7
 Che mical Phy sics Let ter s 625 (20 15) 91– 97 Con tents lists available at ScienceDirect ChemicalPhysicsLetters  j o u r na l h o me p a g e : www.elsevier.com/locate/cplett Bigdatareductionbyttingmathematicalfunctions AsearchforappropriatefunctionstotRamachandransurfaces AnitaRágyanszki a ,KláraZ.Gerlei a ,AttilaSurányi a ,AndrásKelemen b , Svend J.Knak Jensen c,,ImreG.Csizmadia a,d ,BélaViskolcz a a Depart ment of Che mic al Inf ormati cs, Uni ver sit y of Sze ged , Bol dog ass zon y sgt . 6., H-6 725 Sze ged, Hungar y b Department of Applied Info rmatic s, Uni ver sit y of Szeged, Bol dog ass zony sgt . 6.,H-6725 Sze ged, Hungar y c Depart ment of Chemi stry, Aarh us Univer sity, Langel andsg ade 140, DK-80 00 Aarhu s C, Denma rk d Departmen t of Che mistry, Uni ver sit y of Tor onto, 80 St.GeorgeStreet , M5S3H6 Tor ont o, Ontario, Canada articleinfo  Article history: Rec eived 11 Dec ember 201 4 Innal fo rm17 Fe br uary 2015 Ava ila ble online25Februa ry 2015 abstract Thepotentialenergysurfaceassociatedwithinternalrotationof apairof geminalfunctionalgroupswas studiedusingelectronstructurecalculations.Thefunctionalgroupswereattachedtoa met hyl ene car- bo n and werechosenassaturatedhydrocarbons,unsaturatedhydrocarbonsand hetero atomcontaining moietieslikeamidebondsinvariousorientations.Forthemajorityof thestudiedcompoundsextended Fourierexpansions,augmentedwithGaussianfunctionswereneededtoachieveaccuracywithina few kJ/mol.Thepresentletteraimstotaketherststepsof a bot tomupsolutionforproteinfoldingbynding thefunctionsof smallpeptideresidues. © 2015ElsevierB.V.Allrightsreserved. 1. Introduction Pro tein foldin g is one of the big ges t conundrums of our cen tur y. One of th e r eas on s th at t hi s pr obl em has not bee n s ol ve d yet is due to th e bi g d at a a ss oc ia te d wi th it . Re du ci n g this dat a set is a po ssible ap pr oach towa rd si mp li c at ion. In order to make thes e simpli ca tio ns, we rst have to deeply unders tan d mathematical prop ertie s of confo rmat ional space s. It has not been explored as yet how the complexity of the function must cha nge with inc rea sing comple xit y of the pot ential energy surface. Conf ormati onal anal ysis of organi c molecules has be en initi- at ed before the midd le of the 20 th cent ur y. Si mp le hy dr ocarbons li ke e th an e (H 3 C CH 3 ), propan e (CH 3  CH 2  CH 3 )and n -b ut a ne (CH 3  CH 2  CH 2  CH 3 ) wer e the ba si c m ol ec ul es th at exhi bi ted such st ruct ur al char ac teristics that pr ovided the basis of co nf or- ma ti onal analysis. Me thyl rota ti ons ( C CH 3 ) and et hy l ro ta ti ons ( CH 2  CH 2  ) we reverysimi la r ina va ri et y of comp ounds asillus- tra ted in Figure 1. The mini ma fo r me thyl rotations occurred of  60 , +60 and 180 for ethane, propane and forthe anti (a) orientation of n- bu ta ne. These are considered id ea l va lues. Fo r the ga uche (g + or g) orientatio n in n-buta ne, the methyl ro ta tion rema ined in the Corresponding author. E-mail addres s: [email protected] (S.J. Knak Jense n). same vicini ty (5 2.3 ) [1] thus it is s ti ll r e ga r de d as i de al . Th e two methyl rotati ons in propane al so produced an ideal surf ace where 3 × 3 = 9 minima oc curr ed regularl y at 60 , +60 and 180 values of the dihedr al angl es. In n- butane the dihedral angl es associ ated wi th the rota ti onabou t the ce ntral C C bo nd we re 69.5 , +6 9. 5 and 180 . On the basis of thi s it would have bee n ea sy to as su me tha t thi s is the general rul e for con for mational analys is. In 1963 Rama chandran [2] att empt ed to pe rf or m conf orma- tio nalanaly sis on simplepepti desand thepatte rn forsuch a dou ble rotor CONH CH 2  C ONH tu r ne d o u t to be qu it edif f erent fr om the pro pan e surface. More rec ent ly it tur ned outtha t even a simple compound as n-p ent ane beh ave d in non-ideal fashion [3]. Conse- qu entl y, it now ap pe ars to be re asonable to as su me that surface to pology is a functi on of the rotati ng mo ieties. To es tablish the topology of the surface a set of grid points ne ed to be co mp uted. The loca ti ons of the minima on the surf ace re pr es ente d by the gr id po ints will lead to th e to po logica l image of the po tent ial ener gy surface (P ES ). Fitting anal yt ic functions to the grid po ints is a mathematical techniqu e that has already been de ve loped. Ac tu ally , any co mp lete set of function (p ower series [4], tri gon ome tri c or Gaussi an) may be used to t a fu nc tio n tha t descri bes a po tential energy curve or surf ace. Powe r seri es can be use d suc cessfu llyto t PESin thereactionsub spa ce [5] whileinthe confor mational subsp ace trigo nomet ric functions arefavoured [1]. Fit tin g ma the mat ica l fun cti ons to potential energy curves was inv est iga ted by several authors [6–10]. Po pl e and his co-workers at temp ted to us e ver y simple tr igonomet ri c functions to achi eve http://dx.doi.org/10.1016/j.cplett.2015.02.031 000 9-2 614/ © 2015 Elsevier B.V. All rig hts res erv ed.

1-s2.0-S0009261415001220-main

Embed Size (px)

DESCRIPTION

fgj

Citation preview

  • Chemical Physics Letters 625 (2015) 9197

    Contents lists available at ScienceDirect

    Chemical Physics Letters

    jou rn al hom epage: www.elsev ier .com/ locate /cp le t t

    Big dat ctiA searc ha

    Anita Rg KelSvend J. K a

    a Department o ngaryb Department o aryc Department od Department o ada

    a r t i c l

    Article history:Received 11 DIn nal form 1Available onlin

    d wilatioroca

    orienGaus

    the s.

    1. Introduction

    Protein folding is one of the biggest conundrums of our century.One of the due to the possible apsimplicatiproperties o

    It has nfunction muenergy surf

    Conformated beforelike ethane(CH3 CH2such structmational an( CH2 CH2trated in Fig

    The min180 for etbutane. Theg) orienta

    CorresponE-mail add

    same vicinity (52.3) [1] thus it is still regarded as ideal. The twomethyl rotations in propane also produced an ideal surface where3 3 = 9 minima occurred regularly at 60, +60 and 180 values

    http://dx.doi.o0009-2614/ reasons that this problem has not been solved yet isbig data associated with it. Reducing this data set is aproach toward simplication. In order to make theseons, we rst have to deeply understand mathematicalf conformational spaces.ot been explored as yet how the complexity of thest change with increasing complexity of the potential

    ace.ational analysis of organic molecules has been initi-

    the middle of the 20th century. Simple hydrocarbons (H3C CH3), propane (CH3 CH2 CH3) and n-butaneCH2 CH3) were the basic molecules that exhibitedural characteristics that provided the basis of confor-alysis. Methyl rotations ( C CH3) and ethyl rotations) were very similar in a variety of compounds as illus-

    ure 1.ima for methyl rotations occurred of 60, +60 andhane, propane and for the anti (a) orientation of n-se are considered ideal values. For the gauche (g+ ortion in n-butane, the methyl rotation remained in the

    ding author.ress: [email protected] (S.J. Knak Jensen).

    of the dihedral angles. In n-butane the dihedral angles associatedwith the rotation about the central C C bond were 69.5, +69.5and 180. On the basis of this it would have been easy to assumethat this is the general rule for conformational analysis.

    In 1963 Ramachandran [2] attempted to perform conforma-tional analysis on simple peptides and the pattern for such a doublerotor CONH CH2 CONH turned out to be quite different fromthe propane surface. More recently it turned out that even a simplecompound as n-pentane behaved in non-ideal fashion [3]. Conse-quently, it now appears to be reasonable to assume that surfacetopology is a function of the rotating moieties.

    To establish the topology of the surface a set of grid pointsneed to be computed. The locations of the minima on the surfacerepresented by the grid points will lead to the topological imageof the potential energy surface (PES). Fitting analytic functions tothe grid points is a mathematical technique that has already beendeveloped. Actually, any complete set of function (power series[4], trigonometric or Gaussian) may be used to t a function thatdescribes a potential energy curve or surface. Power series can beused successfully to t PES in the reaction subspace [5] while in theconformational subspace trigonometric functions are favoured [1].

    Fitting mathematical functions to potential energy curves wasinvestigated by several authors [610]. Pople and his co-workersattempted to use very simple trigonometric functions to achieve

    rg/10.1016/j.cplett.2015.02.0312015 Elsevier B.V. All rights reserved.a reduction by tting mathematical funh for appropriate functions to t Ramac

    yanszkia, Klra Z. Gerlei a, Attila Surnyia, Andrsnak Jensenc,, Imre G. Csizmadiaa,d, Bla Viskolcz

    f Chemical Informatics, University of Szeged, Boldogasszony sgt. 6., H-6725 Szeged, Huf Applied Informatics, University of Szeged, Boldogasszony sgt. 6., H-6725 Szeged, Hungf Chemistry, Aarhus University, Langelandsgade 140, DK-8000 Aarhus C, Denmarkf Chemistry, University of Toronto, 80 St. George Street, M5S 3H6 Toronto, Ontario, Can

    e i n f o

    ecember 20147 February 2015e 25 February 2015

    a b s t r a c t

    The potential energy surface associatestudied using electron structure calcubon and were chosen as saturated hydmoieties like amide bonds in various Fourier expansions, augmented with kJ/mol. The present letter aims to takethe functions of small peptide residueonsndran surfaces

    emenb,

    th internal rotation of a pair of geminal functional groups wasns. The functional groups were attached to a methylene car-rbons, unsaturated hydrocarbons and heteroatom containingtations. For the majority of the studied compounds extendedsian functions were needed to achieve accuracy within a fewrst steps of a bottom up solution for protein folding by nding

    2015 Elsevier B.V. All rights reserved.

  • 92 A. Rgyanszki et al. / Chemical Physics Letters 625 (2015) 9197

    Figure 1. A: Modication of conformational potential energy curve by increasing degree of substitution. B: Conformational potential energy surface (PES) of propane.

    CCC

    HH

    H

    HH

    H

    H H

    CCC

    CC

    H

    HH

    H

    H H

    HH

    HH HH

    I

    II

    Propane

    n-Pentane

    RC

    H H

    R

    General structure of geminally substituted methane.

    CCN

    HH

    H

    NC

    O

    CCHH

    HH HH O

    H

    III

    IV

    N-Acetyl-Glycine N-methylamide

    CCN

    H

    NC

    O

    CCHH

    HH HH O

    H N-Acetyl-Alanine N-methylamide

    HH3C

    V

    CCN

    NC

    O

    CCHH

    HH HH

    HN-Acetyl-Valine N-methylamideC

    CC

    H H

    H

    H H

    HH

    H

    Scheme 1. Molecules studied as double rotors ( and ) to generate the potential energy surface. The side chain orientation is measured by .

  • A. Rgyanszki et al. / Chemical Physics Letters 625 (2015) 9197 93

    a reasonable accuracy. In 1988, Chung has pointed out that fullFourier expansion is needed [11]. In the meantime Peterson at al. [1]started to use an extended Fourier series in which Gaussian func-tions were augmenting the trigonometric expansion. Work has alsocontinued u

    A varietyanalytic funtive visualizpseudo 3D cases such a

    This proand the pre

    2. Method

    The choever, we necurrent studated with inusing the B3theory in thB3LYP/6-31give resultshigher levea range of (Scheme 1)ments in ordeach of the points had the nomencpotential en

    A Levenbmethod witwith two ints were caof the t waDetails of thsection (Mecalculated g

    We woutides has pr

    3. Current

    Finding arduous woable to stara bottom upresearch as

    Mathemsquare t oto the resecal functionthe complecomplexitycriteria:

    (i) The ttsomewh

    (ii) The ttein order

    In the capoints are enentially w

    2. A schematic ow-chart showing the phases an extensive research frommization of tted functions to the use of the results in a force eld descrip-

    example, using 15 increments for an oligiopeptide withino acids leads to 25 points along each of and anglesrequires something like 1028 grid points due to the followingnship [30] (1):

    2n = 625n = 62510 1028 (1)

    molecules that were studied are shown in Scheme 1. Of thempounds the rst two (I and II) have saturated rotating moi-ttached to the central CH2 skeleton. The second set (IIIV)s amide bonds ( CONH ) with three different orientationsrotating moieties. Compounds, IIIV have dissymmetricalre: which is actually glycine diamide CONH CH2 CONH ,sing rather large multi-term Fourier expansion [12]. of molecular systems were also tted by quantitativections [1318]. Such surfaces may be used for qualita-ation. This is usually accomplished by 2D-contour orplots. These were historically demonstrated in severals CH3 [19], CH2 S(O)H [20] and CH2 S(O2)H [21].

    cess was expanded, rst for one dimensional cases [22],sent letter deals with two dimensional problems.

    s

    ices of basis set and level of theory are crucial. How-ed to compromise our choice so that we can extend oury to larger peptides in the future. Energies, E, associ-ternal rotation were calculated quantum mechanicallyLYP/6-31G(d) implementation of the density functionale Gaussian 09 [23,24] software package. The choice ofG(d) was inspired by the nding that it happened to

    in good agreement with results obtained using a muchl of theory [25]. The calculations were carried out fordihedral angle values, and , of the ve molecules

    in the interval [180,180] with grid points at 15 incre-er to generate the surface. This required 25 points along

    two independent variables, thus a total of 252 = 625 gridto be computed for given 2D surface. Figure 2 deneslature of the idealized topology of 2D conformationalergy surface and the Ramachandran map.ergMarquardt algorithm [26], a nonlinear least squareh a local minimizer, was performed to t the functionsdependent variables, and , to the surface data. Therried out using the MATLAB [27] software. The goodnesss monitored by the calculated R2 and the RMSE values.e formalism are found in the Supplementary Materialthodology 1) where RMSE shows the differences of therid points and the tted function.ld like to note that our approach to study PES for pep-oved useful for several other systems [5,28,29].

    scope and future prospective

    the right methodologies for data set reduction is anrk. When proposing such methodologies it is reason-

    t with the description of small compounds, and aim for solution. The long term goal would require extensive

    illustrated by Scheme 2.atical software, such as Matlab, will make the leastf a given function to the grid points. However, it is uparcher to choose the explicit form of the mathemati-. The purpose of the present letter is to explore howxity of the mathematical function increases with the

    of molecular structure. One needs to keep in mind two

    ing should achieve near chemical accuracy or valuesat better.d function should have as few parameters as possible

    to achieve big data reduction effectively.

    se of a single amino acid diamide the needed 625 gridasily manageable. However, the problem grows expo-ith the increasing size of the peptide molecule.

    Schemethe optition.

    For10 amwhich relatio

    N = 25

    Theve coeties acontainin the structu

  • 94 A. Rgyanszki et al. / Chemical Physics Letters 625 (2015) 9197

    Figure 2. Schematic topology of conformational PES map showing the 360 to 360 range of dihedral angles of double rotors. A: Hydrocarbons, B: Peptides (RamachandranMap).

    alanine diamide CONH CH Me CONH and valine diamideCONH CH Me2 CONH .Clearly in the rst two compounds (I and II) the rotating moi-

    eties connected to the central methylene carbon are of tetrahedralmoieties. In contrast, the last three compounds (IIIV) have at,trigonal planar, rotating moieties with heteroatoms.

    These set of ve compounds (IV) represent increasing com-plexity of the potential energy surface. Of course the increasing

    complexity in appearance may lead to increasing complexity of themathematical function to be tted.

    4. Results and discussion

    The complexity of the mathematical function that describes thePES is expected to depend on the complexity of the correspondingmolecular structure.Figure 3. Potential energy surfaces for the compounds I and II in Scheme 1 calculated for the gas phase at the level B3LYP/6-31G(d).

  • A. Rgyanszki et al. / Chemical Physics Letters 625 (2015) 9197 95

    Each sura summatioexpansions

    Ea(, ) =

    where a0 isfactor fromTo achieve (3) to the re

    Eb(, ) =

    where m is dene the cthe ellipsoidFigure 4. Fully relaxed potential energy surfaces for the compounds IIIV in Scheme 1 ca

    face had a minimal set of functions (2), which includesn up to 6, yielding 6 4 = 24 terms of simplied Fourier

    [31] with two independent variables.

    a0 +6

    m=1(a1 cos m + b1 cos m

    + a2 sin m + b2 sin m ) (2) the constant term in the series, is the conversion

    degrees to radians and m is the number of the terms.higher accuracy, a set of Gaussian functions were ttedcognizable critical points.

    9

    m=1Ame

    (c(0m)2/22m+c ( 0m)2/22 m) (3)

    the number of the terms, A is the amplitude, 0 and 0enter and the and are the and extension of.

    For the extended tw

    Ec(, ) =

    Figures 3The optimizthe tted suare listed in

    Tables 1the surfaceinhomogenin the numlculated for the gas phase at the level B3LYP/6-31G(d).

    sake of the transformation of coordinate system ano dimensional Fourier-series (4) is needed.

    6

    m=1f1 cos(m + m )f2 cos(m m )

    + f3 cos(m + m f4 sin(m m )+ f5 sin(m + m )f6 cos(m m )+ f7 sin(m + m )f8 sin(m m ) (4)

    and 4 show the conformational PES for compounds IV.ed parameters of the ve molecules and the accuracy ofrface are listed in Tables 1 and 2. The tted parameters

    Supplementary Materials in Tables S1 and S2. and 2 also summarize the symmetric properties ofs studied and the corresponding tted functions. Theeity of the molecules main chain causes a reductionber of the symmetry axes and the surface becomes

  • 96 A. Rgyanszki et al. / Chemical Physics Letters 625 (2015) 9197

    Table 1Summary of the minimum energy critical points of the IIV compounds and accuracy of surface t.

    Compound Uniqueminima

    Optimized results Fitted results R2 RMSE Fitted function Number of ttedparameters

    Eopt Et

    E E(kJ mol1) (kJ mol1)

    I 9 60.36 60.72 0.00 60.36 60.72 0.00 0.998 0.344 (1) 25 0

    II 4 64.69 89.66 14.15 67.28 89.90 9.90 0.997 0.992 (1) + (2) 70 4.252 68.28 67.90 7.07 68.28 67.90 5.79 1.284 65.40 180.00 3.62 65.40 180.00 2.51 1.111 179.23 179.23 0.00 179.23 179.23 0.63 0.63

    III L, D 122.78 21.93 10.24 120.20 19.39 12.65 0.929 4.631 (1) + (2) + (3) 118 2.41L 179.99 180.00 2.58 178.8 178.8 2.66 0.08L, D 82.15 68.54 0.00 81.43 50.41 2.28 2.28

    IV D 165.06 30.94 29.28 159.3 28.79 24.50 0.911 5.725 (1) + (2) + (3) 118 4.78D 67.92 26.68 24.21 67.17 21.11 28.07 3.86L 126.55 21.42 13.04 117.10 19.10 14.71 1.67D 73.52 57.39 10.88 67.33 67.17 10.05 0.83L 158.33 163.41 5.97 155.50 167.00 6.06 0.09L 82.87 72.60 0.00 86.36 78.89 2.23 2.23

    less-symmetric, consequently, the function which describes thatsurface is increasingly more complicated. Consequently, the R2

    values are gradually reduced from unity, the differences betweencalculated and RMSE of the tted functions are also reduced in agradual fashion.

    Tables 1 and 2 indicate that the PESs which are highly sym-metrical can be tted by a simple two dimensional Fourierexpansion (1). In contrast the PESs of the molecules that have strongintramolecular interactions can only be described with more com-plicated functions. The number of the Fourier expansions dependson the numhas only diarate Gaussiathe minima

    The molsymmetricasymmetry the chiral a

    symmetry what so ever. The increasing complexity in the structureleads to increasing complexity of the tted mathematical functions.The surface of glycine diamide, the alanine diamide and the valinediamide needs 6 terms of 2 dimensional Fourier expansions, 9 termsof Gaussian functions and 6 terms of extended Fourier functions.

    The PES was originally described by 625 energy values denedby the two rotated dihedral angles. Creating a function describ-ing the conformational space of the species drastically reduced theamount of data (Tables 1 and 2) and made it possible to performa more involved mathematical analysis on the functions than it is

    oableile thrfacea an

    It semmeted fe an

    Table 2Summary of th face DFT computed aram

    Compound

    1)

    Vber of the higher maxima on the surface. If the surfacegonal and central symmetry, it is required to incorpo-n functions (2). The number of the higher maxima and

    determine the number of these functions.ecules, (IIIV), contain a peptide bond and have a dis-l structure, which is described by a surface with acenter only for the glycine diamide (III). In contrast,lanine diamide (IV) and the valine diamide (V) have no

    ever dWh

    the sumaximmetry.the sythe tpropan

    e minimum energy critical points of the valine diamide (V) and accuracy of the sur grid-points the t was performed using tting functions of (1) + (2) + (3) and 118 p

    Unique minima Optimized results

    E (kJ mol

    D (a) 62.93 39.08 164.98 29.08 D (g) 50.87 39.25 68.52 29.88

    D (g+) 50.91 42.45 69.38 34.46 L (a) 86.81 22.80 173.73 22.46 L (a) 126.58 132.80 175.43 7.31 L (g) 132.08 159.92 60.83 5.11 L (g+) 152.50 153.74 71.60 5.68 D (a) 128.55 65.43 178.76 31.50 D (g) 131.95 40.53 67.14 37.58 D (g+) 160.47 36.22 64.15 29.76 L (g) 122.94 18.74 66.09 10.99 L (g+) 130.10 21.79 71.94 16.24 D (a) 77.03 142.69 177.34 40.48 D (g) 68.84 174.90 41.88 44.92 D (g+) 74.35 164.18 102.17 47.55 L (g) 122.84 18.70 66.10 10.99 L (g+) 130.11 21.79 71.94 16.24 D (a) 73.51 60.81 173.19 9.10 D (g) 59.54 33.73 65.04 19.24 D (g+) 62.65 38.66 72.09 20.76 L (a) 83.49 78.96 175.19 0.00 L (g) 83.88 65.26 71.82 1.85 L (g+) 84.28 72.33 60.61 3.74 on a matrix of data points.e Fourier expansion denes mostly the periodicity of, the Gaussian function denes the minima and thed the extended Fourier series dene the surface dissym-ems the complexity of the molecule and the number oftry axes species the complexity and the accuracy ofunction. Table S3 illustrates this point in the cases ofd n-pentane. The number of functions needed is mostly

    t. indicates the orientation of the sidechain (Scheme 1). To the 625eters. R2 and RMSE were 0.913 and 6.364 kJ mol1, respectively.

    Fitted results Eopt Et E

    (kJ mol1)

    60.00 38.18 31.19 2.1152.87 38.18 31.19 1.31

    52.73 41.82 35.01 0.55

    85.45 23.64 16.98 5.48125.50 136.40 3.95 3.36132.70 154.50 0.24 4.87150.90 150.90 1.45 4.23128.50 63.64 28.42 3.08132.70 41.82 32.61 4.97160.50 34.55 23.91 5.85121.80 20.00 14.92 3.93129.10 20.00 16.91 0.67

    78.18 143.60 43.02 2.5467.27 172.70 47.90 2.9874.55 165.50 49.77 2.22

    121.80 20.00 13.14 2.15129.10 20.00 16.91 0.67

    70.91 60.64 14.18 5.0860.00 34.55 17.28 1.9663.64 35.44 19.81 0.95

    78.18 81.82 3.28 3.2885.45 67.27 5.81 3.9685.45 74.55 8.15 4.41

  • A. Rgyanszki et al. / Chemical Physics Letters 625 (2015) 9197 97

    determined by the complexity of the molecule. Although the t-ting becomes more appropriate by applying more functions, thereis a limit; simple molecules tted surfaces are only worsened if anexcessive number of functions are used.

    In the gas phase, at this level of theory, not all of peptide min-ima appear on the surface. In the case of glycine diamide (FigureS1) ve conformers (L, L, D, L D) and in the case of alaninediamide (Figure S2) six conformers (D, L, L, D, L D) insteadof the ideal nine conformers shown in Figure 1B. In the case ofthe valine diamide twenty three conformers (Table 2) instead ofthe possible twenty seven were located. Four of the minima, L(g), L (g+), L (a) and the L (a), couldnt be located. The compar-ison of the tted surfaces suggested that for a given number of gridpoints, such as 252 = 625 simple hydrocarbons can be tted moreaccurately than peptides. However, the Ramachandran surfaces ofglycine, alanine and valine diamide can be tted with acceptableR2 values (0.91 R2 0.93). More important are the deviations ofthe optimized minimum energy points from the tted ones. Theseare given incan be repre120 paramereduction nfunction woThe last coltion betweeis within 1.

    5. Conclus

    The PES ematical fugiven molecof Fourier s

    The moland their sudescribed wthe case whless-symmeplicated anthe surface ting needs aextended Fo

    The presand hyperstaining sevrepresentedmolecules t

    Acknowled

    The auth4.2.2.A-11/1

    their biological and environmental answers, TMOP-4.2.2.C-11/1/KONV-2012-0010 Supercomputer the national virtuallaboratory, HUSRB/1002/214/193 Bile Acid Nanosystems asMolecule Carriers in Pharmaceutical Applications and TMOP4.2.4. A/2-11-1-2012-0001, National Excellence Program Elabo-rating and operating an inland student and researcher personalsupport system, subsidized by the European Union and co-nancedby the European Social Fund.

    The authors would like to thank Miln Szori for helpful dis-cussions. The authors thank M. Labdi and L. Mller for theadministration of the computer clusters used for this work.

    Appendix A. Supplementary data

    Supplementary data associated with this article can be found, inthe online version, at doi:10.1016/j.cplett.2015.02.031.

    nces

    . Pete. Ram63).asi, Butrey

    zab, G LewisadomadomadomHead-hung.K. Kehl. Stru. Pete. Mod79) 14. DeM80) 14. Pete.. Pete. Pete.. Kari, olfe, olfe,

    gyaniskolc. Frisc9.. Beckndrddia, J. More, TLAB C. Varutreykli, A.al Pu. Tolst the last column of Tables 1 and 2. These 625 grid pointssented by mathematical functions containing less thanters. It is hoped that this may lead to successful big dataeeded to understand protein folding. A tted analyticuld reduce this big data into a more manageable set.

    umn of Tables 1 and 2 shows that the maximum devia-n the optimized and the tted energies of the minima

    5 kcal/mol.

    ion

    can be described with an accurate multi variable math-nction. Depending on the structural complexity of theule, the surface can be characterized by a combination

    eries and Gaussian functions.ecules which have two symmetric rotational groupsrfaces have all the bilateral symmetry axes and can beith a linear combination of the single Fourier series. Inen the molecule has heteroatoms, the surface becomestric and the tted functions are increasingly more com-d Gaussian functions are needed to t the surface. Ifhas only central symmetry or has no symmetry the t-

    dissymmetrical correction function, in the form of anurier series.ent study suggests that the potential energy surfacesurfaces of exible molecules, such as peptides, con-eral internal bond-rotations, may be reasonably well

    by these types of tting method. For such macrohe grid points would be in the domain of big data.

    gements

    ors acknowledge the nancial support within TMOP-/KONV-2012-0047 New functional material and

    Refere

    [1] M.R[2] G.N

    (19[3] G. T[4] D. A[5] I. S[6] J.D.[7] L. R[8] L. R[9] L. R

    [10] M. [11] A. C[12] T.A

    Mo[13] M.R[14] T.A

    (19[15] G.R

    (19[16] M.R

    131[17] M.R[18] M.R

    239[19] R.E[20] S. W[21] S. W[22] A. R

    B. V[23] M.J

    200[24] A.D[25] G. E

    ma[26] J.J. [27] MA[28] A.J.[29] D. A[30] I. J

    tion[31] G.Prson, I.G. Csizmadia, J. Am. Chem. Soc. 100 (1978) 6911.achandran, C. Ramakrishnan, V. Sasisekharan, J. Mol. Biol. 7 (95)

    . Nagy, G. Matisz, T.S. Tasi, Comput. Theor. Chem. 963 (2011) 378., N. Meinander, J. Laane, J. Phys. Chem. A 108 (2003) 409.. Czak, Nat. Commun. 6 (2015) 5972., T.B. Malloy, T.H. Chao, J. Laane, J. Mol. Struct. 12 (1972) 427., J.W. Hehre, J.A. Pople, J. Am. Chem. Soc. 94 (1972) 2371., W.A. Lathan, W.J. Hehre, J.A. Pople, J. Am. Chem. Soc. 95 (1973) 693., J.A. Pople, J. Am. Chem. Soc. 94 (1970) 4786.Gordon, J.A. Pople, J. Phys. Chem. 97 (1993) 1147.-Philips, J. Chem. Phys. 88 (1988) 1764.oe, M.R. Peterson, G.A. Chass, B. Viskolcz, L. Stacho, I.G. Csizmadia, J.

    ct. THEOCHEM 666667 (2003) 79.rson, I.G. Csizmadia, J. Am. Chem. Soc. 101 (1979) 1076.ro, W.G. Liauw, M.R. Peterson, I.G. Csizmadia, J. Chem. Soc. Perkin 232.are, O.P. Strausz, M.R. Peterson, I.G. Csizmadia, J. Comput. Chem. 11.rson, G.R. DeMare, I.G. Csizmadia, O.P. Strausz, J. Mol. Struct. 86 (1981)

    rson, I.G. Csizmadia, Theor. Org. Chem. 3 (1982) 190.rson, G.R. DeMare, I.G. Csizmadia, O.P. Strausz, J. Mol. Struct. 92 (1983)

    I.G. Csizmadia, J. Chem. Phys. 1443 (1969).A. Rauk, I.G. Csizmadia, Can. J. Chem. 113 (1969).A. Rauk, I.G. Csizmadia, J. Am. Chem. Soc. (1969) 1567.szki, A. Surnyi, I.G. Csizmadia, A. Kelemen, S.J. Knak Jensen, S.Y. Uysal,z, Chem. Phys. Lett. 599 (2014) 169.h, et al., Gaussian 09, Revision A.1, Gaussian, Inc., Wallingford, CT, USA,

    e, J. Chem. Phys. 98 (1993) 5648.i, A. Perzel, O. Farkas, M.A. McAllister, G.I. Csonka, J. Ladik, I.G. Csiz-

    Mol. Struct. THEOCHEM 391 (1997) 15.Lect. Notes Math. 630 (1978) 105.2013b, The MathWorks Natick, 2013.andas, Phys. Chem. Chem. Phys. 13 (2011) 9796., N. Meinander, J. Laane, J. Phys. Chem. A 108 (2004) 409.

    Perczel, B. Viskolcz, I.G. Csizmadia, Protein Model, Springer Interna-blishing, 2014.ov, Fourier Series, Courier-Dover, 1976.

    Big data reduction by fitting mathematical functions1 Introduction2 Methods3 Current scope and future prospective4 Results and discussion5 ConclusionAcknowledgementsAppendix A Supplementary dataReferences