Stats assessment

Predicting Protein Ligand Binding AffinitiesStatistics AssessmentWe have been given a set of data containing experimental and calculated binding affinities of different drugs with five proteins. Each protein interacts with a drug that has a common backbone; then different ligands are attached to the backbone to see how the binding affinity for that protein is affected. The data given contains an experimental measurement of the binding energy for each protein-drug interaction. This is used to evaluate the various computational methodologies for calculating the binding energy, which should be correlated to the experimental measurements. Statistical MethodsThere are two published sets of data that have been recalculated using newer versions of the software. These are compared to see if the new software correlates to the original. There is then a group of results that take the configurations from programs CDOCKER and Glide then rescore the binding energies using various solvation models. These are compared to the experimental values to see if the rescoring is worthwhile and leads to improved results.To compare the data, correlation tests are used. These give a value between -1 and 1, 1 being perfectly correlated and -1 perfectly anti-correlated. Values close to 0 suggest there is no correlation. Pearson r and Kendall tau are used to compare the data. Pearson r simply looks at whether there is a linear relationship between the data. Kendall tau compares the rank of the data, and gives a high correlation if the ordering of one set matches the ordering of another. Kendall tau was chosen over Spearman rank as a rank-ordering test as it is insensitive to error and/or outliers, and gives better p-values with small sample sizes, which we have.We can also use the Kruskal-Wallis test to determine whether an apparent difference in results is due to random fluctuations within one underlying distribution, or if the results suggest there is a statistically significant (p-value below a certain threshold, usually 0.05) difference between the data. Kruskal-Wallis is a non-parametric version of ANOVA, which means it does not assume a normal distribution of the underlying data, which ANOVA does.1a) Both tests are used to show how the software versions compare, with the results summarised in the following table.

Pearson rKendall tau

DockingE-NovoDockingE-Novo

-Secratase0.39810150.77290430.71428570.5238095

Factor Xa0.63418030.23336220.42857140.1774892

HIV1 Protease0.98325360.067936630.8974359-0.1538462

Src Tyrosine Kinase0.89928930.7506920.61111110.6666667

Thrombin0.90549130.65426680.72727270.5757576

The two protocols used are docking and e-novo. Docking gives an estimate to the binding energy using a simplified scoring function, while e-novo uses the MM-GBSA method to calculate the binding energy. Looking at the scatter plots can help to assess which method of comparison is preferable.

-Secratase

Factor Xa

HIV1 Protease

Src Tyrosine Kinase

Thrombin

This clearly highlights that the docking versions compare very well for HIV1 Protease. The Pearson value is higher as there is a clear linear relationship between the data, however the Kendall value is slightly lower as some of the data is out of order, meaning the ranks wont match as well. Similarly the docking values for Src Tyrosine Kinase and Thrombin have fairly good Pearson values, while the Kendall values are lower. When looking at -Secratase plots you can see that the docking values match well for the main part, whereas the e-novo values do not. Because of the outlier in the docking, the Pearson values show the opposite of this, whereas the Kendall values reinforce the plot. For this purpose, where we are trying to predict which drug molecule will perform best, the order rather than the absolute value is most important. Kendall tau gives a better representation of the order, and is less affected by outliers so is the better statistic to use. From the table it is clear that only the docking values for HIV1 Protease perform well, with -Secratase and Thrombin docking doing reasonable as well. All other correlation between the software versions is poor, and the good value for HIV1 Protease is probably due to the fact it is a common protein used to parameterise software, so may have been used in the docking parameterisation and therefore likely to give a good result.1b) All the calculations are then compared to the experimental data to test their reliability at predicting the binding energies. The boxplots of all the data for -Secratase are given below.CDOCKER

Glide

This clearly highlights that due to the differences in absolute values of the data, a direct comparison between the values would be futile. Therefore a correlation test is run comparing each method to the experimental results.

Pearson R for CDOCKERBSFXHIVSRCT

Published.Docking.0.1501930860.702377530.7262222-0.3249626-0.1962866

Published.E.Novo0.4589882420.542172090.297188460.87254760.8216115

Calculated.Docking.0.5331939750.342519070.67575162-0.1202974-0.0426417

Calculated.E.Novo0.3083238290.116512340.059595770.7471890.6644117

Prime.0.559628897-0.160053580.36601040.74111630.4440498

GBHCT.0.2694878550.287518460.617587250.4590031

GBOBC1.bondi.0.2553971660.073103250.340486860.3920258

GBOBC1.mbondi2.0.2266242850.172195490.533463740.3952366

GBOBC2.bondi.0.243324324-0.110272750.537831980.4900985

GBOBC2.mbondi2.0.2399190320.155325930.538168350.3878184

Gbn.0.008088969-0.055896340.391382820.4855234

Pearson R for GlideBSFXHIVSRCT

Docking.-0.11928230.41926640.85004140.770229590.8472736

Prime.0.60030320.25347740.61752420.74445147-0.3008123

GBHCT.0.42087590.70210790.6437029-0.033898740.5337817

GBOBC1.bondi.0.36616240.62358630.66463450.107626530.6577541

GBOBC1.mbondi2.0.28430040.68991130.7471562-0.158076260.5536183

GBOBC2.bondi.0.23749680.63608060.72090650.058763120.6630854

GBOBC2.mbondi2.0.29745480.5828520.65339250.101623570.4687353

Gbn.0.12845180.62874070.57394560.196666970.6536765

Kendall tau for CDOCKERBSFXHIVSRCT

Published.Docking.0.047619050.578266340.6410256-0.1714986-0.2121212

Published.E.Novo0.238095240.421743120.43589740.74316050.6969697

Calculated.Docking.0.333333330.326090040.6923077-0.11433240

Calculated.E.Novo0.142857140.178262550.10256410.45732960.4545455

Prime.0.42857143-0.169566820.43589740.51449580.3333333

GBHCT.0.047619050.230436960.48717950.3939394

GBOBC1.bondi.0.047619050.030435070.33333330.2121212

GBOBC1.mbondi2.0.047619050.152175350.41025640.2424242

GBOBC2.bondi.0.04761905-0.073913740.46153850.3030303

GBOBC2.mbondi2.0.047619050.100000950.58974360.1515152

Gbn.-0.04761905-0.047826540.41025640.3030303

Kendall tau for GlideBSFXHIVSRCT

Docking.0.047619050.16087110.76923080.62882810.60606061

Prime.0.428571430.13478390.56410260.6288281-0.09090909

GBHCT.0.238095240.61304930.38461540.05716620.51515152

GBOBC1.bondi.0.238095240.44783030.3076923-0.11433240.51515152

GBOBC1.mbondi2.0.238095240.55217910.41025640.05716620.45454545

GBOBC2.bondi.0.238095240.50000470.33333330.05716620.66666667

GBOBC2.mbondi2.0.238095240.50870050.3846154-0.11433240.54545455

Gbn.0.142857140.55217910.20512820.2858310.51515152

None of these suggest any consistent correlation across any of the methods, with the only values above 0.8 coming from the Pearson tests. As discussed earlier these are less likely to represent the correct ordering of the binding energies, which is the most important outcome of the calculations. Looking at the boxplots for all these tables provides an clearer representation.Pearson R for CDOCKER

Pearson R for Glide

Kendall tau for CDOCKER

Kendall tau for Glide

Both the Pearson and Kendall results show correlation averages from 0.2-0.6 for CDOCKER, while Kendall gives 0.2-0.6 for and Pearson 0.4-0.8 for Glide. This shows that none of the methods are particularly well correlated to the experimental values, suggesting all the methods are not very reliable for predicting binding energies.1c) Looking at the boxplots it appears that the published e-novo results are best for CDOCKER while the docking results are best for glide, as these have the largest average correlation to the experimental energies. However this does not give any indication as to whether these results are statistically significant. In order to further the analysis, a new approach is taken using the ideas of the Friedman test. This is another non-parametric ANOVA alternative, where it actually uses the values for the ranks and compares those, rather than just comparing the rank-ordering like Kruskal-Wallis.This has been implemented by taking the values of the binding energies for each protocol and converting them to their rank, for example:Ligand No.Binding EnergyRank

1-9.943

2-12.002

3-9.434

4-12.301

5-8.925

6-8.736

7-8.377

The values for the experimental rank are then taken away from each of the calculated ranks, and the absolute value of the difference taken. This means that any value equal to zero has the correct rank, and any other value represents how far away the data is from the correct rank. A method with all its values further from zero will be worse than a method with all its values close or equal to zero. We can then run a Kruskal-Wallis test to examine whether the values for absolute differences in the ranks differ significantly between protocols, resulting in the following p-values.CDOCKERGlide

-Secratase0.96960.9998

Factor Xa0.018110.002931

HIV 1 Protease0.3390.7422

Src Tyrosine Kinase0.34360.1267

Thrombin0.19990.5764

All separate0.32190.7534

All combined0.7675

This shows that the only protein within which the methods differ significantly is Factor Xa (using 0.05 as the threshold p-value). If you consider all proteins combined, whether you combine the CDOCKER and Glide data or leave them separate, then there is no statistically significant difference between the methods. Looking at the boxplots of Factor Xa compared with the next lowest scores, Src Tyrosine Kinase, shows how the means Factor Xa vary substantially respectively.Factor Xa CDOCKER

Factor Xa Glide

Src Tyrosine Kinase CDOCKER

Src Tyrosine Kinase Glide

The mean values of Factor Xa have a wider range suggesting the methods give significantly different results to each other. The Src Tyrosine Kinase mean range is only three showing a much closer distribution, suggesting all the methods are equivalent. The protocols with a mean closer to zero are the better methods, as zero represents the experimental ordering. For Factor Xa these are published docking in CDOCKER and GBHCT in Glide.This is reinforced by an ANOVA test, where the following table summarises the difference of each methods mean with the mean of all the data for Factor Xa, . The more negative the value, the closer the mean for that method is to zero and the better the method. The most negative values are the published docking and GBHCT.

Factor Xa CDOCKER = 6.475207Calculated.Docking.Calculated.E.NovoGBHCT.Gbn.

-1.157-0.157-0.83881.0248

GBOBC1.bondi.GBOBC1.mbondi2.GBOBC2.bondi.GBOBC2.mbondi2.

1.2521-0.02071.79750.343

Prime.Published.Docking.Published.E.Novo

1.9793-2.9298-1.2934

Factor Xa Glide = 4.272727Docking.GBHCT.Gbn.GBOBC1.bondi.

2.4091-1.5-0.9545-0.2727

GBOBC1.mbondi2.GBOBC2.bondi.GBOBC2.mbondi2. Prime.

-0.9091-0.4545-0.63642.3182

1d) We can now run a similar test, except instead of grouping the data by method we can group it by the ligands within each protein. This will help us to understand if there are specific proteins that do badly across all the different methods. We cannot combine the data between proteins however, as ligand 1 on Factor Xa does not correspond to ligand 1 on Thrombin for example. However we can combine all the Glide and CDOCKER data within each protein. The following table gives the p-values, summarising which proteins have significantly differing ligands.CDOCKERGlideCombined

-Secratase1.18E-012.84E-072.20E-16

Factor Xa2.42E-075.10E-093.60E-13

HIV 1 Protease1.27E-074.52E-091.26E-14

Src Tyrosine Kinase0.031090.016390.0006762

Thrombin1.55E-050.018281.04E-05

This shows that all the ligands differ by a statistically significant amount (again using 0.05 as the threshold p-value). We can now examine the boxplots to see which ligands correlate well to the experimental ordering (mean close to zero), and which do not (mean far from zero).

-Secratase

Factor Xa

HIV 1 Protease

Src Tyrosine Kinase

Thrombin

The worst ligands for each protein can clearly be seen on each graph; ligand six for -Secratase; ligand three, ten or sixteen for Factor Xa; ligand ten for HIV 1 Protease; ligand seven for Src Tyrosine Kinase; ligand seven for Thrombin. The ANOVA test can be run again, giving the following deviations from the mean, with the most negative the best performing ligands and the most positive the worst performing ligands.

-Secratase = 1.834586ligand 1ligand 2ligand 3ligand 4ligand 5ligand 6ligand 7

1.2180.5865-1.203-1.0977-0.51882.5338-1.5188

Factor Xa = 5.547847ligand 1ligand 2ligand 3ligand 4ligand 5ligand 6ligand 7

0.084-1.13.9782.0570.136-3.074-2.548

ligand 8ligand 9ligand 10ligand 11ligand 12ligand 13ligand 14

-0.653-3.7064.768-1.1270.347-0.285-1.443

ligand 15ligand 16ligand 17ligand 18ligand 19ligand 20ligand 21

-0.9694.136-2.9161.768-0.0480.557-0.758

ligand 22

0.794

HIV 1 Protease = 2.550607ligand 1ligand 2ligand 3ligand 4ligand 5ligand 6ligand 7

-1.5510.0281.291-0.919-1.709-1.498-0.551

ligand 8ligand 9ligand 10ligand 11ligand 12ligand 13

-0.2871.0813.713-0.498-0.2871.186

Src Tyrosine Kinase = 2.299145ligand 1ligand 2ligand 3ligand 4ligand 5ligand 6ligand 7

-1.0299-1.2991-0.5684-1.0299-0.14530.85473.0855

ligand 8ligand 9

0.5085-0.3761

Thrombin = 2.763158ligand 1ligand 2ligand 3ligand 4ligand 5ligand 6ligand 7

-0.1316-0.8684-0.02630.2368-1.02631.02631.7105

ligand 8ligand 9ligand 10ligand 11ligand 12

-1.60531.4474-0.97370.7105-0.5

This again just gives a numerical indication of the results we can see from examining the boxplot. The ANOVA test is not really applicable to these results, as they are not normally distributed, so absolute values cannot be used, but help to strengthen the argument and make clearer the boxplots.Conclusions1e) These results show that there is no statistically significant difference between any of the rescored methods with the original calculations, both on the old and new software, when compared to the experimental values. This all assumes the experimental values are correct and do not have any error, which will of course not be true. However as we are principally considering whether the order is correct rather than the absolute values, as this is what matters when choosing which drug will work best, then the error will have less of an effect.It is also clear however that depending on the ligand attached to the drug molecule then the calculations do perform differently. This is an obvious shortcoming in the protocols that is present across all the tested proteins. Future work could involve investigation into how these ligands affect the results, as this could provide valuable insight into improving the performance of the calculations.

Documents

Stats assessment