1
“INTELLIGENT” CONSENSUS PREDICTIONS FOR DAPHNIA TOXICITY OF AGROCHEMICALS Pathan Mohsin Khan 1 , Kunal Roy 2,3 , Emilio Benfenati 3 1 Department of Pharmacoinformatics, National Institute of Pharmaceutical Educational and Research (NIPER), Chunilal Bhawan, 168, Manikata Main Road, 700054 Kolkata, India 2 Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, 188 Raja S C Mullick Road, 700032, Kolkata, India 3 Laboratory of Environmental Chemistry and Toxicology, Istituto Di Ricerche FarmacologicheMario Negri IRCCS, Via La Masa, 19, 20156, Milano, Italy Introduction Overall Methodology Key Points Why Consensus? Error estimation and predictivity comparison Molecular descriptors (Dragon+ PaDEL) Validation Parameters [MAE, Tropsha, rm 2 ] External (Q 2 F 1 , Q 2 F 2 , rm 2 Test , MAE 95% ) Internal (R 2 , Q 2 ,rm 2 LOO , MAE 95% ) References Agrochemicals : A broad class of chemical products widely used in the agriculture to prevent, destroy, or control the harmful organisms (insects, fungi, microbes and weeds) or diseases, or to protect the crops before and after harvesting to minimize the loss or to enhance the yield in production. Over the last few years, the ecotoxicological hazard potential of agrochemicals has received much attention in the industries and regulatory agencies. There are only limited experimental ecotoxicological data available for such compounds. Quantitative structure-toxicity relationship (QSTR) modeling is a ligand based statistical approach proved to be useful in data gap filling. In the present work, we have generated QSTR models for daphnia toxicities of different classes of agrochemicals (fungicides, herbicides, insecticides and microbiocides) employing only simple and interpretable two-dimensional descriptors, and subsequently strictly validated using test set compounds. The validated individual as well as global models were subjected for the “intelligent” consensus model generation using the ICP tool (http://dtclab.webs.com/software-tools) with an objective to improve the prediction quality and reduced prediction errors . The individual as well as consensus models were used to predict the toxicity of an external dataset of biocides to determine the predictive ability of models. As per the developed models, generally, lipophilicity, number of X (halogen) on an aromatic ring, number of substituted benzene C(sp2), number of chlorine atoms, frequency of C - Cl at topological distance 5, number of multiple bonds, number of heavy atoms, number of rotatable bonds, and an increase in carbon chain length increase the toxicity while polarity, presence of ether moiety in aliphatic chain, presence of two oxygen atoms at a topological distance 8, branching in molecules, count of hydrogen bond acceptor atoms and/or polar surface area decrease the toxicity. Ring descriptors E-state descriptors Molecular properties Connectivity indices ETA descriptors Functional group count 2D atom pair Validation External set prediction (Biocides dataset) ECOSAR Comparison Summary of feature responsible for toxicity of agrochemicals Comparison between our models and ECOSAR prediction A single model can’t guarantee the best quality predictions for all compds Entire chemical space is not covered in a single model while consensus combines multiple features of different models covering wider range Helps to reduce error of predictions Four types of consensus proposed: I. CM0:- Simple average of predictions II. CM1:- Average of predictions from the 'qualified' individual models III. CM2:- Weighted average predictions from 'qualified' Individual models IV. CM3:- Best selection of predictions (compound-wise) from 'qualified' Individual models. Prediction of a models are not reliable unless compared with standards and used external dataset compounds. we have employed an external dataset of 67 biocides, The quality of predictions (R2pred) for three individual models were 0.47, 0.50 and 0.47 with mean absolute error of 1.407, 1.395, and 1.422 respectively, while the prediction quality for the consensus model-3 is 0.49 but the mean absolute error reduced to 1.37. Comparison of error (RMSEp) was made with ECOSAR ECOSAR is preferred widely for ecotoxicological prediction of organic chemicals Comparison was made only with test set of the models. Our models offered better predictive efficiency and larger chemical domain. Consensus models offered better predictivity when compared with simple QSTR models. Global models Individual Models Fungicides models microbiocid es models Herbicides models Insecticides models Ntrain = 81 and Ntest = 26 Ntrain = 36 and Ntest = 12 Ntrain = 112 and Ntest = 35 Ntrain = 111 and Ntest = 36 Global models Waxman MF, The agrochemical and pesticides safety handbook. CRC Press, 1998. Roy K, Kar S, Das RN, Understanding the Basics of QSAR for Applications in Pharmaceutical Sciences and Risk Assessment, Academic Press, NY, 2015. Roy K, Ambure P, Kar S, Ojha PK, Is it possible to improve the quality of predictions from an“intelligent” use of multiple QSAR/QSPR/QSTR models? J Chemom 32, 2018, e2992. US EPA, The ECOSAR The ECOSAR (ECOlogical Structure Activity Relationship) Class Program, 2012. Acknowledgement PMK thanks the Department of Pharmaceuticals, Ministry of Chemicals and Fertilizers, Govt. of India for a fellowship. KR thanks the European Commission for financial assistance under the project VERMEER [LIFE16 ENV/IT/000167]. GADCV PLS Ntrain = 313 Ntest = 105 Daphnia toxicity data (pEC 50 values) Flutianil p EC 50 = 6.98 Propiconazole p EC 50 = 4.02 Pentachlorophenol p EC 50 = 5.95 Acetic Acid p EC 50 = 2.08 Cyfluthrin p EC 50 = 9.23 Propylene glycol p EC 50 = 1.83 Tributyltin methacrylate p EC 50 = 2.95 Tributyltin oxide p EC 50 = 7.90

Introduction Overall Methodology Key Points · Roy K, Kar S, Das RN, Understanding the Basics of QSAR for Applications in Pharmaceutical Sciences and Risk Assessment, Academic Press,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction Overall Methodology Key Points · Roy K, Kar S, Das RN, Understanding the Basics of QSAR for Applications in Pharmaceutical Sciences and Risk Assessment, Academic Press,

“INTELLIGENT” CONSENSUS PREDICTIONS FOR DAPHNIA TOXICITY OF AGROCHEMICALS Pathan Mohsin Khan 1 , Kunal Roy 2,3 , Emilio Benfenati 3

1Department of Pharmacoinformatics, National Institute of Pharmaceutical Educational and Research (NIPER), Chunilal Bhawan, 168, Manikata Main Road, 700054 Kolkata, India 2 Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, 188 Raja S C Mullick Road, 700032, Kolkata, India

3 Laboratory of Environmental Chemistry and Toxicology, Istituto Di Ricerche FarmacologicheMario Negri IRCCS, Via La Masa, 19, 20156, Milano, Italy

Introduction

Overall Methodology

Key Points

Why Consensus?

Error estimation and predictivity comparison

Molecular descriptors

(Dragon+ PaDEL)

Validation Parameters

[MAE, Tropsha,

rm2]

External

(Q2F1, Q2F2,

rm2Test,

MAE95%)

Internal

(R2, Q2,rm2

LOO, MAE95%)

References

Agrochemicals : A broad class of chemical products widely used in the

agriculture to prevent, destroy, or control the harmful organisms (insects,

fungi, microbes and weeds) or diseases, or to protect the crops before

and after harvesting to minimize the loss or to enhance the yield in

production.

Over the last few years, the ecotoxicological hazard potential of

agrochemicals has received much attention in the industries and

regulatory agencies.

There are only limited experimental ecotoxicological data available for

such compounds.

Quantitative structure-toxicity relationship (QSTR) modeling is a ligand

based statistical approach proved to be useful in data gap filling.

In the present work, we have generated QSTR models for daphnia toxicities of different classes of

agrochemicals (fungicides, herbicides, insecticides and microbiocides) employing only simple and

interpretable two-dimensional descriptors, and subsequently strictly validated using test set compounds.

The validated individual as well as global models were subjected for the “intelligent” consensus model

generation using the ICP tool (http://dtclab.webs.com/software-tools) with an objective to improve the

prediction quality and reduced prediction errors .

The individual as well as consensus models were used to predict the toxicity of an external dataset of

biocides to determine the predictive ability of models.

As per the developed models, generally, lipophilicity, number of X (halogen) on an aromatic ring,

number of substituted benzene C(sp2), number of chlorine atoms, frequency of C - Cl at topological

distance 5, number of multiple bonds, number of heavy atoms, number of rotatable bonds, and an

increase in carbon chain length increase the toxicity while polarity, presence of ether moiety in aliphatic

chain, presence of two oxygen atoms at a topological distance 8, branching in molecules, count of

hydrogen bond acceptor atoms and/or polar surface area decrease the toxicity.

Ring

descriptors

E-state

descriptors

Molecular

properties

Connectivity

indices

ETA

descriptors

Functional group

count

2D atom

pair

Validation

External set prediction

(Biocides dataset)

ECOSAR

Comparison

Summary of feature responsible for toxicity of agrochemicals

Comparison between our models and ECOSAR prediction

A single model can’t guarantee the best quality predictions for all compds

Entire chemical space is not covered in a single model while consensus

combines multiple features of different models covering wider range

Helps to reduce error of predictions

Four types of consensus proposed:

I. CM0:- Simple average of predictions

II. CM1:- Average of predictions from the 'qualified' individual models

III. CM2:- Weighted average predictions from 'qualified' Individual models

IV. CM3:- Best selection of predictions (compound-wise) from 'qualified'

Individual models.

Prediction of a models are not reliable unless compared with standards and used

external dataset compounds.

we have employed an external dataset of 67 biocides, The quality of predictions

(R2pred) for three individual models were 0.47, 0.50 and 0.47 with mean

absolute error of 1.407, 1.395, and 1.422 respectively, while the prediction

quality for the consensus model-3 is 0.49 but the mean absolute error reduced to

1.37.

Comparison of error (RMSEp) was made with ECOSAR

ECOSAR is preferred widely for ecotoxicological prediction of organic

chemicals

Comparison was made only with test set of the models.

Our models offered better predictive efficiency and larger chemical domain.

Consensus models offered better predictivity when compared with simple

QSTR models.

Global models

Ind

ivid

ual

Mod

els

Fu

ngic

ides

model

s

mic

robio

cid

es m

odel

s

Her

bic

ides

model

s

Inse

ctic

ides

model

s

Ntrain = 81 and Ntest = 26 Ntrain = 36 and Ntest = 12

Ntrain = 112 and Ntest = 35 Ntrain = 111 and Ntest = 36

Global models

Waxman MF, The agrochemical and pesticides safety handbook. CRC Press, 1998.

Roy K, Kar S, Das RN, Understanding the Basics of QSAR for Applications in Pharmaceutical Sciences and Risk

Assessment, Academic Press, NY, 2015.

Roy K, Ambure P, Kar S, Ojha PK, Is it possible to improve the quality of predictions from an“intelligent” use of multiple

QSAR/QSPR/QSTR models? J Chemom 32, 2018, e2992.

US EPA, The ECOSAR The ECOSAR (ECOlogical Structure Activity Relationship) Class Program, 2012.

Acknowledgement PMK thanks the Department of Pharmaceuticals, Ministry of Chemicals and Fertilizers, Govt. of India for a fellowship.

KR thanks the European Commission for financial assistance under the project VERMEER [LIFE16 ENV/IT/000167].

GADCV

PLS

Ntr

ain

= 3

13

Nte

st =

105

Daphnia toxicity data

(pEC50 values)

Flutianil

pEC50 = 6.98

Propiconazole

pEC50 = 4.02

Pentachlorophenol

pEC50 = 5.95

Acetic Acid

pEC50 = 2.08

Cyfluthrin

pEC50 = 9.23

Propylene glycol

pEC50 = 1.83

Tributyltin methacrylate

pEC50 = 2.95

Tributyltin oxide

pEC50 = 7.90