View
1
Download
0
Category
Preview:
Citation preview
Consensus QSAR modeling for the toxicity of organic chemicals against Pseudokirchneriella subcapitata using 2D descriptors Kabiruddin Khan1, Kunal Roy1*
1Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, 188 Raja S C Mullick Road, 700032, Kolkata, India
Ecotoxicity of organic chemicals (OCs)
Protocol followed
• OCs-Most extensively used chemical
class
• Potent PBT potential candidates
• Consist of Pharmaceuticals, UV filters,
hormones, biocides and endocrine
disrupting chemicals etc
• Environment risk assessment of many
OCs became mandatory in the US (since
1995) and EU (since 1993)
• NOEC and ECx are most widely used in
ecotoxicity measurement
What is NOEC?
• No observed-effective concentration
• Highest concentration at which there is
no significant change in control
population against the measured
endpoint
• Represents lower threshold for toxicity
• Denotes x% effective concentration
where x% is change in the populations
of studied species (x can be 5-100%)
What is ECx?
ECx replacing NOEC?
• OECD and SETAC recommend
replacement of NOEC with ECx (x = 5-
20)
• In USEPA, effluent testing relies on
NOEC and EC25
• In EU-REACH regulation, NOEC and
ECx are fundamental information for
indicating toxicity
• Consensus on need of replacing NOEC
with ECx in workshop conducted by
OECD in 1996
Objectives
1. To propose robust QSAR models to
calculate EC10 and EC50 concentrations
in algae
2. To demonstrate advantage of consensus
modeling in error reduction
3. To identify features responsible for algal
immobility
4. To fill the data gap of untested/unknown
compounds using developed models
5. To analyze the predictive efficiency of the
developed models
Applicability domain assessment pEC50
References
Reduction in MAE (mean absolute error) and predictivity enhancement
Data gap filling
Conclusion
M. no Equations Div LV 𝑹𝟐 𝑸𝟐
𝑴𝑨𝑬𝒕𝒓𝒂𝒊𝒏𝟏𝟎𝟎%
𝑴𝑨𝑬𝒕𝒓𝒂𝒊𝒏𝟗𝟓%
Prediction
qualitya 𝑸𝑭𝟏𝟐
𝑴𝑨𝑬𝒕𝒆𝒔𝒕𝟏𝟎𝟎%
𝑴𝑨𝑬𝒕𝒆𝒔𝒕𝟗𝟓% Prediction
qualitya
1 pEC50Algae (mols/L) = 3.67917 +
0.02888 × CrippenMR + 0.39725 ×
MLOGP + 0.60323 × B06[C − N] +
0.78557 × B05[C − Cl] + 1.16506 ×
B02[N − S] − 1.46683 × nSO2OH
ED 3 0.72 0.70 0.69 0.62 Moderate 0.70 0.67 0.60 Good
2 pEC50Algae (mols/L) = 3.90575 +
0.02754 × CrippenMR + 0.2973 ×
Log Kow + 0.67324 × B06[C − N] +
0.83114 × B05[C − Cl] + 1.12005 ×
B02[N − S] − 1.05917 × nSO2OH
ED 3 0.72 0.70 0.70 0.63 Moderate 0.71 0.63 0.56 Moderate
3 pEC50Algae (mols/L) = 3.90595 +
0.03373 × CrippenMR + 0.28608 ×
Log Kow + 0.96872 × B05[C − Cl] +
1.07463 × B02[N − S] − 0.20679 × H −
051 − 1.23943 × nSO2OH
ED 3 0.72 0.70 0.69 0.62 Moderate 0.71 0.64 0.57 Good
4 pEC50Algae (mols/L) =
3.943 ] + 0.02707 × CrippenMR +
0.30156 × Log Kow + 0.67215 ×
B06[C − N] + 0.81936 × B05[C − Cl] +
0.61348 × F02[N − S] − 1.15045 ×
nSO2OH
ED 3 0.71 0.70 0.70 0.63 Moderate 0.70 0.64 0.57 Moderate
5 pEC50Algae (mols/L) = 3.92235 +
0.03364 × CrippenMR + 0.28524 ×
Log Kow + 0.96588 × B05[C − Cl] +
0.65962 × F02[N − S] − 0.20618 × H −
051 − 1.23579 × nSO2OH
ED 3 0.71 0.70 0.70 0.63 Moderate 0.70 0.65 0.59 Moderate
CM0 Average of predictions from all
input Individual models 0.72 0.62 0.55 Good
CM1 Average of predictions from
'qualified' Individual models 0.72 0.62 0.55 Good
CM2 Weighted average predictions from 'qualified' Individual
models 0.72 0.62 0.54 Good
CM3 Best selection of predictions (compound-wise) from 'qualified' Individual models[Winner Model] 0.71 0.61 0.53 Good
M. No Equations Div LV 𝑹𝟐 𝑸𝟐 𝑴𝑨𝑬𝒕𝒓𝒂𝒊𝒏𝟏𝟎𝟎% 𝑴𝑨𝑬𝒕𝒓𝒂𝒊𝒏𝟗𝟓%
Prediction
qualitya 𝑸𝑭𝟏𝟐 𝑴𝑨𝑬𝒕𝒆𝒔𝒕𝟏𝟎𝟎% 𝑴𝑨𝑬𝒕𝒆𝒔𝒕𝟗𝟓%
Prediction
qualitya
1 pEC10Algae (mol/L) = 1.93569 +
0.02315 × CrippenMR + 0.00496 ×
MW + 0.3396 × MLOGP + 0.64439 ×
Log Kow − 1.71498 × nSO2OH +
1.48957 × B02[N − S] − 2.24679 ×
X3A − 0.17172 × H − 051
SR 2 0.70 0.68 0.70 0.62 Moderate 0.77 0.61 0.56 Good
2 pEC10Algae (mol/L) = 1.93569 +
0.02315 × CrippenMR + 0.00496 ×
MW + 0.3396 × MLOGP + 0.64439 ×
B10[C − N] − 1.71498 × nSO2OH +
1.48957 × F01[C − C] − 2.24679 ×
X3A − 0.17172 × H − 051
SR 2 0.70 0.68 0.70 0.62 Moderate 0.77 0.61 0.56 Moderate
3 pEC10Algae (mol/L) = 1.93569 +
0.02315 × CrippenMR + 0.00496 ×
MW + 0.3396 × MLOGP + 0.64439 ×
B10[C − N] − 1.71498 × nSO2OH +
1.48957 × B02[N − S] − 2.24679 ×
X3A − 0.17172 × H − 051
SR 2 0.70 0.68 0.70 0.62 Moderate 0.77 0.61 0.56 Good
4 pEC10Algae (mol/L) = 1.93569 +
0.02315 × CrippenMR + 0.00496 ×
MW + 0.3396 × MLOGP + 0.64439 ×
Log Kow − 1.71498 × nSO2OH +
1.48957 × F01[C − C] − 2.24679 ×
X3A − 0.17172 × H − 051
SR 2 0.70 0.68 0.70 0.62 Moderate 0.77 0.61 0.56 Good
5 pEC10Algae (mol/L) = 1.71202 +
0.02658 × CrippenMR + 0.00556 ×
MW + 0.349452 × MLOGP − 1.8637 ×
nSO2OH + 1.26739 × F01[C − C] −
1.98596 × X3A − 0.17195 × H − 051 −
0.01353 × F04[C − S]
SR 2 0.70 0.68 0.71 0.63 Moderate 0.76 0.62 0.57 Good
CM0 Average of predictions from all input Individual models [Winner Model]
0.77 0.61 0.55 Good
CM1 Average of predictions from 'qualified' Individual
models
0.76 0.62 0.56 Good
CM2 Weighted average predictions from 'qualified' Individual
models
0.76 0.62 0.56 Good
CM3 Best selection of predictions (compound-wise) from 'qualified' Individual models
0.76 0.63 0.57 Moderate
Acknowledgement
• KK thanks Indian Council of Medical Research, New Delhi for financial
support in the form of a senior research fellowship.
• Externally validated consensus QSAR models (pEC10, pEC50) proposed to
predict acute toxicity of organic compounds in green algae Pseudokirchneriella
subcapitata.
• Models were validated rigorously following strict OECD guidelines for QSAR
validation
• The models were developed using genetic algorithm partial least square
technique (GA-PLS) followed by consensus approach
• The QSAR models were characterized by large applicability domain, verified by
different internationally accepted validation metrics
• The developed QSAR model greatly focuses on logP dependence of toxicity of
organic chemicals
• The variables like MLOGP, MR and Log Kow (observed lipophilicity) exert
highest positive contribution in controlling aquatic toxicity
• More polar groups such as oxygen (nSO2OH moiety) and alpha hydrogen (H-
051 moiety) showed an inverse correlation with algal toxicity
• The results emphasize on the use of consensus modeling and its application in
reducing prediction error
• The obtained QSAR models can act as helpful tool for identification and
prioritization for chemicals of highest concern, production of safer alternatives in
order to accomplish REACH regulations on hazardous chemicals • Performed modeling approach will be useful for the design of safer replacement solutions of organic
compounds against green algae, and the developed models can be applied to detect a priori, using
chemical structures, the potential contaminants toxic mainly against Pseudokirchneriella subcapitata
• Consensus models (CM3)) were applied to predict 64 additional synthetic
organic compounds having no absolute observed responses reported in the source
paper of Kusk et al
• Initially, individual five PLS models (pEC50) were used to derive the acute
predicted values of 64 compounds, the models could predict all 64 compounds
with confidence suggested by “Prediction reliability indicator” tool
• In the second step, the obtained consensus model 3 was employed in the
estimation of acute predicted toxicities of 64 compounds, and the predictions
were compared with the range values provided in the manuscript
• The model could predict accurately 53 (82%) out of 64 compounds with
deviation of plus or minus 2 log unit (for the lower range value) and 51 (80%)
out of 64 compounds with deviation of plus or minus 2 log unit (for the higher
range value)
• In the external set, 37 molecules (53%) were predicted with deviation of less
than 1 log unit
• Seven molecules were outside the domain of predictions as given by the PRI tool
• The precise prediction of additional data points proves the significance of the
developed QSAR models, consensus approach and practical application of the
PRI tool in prediction of unknown/untested chemicals
• The use of multiple model derived predictions are more reliable when
compared to single QSAR models as the former can cover a large
number of features
• The developed multiple PLS models against each endpoint (pEC10,
pEC50) was further subjected to consensus modeling using intelligent
consensus predictor tool (available at
http://teqip.jdvu.ac.in/QSAR_Tools/) in order to reduce test set errors (in
terms of MAE (MAEtest100%, MAEtest95%)) and to enhance
prediction performance of the models
• Four different types of consensus modeling has been carried out in the
present study (Roy et al., 2018), namely CM0 (simple average of
predictions), CM1 (average of predictions from the 'qualified' individual
models), CM2 (weighted average predictions (WAPs) from 'qualified'
(having at least 3 compounds similar in training set for respective test set
compound) individual models) and CM3 (best selection of predictions
(compound-wise) from 'qualified' Individual models)
• In both the cases, consensus derived models were better in quality when
compared with the performance of individual models
Applicability domain assessment pEC10
• The AD assessment of the developed models was achieved using the DModX approach
implemented in the SIMCA-P tool as proposed by Wold et al. (2001)
• In case of pEC50 models, 22 outliers were present in the training set and 5 molecules were
outside the domain of applicability in the test set
• Interestingly, 20 outliers were common in the models for both endpoints, and these are 15 (an
allergen),54 (cyclohexane), 57 (insecticide),90 (an intermediate), 115 (a hazardous chemical),
136 (personnel care product (PCP) ), 143 (surfactant), 163 (flavoring agent), 172
(contaminant), 266 (reaction intermediate), 267 (a contaminant), 280 (flavoring agent), 285
(flavoring agent), 294 (a hazardous chemical), 331 (PCP ingredient), 350 (herbicide), 352 (a
hazardous chemical), 369 (contaminant), 370 (pharmaceutical) and 372 (a contaminant)
• The models developed using the pEC10 data gave 22 outliers in the
training set, and 5 chemicals in the test set were outside the domain of
applicability
• The outliers were retained in the final models as they were predicted
with moderate precision by the respective models (absolute predicted
residual being <2 log units)
1. K.O. Kusk, A.M. Christensen, and N. Nyholm, Algal growth inhibition test results of 425 organic chemical
substances. Chemosphere. 204 (2018), pp. 405-412.
2. A. Beasley, S.E. Belanger, J.L. Brill, and R.R. Otter, Evaluation and comparison of the relationship between
NOEC and EC10 or EC20 values in chronic Daphnia toxicity testing. Environ Toxicol Chem. 34 (2015), pp.
2378-2384.
3. K. Roy, and I. Mitra, On various metrics used for validation of predictive QSAR models with applications in
virtual screening and focused library design. Combinatorial chemistry & high throughput screening 14
(2011), pp. 450-474.
4. C.W. Yap, PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints. J
Comput Chem. 32 (2011), pp. 1466-1474.
5. H. Golmohammadi, Z. Dashtbozorgi, and W.E. Acree Jr, Quantitative structure–activity relationship
prediction of blood-to-brain partitioning behavior using support vector machine. European Journal of
Pharmaceutical Sciences 47 (2012), pp. 421-429.
6. R.W. Kennard, and L.A. Stone, Computer aided design of experiments. Technometrics 11 (1969), pp. 137-
148.
7. P.M. Khan, and K. Roy, Current approaches for choosing feature selection and learning algorithms in
quantitative structure-activity relationships (QSAR). Expert Opin Drug Discov. 13 (2018), pp. 1075-1089.
8. K. Roy, P. Ambure, S. Kar, and P.K. Ojha, Is it possible to improve the quality of predictions from an
“intelligent” use of multiple QSAR/QSPR/QSTR models? J. Chemom 32 (2018), pp.e2992.
9. K. Roy, P. Ambure, and S. Kar, How Precise Are Our Quantitative Structure–Activity Relationship Derived
Predictions for New Query Chemicals? ACS Omega 3 (2018), pp. 11392- 11406.
Data curation for model development Results and Analysis (process, metrics and output)
Metrics (pEC50) Metrics (pEC10)
Process
Output
Summary of features present in organic chemicals
responsible for enhancing and reducing algal toxicity
Statistical qualities of all five models developed against algae species using pEC50endpoint. aMAE criteria Statistical qualities of all five models developed against algae species using pEC10 endpoint. aMAE criteria
Like pEC50QSAR models, molar refractivity, lipophilicity, presence of sulfur/nitrogen (due to presence
of CrippenMR, MLOGP, B02[N-S], B10[C-N], F01[C-C], F04[C-S]moieties) enhanced pEC10 values,
while presence of more polar atoms like oxygen (nSO2OH moiety) reduced pEC10 values. Molecular
weight was a significant descriptor appearing in all the five QSAR models with positive contribution to
pEC10 values. Among the other descriptors, B10[CN], F01[C-C] and F04[C-S] contributed positively
while X3A contributed negatively towards pEC10.
CrippenMR (molar refractivity) gives a measure of molecular size, volume and polarizability, the former two
attributes enhance lipophilicity of the molecules while the latter has an electronic component; in other words, an
increase in MR might not always enhances size or lipophilicity of molecules. Molecules with higher molar refractivity
were generally toxic in nature irrespective of their logP values. The highly toxic compounds with higher molar
refractivity include 253 (a pharmaceutical), 72 (a cationic surfactant), 234 (a bulky molecule) and 317 (Amycin, a
pharmaceutical). Compounds 72 (a cationic surfactant), 317(a pharmaceutical), and 386 (an herbicide) were
included in the list of toxic molecules with high MR (128, 111, 94) values and very low logP (1.18, -0.87, 1.67) values.
The modeled dataset exhibited a wide range of log Kow/MLOGP values ranging from -6.7 to 7.2, covering a huge
chemical space.
Analyze Analyze
Mechanistic interpretation pEC50 Mechanistic interpretation pEC10
Structural data collection
Remove salts
Remove mixtures
Identify organic chemicals
Optimize and normalize structures
Aromatize, add explicit hydrogens and save
Recommended