Upload
cameron-dorsey
View
213
Download
1
Embed Size (px)
Citation preview
Organic pollutants environmental fate: modeling and prediction of global persistence by molecular descriptors
P.Gramatica, F.Consolaro and M.Pavan
QSAR Research Unit, Dept. of Structural and Functional Biology, University of Insubria, Varese, Italy
e-mail: [email protected]
Web: http://fisio.dipbsf.uninsubria.it/dbsf/qsar/QSAR.html
INTRODUCTIONThe persistence of organic compounds in various environmental compartments is mainly governed by the rates at which they are removed by chemical and/or physical processes. Half-life in various compartments is the most commonly used criteria for studying persistence, but such data, available for only a few organic compounds, vary greatly for the various compartments and depend on laboratory tests. As most literature data are reported as ranges of values, we used the average values as the input data in QSPR studies.
Validated OLS regression models have been developed using different theoretical molecular descriptors to predict half-life mean values in the atmosphere, soil, surface water and groundwater for more than 90 supposed POPs of different chemical classes (pesticides, PAH, PCB,etc). All the regression models have been strongly validated and the predicted data checked for their reliability by the leverage procedure.
These predicted values are obviously not the real half-life, but a reasonable estimate that have been simultaneously used in Principal Component Analysis to produce useful indexes for POP persistence: PC1 as a global persistence index and PC2 as a compartment related persistence index. These two indexes have been also modeled allowing a fast screening and ranking of organic compounds for their persistence.
DATA SETDATA SETOur data set is constituted by 33 organic pollutants, mainly supposed POPs, for which half-life values in air, surface-water, groundwater and soil have been collected from Howard2 and Rodan3
. It must be emphasized that these values are subject to considerable variation, thus presenting single value is over-simplistic, which is why we considered the mean value of the half life range. The data of the mean value reported range were always transformed in logarithmic units to linearize the experimental range of variation. [2] Howard,P.H. et all. Handbook of Environmental Degradation Rates, (1991). [3] Rodan,B.D. et all. Screening for Persistent Organic Pollutants: Techniques To Provide a scientific basis for POPs Criteria in International Negotiations. Environ. Sci. Technol.,33(20), 3482-3488 (1999).
MOLECULAR DESCRIPTORSMOLECULAR DESCRIPTORSThe molecular structure has been represented by a wide set of molecular descriptors (about 170) calculated by the software DRAGON 1.0 of R.Todeschini (http://www.disat.unimib.it/chm):• mono-dimensional: counts and fragments descriptors• two-dimensional: topological descriptors• three-dimensional:3D-WHIM (Weighted Holistic Invariant Molecular)1
[1]R.Todeschini and P.Gramatica, 3D-modelling and prediction by WHIM descriptors. Part 5. Theory development and chemical meaning of the WHIM descriptors, Quant.Struct.-Act.Relat., 16 (1997) 113-119.
Are half life ranges usefully Are half life ranges usefully QSPR - modeled and predicted by theoretical molecular descriptors?QSPR - modeled and predicted by theoretical molecular descriptors?
Are half life ranges usefully Are half life ranges usefully QSPR - modeled and predicted by theoretical molecular descriptors?QSPR - modeled and predicted by theoretical molecular descriptors?
n.mod response n.obj. n.var. variables Q 2LOO Q 2
LMO (20%) R 2
1 ground water 24 3 nC-E2m-E1e 89.2 88.8 91.92 surf. water 33 5 NR09-ROUV-IDMT-E1p-Vu 72.2 70.5 81.23 soil 30 3 IDM-E2m-G2e 77.8 76.9 83.24 air 32 4 NR07-BAL-UI-E1e 83.8 81.8 87.2
Log h.l. air = - 16.95 + 1.46 nR07 + 3.51 BAL + 0.73 UI + 15.82 E1e
NR07 : number of rings with 7 atomsBAL : Balaban indexUI : unsaturation indexE1e : directional WHIM
Log h.l. groundwater = - 0.04 + 0.09 nC + 1.14 E2m + 6.88 E1e
nC : number of Carbon atomsE2m - E1e : directional WHIM
GROUND WATER
Experimental half life
Pre
dict
ed h
alf l
ife
214
15
16
1822
24
2831
3234
38
3940
79
80
818283
848586
90
92
0.2
0.6
1.0
1.4
1.8
2.2
2.6
0.2 0.6 1.0 1.4 1.8 2.2 2.6
Log h.l. surf.water = -0.66 + 0.47nR09 - 0.06ROUV + 0.01IDMT + 14.37E1p + 0.09VunR09 : number of rings with 9 atomsROUV : Rouvray indexIDMT: total inf. index on the distance magnitudeE1p : atomic distribution directional WHIM Vu : global shape and dimensional WHIM
SURFACE WATER
Experimental half life
Pre
dict
ed h
alf l
ife 214
15
16
18 22
24
2728
31
32
34
37
38
39
40
44
79
80
81
82
8384
85
86
89
90
91
92
93
94
95
96
-3
-2
-1
0
1
2
3
4
5
-3 -2 -1 0 1 2 3 4 5
Log h.l. soil = - 3.46 + 0.58 IDM + 0.99 E2m + 0.48 G2e
IDM : mean inf. index on distance magnitudeE2m- G2e : directional WHIMs
SOIL
Experimental half life
Pre
dict
ed h
alf l
ife
2 141516
18
22
24 27
28
31
323438
39
40
79 80
81
8283
8485
86
89
90
91
92 93
95
96
-0.2
0.2
0.6
1.0
1.4
1.8
2.2
-0.2 0.2 0.6 1.0 1.4 1.8 2.2
AIR
Experimental half life
Pre
dict
ed h
alf l
ife
2
14
15 1618
2224
2728
31
32343738
39
40
4479
80818283
8485
8689
90 91
92
93
94
96
-2
-1
0
1
2
3
-2 -1 0 1 2 3
The PC scores have been used as indexes for POP persistence: PC1 (EV%= 51.2) as a global persistence index and PC2 (EV%= 28.3) as a compartment related persistence index.
These two indexes have been also modeled by molecular descriptors selected by Genetic Algorithm with satisfactory predictive power; this allows a fast screening and ranking of organic compounds for their persistence. The data predicted by this QSPR approach, based on few descriptors of the molecular structure, could be usefully applied in organic pollutants environmental fate modelling, for not yet synthesised chemicals too.
PC1 and PC2 scores as persistence indexesPC1 and PC2 scores as persistence indexesPC1 and PC2 scores as persistence indexesPC1 and PC2 scores as persistence indexesPC1 (overall persistence index)= 9.22 + 3.14 AAC- 6.32 E2s – 17.49 E1e – 0.16 Tm
n = 91 R2 = 85.1 Q2LOO = 82.6 Q2
LMO = 82.2 s = 0.565 F 86 = 122.878SDEC= 0.549 SDEP= 0.595
PC2 (media persistence index)=10.31– 8.29IDE– 0.48G2p+9.93 E1p+5.46 Ks+0.09Ven = 91 R2 = 78.9 Q2
LOO = 75.1 Q2LMO = 74.5 s = 0.502 F6, 85 = 63.762
SDEC= 0.485 SDEP= 0.527
PC 1
PC
2
1
2
34
5678
9
101112
13
14 15
16
17
18
19 20
21
22
2324
2526
27
28 29
30 31
3233
34
35
36
37
3839
40
41
42
45
4647
48
495051
52
53
54
55
56 57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
7778
7980
81
82
8384 85
86
87
8990
91
9293
9496
-4
-3
-2
-1
0
1
2
3
4
-4 -3 -2 -1 0 1 2 3 4
groundw.
air surf. water
Half-lifeHalf-life
Half-lifeHalf-life
soil
PERSISTENCE
SO
LU
BL
ES
an
d V
OL
AT
ILE
SS
OR
BE
D
Principal Component Analysis on experimental Principal Component Analysis on experimental plus QSPR-predicted half life dataplus QSPR-predicted half life data
Cum E.V.% = 79.5
PERSISTENCE
SO
LU
BL
ES
an
d V
OL
AT
ILE
SS
OR
BE
D
Principal Component Analysis onPrincipal Component Analysis onQSPR-predicted half life dataQSPR-predicted half life data
Cum E.V.% = 78.6
PC 1
PC
2
1
2
3
4
5678
9
10
1112
13
14151617
18 19 20
21
2223
2425
26
2728
29
30
31
32
33
34
35
36
37
3839
40
41
42
44
45
4647
48
4950
51
52
53
54
55
56 57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
7778
7980
81
82
83
84
86
87
89
91
9294
96
-3
-2
-1
0
1
2
3
4
-5 -4 -3 -2 -1 0 1 2 3
Half-lifeHalf-life
Half-life
Half-life
surf.wat.air
groundw.
soil
SO
LU
BL
ES
an
d V
OL
AT
ILE
SS
OR
BE
D
PERSISTENCE
PC 1
PC 1 scores
Pre
dict
ed P
C 1
sco
res 1
2
345
6
7
8
9
10
1112
13
1415
16
17 1819
20
21
22
2324 25
26
27
2829
3031
32
33
34
35
36
37
3839
40
4142
4546
47
4849
50
51
52
53
54
55
56
57
58
59 60
61
62
63
6465
6667
68
69
70
7172
73
74
7577
79
80
81
8283
84
8586
87
89
90
91
92
93
94
96
-4
-3
-2
-1
0
1
2
3
4
-4 -3 -2 -1 0 1 2 3 4
PC 2
PC 2 scores
Pre
dict
ed P
C 2
sco
res
1
2 345
678
9
101112
1314
15 1617
181920
21 2223
24
2526
27 2829
3031
3233
34
35
36
3738
39
40
41
42
45
4647
48
49
50
51
52
53
54
55
565758
59
60
61 62
63
64
65
66
67 68
69
70
7172
73
74
7577
78
79
8081
82
838485
86
87
89
90
91
92
93
94
96
-3
-2
-1
0
1
2
3
-3 -2 -1 0 1 2 3