1
Organic pollutants environmental fate: modeling and prediction of global persistence by molecular descriptors P.Gramatica, F.Consolaro and M.Pavan QSAR Research Unit, Dept. of Structural and Functional Biology, University of Insubria, Varese, Italy e-mail: [email protected] Web: http://fisio.dipbsf.uninsubria.it/dbsf/qsar/QSAR.html INTRODUCTION The persistence of organic compounds in various environmental compartments is mainly governed by the rates at which they are removed by chemical and/or physical processes. Half-life in various compartments is the most commonly used criteria for studying persistence, but such data, available for only a few organic compounds, vary greatly for the various compartments and depend on laboratory tests. As most literature data are reported as ranges of values, we used the average values as the input data in QSPR studies. Validated OLS regression models have been developed using different theoretical molecular descriptors to predict half-life mean values in the atmosphere, soil, surface water and groundwater for more than 90 supposed POPs of different chemical classes (pesticides, PAH, PCB,etc). All the regression models have been strongly validated and the predicted data checked for their reliability by the leverage procedure. These predicted values are obviously not the real half-life, but a reasonable estimate that have been simultaneously used in Principal Component Analysis to produce useful indexes for POP persistence: PC1 as a global persistence index and PC2 as a compartment related persistence index. These two indexes have been also modeled allowing a fast screening and ranking of organic compounds for their persistence. DATA SET DATA SET Our data set is constituted by 33 organic pollutants, mainly supposed POPs, for which half-life values in air, surface-water, groundwater and soil have been collected from Howard 2 and Rodan 3 . It must be emphasized that these values are subject to considerable variation, thus presenting single value is over-simplistic, which is why we considered the mean value of the half life range. The data of the mean value reported range were always transformed in logarithmic units to linearize the experimental range of variation. [2] Howard,P.H. et all. Handbook of Environmental Degradation Rates, (1991). [3] Rodan,B.D. et all. Screening for Persistent Organic Pollutants: Techniques To Provide a scientific basis for POPs Criteria in International Negotiations. Environ. Sci. Technol.,33(20), 3482-3488 (1999). MOLECULAR DESCRIPTORS MOLECULAR DESCRIPTORS The molecular structure has been represented by a wide set of molecular descriptors (about 170) calculated by the software DRAGON 1.0 of R.Todeschini (http://www.disat.unimib.it/chm): • mono-dimensional: counts and fragments descriptors • two-dimensional: topological descriptors • three-dimensional:3D-WHIM (Weighted Holistic Invariant Molecular) 1 [1]R.Todeschini and P.Gramatica, 3D-modelling and prediction by WHIM descriptors. Part 5. Theory development and chemical meaning of the WHIM descriptors , Quant.Struct.-Act.Relat., 16 (1997) 113-119. Are half life ranges usefully Are half life ranges usefully QSPR - modeled and predicted by theoretical QSPR - modeled and predicted by theoretical molecular descriptors? molecular descriptors? n.mod response n.obj. n.var. variables Q 2 LOO Q 2 L MO (20%) R 2 1 ground w ater 24 3 nC-E2m -E1e 89.2 88.8 91.9 2 surf.w ater 33 5 NR09-RO UV-IDM T-E1p-Vu 72.2 70.5 81.2 3 soil 30 3 IDM -E2m -G2e 77.8 76.9 83.2 4 air 32 4 NR 07-B A L-UI-E 1e 83.8 81.8 87.2 Log h.l. air = - 16.95 + 1.46 nR07 + 3.51 BAL + 0.73 UI + 15.82 E1e NR07 : number of rings with 7 atoms BAL : Balaban index UI : unsaturation index E1e : directional WHIM Log h.l. groundwater = - 0.04 + 0.09 nC + 1.14 E2m + 6.88 E1e nC : number of Carbon atoms E2m - E1e : directional WHIM GROUND WATER Experimental half life Predicted half life 2 14 15 16 18 22 24 28 31 32 34 38 39 40 79 80 81 82 83 84 85 86 90 92 0.2 0.6 1.0 1.4 1.8 2.2 2.6 0.2 0.6 1.0 1.4 1.8 2.2 2.6 Log h.l. surf.water = -0.66 + 0.47nR09 - 0.06ROUV + 0.01IDMT + 14.37E1p + 0.09Vu nR09 : number of rings with 9 atoms ROUV : Rouvray index IDMT: total inf. index on the distance magnitude E1p : atomic distribution directional WHIM Vu : global shape and dimensional WHIM SURFACE WATER Experimental half life Predicted half life 2 14 15 16 18 22 24 27 28 31 32 34 37 38 39 40 44 79 80 81 82 83 84 85 86 89 90 91 92 93 94 95 96 -3 -2 -1 0 1 2 3 4 5 -3 -2 -1 0 1 2 3 4 5 Log h.l. soil = - 3.46 + 0.58 IDM + 0.99 E2m + 0.48 G2e IDM : mean inf. index on distance magnitude E2m- G2e : directional WHIMs SOIL Experimental half life Predicted half life 2 14 15 16 18 22 24 27 28 31 32 34 38 39 40 79 80 81 82 83 84 85 86 89 90 91 92 93 95 96 -0.2 0.2 0.6 1.0 1.4 1.8 2.2 -0.2 0.2 0.6 1.0 1.4 1.8 2.2 AIR Experimental half life Predicted half life 2 14 15 16 18 22 24 27 28 31 32 34 37 38 39 40 44 79 80 81 82 83 84 85 86 89 90 91 92 93 94 96 -2 -1 0 1 2 3 -2 -1 0 1 2 3 The PC scores have been used as indexes for POP persistence: PC1 (EV %= 51.2) as a global persistence index and PC2 (EV%= 28.3) as a compartment related persistence index. These two indexes have been also modeled by molecular descriptors selected by Genetic Algorithm with satisfactory predictive power; this allows a fast screening and ranking of organic compounds for their persistence. The data predicted by this QSPR approach, based on few descriptors of the molecular structure, could be usefully applied in organic pollutants environmental fate modelling, for not yet synthesised chemicals too. PC1 and PC2 scores as persistence PC1 and PC2 scores as persistence indexes indexes PC1 (overall persistence index)= 9.22 + 3.14 AAC- 6.32 E2s – 17.49 E1e – 0.16 Tm n = 91 R 2 = 85.1 Q 2 LOO = 82.6 Q 2 LMO = 82.2 s = 0.565 F 86 = 122.878 SDEC= 0.549 SDEP= 0.595 PC2 (media persistence index)=10.31– 8.29IDE– 0.48G2p+9.93 E1p+5.46 Ks+0.09Ve n = 91 R 2 = 78.9 Q 2 LOO = 75.1 Q 2 LMO = 74.5 s = 0.502 F 6, 85 = 63.762 SDEC= 0.485 SDEP= 0.527 PC 1 PC 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 77 78 79 80 81 82 83 84 85 86 87 89 90 91 92 93 94 96 -4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4 groundw. air surf. water Half-lif Half-lif Half-lif Half-lif soil PERSISTENCE SOLUBLES and VOLATILES SORBED Principal Component Analysis on experimental Principal Component Analysis on experimental plus QSPR-predicted half life data plus QSPR-predicted half life data Cum E.V.% = 79.5 Principal Component Analysis on Principal Component Analysis on QSPR-predicted half life data QSPR-predicted half life data Cum E.V.% = 78.6 PC 1 PC 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 77 78 79 80 81 82 83 84 86 87 89 91 92 94 96 -3 -2 -1 0 1 2 3 4 -5 -4 -3 -2 -1 0 1 2 3 Half-life Half-life Half-life Half-life surf.wat. air groundw. soil SOLUBLES and VOLATILES SORBED PERSISTENCE PC 1 PC 1 scores Predicted PC 1 scores 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 77 79 80 81 82 83 84 85 86 87 89 90 91 92 93 94 96 -4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4 PC 2 PC 2 scores Predicted PC 2 scores 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 77 78 79 80 81 82 83 84 85 86 87 89 90 91 92 93 94 96 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3

Organic pollutants environmental fate: modeling and prediction of global persistence by molecular descriptors P.Gramatica, F.Consolaro and M.Pavan QSAR

Embed Size (px)

Citation preview

Page 1: Organic pollutants environmental fate: modeling and prediction of global persistence by molecular descriptors P.Gramatica, F.Consolaro and M.Pavan QSAR

Organic pollutants environmental fate: modeling and prediction of global persistence by molecular descriptors

P.Gramatica, F.Consolaro and M.Pavan

QSAR Research Unit, Dept. of Structural and Functional Biology, University of Insubria, Varese, Italy

e-mail: [email protected]

Web: http://fisio.dipbsf.uninsubria.it/dbsf/qsar/QSAR.html

INTRODUCTIONThe persistence of organic compounds in various environmental compartments is mainly governed by the rates at which they are removed by chemical and/or physical processes. Half-life in various compartments is the most commonly used criteria for studying persistence, but such data, available for only a few organic compounds, vary greatly for the various compartments and depend on laboratory tests. As most literature data are reported as ranges of values, we used the average values as the input data in QSPR studies.

Validated OLS regression models have been developed using different theoretical molecular descriptors to predict half-life mean values in the atmosphere, soil, surface water and groundwater for more than 90 supposed POPs of different chemical classes (pesticides, PAH, PCB,etc). All the regression models have been strongly validated and the predicted data checked for their reliability by the leverage procedure.

These predicted values are obviously not the real half-life, but a reasonable estimate that have been simultaneously used in Principal Component Analysis to produce useful indexes for POP persistence: PC1 as a global persistence index and PC2 as a compartment related persistence index. These two indexes have been also modeled allowing a fast screening and ranking of organic compounds for their persistence.

DATA SETDATA SETOur data set is constituted by 33 organic pollutants, mainly supposed POPs, for which half-life values in air, surface-water, groundwater and soil have been collected from Howard2 and Rodan3

. It must be emphasized that these values are subject to considerable variation, thus presenting single value is over-simplistic, which is why we considered the mean value of the half life range. The data of the mean value reported range were always transformed in logarithmic units to linearize the experimental range of variation. [2] Howard,P.H. et all. Handbook of Environmental Degradation Rates, (1991). [3] Rodan,B.D. et all. Screening for Persistent Organic Pollutants: Techniques To Provide a scientific basis for POPs Criteria in International Negotiations. Environ. Sci. Technol.,33(20), 3482-3488 (1999).

MOLECULAR DESCRIPTORSMOLECULAR DESCRIPTORSThe molecular structure has been represented by a wide set of molecular descriptors (about 170) calculated by the software DRAGON 1.0 of R.Todeschini (http://www.disat.unimib.it/chm):• mono-dimensional: counts and fragments descriptors• two-dimensional: topological descriptors• three-dimensional:3D-WHIM (Weighted Holistic Invariant Molecular)1

[1]R.Todeschini and P.Gramatica, 3D-modelling and prediction by WHIM descriptors. Part 5. Theory development and chemical meaning of the WHIM descriptors, Quant.Struct.-Act.Relat., 16 (1997) 113-119.

Are half life ranges usefully Are half life ranges usefully QSPR - modeled and predicted by theoretical molecular descriptors?QSPR - modeled and predicted by theoretical molecular descriptors?

Are half life ranges usefully Are half life ranges usefully QSPR - modeled and predicted by theoretical molecular descriptors?QSPR - modeled and predicted by theoretical molecular descriptors?

n.mod response n.obj. n.var. variables Q 2LOO Q 2

LMO (20%) R 2

1 ground water 24 3 nC-E2m-E1e 89.2 88.8 91.92 surf. water 33 5 NR09-ROUV-IDMT-E1p-Vu 72.2 70.5 81.23 soil 30 3 IDM-E2m-G2e 77.8 76.9 83.24 air 32 4 NR07-BAL-UI-E1e 83.8 81.8 87.2

Log h.l. air = - 16.95 + 1.46 nR07 + 3.51 BAL + 0.73 UI + 15.82 E1e

NR07 : number of rings with 7 atomsBAL : Balaban indexUI : unsaturation indexE1e : directional WHIM

Log h.l. groundwater = - 0.04 + 0.09 nC + 1.14 E2m + 6.88 E1e

nC : number of Carbon atomsE2m - E1e : directional WHIM

GROUND WATER

Experimental half life

Pre

dict

ed h

alf l

ife

214

15

16

1822

24

2831

3234

38

3940

79

80

818283

848586

90

92

0.2

0.6

1.0

1.4

1.8

2.2

2.6

0.2 0.6 1.0 1.4 1.8 2.2 2.6

Log h.l. surf.water = -0.66 + 0.47nR09 - 0.06ROUV + 0.01IDMT + 14.37E1p + 0.09VunR09 : number of rings with 9 atomsROUV : Rouvray indexIDMT: total inf. index on the distance magnitudeE1p : atomic distribution directional WHIM Vu : global shape and dimensional WHIM

SURFACE WATER

Experimental half life

Pre

dict

ed h

alf l

ife 214

15

16

18 22

24

2728

31

32

34

37

38

39

40

44

79

80

81

82

8384

85

86

89

90

91

92

93

94

95

96

-3

-2

-1

0

1

2

3

4

5

-3 -2 -1 0 1 2 3 4 5

Log h.l. soil = - 3.46 + 0.58 IDM + 0.99 E2m + 0.48 G2e

IDM : mean inf. index on distance magnitudeE2m- G2e : directional WHIMs

SOIL

Experimental half life

Pre

dict

ed h

alf l

ife

2 141516

18

22

24 27

28

31

323438

39

40

79 80

81

8283

8485

86

89

90

91

92 93

95

96

-0.2

0.2

0.6

1.0

1.4

1.8

2.2

-0.2 0.2 0.6 1.0 1.4 1.8 2.2

AIR

Experimental half life

Pre

dict

ed h

alf l

ife

2

14

15 1618

2224

2728

31

32343738

39

40

4479

80818283

8485

8689

90 91

92

93

94

96

-2

-1

0

1

2

3

-2 -1 0 1 2 3

The PC scores have been used as indexes for POP persistence: PC1 (EV%= 51.2) as a global persistence index and PC2 (EV%= 28.3) as a compartment related persistence index.

These two indexes have been also modeled by molecular descriptors selected by Genetic Algorithm with satisfactory predictive power; this allows a fast screening and ranking of organic compounds for their persistence. The data predicted by this QSPR approach, based on few descriptors of the molecular structure, could be usefully applied in organic pollutants environmental fate modelling, for not yet synthesised chemicals too.

PC1 and PC2 scores as persistence indexesPC1 and PC2 scores as persistence indexesPC1 and PC2 scores as persistence indexesPC1 and PC2 scores as persistence indexesPC1 (overall persistence index)= 9.22 + 3.14 AAC- 6.32 E2s – 17.49 E1e – 0.16 Tm

n = 91 R2 = 85.1 Q2LOO = 82.6 Q2

LMO = 82.2 s = 0.565 F 86 = 122.878SDEC= 0.549 SDEP= 0.595

PC2 (media persistence index)=10.31– 8.29IDE– 0.48G2p+9.93 E1p+5.46 Ks+0.09Ven = 91 R2 = 78.9 Q2

LOO = 75.1 Q2LMO = 74.5 s = 0.502 F6, 85 = 63.762

SDEC= 0.485 SDEP= 0.527

PC 1

PC

2

1

2

34

5678

9

101112

13

14 15

16

17

18

19 20

21

22

2324

2526

27

28 29

30 31

3233

34

35

36

37

3839

40

41

42

45

4647

48

495051

52

53

54

55

56 57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

7778

7980

81

82

8384 85

86

87

8990

91

9293

9496

-4

-3

-2

-1

0

1

2

3

4

-4 -3 -2 -1 0 1 2 3 4

groundw.

air surf. water

Half-lifeHalf-life

Half-lifeHalf-life

soil

PERSISTENCE

SO

LU

BL

ES

an

d V

OL

AT

ILE

SS

OR

BE

D

Principal Component Analysis on experimental Principal Component Analysis on experimental plus QSPR-predicted half life dataplus QSPR-predicted half life data

Cum E.V.% = 79.5

PERSISTENCE

SO

LU

BL

ES

an

d V

OL

AT

ILE

SS

OR

BE

D

Principal Component Analysis onPrincipal Component Analysis onQSPR-predicted half life dataQSPR-predicted half life data

Cum E.V.% = 78.6

PC 1

PC

2

1

2

3

4

5678

9

10

1112

13

14151617

18 19 20

21

2223

2425

26

2728

29

30

31

32

33

34

35

36

37

3839

40

41

42

44

45

4647

48

4950

51

52

53

54

55

56 57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

7778

7980

81

82

83

84

86

87

89

91

9294

96

-3

-2

-1

0

1

2

3

4

-5 -4 -3 -2 -1 0 1 2 3

Half-lifeHalf-life

Half-life

Half-life

surf.wat.air

groundw.

soil

SO

LU

BL

ES

an

d V

OL

AT

ILE

SS

OR

BE

D

PERSISTENCE

PC 1

PC 1 scores

Pre

dict

ed P

C 1

sco

res 1

2

345

6

7

8

9

10

1112

13

1415

16

17 1819

20

21

22

2324 25

26

27

2829

3031

32

33

34

35

36

37

3839

40

4142

4546

47

4849

50

51

52

53

54

55

56

57

58

59 60

61

62

63

6465

6667

68

69

70

7172

73

74

7577

79

80

81

8283

84

8586

87

89

90

91

92

93

94

96

-4

-3

-2

-1

0

1

2

3

4

-4 -3 -2 -1 0 1 2 3 4

PC 2

PC 2 scores

Pre

dict

ed P

C 2

sco

res

1

2 345

678

9

101112

1314

15 1617

181920

21 2223

24

2526

27 2829

3031

3233

34

35

36

3738

39

40

41

42

45

4647

48

49

50

51

52

53

54

55

565758

59

60

61 62

63

64

65

66

67 68

69

70

7172

73

74

7577

78

79

8081

82

838485

86

87

89

90

91

92

93

94

96

-3

-2

-1

0

1

2

3

-3 -2 -1 0 1 2 3