29
1 Data mining of toxic chemicals & database- based toxicity prediction Jiansuo Wang & Luhua Lai Institute of Physical Chemistry, Peking University P. R. China

Data mining of toxic chemicals & database-based toxicity prediction

  • Upload
    kim-gay

  • View
    19

  • Download
    1

Embed Size (px)

DESCRIPTION

Data mining of toxic chemicals & database-based toxicity prediction. Jiansuo Wang & Luhua Lai. Institute of Physical Chemistry, Peking University. P. R. China. Our goal: to introduce risk assessment of chemicals in the early stage of drug design. Candidates generated by computer aid. - PowerPoint PPT Presentation

Citation preview

Page 1: Data mining of toxic chemicals & database-based toxicity prediction

1

Data mining of toxic chemicals & database-based toxicity prediction

Jiansuo Wang & Luhua Lai

Institute of Physical Chemistry, Peking University P. R. China

Page 2: Data mining of toxic chemicals & database-based toxicity prediction

2

Our goal: to introduce risk assessment of chemicals in the early stage of drug design.

Candidates generated by computer aid

Initial screening of chemical toxicity

Leads which are a bit “safer”

Page 3: Data mining of toxic chemicals & database-based toxicity prediction

3

Due to computer-aided drug design, characteristics & difficulty of the problem besides the complexity of toxicity :

•The virtually generated molecules are numerous.

•The molecules designed for drugs may be structurally diverse.

•The molecules have no or little other information except for chemical structure.

Page 4: Data mining of toxic chemicals & database-based toxicity prediction

4

•In terms of structure-activity rules: expert system.

•In terms of statistical models: QSAR (Qualitative/Quantitative Structure Activity Relationship).

How to evaluate the bio-activity (toxicity) of a large number of molecules only from their structure?

Page 5: Data mining of toxic chemicals & database-based toxicity prediction

5

How to extract rules/models of toxic chemicals from the database of toxic chemicals to aid toxicity assessment?

•Structural features of toxic chemicals

statistical analysis,

similarity analysis,

cluster analysis

•QSAR models of toxic chemicals

QSAR combined with cluster analysis

To the database RTECS

To the database RTECS

Page 6: Data mining of toxic chemicals & database-based toxicity prediction

6

What features toxic chemicals?

Molecular weight

Atomic composition of molecules

groups of molecules

rings of molecules

An initial database analysis shows that there is no distinct difference between toxic chemicals and drugs about these basic molecular features.

Page 7: Data mining of toxic chemicals & database-based toxicity prediction

7

Classification of toxic substances according to action modes:

1) substances that exhibit extremes of acidity, basicity, dehydrating ability, or oxidizing power;

2) reactive substances that contain functional groups prone to react with biomolecules in a damaging way;

3) heavy metals;

4) lipid-soluble compounds;

5) binding species in a reversible or irreversible way that bond to biomolecules and alter the normal function, and so on .

Manahan, S. E. Toxicological chemistry

Page 8: Data mining of toxic chemicals & database-based toxicity prediction

8

Considering the integrality of molecules and the specificity of action modes between the molecules.

A molecular structure pattern is defined as a template comprising a given framework and some given groups.

It represents the common structural features shared by a series of molecules that are possible to act in a toxicologically similar manner.

Structure patterns

Page 9: Data mining of toxic chemicals & database-based toxicity prediction

9

C

OMe

C2H5-N-(CH2)2-OCO-

CH3

.HCl

MeO—, HCl

C2H5-N-(CH2)2-OCO-

CH3

side-chains framework

ring-systems linkers

molecule

C

C

How to get molecular structure patterns?

•Dissect the molecules

•Similarity comparison:•Cluster analysis

BA

BAAB SS

SSR

Page 10: Data mining of toxic chemicals & database-based toxicity prediction

10

Do structure patterns really exist in the database of toxic chemicals ?

The underlining idea of structure patterns:

Specificity of action modes Structural correlation among the molecules with similar action mode

The embodiment of structure patterns in the database:

• Structure similarity among the molecules in the databases will become convergent when the size of the databases varies from small to large. Parallel analysis

• A large enough database will have predictive power for new toxic chemicals to a certain degree. Cross analysis

Page 11: Data mining of toxic chemicals & database-based toxicity prediction

11

The curve of coverage rates vs size of databases when 0.6 is given as the similarity limit.

Figure displays that prediction accuracy is given, prediction ability of the databases tends to be convergent when the database is large enough. It indicates of the possibility that structure patterns exist in the database.

Page 12: Data mining of toxic chemicals & database-based toxicity prediction

12

The findings of systematic analysis about the database indicate:

not only structure patterns promise to exist,

but also it is necessary and feasible to search for structure patterns.

Page 13: Data mining of toxic chemicals & database-based toxicity prediction

13

C h e m i c a l - c o u n t C A S - N u m b e r C h e m i c a l s t r u c t u r e4 0 1 1 4 0 - 4 1 - 0

N HC l

O

NC H 3

O C O C C l 3

3 8 4 5 5 - 4 5 - 8 N HC l C O O C H 2 C H 2 N H 4 C l

3 8 3 1 0 2 5 8 5 - 4 2 - 2 C l

O C H 2 C H ( C H 3 ) NC O C H 2

C H 2 C H 3

NC H 2 C H 3

C H 2 C H 3

. H C l

3 5 2 7 3 9 7 2 - 9 8 - 2

O C = OC ( C H 3 ) 3

C lC l

C l

C lH O O C

3 2 3 2 8 2 8 - 4 2 - 4N H C O O N

C H 3

C H 3

The representative molecules of some structure patterns of toxic chemicals

Page 14: Data mining of toxic chemicals & database-based toxicity prediction

14

Data mining of toxic chemicals: QSAR combined with structure patterns

A two-step strategy to explore noncongeneric toxic chemicals from the database: the screening of structure patterns and the generation of detailed relationship between structure and activity.

First, an efficient similarity comparison is proposed to screen chemical patterns for further QSAR analysis.

Then, QSAR study of structure pattern can provide the estimate of the activity as well as the detailed relationship between activity and structure.

Page 15: Data mining of toxic chemicals & database-based toxicity prediction

15

An example of the implementation N N

O

OOCH2-CH=CH2C2H5(CH3)CH

The representative molecule

of the structure pattern (WLN:

T6VMVMV FHJ F2Y&1 F2U1; CAS-number: 115-44-6):

•Select one structure pattern.

•By computing molecular similarity, we get 189 chemicals from the database RTECS whose similarity values to the representative molecule are higher than 0.6.

•According to species observed and route of exposure, the chemicals mainly fall in the five major categories.

•Build CoMFA models between the structure and LD50 values about three series of chemicals.

Page 16: Data mining of toxic chemicals & database-based toxicity prediction

16

Rabbit-intravenous: cross-validated and final fit CoMFA analysis with five components; 37 chemicals, q2 = 0.608, r2 = 0.981, F = 323.

Page 17: Data mining of toxic chemicals & database-based toxicity prediction

17

Rabbit-intravenous: contour map of final CoMFA model; for steric effects, more bulk near green and less bulk near yellow is favorable to increase the active, while for electrostatic effects, more positive near blue and more negative near red is desirable for molecules to be more active.

Page 18: Data mining of toxic chemicals & database-based toxicity prediction

18

The performance of overall procedure demonstrates:

•such a stepwise scheme is feasible and effective to mine a database of toxic chemicals.•The scheme take account of structural diversity of toxic chemicals•The scheme is a compromise between speed and accuracy.

Page 19: Data mining of toxic chemicals & database-based toxicity prediction

19

Database of toxic chemicals ShapeAnal

Inquiry molecule

Structure-related set

Field-based similarity analysis

Flexible CoMFA analysis

Close molecule & similarity-activity

CoMFA model &activity prediction

dbToxPre: database-based toxicity predictor of chemicals

Page 20: Data mining of toxic chemicals & database-based toxicity prediction

20

The program mainly includes four parts: 1) a fast and efficient clustering selection of molecules based on molecular shape2) field-based similarity computation of molecular structure based on shape cluster3) flexible CoMFA analysis of molecules based on shape cluster4) a database of toxic chemicals suitable for such procedure

dbToxPre

The characteristics of the program:

fast; efficient; dynamically combining with the database

Page 21: Data mining of toxic chemicals & database-based toxicity prediction

21

Inquiry molecule

Marking of atoms in the molecule

Structure description:dimension,ring systems,relative orientation of ring-system atoms

Alignment of molecule shapes

Structure-related set

ShapeAnal:fast & efficient shape analysis of molecules

Page 22: Data mining of toxic chemicals & database-based toxicity prediction

22

Molecular Field

• Concept: continuous property fields around the molecule produced by the molecular atoms.

• Similarity analysis of molecular field(Carbo index):

• Comparative Molecular Field Analysis, CoMFA

2122

12 )()( dvPdvP

dvPPR

BA

BA

AB

Page 23: Data mining of toxic chemicals & database-based toxicity prediction

23

Evolutionary Algorithm -considering flexibility of molecules

•Community/Population: structure-related set•Species/Chromosome: combination of rotatable single bonds in the molecules •Convergence: steady state of sorting•Procedure:

Parent generation

Congenric mutation

Child generation

Page 24: Data mining of toxic chemicals & database-based toxicity prediction

24

Structure-related set

Molecular alignment based on framework shape

EA: conformation mutation & similarity comparison

Similarity analysis & activity prediction

Fast field-based similarity analysis

Page 25: Data mining of toxic chemicals & database-based toxicity prediction

25

Flexible CoMFA

• The procedure of CoMFA

• Characteristics: considering conformational flexibility & hydrophobic field

Structure-related set

Molecular alignment based on framework shape

EA: conformation mutation & CoMFA

CoMFA model & activity prediction

Page 26: Data mining of toxic chemicals & database-based toxicity prediction

26

Rebuilding of toxic-chemical database

•Seleciton of DBMS

•Sketch map of the design of ToxdbToxdb

StructInfo AcuteTox 。。。。。。 。。。。。。。。。。。。

Michael Stonebraker’s classification:

simple data & no inquiry--file system

complex data & no inquiry--object-oriented DBMS

simple data & inquiry -- relationship DBMS

complex data & inquiry -- object-relationship DBMS: Postgresql

Page 27: Data mining of toxic chemicals & database-based toxicity prediction

27

Database-based toxicity prediction of chemicals provides activity assessment of the inquiry molecule by a serial of related molecules from the database. The purposes:

•To try the best to use available known knowledge of related chemicals.

•To offset uncertainty of single data by mutual correction among a serial of molecules.

Page 28: Data mining of toxic chemicals & database-based toxicity prediction

28

Conclusion

1. Data mining of toxic chemicals: structural patterns and QSAR, Jiansuo Wang, Luhua lai, Youqi Tang, J. Mol. Modelling,1999, 252-262.2. Predictive toxicology of toxic chemicals and database mining, Jiansuo Wang, Luhua lai, Youqi Tang, Chinese Science Bulletin, 2000, 45, 12, 1093-1097 。3. Structural features of toxic chemicals for specific toxicity, Jiansuo Wang, Luhua lai, Youqi Tang , J. Chem. Inf. Comput. Sci.,1999, 39, 6, 1173-1189.

•Initial analysis of toxic-chemical database confirms the concept of structure pattern of toxic chemicals.

•QSAR combined with structure pattern provide an alternative to explore noncongenric toxic chemicals in the database.

•Database-based toxicity prediction combines dynamically the database to assist risk assessment of chemicals.

•Data-mining & toxicity prediction: visualization computation

Reference & paper:

Storage computation: effective computation integrated into reasonable data storage

Page 29: Data mining of toxic chemicals & database-based toxicity prediction

29

Acknowledgements

Prof. Luhua Lai

Prof. Youqi Tang

Mr. Alan Gelberg

…...