22
J. Chem. In. Comput. Sci. 1995,35, 979-1000 979 ESSESA: An Expert System for Structure Elucidation from Spectra. 6. Substructure Constraints from Analysis of l3C=NMR Spectra Hong Huixiao,* Han Yinling, Xin Xinquan, and Shi Yufeng Chemistry Department, Nanjing University, Nanjing 210093, People's Republic of China Received January 9, 1995@ This paper describes the knowledge base of I3C-NMR spectral analysis and the interpretation program for analysis of I3C-NMR spectral data in ESSESA. Logical representation and the production system rules concerning analysis of I3C-NMR spectra are discussed as well as inferential models useful in I3C-NMR spectral analysis. The unsaturation and the atomic composition of an unknown compound as well as the substructure constraints from the analysis of infrared spectrum and first-order 'H-NMR spectrum are passed to the interpretation program that develops the substructure constraints from its analysis of I3C-NMR spectral data by inference from the knowledge base of the spectral analysis. The knowledge base contains 1277 substructures. INTRODUCTION The elucidation of the structure of organic compounds based on spectroscopic methods, like 'H-NMR, I3C-NMR, IR, mass spectrometry, and UVNIS-spectroscopy, is still essentially an empirical procedure. This is due to the fact that the derivation of constitution from the spectra'-2 contains steps whose complexity prohibits formalization. Therefore the practical application of automatic interpretation programs3 is far more limited than sometimes postulated in the literatures4 Artificial intelligence, especially the expert systems method deriving rules out of a knowledge base, is generally applicable. ESSESA is an expert system for structure elucidation from spectral analysis. 5-9 One of the most powerful tools for the determination of an unknown organic structure is the I3C-NMR spectroscopy. The increasing availability of sophisticated NMR instrumen- tation has developed I3C-NMR spectroscopy into a routine task, even with a few milligrams of chemical compound. One important aspect of I3C-NMR spectral data is the excellent correlation between structural features and cor- respondence of spectral properties, which possesses enor- mous practical and theoretical significance. The interpre- tation of I3C-NMR spectral data is usually based on direct comparison of the spectrum of the unknown with a large reference data collection of known structures and their well- assigned resonance lines. In principle the interpretation of I3C-NMR can be done manually, but it is usually very cumbersome and time-consuming. Automation of this task is very important and many systems based on database^'^-'^ and other methods's-'8 have been developed. The interpretation of I3C-NMR spectra is often based on comparison with standard reference spectra stored in libraries. When suitable spectra are not available for comparison, other methods must be employed to evaluate complex experimental spectra in order to determine the chemical structure. Expert system is one of the methods useful in this task. From I3C- NMR spectrum the types of carbon and its structural environment in a structure can be derived. In some cases, the information from I3C-NMR spectrum can lead to an unique structure, but in many cases there remains a large number of structural possibilities. In ESSESA the complete structure of a molecule is obtained by computer when data from several spectroscopic techniques are used simulta- neously. The analysis of I3C-NMR spectral data gives the third substructure constrains which will be used in the generation of the complete structure. KNOWLEDGE BASE OF 13C-NMRSPECTRA ANALYSIS The conventional approach taken by chemists to I3C-NMR spectral interpretation is based upon models, often simplified, of the physical processes underlying resonance and the resulting spectral absorption. The physical models can be used to relate specific spectral signals to particular structural components of the molecule. Usually, there are several factors that together determine the detailed characteristics (such as chemical shift and multiplicity of peak) of a I3C- NMR spectral signal and which are often in some sort of a hierarchical relationship that defines their relative importance. The initial analysis of a spectral signal may identify the presence of a specific type of carbon in the unknown molecule, and more detailed analysis of the form of the signal may determine aspects of the larger environment of that type of carbon. The chemist's knowledge of I3C-NMR spectral analysis that has been incorporated into ESSESA is encoded in the form of spectral feature-substructurerelationship rules written in PROLOG, which comprise the knowledge base for I3C- NMR spectral analysis. If a set of specific I3C-NMR absorption peaks given by an unknown compound is W,, such that (1) W, = [W,,(cf,m)] i = 1, 2, 3, ..., n here qf is the chemical shift, m is the multiplicity. The set of substructures in the knowledge base of I3C-NMRspectra analysis is S, S, = [Scj] j = 1, 2, 3, ..., k (2) @Abstract published in Advance ACS Abstracts, September 1, 1995. 0095-2338/95/1635-0979$09.00/0 0 The set of specific absorption peaks that correspond to a 1995 American Chemical Society

ESSESA: An expert system for structure elucidation from spectra. 6. Substructure constraints from analysis of 13C-NMR spectra

  • Upload
    shi

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

J. Chem. I n . Comput. Sci. 1995,35, 979-1000 979

ESSESA: An Expert System for Structure Elucidation from Spectra. 6. Substructure Constraints from Analysis of l3C=NMR Spectra

Hong Huixiao,* Han Yinling, Xin Xinquan, and Shi Yufeng

Chemistry Department, Nanjing University, Nanjing 210093, People's Republic of China

Received January 9, 1995@

This paper describes the knowledge base of I3C-NMR spectral analysis and the interpretation program for analysis of I3C-NMR spectral data in ESSESA. Logical representation and the production system rules concerning analysis of I3C-NMR spectra are discussed as well as inferential models useful in I3C-NMR spectral analysis. The unsaturation and the atomic composition of an unknown compound as well as the substructure constraints from the analysis of infrared spectrum and first-order 'H-NMR spectrum are passed to the interpretation program that develops the substructure constraints from its analysis of I3C-NMR spectral data by inference from the knowledge base of the spectral analysis. The knowledge base contains 1277 substructures.

INTRODUCTION

The elucidation of the structure of organic compounds based on spectroscopic methods, like 'H-NMR, I3C-NMR, IR, mass spectrometry, and UVNIS-spectroscopy, is still essentially an empirical procedure. This is due to the fact that the derivation of constitution from the spectra'-2 contains steps whose complexity prohibits formalization. Therefore the practical application of automatic interpretation programs3 is far more limited than sometimes postulated in the literatures4 Artificial intelligence, especially the expert systems method deriving rules out of a knowledge base, is generally applicable. ESSESA is an expert system for structure elucidation from spectral analysis. 5-9

One of the most powerful tools for the determination of an unknown organic structure is the I3C-NMR spectroscopy. The increasing availability of sophisticated NMR instrumen- tation has developed I3C-NMR spectroscopy into a routine task, even with a few milligrams of chemical compound. One important aspect of I3C-NMR spectral data is the excellent correlation between structural features and cor- respondence of spectral properties, which possesses enor- mous practical and theoretical significance. The interpre- tation of I3C-NMR spectral data is usually based on direct comparison of the spectrum of the unknown with a large reference data collection of known structures and their well- assigned resonance lines. In principle the interpretation of I3C-NMR can be done manually, but it is usually very cumbersome and time-consuming. Automation of this task is very important and many systems based on database^'^-'^ and other methods's-'8 have been developed.

The interpretation of I3C-NMR spectra is often based on comparison with standard reference spectra stored in libraries. When suitable spectra are not available for comparison, other methods must be employed to evaluate complex experimental spectra in order to determine the chemical structure. Expert system is one of the methods useful in this task. From I3C- NMR spectrum the types of carbon and its structural environment in a structure can be derived. In some cases, the information from I3C-NMR spectrum can lead to an unique structure, but in many cases there remains a large

number of structural possibilities. In ESSESA the complete structure of a molecule is obtained by computer when data from several spectroscopic techniques are used simulta- neously. The analysis of I3C-NMR spectral data gives the third substructure constrains which will be used in the generation of the complete structure.

KNOWLEDGE BASE OF 13C-NMR SPECTRA ANALYSIS

The conventional approach taken by chemists to I3C-NMR spectral interpretation is based upon models, often simplified, of the physical processes underlying resonance and the resulting spectral absorption. The physical models can be used to relate specific spectral signals to particular structural components of the molecule. Usually, there are several factors that together determine the detailed characteristics (such as chemical shift and multiplicity of peak) of a I3C- NMR spectral signal and which are often in some sort of a hierarchical relationship that defines their relative importance. The initial analysis of a spectral signal may identify the presence of a specific type of carbon in the unknown molecule, and more detailed analysis of the form of the signal may determine aspects of the larger environment of that type of carbon.

The chemist's knowledge of I3C-NMR spectral analysis that has been incorporated into ESSESA is encoded in the form of spectral feature-substructure relationship rules written in PROLOG, which comprise the knowledge base for I3C- NMR spectral analysis.

If a set of specific I3C-NMR absorption peaks given by an unknown compound is W,, such that

(1) W, = [W,,(cf,m)] i = 1, 2, 3, ..., n

here qf is the chemical shift, m is the multiplicity. The set of substructures in the knowledge base of I3C-NMR spectra analysis is S,

S, = [Scj] j = 1 , 2, 3, ..., k (2)

@Abstract published in Advance ACS Abstracts, September 1, 1995.

0095-2338/95/1635-0979$09.00/0 0

The set of specific absorption peaks that correspond to a

1995 American Chemical Society

980 J. Chem. In5 Comput. Sci., Vol. 35, NO. 6, 1995

substructure Scj is W,, that is expected to be

HUIXIAO ET AL.

If the set of substructures from IR spectral analysis is Sir

And if the set of substructures from first-order 'H-NMR spectral analysis is S h h

In order to analyze a I3C-NMR spectrum of an unknown compound, it is necessary to pick out the subset S,, from the set S,, such that

According to this procedure, the construction of the knowledge base for I3C-NMR spectral analysis requires that the logical representation formula (eq 3) of the set of substructures S, be found and encoded in PROLOG rules. In ESSESA, the information in eq 3 is expressed by the production system rule-Le., the spectral feature-substruc- ture relationships. Given a sufficiently large number of spectra, or alternatively the appropriate information derived from published observations, it is possible to develop the ability to associate certain absorption peaks with the corre- sponding carbon types and their structural environment. Thus a set of rules can be developed to derive the subset S,, of substructures that are indicated by a specific I3C-NMR spectrum.

In ESSESA, the I3C-NMR spectrum is not the only one to derive the substructure constraints, IR and first-order 'H- NMR spectra are used also. The IR spectrum and the first- order 'H-NMR spectrum are the first and the second spectra to be analyzed and the result (eqs 45 and 59) from the IR spectral analysis and first-order 'H-NMR spectral analysis are used as the constraints in the I3C-NMR spectral analysis.

On the basis of eq 6, to derive the subset S,, of substructures that are consistent with the IR and the first- order 'H-NMR spectral data and the I3C-NMR spectral data and other chemical information is to confirm that the expecting absorption peaks of S,, exist in the spectral data. Equation 6 can be written in production system rules, for example, the production system rule that is derived the identification of the substructure RC(=O)Cl is as follows:

IF in the "C-NMR spectrum of an unknown compound there are

peak with chemical shift 167-180 ppm, multiplicity I and

there exist the >C=O substructure from the IR spectral

analysis

Then the substructure '-'G may be present in the stmcture

ofthe unknown compound

In ESSESA this rule can be written in PROLOG such as follow

Peakc("'-'$! 'I, C, M).-C>=I 67,

C<=180 1,

M=S(Single)

Sub~("'-~$l") .pe&c("R-c$l", C, M), I,

Subir("X=O")

1R Constraint

Ring and Double-Bond Information 13C-NhlR ~H-NMR Constraint

Digitized Spectrum I +

Interpreter Program

+ Substructure constraints

Figure 1. Overview of the structure of the interpreter program.

There are 1277 substructures in the knowledge base used by ESSESA for the I3C-NMR spectrum analysis. These 1277 substructures are shown in Table 1. The spectral feature-substructure relationship rules in Table 1 are gener- ated from refs 19-21. There exists overlap within this table, for example, substructure 14 partially overlap substructure 15 and so on. This problem will be solved before the complete structure candidates are generated on the basis of information derived from the IR, 'H-NMR, and I3C-NMR spectral analysis. The mutually consistent set of substruc- tures that will be used by the structure generator can be achieved by means of a procedure that finds those combina- tions of permitted substructures that are compatible with the overall composition of the molecule and with the constraints derived from all of the spectral data. Each such combination of substructures then may be used to define a distinct problem that can be referred to a subsequent structure generation program.

INTERPRETER PROGRAM

After the ring and double-bond characteristics of the entered molecular formula and the digitized I3C-NMR spectrum as well as the analysis results of IR spectrum and first-order 'H-NMR are passed to the interpreter program of I3C-NMR spectral analysis, the interpretation of the 13C-NMR spectrum is started. Using eq 6 and the knowledge base the interpreter program begins to identify the various substructure fragments that may be present in the structure. The result of interpretation is a substructure list that is used in the structure generation. The structure of this interpreter pro- gram is shown in Figure 1.

The interpreter program works with the set of 1277 substructures. Each of these substructures is correlated with a defined pattern of I3C-NMR spectral absorption. The spectral features that characterize substructure fragments normally consist of spectral ranges within which specific types of peaks are expected. The initial set of substructures is, in effect, screened against the entered spectral data, and any substructure whose requisite spectral pattern is absent is discarded. The result of this analysis is a subset of the 1277 substructures. Each of the members of the subset is related to absorption data, which is consistent with the I3C- NMR spectrum of the unknown compound.

In the interpretation, goal-driving inference tactics were used. In such an inference model, substructures from the knowledge base are used as the goals, and the program will seek spectral patterns that fullfil the premises of the goals, as defined by the production rules. If any single premise of a goal is not satisfied, that goal is determined to be false, that is to say, that particular substructure cannot be contained

EXPERT SYSTEM FOR STRUCTURE ELUCIDATION FROM SPECTRA

Table 1

J. Chem. In$ Comput. Sei., Vol. 35, No. 6, 1995 981

chemical IR 'H-NMR no. substructure shift multiplicity constraint constraint

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74

RCHs -0CHs >NCH3 -SCH3 -SOCH3 -COCH3 ArCH, >C=CCH3

RCHiRi RCH2C=C< RCH2C#C- RCH~AI RCH2CO- RCH2CS- RCH2CN RCHzF RCH2CI

RCHzI RCH20- RCHzS- RCH2SO- RCH2S02- RCH2Nc RCHzNOz RCHzPO< > C=C CHzC=C < >CsCCH2C#C-

>CS.CCH2CO- >C=CCH2CS- >C=CCH2CN >CzCCHlF > C=CCH2Cl > CSCCH2Br >C=CCHzI > CsCCH20- >CsCCH2S- >C=CCHzSO- > CsCCH2S02- >CZCCH2N< >C=CCH2NO2

C=CCH2PO < -C#CCH2C#C-

-C#CCH3

RCH2Br

>C=CCH*Ar

-C#CCH2Ar -C#CCH2CO- -C#CCH2CN -C#CCH2F -C#CCH2C1 -C#CCH2Br -C#CCH2I -C#CCH20- -C#CCH2S- -C#CCHzSO- -C#CCH2SO2- -C#CCH2N< -C#CCH2NOz -C#CCHzPO < ArCH2Ar ArCH2CO- ArCH2CS- ArCH2CN ArCH2F ArCH2Cl ArCH2Br ArCH2I ArCH20- ArCH2S- ArCH2O- ArCH2S02- ArCHzN < ArCH2N02 ArCH2PO<

5-35 50-60 30-50 15-25 40-45 16-27 11-27 8-35 3-16

20-44 30-41 18-25 32-40 30-50 41-56 15-20 80-86 40-55 30-36 5-18

60-75 23-41 50-55 55-65 40-55 73-78 38-43 20-34 25-35 28-40 30-44 40-55 20-30 65-81 40-50 30-36

5-12 64-76 20-40 47-60 55-68 42-55 63-78 29-33

8-12 25-45 26-45 15-36 72-80 30-40 10-15 2-15

44-58 12-28 30-37 37-44 28-48 60-70 15-20 30-50 36-51 47-52 25-35 80-100 39-50 23-30 0-6

65-80 32-53 55-69 50-65 40-60 70-90 28-38

c

Q Q Q Q Q Q Q T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T

> co Ar >c=c< -C#C-

>c=c< -C#C- Ar > co > cs -CN

>so >so2 -NO2 >PO- >C=C< -C#C- Ar, > C-C < > CO,>C=.C < > c s , > c-c < -CN,>C=C< > c=c < 'C==C' 'C==C< > c-c < > c-c < > c=c < >so, >C=C< >so*, > c=c < > c-c < -NO*, >C=C < >PO-, > c--c < -C#C- Ar,-C#C- > c0,-C#C- -CN,-C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -C#C- >SO,-C#C- > S02,-C#C- -C#C- -NO*,-C#C- 'PO-,-C#C- Ar >CO,Ar >CS,Ar -CN,Ar Ar Ar Ar Ar Ar Ar > S0,Ar > S02,Ar Ar -N02,Ar >PO-,Ar

982 J. Chem. In$ Comput. Sci., Vol. 35, NO. 6, 1995

Table 1 (Continued)

HUIXIAO ET AL.

75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 IO0 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148

chemical IR IH-NMR no. substructure shift multiplicity constraint constraint

-COCHXO- 43-55 T >co -COCH;CS- -COCH?CN -COCH>F -COCH2C1 -COCH2Br -COCH2I - C 0 C H 2 0 - -COCH2S- -COCH2SO- -COCH2S02- -COCH2N < -COCH>N02 -COCH2PO< -CSCH*CS- -CSCHzCN -CSCH2F -CSCH2C1 -CSCH2Br -CSCH2I -CSCH20- -CSCH2S- -CSCHzSO- -CSCH2S02- -CSCH*N< -CSCH*N02 CNCH20- CNCHIS- CNCH2SO- CNCH2S02- CNCH2N< FCH20- FCH2S- FCH2SO- FCH2S02- FCHzN< FCH2PO< ClCH20- CICH2S- ClCH2SO- CICH2S02- CICH2N< ClCHlPO< BrCH20- BrCH2S- BrCH2SO- BrCH2S02- BrCH2N< BrCH2PO< ICH20- ICH2S- ICH2SO- ICH2SO2- ICH>N< ICH2PO< -0CH2O- -0CHzS- -OCH:!SO- -0CH2SO:- -0CH2N' -0CH2PO < -SCH2S- -SCH2SO- -SCH2S02- -SCH2N< -SCH?PO< -S02CH?N< > NCH2N < >NCH2PO< N02CH20- N02CH2S- N02CH2SO- N02CH2S02- N02CH2N'

53-63 18-30 80-95 40-58 20-36

6-26 65-76 32-53 52-58 58-67 46-56 70-85 30-42 60-75 28-40 90-100 50-63 40-55 20-30 70-86 30-45 38-50 43-56 50-64 80-92 46-62

8-20 32-46 40-48 27-35

110-121 71-83 90-100

100-108 80-91 62-73 70-83 32-40 54-62 61-68 43-54 24-37 58-72 24-32 44-52 51-59 34-45 18-30 25-40 10-19 20-34 23-36 20-30

5-15 82-104 42-60 50-55 55-80 60-89 61-67 32-48 42-58 47-63 43-61 38-54 55-70 50-60 40-54 90-110 52-70 58-65 63-80 70-90

T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T

'CS,'CO -CN,>CO > co > co >co >co > co > co > s o , > c o 'SO2,'CO > co -N02,>CO >PO-, > co > cs -CN,>CS 'CS > cs >cs > cs > cs >cs > s o , > c s > s o 2 , > c s > cs -NO?,>CS -CN -CN 'SO,-CN >S02,-CN -CN

>so > so2 >PO-

> so > so2 >PO-

>so >so2 >PO-

> so >so2 >PO-

> so > so2 >PO-

> so > so2 >PO- > so2 >PO- -NO2 -NO2 > SO,-N02 "S02,-N02 -NO2

EXPERT SYSTEM FOR STRUCTURE ELUCIDATION FROM SPECTRA J. Chem. In$ Comput. Sci., Vol. 35, No. 6, 1995 983

Table 1 (Continued)

chemical IR 'H-NMR no. substructure shift multiplicity constraint constraint 149 150 15 1 152 153 154 155 156 157

159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 20 1 202 203 204 205 206 207 208 209 210 21 1 212 213 214 215 216 217 218 219 220 22 1 222

158

NOzCHzPO < CHz=C<

RCH(Ri)Rz CH2-N-

RCH(Ri)C=C' RCH(R,)C#C-

RCH(R1)CO- RCH(R1)CS- RCH(Ri)CN RCH(Ri)F RCH(R1)Cl

RCH(Ri)I RCH(Ri)O- RCH(R1)S- RCH(R1)SO- RCH(R1)SOz- RCH(Ri)N< RCH(Ri)NOz RCH(Ri)PO<

RCH(R1)Ar

RCH(RI)Br

>C=CCH(R)C=C<

>C=CCH(R)Ar > C=CCH(R)C#C-

> C=CCH(R)CO- > C=CCH(R)CS- > C=CCH(R)CN > C=CCH(R)F >C=CCH(R)Cl > C=CCH(R)Br > C=CCH(R)I > C=CCH(R)O- > C=CCH(R)S- > C=CCH(R)SO- >C=CCH(R)SOz-

> C=CCH(R)N02

-C#CCH(R)C#C-

-C#CCH(R)CO- -C#CCH(R)CS- -C#CCH(R)CN -C#CCH(R)F -C#CCH(R)CI

-C#CCH(R)I -C#CCH(R)O- -C#CCH(R)S- -C#CCH(R)SO- -C#CCH(R)SOz- -C#CCH(R)N < -C#CCH(R)N02 -C#CCH(R)PO <

> C=CCH(R)N <

> C=CCH(R)PO <

-C#CCH(R)Ar

-C#CCH(R)Br

ArCH(R)Ar ArCH(R)CO- ArCH(R)CS- ArCH(R)CN ArCH(R)F ArCH(R)Cl ArCH(R)Br ArCH(R)I

ArCH(R)S-

ArCH(R)SO2- ArCH(R)N< ArCH(R)N02

ArCH(R)O-

ArCH(R)SO-

-COCH(R)CO- -COCH(R)CS- -COCH(R)CN -COCH(R)F -COCH(R)Cl -COCH(R)Br

70-80 80-138

135- 150 22-50 40-60 25-36 36-60 36-70 33-49 16-37 90- 123 52-68 40-74 14-48 62-91 22-68 59-69 50-65 40-70 80-98 26-44 33-57 27-36 35-52 38-64 40-60 20-38 93-120 50-68 42-59 20-48 60-80 42-60 65-70 53-70 50-70 90-110 34-52 25-30 32-50 34-52 35-47 27-36 85-100 55-67 46-54 18-38 53-91 34-59 46-57 50-60 35-70 75-90 24-40 47-62 50-61 51-64 30-41 95- 115 65-99 40-60 17-24 60-80 30-45 55-70 57-70 40-75 8 1-95 47-70 49-68 35-50 88-98 56-80 38-50

T T T D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D

>PO-,-N02 z c=c < >C=N-

>c=c< -C#C- Ar > co > cs -CN

>so > so2 -NO2 >PO- > c-c < -C#C- Ar, > C-C < 'CO,> c-c -= >cs, >c=c < -CN,>C=C < > c=c < > c=c < > c-c < >c=c< > c-c < > c-c < > s o , > c = c < 'so2,'c=c < >c=c< -NOz,>C=C< 'PO-, > c=c < -C#C- Ar,-C#C- > c0,-C#C- 'CS,-C#C- -CN,-C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -C#C- >SO,-C#C- > S02,-C#C- -C#C- -NOz,-C#C- >PO-,-C#C- Ar >CO,Ar >CS,Ar

Ar Ar Ar Ar Ar Ar > S0,Ar >S02,Ar Ar -N02,Ar > co > c s , 'CO -CN,>CO > co 'CO > co

-CN,Ar

3 CHz CH2-c CHz-N- 'CH- "CH- >CH- z CH- 'CH- 'CH- 'CH- >CH- >CH- >CH- >CH- >CH. >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- 'CH- 'CH- 'CH- >CH- >CH- >CH- 'CH- >CH. 'CH- >CH- >CH- 'CH- >CH- >CH- 'CH- 'CH- >CH- >CH- >CH- 'CH- 'CH- >CH- 'CH- 'CH- 'CH- 'CH- 'CH- >CH- 'CH- >CH- >CH- >CH- 'CH- 'CH- >CH- >CH- 'CH- >CH- >CH- >CH- 'CH- 'CH- >CH- >CH- >CH- > CH- >CH- >CH-

984 J. Chem. In$ Comput. Sci., Vol. 35, No. 6, 1995 HUIXIAO ET AL,

Table 1 (Continued)

chemical IR 'H-NMR no. substructure shift multiplicity constraint constraint 223 224 225 226 227 228 229 230 23 1 232 233 234 235 236 237 238 239 240 24 1 242 243 244 245 246 247 248 249 250 25 1 252 253 254 255 256 257 258 259 260 26 1 262 263 264 265 266 267 268 269 270 27 1 272 273 274 275 276 277 27 8 279 280 28 1 282 283 284 285 286 287 288 289 290 29 1 292 293 294 295 296

-COCH(R)I -COCH(R)O- -COCH(R)S- -COCH(R)SO- -COCH(R)SO2- -COCH(R)N 4

-COCH(R)N02 -COCH(R)PO < -CSCH(R)CS- -CSCH(R)CN -CSCH(R)F -CSCH(R)Cl

-CSCH(R)I -CSCH(R)O-

-CSCH(R)SO- -CSCH(R)SOz- -CSCH(R)N< -CSCH(R)N02 -CSCH(R)PO ' CNCH(R)O- CNCH(R)S- CNCH(R)SO- CNCH(R)SOz-

-CSCH(R)Br

-CSCH(R)S-

CNCH(R)N < CNCH(R)PO < FCH(R)O- FCH(R)S- FCH(R)SO- FCH(R)SOl-

ClCH(R)O- ClCH(R)S- ClCH(R)SO- ClCH(R)SO2-

FCH(R)N <

ClCH(R)N< ClCH(R)PO< BrCH(R)O- BrCH(R)S- BrCH(R)SO-

BrCH(R)N < BrCH(R)PO<

BrCH(R)SOz-

ICH(R)O- ICH(R)S- ICH(R)SO- ICH(R)SOz- ICH(R)N< ICH(R)PO < -OCH(R)O- -OCH(R)S- -OCH(R)SO- -OCH(R)SOz- -OCH(R)N< -OCH(R)PO < -SCH(R)S- -SCH(R)SO- -SCH(R)SO*- -SCH(R)N < -SCH(R)PO< >NCH(R)N< > NCH(R)PO < NOzCH(R)O- NO2CH(R)S- NOzCH(R)SO- N02CH(R)SOz- NOzCH(R)N< N02CH(R)PO < RCH=C <

>C=CCH(C=C<)C=C<

> C=CCH(C-C <)Ar

RCH=N-

> C=CCH(C-C <)C#C-

17-24 56-92 37-61 55-63 58-68 55-70 80-87 38-48 50-68 36-52 90-100 60-80 40-52 19-25 55-90 46-60 58-67 60-70 53-68 85-95 38-50 42-58 27-48 40-50 41-52 50-60 25-40

104-120 63-75 70-78 75-85 72-82 84-113 60-70 65-72 67-73 70-84 60-65 86-94 53-63 60-70 63-72 66-74 50-60 40-49 30-40 37-47 40-50 30-40 25-35 93-114 78-86 75-85 80-90 74-88 7 1-74 38-62 48-58 53-63 51-70 53-65 51-90 45-55

100-120 80-90 90- 100 95-105 90-98 85-95 97-130

159-173 51-58 36-46 35-50

D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D

> co > co >eo >so, > co >SO& > co >eo -N02,>CO >PO-, > co >cs -CN,>CS > cs 'CS >cs >cs > cs >cs "S0,'CS >so2,>cs > cs -NOz,>CS > PO-,>CS -CN -CN >SO,-CN >S02,-CN -CN >PO-,-CN

>so > so2

>so >SO?

>PO-

> so >so2 >PO-

> so >so2 >PO-

> so > so2 >PO-

> so > so2 >PO-

>PO- -NO* -NO2 >SO,-N02

S02,-N02 -NO* >PO-,-N02 > c=c < > C=N- > c=c < -C#C-,>C=C< Ar, >C=C <

'CH- 'CH- >CH- > CH- 'CH- 'CH- >CH- >CH- >CH- 'CH- >CH- >CH- 'CH- 'CH- >CH- 'CH- >CH- >CH- 'CH- > CH- >CH- >CH- >CH- 'CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- 'CH- >CH- 'CH- >CH- 'CH- >CH- >CH- >CH- 'CH- >CH- >CH- 'CH- 'CH- >CH- >CH- >CH- 'CH- >CH- 'CH- 'CH- 'CH- >CH- >CH- >CH- >CH- >CH- 'CH- 'CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- > CH. > CH-

-CH=N- >CH- > CH-

-CH=C<

>CH-

EXPERT SYSTEM FOR STRUCTURE ELUCIDATION FROM SPECTRA J. Chem. In.. Comput. Sci., Vol. 35, No. 6, 1995 985

Table 1 (Continued)

chemical IR 'H-NMR no. substructure shift multiplicity constraint constraint

297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 3 19 320 321 322 323 3 24 325 326 327 328 329 330 33 1 332 333 334 335 336 337 338 339 340 34 1 342 343 344 345 346 347 348 349 350 35 1 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370

> C=CCH(C-C ' )CO- >C=CCH(C-C <)CS- >C==CCH(C-C <)CN > C=CCH(C=C <)F > C-CCH(C-C <)C1 >C==CCH(C-;C<)Br >C=CCH(C=C <)I >C=CCH(C=C<)O- >C=CCH(C=C <)S- >C=CCH(C=C <)SO- >C=CCH(C=C <)S02- >C=CCH(C=C <)N<

>C=CCH(C=C <)PO<

> C=CCH(C#C-)Ar

> C=CCH(C=C <)NO2

> C=CCH(C#C-)C#C-

> C=CCH(C#C-)CO- > C=CCH(C#C-)CS- > C-CCH(C#C-)CN >C-CCH(C#C-)F >C=CCH(C#C-)Cl

>CmCCH(C#C-)I >C-CCH(C#C-)O- ' C=CCH(C#C-)S- > C=CCH(C#C-)SO- > C=CCH(C#C-)SOz- >C=CCH(C#C-)N< >C4CH(C#C-)N02 > C=CCH(C#C-)PO <

> C-CCH(C#C-)Br

>C=CCH(Ar)Ar > C=CCH(Ar)CO- > C=CCH(Ar)CS- > C=CCH(Ar)CN >C=CCH(Ar)F >C=CCH(Ar)Cl > C-CCH(Ar)Br > CECCH(Ar)I > C=CCH(Ar)O- > C=CCH(Ar)S- > C=CCH(Ar)SO- > C=CCH(Ar)S02- > C=CCH(Ar)N < >C==CCH(Ar)N02 >C=CCH(Ar)PO< > C=CCH(CO-)CO- > C=CCH(CO-)CS- > C=CCH(CO-)CN > C=CCH(CO-)F >C=CCH(CO-)Cl

> C=CCH(CO-)I > C=CCH(CO-)0- > C=CCH(CO-)S- >C=CCH(CO-)SO- > C=CCH(CO-)SO?- >C=CCH(CO-)N< >C=CCH(CO-)NO? >C=CCH(CO-)PO< > C=CCH(CS-)CS- > C=CCH(CS-)CN > C=CCH(CS-)F 'C=CCH(CS-)Cl

>C=CCH(CS-)I > C=CCH(CS-)0- > C=CCH(CS-)S- > C-CCH(CS-)SO- >C=CCH(CS-)SOz- >C=CCH(CS-)N< > C=CCH(CS-)NOz > C=CCH(CS-)PO<

> C=CCH(CO-)Br

> C=CCH(CS-)Br

>C=CCH(CN)F C=CCH(CN)Cl

36-41 38-48 27-33 95-110 55-65 45-55 25-35 72-87 45-54 50-60 55-68 50-65 80-90 50-65 22-33 22-34 24-36 26-36 18-28 80-95 42-53 32-43 15-28 60-65 32-42 38-48 43-53 48-58 65-78 35-48 51-58 55-65 53-60 35-46 95-110 56-65 44-53 26-35 72-79 44-55 50-60 55-65 57-74 80-92 48-62 55-75 53-73 35-55 94-118 56-74 45-63 28-43 7 1-79 47-53 52-70 56-74 57-63 80-95 45-60 54-63 36-45 94- 105 55-64 45-54 27-36 71-80 45-54 50-59 55-64 56-65 80-89 47-56 85-94 48-57

D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D

> c o , > c=c < > cs,> C% < -CN, >C=C < > c=c < >c=c< > c=c < > c=c < > c=c < >c=c< > s o , > c = c < > SO& > c-c < >c=c< -NO>, >C=C < >PO-,>C=C< -C#C-, >c=c < Ar,>C=C <,-C#C- >CO,'C=C<,-C#C- > c s , ~ c - c ~ , - c # c - -CN, > C-C < ,-C#C- > c=c <,-C#C- >C=C<,-C#C- > c=c < ,-C#C- >c=c < ,-C#C- > c=c < ,-C#C- >C=C<,-C#C- > so, > c=c < ,-C#C- > s02, > c-c <,-C#C- >C==C',-C#C- -NO2,'C=C<,-C#C- >PO-,> C=C<,-C#C- Ar, > C=C < >CO,>C=C<,Ar >CS,>C=C<,Ar -CN,>C=C<,Ar >C=C<,Ar >C=C< ,Ar >C=C<,Ar >c=c ",Ar > C-C < ,Ar > C-C < ,Ar > s o , > c = c <,Ar >SO2,>C=C <,Ar > C=C < ,Ar -NOZ,>C=C<,Ar >PO-,>C=C<,Ar > c o , > c-c < 'CS,'C==C < ,'CO -CN,>C=C<,>CO >c=c <, > co 'CSC < , > c o >c=c < , > c o >C=C<,>CO >C=C< , > c o >c=c < , > c o > s o , ~ c = c ~ , ~ c o >SO& > c - c < , > c o >C=C<,>CO -NOz,>C=C<,>CO >PO-, >c=c <, >co >CS,>C=C< -CN,>C=C<,>CS >C=C<, >cs >c=c <, >cs > c = c < , > c s > c=c<,>cs >c=c < , > cs >c=c ',>CS > SO,>C=C <,>CS > S02,'C=C<,>CS >c=c < , > cs -NOz,>C=C<,>CS >PO-,>C=C<, >cs > C=C < ,-CN > C=C <,-CN

>CH- 'CH- 'CH- 'CH- 'CH- >CH- 'CH- 'CH- >CH- 'CH. 'CH- >CH- >CH- >CH- >CH- 'CH- 'CH- 'CH- >CH- >CH- >CH- 'CH- 'CH- >CH- >CH- >CH. >CH- 'CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- > CH. 'CH- 'CH- >CH- >CH- >CH- >CH- >CH- >CH- 'CH- >CH- 'CH- 'CH- 'CH- 'CH. >CH- 'CH- >CH- 'CH- 'CH- >CH- 'CH- 'CH-

CH- 'CH- >CH- >CH- > CH. >CH- >CH- 'CH- >CH- >CH- "CH-

986 J. Chem. In5 Comput. Sci., Vol. 35, No. 6, 1995

Table 1 (Continued)

HIJIXIAO ET AL.

chemical IR 'H-NMR constraint no. substructure shift multiplicity constraint

37 1 372 373 374 375 376 317 378 319 380 38 1 382 383 384 385 386 387 388 389 390 39 1 392 393 394 395 396 397 398 399 400 40 1 402 403 404 405 406 407 408 409 410 41 1 412 413 414 415 416 417 418 419 420 42 1 422 423 424 425 426 427 428 429 430 43 1 432 433 434 435 436 437 438 439 440 44 1 442 443 444

>C=CCH(CN)Br > C=CCH(CN)I > C=CCH(CN)O- > C=CCH(CN)S- > C=CCH(CN)SO- > C=CCH(CN)SO2- >C=CCH(CN)N< >C=CCH(F)F >C=CCH(F)Cl >C=CCH(F)Br >C=CCH(F)I . >C=CCH(F)O- >C=CCH(F)S- >C=CCH(F)N< > C=CCH(Cl)Cl > C=CCH(Cl)Br >C=CCH(Cl)I >C=CCH(C1)0- >C=CCH(Cl)S-

> C=CCH(O-)0- > C=CCH(O-)S- >C=CCH(O-)N< > C=CCH(O-)NO* > C=CCH(S-)S- > C=CCH(S-)N < > C=CCH(S-)NOz

> C=CCH(N<)N02

>C=CCH(Cl)N<

>C=CCH(N<)N<

> C=CCH=C < > C=CCH=N- -C#CCH(C#C-) Ar -C#CCH(C#C-)CO- -C#CCH(C#C-)CS- -C#CCH(C#C-)CN -C#CCH(C#C-)F -C#CCH(C#C-)Cl -C#CCH(C#C-)Br -C#CCH(C#C-)I -C#CCH(C#C-)0- -C#CCH(C#C-)S- -C#CCH(C#C-)N < -C#CCH(C#C-)N02 -C#CCH(Ar)Ar -C#CCH(Ar)CO- -C#CCH(Ar)CS- -C#CCH(Ar)CN -C#CCH(Ar)F -C#CCH(Ar)Cl -C#CCH(Ar)Br -C#CCH(Ar)I -C#CCH(Ar)O- -C#CCH(Ar)S- -C#CCH(Ar)N< -C#CCH(Ar)N02 -C#CCH(CO-)CO- -C#CCH(CO-)CS- -C#CCH(CO-)CN

-C#CCH(CO-)Cl -C#CCH(CO-)Br -C#CCH(CO-)I -C#CCH(CO-)0- -C#CCH(CO-)S- -C#CCH(CO-)N < -C#CCH(CO-)NOz -C#CCH(CS-)CS- -C#CCH(CS-)CN -C#CCH(CS-)F -C#CCH(CS-)Cl

-C#CCH(CO-)F

-C#CCH(CS-)Br -C#CCH(CS-)I -C#CCH(CS-)0- -C#CCH(CS-)S-

35-44 18-29 61-70 35-44 40-49 45-54 52-60

110-135 90- 105 75-90 55-70

100-115 95-110 60-70 65-72 50-65 30-45 83-93 50-65 35-54 85-110 46-58 82-95 90-1 10 44-54 60-69 85-96 65-80 90-1 10 95-135

146-165 18-34 20-36 21-35 15-28 75-90 40-53 34-45 8-19

52-64 30-42 35-48 60-74 37-53 39-55 40-54 22-34 85-94 60-74 50-60 33-48 62-72 30-40 45-60 65-79 40-55 38-55 25-40 80-92 45-60 33-48 18-33 58-70 33-46 44-56 68-81 40-55 24-36 80-93 45-54 35-44 18-28 58-70 38-48

D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D

> C=C < ,-CN >C=C',-CN > C=C <,-CN >C=C<,-CN >SO,>C=C<,-CN >SO*, >C=C < ,-CN > C=C < ,-CN > c=c < > c=c < >c=c< > c=c < > c=c < > c=c < > c=c < > c=c < > c=c < > c=c < > c=c < > c=c < > c=c < >c=c< > c-c < > c=c <

-NOz,>C=C < >c=c< >c=c< -NO:!, > C=C < > c=c < -NOz,>C=C< >c=c< > C=C < , > C=N- Ar,-C#C- >co,-C#C- >CS,-C#C- -CN,-C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -NOz,-C#C- Ar,-C#C- > c0,-C#C- 'CS,-C#C- -CN,-C#C- Ar,-C#C- Ar,-C#C- Ar,-C#C- Ar,-C#C- Ar,-C#C- Ar,-C#C- Ar,-C#C- -NO:!,Ar,-C#C- >CO,-C#C- > c s , > c0,-C#C- -CN, > C0,-C#C- ' c0,-C#C- >co,-C#C- >CO,-C#C- >co,-C#C- >CO,-C#C- > c0,-C#C- > c0,-C#C- -N02,>CO,-C#C- > cs,-C#C- -CN,>CS,-C#C- > cs,-C#C- >CS,-C#C- 'CS,-C#C- > cs,-C#C- >cs,-C#C- >cs,-C#C-

>CH- >CH- 'CH- >CH- > CH. >CH- >CH- >CH- 'CH- >CH- >CH- 'CH- >CH- >CH- >CH- >CH- >CH- >CH- 'CH- >CH- >CH- >CH- >CH- >CH- 'CH- "CH- 'CH- >CH- 'CH- -CH=C< -CH=N- >CH- >CH- 'CH- >CH- >CH- >CH- 'CH- 'CH- >CH- >CH- >CH- >CH- >CH- >CH- 'CH- >CH- >CH- >CH- >CH- >CH- 'CH- >CH- 'CH- >CH- 'CH- 'CH- 'CH- >CH- 'CH- >CH- >CH- >CH- >CH- 'CH- >CH- > CH- >CH- >CH- >CH- 'CH- >CH- >CH- >CH-

EXPERT SYSTEM FOR STRUCTURE ELUCIDATION FROM SPECTRA J. Chem. In5 Comput. Sci., Vol. 35, No. 6, 1995 987

Table 1 (Continued)

chemical IR 'H-NMR no. substructure shift multiplicity constraint constraint

445 446 447 448 449 450 45 1 452 453 454 455 456 457 45 8 459 460 46 1 462 463 464 465 466 467 468 469 470 47 1 472 473 474 475 476 477 478 479 480 48 1 482 483 484 485 486 487 488 489 490 49 1 492 493 494 495 496 497 498 499 500 50 1 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517

-C#CCH(CS-)N < -C#CCH(CS-)N02 -C#CCH(CN)F -C#CCH(CN)Cl

-C#CCH(CN)I -C#CCH(CN)O- -C#CCH(CN)S- -C#CCH(CN)N < -C#CCH(F)F -C#CCH(F)Cl

-C#CCH(F)I -C#CCH(F)O- -C#CCH(F)S- -C#CCH(F)N < -C#CCH(Cl)Cl

-C#CCH(Cl)I -C#CCH(Cl)O- -C#CCH(Cl)S- -C#CCH(Cl)N < -C#CCH(O-)0- -C#CCH(O-)S-

-C#CCH(CN)Br

-C#CCH(F)Br

-C#CCH(Cl)Br

-C#CCH(O-)N< -C#CCH(O-)N02 -C#CCH(S-)S- -C#CCH(S-)N < -C#CCH(S-)N02 -C#CCH(N< )N < -C#CCH(N<)NO2 -C#CCH=C < -C#CCH=N- ArCH(Ar)Ar ArCH(Ar)CO- ArCH(Ar)CS- ArCH(Ar)CN ArCH(Ar)F ArCH(Ar)Cl ArCH(Ar)Br ArCH(Ar)I ArCH(Ar)O- ArCH(Ar)S- ArCH(Ar)N< ArCH(Ar)N02 ArCH(C0-)CO- ArCH(C0-)CS- , ArCH(C0-)CN ArCH(C0-)F ArCH(C0-)Cl ArCH(C0-)Br ArCH(C0-)I ArCH(C0-)0- ArCH(C0-)S- ArCH(C0-)N< ArCH(CO-)N02 ArCH(CS-)CS- ArCH(CS-)CN ArCH(CS-)F ArCH(CS-)Cl ArCH(CS-)Br ArCH(CS-)I ArCH(CS-)0- ArCH(CS-)S- ArCH(CS-)N < ArCH(CS-)N02 ArCH(CN)N< ArCH(F)F ArCH(F)Cl ArCH(F)Br ArCH(F)I ArCH(F)O- ArCH(F)S-

518 ArCH(F)N<

42-54 68-79 70-84 40-55 28-39 10-21 48-60 25-36 41-54

105-120 78-95 65-80 44-58 88- 103 70-88 45-64 50-60 38-53 20-35 70-80 40-58 43-60 89-106 52-64 73-86 80- 100 40-50 54-65 75-86 55-75 80- 100 90-120

130- 150 55-65 54-60 52-60 37-43 90- 105 57-66 47-56 29-40 71-87 52-64 59-77 88-105 65-80 63-75 44-53 91-99 57-68 49-58 30-38 75-87 42-48 54-61 80-92 60-75 55-65 90-99 58-69 50-60 30-40 74-86 42-50 52-62 79-90 57-64

130- I50 95-110 80-95 60-75

100-120 70-85 75-90

D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D

> cs,-C#C- -NOz,>CS,-C#C- -CN,-C#C- -CN,-C#C- -CN,-C#C- -CN,-C#C- -CN,-C#C- -CN,-C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -N02,-C#C- -C#C- -C#C- -N02,-C#C- -C#C- -N02,-C#C- -C#C-, > c-c -= -C#C- , > C-N- Ar >CO,Ar 'CS,Ar -CN,Ar Ar Ar Ar Ar Ar Ar Ar -N02,Ar > C0,Ar >CS,Ar, > CO -CN,Ar, > CO >CO,Ar > C0,Ar > C0,Ar >CO,Ar >CO,Ar >CO,Ar >CO,Ar -N02, > C0,Ar >CS,Ar

>CS,Ar > CS ,Ar >CS,Ar >CS,Ar >CS,Ar >CS,Ar >CS,Ar -N02, > CS,Ar Ar,-CN Ar Ar Ar Ar Ar Ar Ar

-CN, > CS,Ar

>CH- >CH- 'CH- 'CH- >CH- 'CH- >CH- >CH- >CH- >CH- >CH- >CH- 'CH- 'CH- >CH- >CH- >CH- >CH- >CH- 'CH- >CH- 'CH- 'CH- 'CH- >CH- >CH- 'CH- >CH- >CH- >CH- 'CH- -CH=C< -CH=N- >CH- >CH- >CH- >CH- 'CH- >CH- >CH- >CH- 'CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- 'CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- 'CH- 'CH-

988

Table 1 (Continued)

J. Chem. In$ Comput. Sci., Vol. 35, No. 6, 1995 HUIXIAO ET AL.

chemical IR 'H-NMR no. substructure shift multiplicity constraint constraint

519 520 521 522 523 5 24 525 526 527 528 529 530 53 1 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 55 1 552 553 554 555 556 557 558 559 560 56 1 562 563 564 565 566 5 67 568 569 570 57 1 572 573 574 575 576 517 578 579 580 581 582 583 5 84 5 85 586 587 588 589 590 591 592

ArCH(C1)CI ArCH(C1)Br ArCH(C1)I ArCH(C1)O- ArCH(C1)S- ArCH(C1)N < ArCH(0-)0- ArCH(0-)S- ArCH(0-)N< ArCH(O-)N02 ArCH(S-)S- ArCH(S-)N < ArCH(S-)N02 ArCH(N <)N < ArCH(N <)NO2 ArCH=C < ArCH=N- -COCH(CO-)CO- -COCH(CO-)CS- -COCH(CO-)CN -COCH(CO-)F -COCH(CO-)Cl -COCH(CO-)Br -COCH(CO-)I -COCH(CO-)0- -COCH(CO-)S- -COCH(CO-)N < -COCH(CO-)N02 -COCH(CS-)CS- -COCH(CS-)CN -COCH(CS-)F -COCH(CS-)Cl

-COCH(CS-)I -COCH(CS-)0- -COCH(CS-)S- -COCH(CS-)N < -COCH(CS-)N02 -COCH(CN)F -COCH(CN)Cl

-COCH(CS-)Br

-COCH(CN)Br -COCH(CN)I -COCH(CN)O- -COCH(CN)S- -COCH(CN)N < -COCH(F)F -COCH(F)CI -COCH(F)Br -COCH(F)I -COCH(F)O- -COCH(F)S- -COCH(F)N' -COCH(CI)CI -COCH(Cl)Br -COCH(Cl)I -COCH(CI)O- -COCH(Cl)S- -COCH(CI)N< -COCH(O-)0- -COCH(O-)S- -COCH(O-)N < -COCH(O-)N02 -COCH(S-)S- -COCH(S-)N < -COCH(S-)NOz -COCH(N<)N< -COCH(N<)N02 -COCH=C< -COCH=N- -CSCH(CN)F -CSCH(CN)CI -CSCH(CN)Br -CSCH(CN)I -CSCH(CN)O-

63-72 55-64 40-54 68-80 48-60 60-69 99-118 60-75 72-86

105-120 47-55 64-72 70-85 73-88 80-94 95-120

130-144 60-78 55-65 38-50 95-120 58-70 48-60 30-43 74-83 50-58 54-64 75-90 5 5 - 6 4 38-50 95-108 58-70 48-60 30-40 74-85 48-60 58-70 82-90 86-96 50-62 38-50 21-34 63-74 38-48 55-65

125-135 92-105 78-92 58-72

105-120 65-75 70-80 63-71 55-60 33-47 85-95 50-65 55-70 87-105 48-60 72-82 90- 1 10 45-54 70-78 88-100 58-65 95-110

110-146 145-165 80-92 50-62 38-40 20-33 60-72

D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D

Ar Ar Ar Ar Ar Ar Ar Ar Ar -N02,Ar Ar Ar -NO?,Ar Ar -N02,Ar Ar, > C=C < Ar, > C=N- >co 'CS, 'CO -CN,>CO >co > co > co > co > co > co >co -N02,'CO > cs, > co -CN,>CS,>CO > c s , > c o >cs, >co > CS,'CO >cs, >co > cs, > co >cs, >co > c s , > co -NOz,>CS,'CO >CO,-CN >CO,-CN >CO,-CN >CO,-CN >CO,-CN >CO,-CN >CO,-CN 'CO >co >co >co >co > co > co 'CO > co > co > co > co >co > co > co > co -NOz,>CO > co >co -NOz,>CO > co -NOz,>CO > c o , > c = c < >CO,>C=N- >CS,-CN >CS,-CN >CS,-CN >CS,-CN 'CS,-CN

>CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- 'CH- >CH- 'CH- 'CH- -CH-C< -CH=N- >CH- >CH- >CH- 'CH- >CH- >CH- 'CH- 'CH- >CH- >CH- 'CH- >CH- >CH- >CH- >CH- 'CH- >CH- >CH- 'CH- 'CH- >CH- >CH- >CH- >CH- 'CH- 'CH- >CH- >CH- >CH- >CH- >CH- >CH- 'CH- >CH- >CH- >CH- >CH- >CH- 'CH- >CH- >CH- > CH- > CH- >CH- >CH- >CH- 'CH- >CH- >CH- >CH- -CH=C< -CH=N- >CH- >CH- >CH- >CH- 'CH-

EXPERT SYSTEM FOR STRUCTURE ELUCIDATION FROM SPECTRA

Table 1 (Continued)

J. Chem. In5 Comput. Sci., Vol. 35, No. 6, 1995 989

chemical IR 'H-NMR no. substructure shift multiplicity constraint constraint

593 5 94 595 596 597 598 599 600 60 1 602 603 604 605 606 607 608 609 610 61 1 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 63 1 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 65 1 652 653 654 655 656 657 65 8 659 660 66 1 662 663 664 665 666

-CSCH(CN)S- -CSCH(CN)NI -CSCH(F)F -CSCH(F)Cl -CSCH(F)Br -CSCH(F)I -CSCH(F)O- -CSCH(F)S- -CSCH(F)N < -CSCH(Cl)Cl

-CSCH(CI)I -CSCH(Cl)O- -CSCH(Cl)S- -CSCH(CI)N < -CSCH(O-)0- -CSCH(O-)S- -CS CH(0-)N < -CSCH(O-)NO2 -CSCH(S-)S- -CSCH(S-)N< -CSCH(S-)N02 -CSCH(N < )N < -CSCH(N<)NOz -CSCH=C< -CSCH=N- CNCH(0-)0- CNCH(0-)S- CNCH(0-)N < CNCH(0-)NO2 CNCH(S-)S- CNCH(S-)N < CNCH(S-)N02

CNCH(N<)N02

-CSCH(Cl)Br

CNCH(N<)N<

CNCH-C < CNCH-N- FCH(F)O- FCH(F)S- FCH(F)SO- FCH(F)S02-

FCH(CI)O- FCH(C1)S- FCH(C1)SO- FCH(Cl)S02-

FCH(0-)0- FCH(0-)S- FCH(0-)SO- FCH(O-)S02- FCH(0-)N < FCH(S-)N <

FCH(F)N<

FCH(C1)N'

FCH(N<)N< FCH=C < FCH=N- ClCH(C1)O- ClCH(CI)S- ClCH(CI)SO2-

ClCH(0-)0- ClCH(0-)S- CICH(O-)N< ClCH(S-)N <

ClCH=N-

ClCH(CI)N<

CICH=C <

BrCH(Br)O- BrCH(Br)S- BrCH(Br)N < BrCH(0-)0- BrCH(0-)S- BrCH(0-)N< BrCH(S-)N< BrCH=C <

35-45 52-65

120-130 90-102 13-88 55-70

100-115 60-72 72-83 60-73 50-60 3 1-45 80-92 40-55 50-60 83-100 45-58 70-82 85-100 42-52 60-72 80-95 53-62 83-98

105- 130 140-160 95-116 58-73 70-84

102-118 40-50 55-66 73-84 65-80 80-94 92-118

135-144 114-118 85-98 96-105

110-116 94- 104 80-95 55-65 60-70 75-85 63-70

100- 110 73-86 78-90 85-95 80-90 45-60 60-70

130-148 160- 180 70-88 45-53 66-74 65-70 90- 105 55-67 70-80 40-55

105-144 145-160 60-75 45-60 55-70 85-100 65-80 75-86 40-50

105-126

D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D

>CS,-CN >CS,-CN ' cs 'CS 'CS ' cs ' cs ' cs ' cs 'CS > cs > cs 'CS ' cs ' cs > cs > cs ' cs -N02,>CS > cs ' cs -NOs,'CS > cs -NOz,>CS ' c s , > c-c < > CS, > C=N- -CN -CN -CN -NOz,-CN -CN -CN -N02,-CN -CN -N02,-CN -CN,'C=C< -CN,'C=N-

'SO > so2

'SO 'SO2

'SO ' so2

> c=c < > C=N-

' so2

> c=c <

>CH- 'CH- 'CH- >CH- 'CH- >CH- >CH- 'CH- 'CH- 'CH- 'CH- >CH- 'CH- 'CH- >CH- 'CH- 'CH- 'CH- 'CH- 'CH- 'CH- 'CH- 'CH- 'CH- -CH=C' -CH=N- 'CH- 'CH- 'CH- 'CH- 'CH- 'CH- 'CH- 'CH- 'CH- -CH=C< -CH=N- 'CH- 'CH- >CH. 'CH- 'CH- 'CH- 'CH- >CH. 'CH- 'CH- 'CH- 'CH- 'CH. 'CH- 'CH- 'CH- 'CH- -CH=C< -CH=N- >CH- 'CH- >CH- >CH- 'CH- 'CH- 'CH- >CH- -CH=C< -CH=N- "CH- 'CH- >CH- 'CH- 'CH- >CH- 'CH- -CHsC<

990 .I. Chem. In$ Comput. Sci., Vol. 35, No. 6, 1995 HLIIXIAO ET AL.

Table 1 (Continued)

no. substructure chemical IR 'H-NMR

shift multiplicity constraint constraint 667 668 669 670 67 1 672 673 614 615 676 677 678 679 680 68 1 682 683 684 685 686 687 688 689 690 69 1 692 693 694 695 696 697 698 699 700 70 1 702 703 704 705 706 707 708 709 710 711 712 713 7 14 715 716 717 718 7 19 720 721 722 723 724 725 726 727 728 729 730 73 1 732 733 734 735 736 731 738 739 740

BrCH-N- -OCH(O-)0- -OCH(O-)S- -OCH(O-)N < -OCH(O-)N02 -OCH(O-)PO <

-OCH(S-)N < -OCH(S-)NO2 -OCH( S-)PO < -OCH(N<)N< -OCH(N <)NO2 -OCH=C< -0CH-N- -SCH(S-)S- -SCH( S-)N < -SCH(S-)N02 -SCH( S-)PO < -SCH(N')N< -SCH(N<)NOz -SCH=C< -S CH-N-

-OCH(S-)S-

>NCH(N<)N<

> NCH=C < >NCH(N<)NO*

>NCH=N- N02CH=C < N02CH=N- > POCH=C < >POCH=N- HC#C- -CH=(Ar) RCHO ArCHO RCHS ArCHS FCH(R)F ClCH(R)Cl BrCH(R)Br

FCH(R)CI -OCH(R)CN RC(R I )(Rz)R3

ClCH(R)NOz

RC(RI)(R~)C=C< RC(Rl)(Rz)C#C- RC(RI) (R~)A~ RC(RI)(RZ)CO- RC(R I )(R2)CS- RC(Ri)(Rz)CN RC(RI)(R~)F RC(Ri)(Rz)Cl RC(Rd(R2)Br RC(Ri ) (W RC(RI)(R~)O- RC(Ri)(RdS- RC(RI)(R~)SO- RC(R I )(RdSOz- RC(Ri)(RdN< RC(RI )(R2)PO < > C=CC( R)( R1)C-C < > C=-CC(R)(Ri)C#C-

>C=CC(R)(Rl)CO- > C-CC(R)(Ri)CN > C=CC(R)(Ri)F >C=CC(R)(Ri)Cl

>C=CC(R)(Ri)I > C=CC(R)(Rl)O- >CsCC(R)(Rl)S- >C=CC(R)(Ri)SO- > C=CC(R)(Ri)SO2- > C=CC(R)(Ri)N< > C=CC(R)(Rl)NOz

>C=CC(R)(RI)Ar

>C==CC(R)(RI)B~

135- 150 109-122 80-95 90- 102

130-140 94-97 60-72 80-90

110-120 80-90 85-95

115-130 117-137 150-170 54-67 65-16 85-100 41 -46 70-80 90- 100

104- 144 134-148 60-80 80-100

116-150 150- 170 1 36- 1 60 160- 180 124-132 170-185 22-93

100-140 197 - 206 206-216 190-210 200-220 105-114 72-91 50-60 88-111 92-102 50-65 30-66 40-62 38-44 33-50 45-64 36-54 27-46 89-96 62-76 54-66 37-52 68-87 39-53 56-65 58-68 50-63 32-53 40-50 44-50 45-55 46-63 44-57 90-100 71 -82 65-86 40-50 72-89 45-56 58-64 62-65 55-65 92- 100

D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S

> C=N-

-NO2 >PO-

-NO2 >PO-

-NO1 > c=c < > C=N-

-NO1 >PO-

-NO2 >c=c< >C=N-

-NO2 >c=c< > C=N- > C=C <,-NO? > C=N-,-N02 >c=c <, > PO- >C=N-,>PO- -C#C- Ar > co Ar, > CO > cs Ar,>CS

-NO2

-CN

>c=c< -C#C- Ar > co >cs -CN

-CH=N- > CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- >CH- 'CH- 'CH- -CH=C< -CH=N- 'CH- >CH- 'CH- >CH- 'CH- >CH- -CH=C< -CH=N- >CH- >CH- -CH=C< -CH=N- -CH=C< -CH=N- -CH=C< -CH=N- -C#CH Ar-H -CHO -CHO -CHS -CHS >CH- 'CH- >CH- >CH- >CH- >CH-

EXPERT SYSTEM FOR STRUCTURE ELUCIDATION FROM SPECTRA

Table 1 (Continued)

J. Chem. I n . Comput. Sci., Vol. 35, No. 6, 1995 991

chemical IR IH-NMR no. substructure shift multiplicity constraint constraint 74 1 742 743 744 745 746 747 748 749 750 75 1 752 753 754 755 756 757 758 759 760 76 1 762 763 764 765 766 767 768 769 770 77 1 772 773 774 775 776 777 778 779 780 78 1 782 783 784 785 786 787 788 789 790 79 1 792 793 794 795 796 797 798 799 800 80 1 802 803 804 805 806 807 808 809 810 81 1 812 813 814

-C#CC(R)(R,)C#C-

-C#CC(R)(RI)CO- -C#CC(R) (R1)CS- -C#CC(R) (R1)CN -C#CC(R) (R1)F -C#CC(R) (R1)Cl

-C#CC(R) (RI)I -C#CC(R) (R1)O- -C#CC(R) (R1)S- -C#CC(R) (Rl)N< -C#CC(R) (R1)NOz

ArC(R) (R1)CO- ArC(R) (R1)CS-

-C#CC(R)(RI)Ar

-C#CC(R) (RJBr

Arc@) ( R I M

ArC(R) (RI)CN ArC(R) (RdF ArC(R)(Ri)Cl ArC(R)(RdBr ArC(R)(RdI ArC(R)(RI)O- ArC(R)(R])S-

ArC(R)(Ri)N< ArC(R)WN02

ArC(R)(Rl)SO-

-COC(R)(Rl)CO- -COC(R)(Rl)CS- -COC(R)(Rl)CN -COC(R)(Rl)F -COC(R)(RI)C~

-COC(R)(Rl)I -COC(R)(Rl)O- -COC(R)(Rl)S- -COC(R)(RI)SOZ- -COC(R)(Rl)N< -COC(R)(Rl)NOz -CSC(R)(Rl)CS- -CSC(R)(R])CN -CSC(R)(R])F -CSC(R)(RI )C1

-COC(R)(RI )Br

-CSC(R)(RI )Br -CSC(R)(Rl)I -CSC(R)(R,)O- -CSC(R)(Rl)S- -CSC(R)(RI)SO~- -CSC(R)(Rl)N< -CSC(R)(Rl)NOz CNC(R)(Rl)O- CNC(R)(R])S- CNC(R)(R])N< FC(R)(RI)O- FC(R)(RI)S- FC(R)(Ri)N< CIC(R)(RI)O- CIC(R)(Rl)S- ClC(R)(RI)N< BrC(R)(RI)O- BrC(R)(Rl)S- BrC(R)(RI)N< -OC(R)(R,)O- -OC(R)(R,)S- -OC(R)(Rj)SOz- -OC(R)(Rl)N< -SC(R)(Rj)S- -SC(R)(Rl)N< -SC(R)(Rl)PO< -SO~C(R)(RI)SO~- -SOzC(R)(Rl)N< 'NC(R)(Ri)N < NOZC(R)(RI)O- NOZC(R)(RI)S- NOzC(R)(Ri)N<

42-55 44-56 45-60 43-55 38-50 80-95 65-76 60-70 35-48 64-83 40-50 44-56 85-100 37-54 50-55 45-52 39-49 90-100 72-85 60-70 35-50 75-89 40-50 45-57 55-66 90-100 59-82 55-70 50-60 95-110 69-81 58-70 40-50 65-88 40-50 45-57 48-60 90- 103 50-70 48-62 90-105 68-79 55-65 40-52 62-80 40-50 48-58 47-59 90- 105 69-83 38-50 51-59

110-120 80-94 85-98 94- 103 60-70 70-82 79-86 50-60 60-74 92-117 70-85 75-90 80- 100 45-60 58-70 5 1-59 80-91 90-96 75-90

100-120 80- 100 90- 110

S -C#C- S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S

Ar,-C#C- >co,-C#C- ' cs,-C#C- -CN,-C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -N02,-C#C- Ar >CO,Ar >CS,Ar -CN,Ar Ar Ar Ar Ar Ar Ar > s0,Ar Ar -NOz,Ar 'CO 'CS, > co -CN,>CO ' co > co ' co 'CO 'CO 'CO 'SO2,'CO > co -N02,> CO 'CS -CN,>CS ' cs ' cs ' cs >cs 'CS 'CS ' soz, ' cs 'CS -N02,>CS -CN -CN -CN

' so2

>PO- ' so2 ' so2

992 J. Chem. In$ Comput. Sci., Vol. 35, No. 6, 1995 HUIXIAO ET AL.

Table 1 (Continued)

no. chemical IR IH-NMR

substructure shift multiplicity constraint constraint

815 816 817 818 819 820 82 1 822 823 824 825 826 827 828 829 830 83 1 832 833 834 835 836 837 838 839 840 84 1 842 843 844 845 846 847 848 849 850 85 1 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 87 1 872 873 874 875 876 877 878 879 880 88 1 882 883 8 84 885 886 887 888

RC(R,)=C< 92-120 RC (R I)=N- 172-187 >C=CCR(C=C <)C=C < 69 - 80 > C-CCR(C-C <)C#C- > C=CCR(C=C <)Ar > C=CCR(C=C <)CO- >C-CCR(C--C <)CS- >C=CCR(C=C <)CN >C=CCR(C=C <)F >C=CCR(C=C <)Cl > C=CCR(C=C <)Br >C=CCR(C=C <)I >C=CCR(C=C <)O- >C=CCR(C=C <)S-

> C=CCR(C=C<)NO:, > C=CCR(C#C-)C#C- >C=CCR(C#C-)Ar > C=CCR(C#C-)CO- > C=CCR(C#C-)CS- > C=CCR(C#C-)CN > C=CCR(C#C-)F >C=CCR(C#C-)CI > C=CCR(C#C-)Br > C=CCR(C#C-)I > C-CCR(C#C-)O- > C=CCR(C#C-)S- > C=CCR(C#C-)N < >C=CCR(C#C-)N02

> C=CCR(Ar)CO- > C=CCR( Ar)C S -

> C=CCR(C=C <)N <

> C=CCR(Ar)Ar

>C=CCR(Ar)CN >C=CCR(Ar)F >C=CCR(Ar)Cl > C=CCR(Ar)Br > C=CCR(Ar)I >C=CCR(Ar)O- > C=CCR(Ar)S- > C=CCR(Ar)N < >C=CCR(Ar)NOz > C=CCR(CO-)CO- > C=CCR(CO-)CS- > C=CCR(CO-)F >C=CCR(CO-)Cl >C=CCR(CO-)Br

C=CCR(CO-)I > C=CCR(CO-)0- > C=CCR(CO-)S- >C=CCR(CO-)N< >C=CCR(CO-)NO2 > C=CCR(CS-)CS- > C=CCR(CS-)CN > C=CCR(CS-)F > C=CCR(CS-)Cl

> C=CCR(CS-)I >C=CCR(CS-)0- > C=CCR(CS-)S- > C=CCR(CS-)N < > C=CCR(CS-)N02

> C=CCR(CS-)Br

>C=CCR(CN)F >C=CCR(CN)Cl >C=CCR(CN)Br >C=CCR(CN)I > C=CCR(CN)O- > C=CCR(CN)S- >C=CCR(CN)N< >C=CCR(F)F > C=CCR(F)CI > C=CCR(F)Br > C=CCR(F)I > C=CCR(F)O- >C=CCR(F)S-

50-60 55-70 68-76 64-74 42-56 95-110 73-85 60-80 40-60 72-89 50-60 60-70 74-99 40-50 50-60 55-65 50-61 38-46 85-100 65-75 53-65 35-46 65-90 45-55 60-70 80-92 46-53 50-60 48-60 40-50 95-108 70-85 60-70 40 - 60 72-88 50-60 64-70 80-95 40-49 40-55 80-95 70-82 60--72 40-55 72-84 35-50 50-60 70-85 40-50 40-48 80-95 70-80 60-70 45-60 70-83 35-48 50-60 72-83 70-85 60-70 50-60 35-50 60-75 30-42 63-72 97-112 75-95 65-75 55-66 85-100 60-72

S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S

>c=c< > c=c < -C#C-, >c=c < Ar, > C=C < > c o , > c = c < > cs, > c=c < -CN,>C=C< > c=c < > c=c < >c=c< > c=c < >c=c< >c=c< >c=c< -NOz,>C=C< -C#C-, > c=c < Ar,>C=C <,-C#C- 'CO, >C=C<,-C#C- >CS,>C=C<,-C#C- -CN, > C=C ' ,-C#C- > c=c < ,-C#C- >C=C<,-C#C- > c=c < ,-C#C- >C=C<,-C#C- > c=c < ,-C#C- > c=c < ,-C#C- >C=C<,-C#C- -NO>, > C=C < ,-C#C- Ar,>C=C< >CO,>C=C <,Ar >CS,>C=C <,Ar -CN,>C=C<,Ar > C=C < , Ar > C=C<,Ar >C=C<,Ar > C=C < ,Ar > C=C < ,Ar > C=C < ,Ar >C=C<,Ar -NOl,>C=C< ,Ar > co,>c=c < >CS,>C=C < ,'CO > c=c <, 'CO >C=C<,>CO >C=C<,>CO >c=c < ,>co >C=C<,>CO >c=c <,>CO >c=c <,>CO -NO1,>C=C<,>CO >cs, >c=c< -CN,>C=C<,>CS >c=c <,> cs >c=-C<,>cs >C=C<,>CS >c=c<,> cs >c=c <, >cs >c=c<,>cs >C=C<,>CS -N02,>C=C<,>CS > C-C < ,-CN >C=C<,-CN >C=C<,-CN >C=C<,-CN >C=C<,-CN > C=C < ,-CN >C=C<,-CN > c=c < > c=c < > c=c < >c=c< > c=c < >C=C<

>C=N.

EXPERT SYSTEM FOR STRUCTURE ELUCIDATION FROM SPECTRA

Table 1 (Continued)

J. Chem. In$ Comput. Sci., Vol. 35, No. 6, 1995 993

chemical IR 'H-NMR no. substructure shift multiplicity constraint constraint

889 890 89 1 892 893 894 895 896 897 898 899 900 90 1 902 903 904 905 906 907 908 909 910 91 1 912 913 914 915 916 917 918 919 920 92 1 922 923 924 925 926 927 928 929 930 93 1 932 933 934 935 936 937 938 939 940 94 1 942 943 944 945 946 947 948 949 950 95 1 952 953 954 955 956 957 958 959 960 96 1 962

>C=CCR(F)N< >C<CR(Cl)CI >C=CCR(Cl)Br >C=CCR(Cl)I

>C=CCR(CI)N<

>C-CCR(Cl)O- >C=CCR(Cl)S-

> C=CCR(O-)0- >C=CCR(O-)S- > C=CCR(O-)N < >C=CCR(O-)NOz >C=CCR(S-)S- >C=CCR(S-)N< > C=CCR(S-)N02

>C=CCR(N<)NOz > C=CCR(N <)N <

>C=CCR=C<

-C#CCR(C#C-)Ar > C=CCR-N-

-C#CCR(C#C-)CO- -C#CCR(C#C-)CS- -C#CCR(C#C-)CN -C#CCR(C#C-)F -C#CCR(C#C-)Cl -C#CCR(C#C-)Br -C#CCR(C#C-)I -C#CCR(C#C-)0- -C#CCR(C#C-)S- -C#CCR(C#C-)N < -C#CCR(C#C-)N02 -C#CCR(Ar)Ar -C#CCR(Ar)CO- -C#CCR(Ar)CS-

-C#CCR(Ar)F -C#CCR(Ar)Cl -C#CCR(Ar)Br -C#CCR(Ar)I

-C#CCR(Ar)S-

-C#CCR(Ar)NOz

-C#CCR(Ar)CN

-C#CCR(Ar)O-

-C#CCR(Ar)N <

-C#CCR(CO-)CO- -C#CCR(CO-)CS- -C#CCR(CO-)F -C#CCR(CO-)Cl -C#CCR(CO-)Br -C#CCR(CO-)I -C#CCR(CO-)0- -C#CCR(CO-)S- -C#CCR(CO-)N < -C#CCR(CO-)NOz -C#CCR(CS-)CS- -C#CCR(CS-)CN -C#CCR(CS-)F -C#CCR(CS-)Cl

-C#CCR(CS-)I -C#CCR(CS-)0- -C#CCR(CS-)S- -C#CCR(CS-)N -C#CCR(CS-)N02 -C#CCR(CN)F -C#CCR(CN)Cl

-C#CCR(CN)I -C#CCR(CN)O- -C#CCR(CN)S- -C#CCR(CN)N < -C#CCR(F)F -C#CCR(F)Cl

-C#CCR(F)I -C#CCR(F)O-

-C#CCR(CS-)Br

-C#CCR(CN)Br

-C#CCR(F)Br

75-85 85-94 75-85 50-65 88-100 60-70 70-85 93-106 65-75 78-95

100-113 49-60 60-72 80-95 65-80 90- 105 95-135

167-174 30-40 40-60 45-55 28-40 75-90 55-65 43-55 25-36 60-80 40-50 55-60 65-80 70-78 45-60 43-55 38-49 90-100 65-75 55-65 38-50 70-80 45-55 60-70 75-90 40-50 40-52 78-92 65-75 58-66 36-46 70-81 35-50 50-60 70-84 40-50 40-48 80-92 65-80 56-66 35-50 70-80 35-48 45-58 70-80 70-82 58-70 50-58 30-40 65-75 30-40 50-60 92-105 70-85 60-70 45-56 80-92

S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S

7 c-c < 7c=c< 7 c=c < 'C=C< > C-C < > c-c < > c=c < >c=c< >c=c< >c=c < -NOz,>C=C< > c-c < > c=c < -NO?, > C=C < >c=c< -NOz,'C=C< > c-c < > C-C < , > C-N- Ar,-C#C- >CO,-C#C- 'CS,-C#C- -CN,-C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -NOz,-C#C- Ar,-C#C- >CO,-C#C- >CS,-C#C- -CN,-C#C- Ar,-C#c- Ar,-C#C- Ar,-C#C- Ar,-C#C- Ar,-C#C- Ar,-C#C- Ar,-C#C- -N02,Ar,-C#C- 'CO,-C#C- > c s , > c0,-c#c- >CO,-C#C- > c0,-C#C- >CO,-C#C- >CO,-C#C- >co,-C#C- >CO,-C#C- > c0,-C#C- -NO2,> C0,-C#C- >CS,-C#C- -CN, > CS,-C#C- > cs,-C#C- 'CS,-C#C- >CS,-C#C- 'CS,-C#C- >CS,-C#C- 'CS,-C#C- 'CS,-C#C- -NOz,>CS,-C#C- -CN,-C#C- -CN,-C#C- -CN,-C#C- -CN,-C#C- -CN,-C#C- -CN,-C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -C#C-

994 J. Chem. In5 Comput. Sci., Vol. 35, No, 6, 1995 HUIXIAO ET AL.

Table 1 (Continued) chemical IR 'H-NMR

no. substructure shift multiplicity constraint constraint

963 964 965 966 967 968 969 970 97 1 972 973 974 975 976 977 978 979 980 98 1 982 983 984 985 986 987 988 989 990 99 1 992 993 994 995 996 997 998 999

1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 101 1 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036

-C#CCR(F)S- -C#CCR(F)N< -C#CCR(Cl)Cl -C#CCR(Cl)Br -C#CCR(Cl)I -C#CCR(Cl)O- -C#CCR(Cl)S- -C#CCR(Cl)N < -C#CCR(O-)0- -C#CCR(O-)S- -C#CCR(O-)N < -C#CCR(O-)NO2 -C#CCR( S-)S- -C#CCR(S-)N < -C#CCR(S-)N02 -C#CCR(N')N' -C#CCR(N <)NO2 -C#CCR=C < -C#CCR=N- ArCR( Ar)Ar ArCR(Ar)CO- ArCR(Ar)CS- ArCR(Ar)CN ArCR(Ar)F ArCR(Ar)CI ArCR(Ar)Br ArCR(Ar)I ArCR(Ar)O- ArCR(Ar)S- ArCR(Ar)SO- ArCR(Ar)N < ArCR(Ar)N02 ArCR(C0-)CO- ArCR(C0-)CS- ArCR(C0-)CN ArCR(C0-)F ArCR(C0-)C1 ArCR(C0-)Br ArCR(C0-)I ArCR(C0-)0- ArCR(C0-)S- ArCR(C0-)N < ArCR(CO-)N02 ArCR(CS-)CS- ArCR(CS-)CN ArCR(CS-)F ArCR(CS-)C1 ArCR(CS-)Br ArCR(CS-)I ArCR(CS-)0- ArCR(CS-)S- ArCR(CS-)N < ArCR(CS-)N02 ArCR(F)F ArCR(F)Cl ArCR(F)Br ArCR(F)I ArCR(F)O- ArCR(F)S- ArCR(F)N< ArCR(C1)Cl ArCR(C1)Br ArCR(C1)I ArCR(C1)O- ArCR(C1)S- ArCR(Cl)N< ArCR(0-)0- ArCR(0-)S- ArCR(0-)N < ArCR(O-)N02 ArCR(S-)S- ArCR( S-)N< ArCR(S-)N02 ArCR(N<)N<

55-70 65-15 80-90 70-80 50-60 90-100 65-75 70-80 90-100 62-73 70-82 95-105 46-55 50-62 70-85 60-70 80-92

120-135 150- 160 50-70 55-73 51-67 46-52 92-105 73-85 63-69 40-50 75-86 50-60 60-74 54-73 80-92 57-65 55-63 45-55 95-106 75-86 65-75 45-60 85-95 48-56 64-68 90-100 55-65 40-53 93-104 70-82 60-72 40-55 80-92 45-53 60-70 85-97

105-120 85-100 75-86 50-65 97-108 70-85 85-97 75-86 65-78 40-50 75 - 85 50-60 60-70 93-108 60-75 70-80 95-110 50-59 60-70 80-95 70-80

S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S s S S S S

-C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -C#C- -NO*,-C#C- -C#C- -C#C- -NO?,-C#C- -C#C- -N02,-C#C- -C#C-,>c=c < -C#C-,>C=N- Ar > C0,Ar >CS,Ar -CN,Ar Ar Ar Ar Ar Ar Ar >SO,Ar Ar -NO*,Ar >CO,Ar >CS,Ar,>CO -CN,Ar, > CO > C0,Ar >CO,Ar >CO,Ar >CO,Ar >CO,Ar >CO,Ar > C0,Ar -NOz,>CO,Ar >CS,Ar -CN,>CS,Ar >CS,Ar >CS,Ar >CS,Ar > CS,Ar >CS,Ar >CS,Ar >CS,Ar -NOz,>CS,Ar Ar Ar Ar Ar Ar Ar Ar Ar Ar Ar Ar Ar Ar Ar Ar Ar -N02.Ar Ar Ar -N02,Ar Ar

EXPERT SYSTEM FOR STRUCTURE ELUCIDATION FROM SPECTRA

Table 1 (Continued)

J. Chem. InJ Comput. Sci., Vol. 35, No. 6, 1995 995

chemical IR 'H-NMR no. substructure shift multiplicity constraint constraint

1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110

ArCR(N<)NOz ArCR=C < ArCR=N- -COCR(CO-)CO- -COCR(CO-)CS- -COCR(CO-)CN -COCR(CO-)F -COCR(CO-)Cl

-COCR(CO-)I -COCR(CO-)0- -COCR(CO-)S- -COCR(CO-)N < -COCR(CO-)NOz -COCR(CS-)CS- -COCR(CS-)CN -COCR(CS-)F -COCR(CS-)Cl

-COCR(CS-)I -COCR(CS-)0- -COCR(CS-)S- -COCR(CS-)N -COCR(CS-)NOz -COCR(CN)F -COCR(CN)CI

-COCR(CN)I -COCR(CN)O- -COCR(CN)S- -COCR(CN)N < -COCR(F)F -COCR(F)Cl

-COCR(F)I -COCR(F)O- -COCR(F)S- -COCR(F)N < -COCR(Cl)Cl

-COCR(CI)I -COCR(Cl)O- -COCR(Cl)S- -COCR(CI)N ' -COCR(O-)0- -COCR(O-)S- -COCR(O-)N -COCR(O-)NOz -COCR(S-)S- -COCR(S-)N < -COCR(S-)N02 -COCR(N<)N< -COCR(N<)NO2 -COCR=C< -COCR=N- -CSCR(CN)F -CSCR(CN)Cl

-COCR(CO-)Br

-COCR(CS-)Br

-COCR(CN)Br

-COCR(F)Br

-COCR(CI)Br

-CSCR(CN)Br -CSCR(CN)I -CSCR(CN)O- -CSCR(CN)S- -CSCR(CN)N < -CSCR(F)F -CSCR(F)Cl

-CSCR(F)I -CSCR(F)O- -CSCR(F)S- -CS CR(F)N < -CSCR(Cl)CI

-CSCR(Cl)I -CSCR(Cl)O- -CSCR(Cl)S-

-CSCR(F)Br

-CSCR(CI)Br

90-105 95-120

160-174 72-83 70-80 62-75 85-100 65-75 53-59 40-50 71-80 50-60 60-70 80-95 65-76 55-65 80-93 62-72 50-60 40-50 70-80 40-52 55-65 75-90 95- 115 70-85 60-70 40-50 80-90 50-60 60-70

100-120 95-110 90- 100 60-72 95-105 75-85 85-95 83-90 75-85 50-60 88-98 70-80 75-86 90- 100 73-87 84-95

100-1 15 53-61 65-78 90- 100 75-88

105-1 15 125-149 170- 180 90-110 75-88 66-78 48-60 85-95 60-70 70-86

100-113 95-105 85-97 60-70 92-105 70-82 80-95 80-88 70-82 48-60 90- 100 60-75

S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S

-NOZ,Ar Ar, > C=C < Ar, > C=N- > co > c s , > co -CN,>CO > co > co >co > co >co > co > co -N02,>CO > c s , > co -CN,>CS,'CO > c s , >co >CS,'CO >CS,'CO >CS, >CO > c s , > co > c s , > c o >CS,>CO -NOz,'CS,>CO >CO,-CN >CO,-CN >CO,-CN 2'20,-CN >CO,-CN >CO,-CN '

>CO,-CN > co > co >co > co > co >co > co > co > co 'CO >co > co > co >co 'CO > co -NOz,>CO >co > co -NOz,>CO > co -N02, > CO >CO,>C=C < >CO,>C=N- >CS,-CN >CS,-CN >CS,-CN >CS,-CN >CS,-CN >CS,-CN >CS,-CN > cs I C s 'CS >cs 'CS > cs 'CS 'CS > cs >CS > cs > cs

996 J . Chem. In$ Comput. Sci., Vol. 35, No. 6, 1995

Table 1 (Continued)

HUIXIAO ET AL.

chemical IR 'H-NMR no. substructure shift multiplicity constraint constraint

1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1 I24 1 I25 1126 1127 1128 1129 1130 1131 1132 1 I33 1 I34 1135 1136 1137 1 I38 1139 1140 1141 1 I42 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1 I55 1 I56 1 I57 1158 1159 1160 1 I61 1162 1163 1164 1165 1166 1 I67 1168 1169 1170 1171 1 I72 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184

-CSCR(Cl)N< -CSCR(O-)0- -CSCR(O-)S- -CSCR(O-)N < -CSCR(O-)N02 -CSCR(S-)S- -CSCR(S-)N< -CSCR(S-)N02 -CSCR(N<)N< -CSCR(N ' )NO2 -CSCR=C' -CSCR=N- CNCR-C < CNCR=N- FCR(F)O- FCR(F)S-

FCR(C1)O- FCR(C1)S-

FCR(0-)0- FCR(0-)S- FCR(0-)N< FCR(O-)NOz FCR(S-)S- FCR(S-)N< FCR(S-)NOz

FCR(F)N<

FCR(C1)N <

FCR(N<)N < FCR(N <)NO2 FCR=C < FCR=N- ClCR(C1)O- ClCR(C1)S- ClCR(C1)N < ClCR(0-)0- ClCR(0-)S- ClCR(0-)N < ClCR(O-)N02 ClCR(S-)S- ClCR( S-)N < ClCR(S-)NOI ClCR(N<)N< ClCR(N <)NO2 ClCR=C<

BrCR(Br)O- BrCR(Br)S- BrCR(Br)N< BrCR(0-)0- BrCR(0-)S- BrCR(0-)N< BrCR(O-)NO? BrCR(S-)S- BrCR(S-)N< BrCR(S-)N02 BrCR(N<)N< BrCR(N<)NOz BrCR=C < BrCR=N-

ClCR=N-

-OCR(O-)0- -OCR(O-)S- -OCR(O-)N< -OCR(O-)NO? -OCR(S-)S- -OCR(S-)N < -OCR(S-)NO2 -OCR(N<)N< -OCR(N<)NO* -OCR(N<)PO< -OCR=C< -0CR-N- -SCR(S-)S- -SCR(S-)N< -SCR(S-)NOz

70-80 90-100 70-90 80-95

100-115 50-60 60-70 85-100 60-75 85-100

135-159 165-175 110-134 155-170 130-145 105-120 110-125 100-115 80-95 90-100

125-140 100-110 110-120 135-150 85-100 95-110

110-120 105-115 130-142 154-169 170-185 100-115 80-90 90- 100

105-118 80-92 90-105

120-135 60-70 70-82 90-105 80-95

100-110 134-138 148-168 90-100 70-80 80-90 95-110 75-90 85-98

115-130 55-68 65-80 85-96 75-86 95-106

112-122 120-148 1 1 1 - I24 90-1 I O

100-115 125-140 75-90 85-96

100- 110 90-100

110- 122 104- 110 150-174 164-184 40-50 55-66 75-86

S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S

>cs 'CS > cs >cs -N02,>CS > cs > cs -NOz,>CS >cs -N02,'CS >cs,> c=c >CS,>C=N- -CN,>C=C< -CN,>C=N-

-NO*

-NO2

-NO2

-NO2

-NO>

-NO2

-NO2

EXPERT SYSTEM FOR STRUCTURE ELUCIDATION FROM SPECTRA

Table 1 (Continued)

J. Chem. Int Comput. Sei., Vol. 35, No. 6, 1995 997

no. chemical IR 'H-NMR

substructure shift multiplicity constraint constraint

1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 121 1 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258

-SCR(N<)N< -SCR(N<)NOz -SCR=C< -SCR=N- >NCR(N<)N<

> NCR=C < >NCR(N<)NOz

>NCR=N- N02CR=C< NOzCR=N- RC#C- -CR=(Ar) RCRlO ArCRO CICR(R1)Cl ArCH2SO- -OC#C- >NC#C- -SC#C- RC#N -OWN > NC#N -SC#N > c=c=c < O=C=C<

s-c-c < -coc (CO-)=C< A r C A r C < CNC (CN)=C < >C=CC (C=C<)=C< > C=CC (N.<)=C < -N=CCR=C < x=cc (O-)=C< -coc (0-)=C < ICR-C < -0c (O-)-C< - 0 C (N<)=C< -OCCl=C < - 0 c (S-)=C < - O C B r C < -OCI=C < 'c=ccoc=c< ~c=ccoco- > C=CCOAr > C=CCOR -COCOAr

-N=C=C <

-COCOR -CSCOR ArCOAr ArCOR 'C=CCOO- - c o c o o - RCOO- > C=CCON < -C#CCON < -COCON< RCONX RCOS- -C#CCOF RCOF > C=CCOCl RCOBr >C=CCHO -C#CCHO -0CHO -0coo- >NCOO- "CON< >C=CC (C=C<)=N- >C-CC (0-)=N- >C=CC (N<)=N- -COC (S-)=N- RC (S-)=N-

76-8 1 86-98

120- 139 159-175 99-104

115-126 120-145 148-169 145-160 170-186 61-123

120-140 188-225 186-215 67-83 52-63 80-95 75-90 64-76

107-128 108-1 12 105-120 108-1 17 160-220 160-207 150-227 265-275 100-105 110-120 73-85

100-150 130-135 1 30- 1 40 153-191 131-201 95-105

149- 183 130-168 136-149 148-167 120-139 86-91

171-210 180-196 170-206 188-210 180-197 177-205 184- 191 180-201 186-215 160-180 157-168 162-178 15 1 - 174 150- 160 151-162 160-180 184-205 141-150 155-163 153- 169 165-176 171-195 15 1 - 175 210-225 148-158 147-157 146-167 144-156 158-175 142-166 157- 166 163-174

S S S S S S S S S S S S S S S T S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S D D D S S S S S S S S

Ar, > SO -C#C- -C#C- -C#C- -CN -CN -CN -CN > c=c=c < 0-c-c <

s=c=c< >c=c <, > co Ar, > C=C < -CN,>C=C < >C=C' >C=C< >C=C< > c=c < >c=c < , > co > c=c < > c-c < >c=c< >c=c< >c=c< > c=c < >c=c< >C=C <, >co > c=c < , >co >C=C<,>CO,Ar >C=C <,>CO Ar, > CO >co >CO,'CS Ar,>CO Ar, > CO >C=C <, >co >co > co >c=c < , > c o -C#C-,>CO >co > co > co -C#C-, > co > co >c=c <, > co > co >c-c 4 ,>co -C#C-, 'CO 'CO > co > co co

>C=C < , > CeN- >C=C 4 , > C-N- ' C-C < , > C-N- > CO, > C-N- 'CsN-

-N=C-C <

'CH-

-CHO -CHO -CHO

998 J. Chem. In$ Comput. Sci., Vol. 35, No. 6, 1995 HUIXIAO ET AL.

Table 1 (Continued)

chemical IR 'H-NMR no. substructure shift multiplicity constraint constraint

1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277

'NC (N<)=N- >NCH=N- RCRiS -0c (0 - )S CH3CS- RCF3 RCC13 RCHC12 RCBr3 Br2CHCO- Br2CHAr Br2CHR CH3S02- CHjPO < -COC#C- FC#C- ClC#C- BrC#C- IC#C-

153-161 137-149 256-279 185-194 32-42

117-126 86-111 67-83 55-60 31-40 35-41 49-62 32-45 9-18

70-88 88-92 77-81 40-50

0-3

S D S S Q S S D S D D D Q Q S S S S S

> C=N- >C=N- > cs > cs > cs

> co Ar

> so2 >PO- -C#C-,>CO -C#C- -C#C- -C#C- -C#C-

-CH=N-

-CSCH3

>CH-

'CH- >CH- 'CH- -S02CH3 >POCH3

Table 2. The Digitized I3C-NMR Spectrum of C19H31NO chemical chemical chemical

shift multiplicity shift multiplicity shift multiplicity 14.1 Q 22.6 T 25.8 T 29.3 T 31.9 T 35.8 T 36.7 T 40.6 T 126.4 D

128.6 D 139.1 S 173.2 S

Table 3. Analvsis Result of I3C-NMR Soectrum of CioHqiNO

no. LNSCS code substructure 1 J

PW L W IS 140 ia 100 M w 40 20 o ppm

Figure 2. The I3C-NMR spectrum of ClgH31NO.

in the unknown structure. If all the premises of a goal are satisfied, then the goal is considered to be true, and the substructure corresponding to the goal may be embedded within the complete structure of the unknown compound.

c c . 3 c .3cc .3 c.3cc1-c.c.=c.c.=c1. C.3CN.2CI=C.C.=C.C.=Cl. N.2CC.3 c.3cc.-o =C.2(Ar) =C . ( Ar) N.2C(=O)C.3

CH3C4

3CCH2Ph > NCH2Ph >NCH2C+ +CCH2C(=O)-

3CCH2Ct

=C<(Ar) =CH-(Ar) +CCON<

EXAMPLES

N-Phenethylundecanamide C19H31NO, structure 1, is taken as an example to illustrate the procedure in which the constraints from I3C-NMR spectral analysis are obtained in ESSESA. Figure 2 is the I3C-NMR spectrum of N- phenethylundecanamide.

C H y (CH& - $- NH- CH2- CHI- 1 - 0

At first, the 13C-NMR is entered. The spectral data must be presented in digital form, but the means of digitization is unimportant, as long as it is accurate. ESSESA accepts the following ranges for the digitized data: chemical shifts of 0.00-250 ppm; multiplicity of S(single), D(double), T(tri- ple), and Q(quartet). The digital data from the spectrum in Figure 2 are given in Table 2. The constraints from IR spectral analysis and the ring and double-bond characteristics and the atomic composition, which are obtained in the IR spectral analysis, of N-phenethylundecanamide are passed to the interpreter program to help the analysis of I3C-NMR.

As the interpreter program acquires these data, it makes use of the rules in the knowledge base to compare the stored spectral patterns with the digital input data to identify the substructures that might be contributing to the N-phenethy-

lundecanamide structure. The resulting constraints from the I3C-NMR spectral analysis of N-phenethylundecanamide are listed in Table 3. The structure of N-phenethylundecanamide can be compared with the substructures shown in Table 3, which ESSESA decided should be present in the structure of N-phenethylundecanamide, based upon the I3C-NMR spectrum. The substructures 1, 2, 3, 4, 6 , 7, 8, and 9 exist really in the structure of N-phenethylundecanamide. Sub- structures 4 and 5 from the analysis of the I3C-NMR spectra do not exist in the structure of N-phenethylundecanamide. They can be deleted from the analysis of all the constraints from IR, 'H-NMR, and I3C-NMR as well as the mutually substructure consistent analysis, and the details about this process will be presented in next paper.

The analysis results of other examples tested by ESSESA are shown in Table 4 that seems to contain almost the carbon atom centered substructures which should be existed in the testing structures.

DISCUSSION

Structure elucidation of compounds is mainly performed by spectroscopic methods like mass-spectrometry, IR, and NMR spectroscopy. I3C-NMR spectroscopy plays an im- portant part in the structure elucidation process of unknown

EXPERT SYSTEM FOR STRUCTURE ELUCIDATION FROM SPECTRA J. Chem. In$ Comput. Sci., Vol. 35, No. 6, 1995 999

of functional group with comparably narrow, and therefore characteristic shift ranges. The interpretation of the spectral data is mainly based on the comparison with suitable reference material taken from literature. For this purpose, computerized data collections have been developed using search strategies of different complexity. These systems retrieve only previously stored information in a rather simple way using matching criteria depending on the algorithm applied. This method works very well for known com- pounds. But for unknown compounds artificial intelligence, especially the expert system method, is a very useful way to elucidate the structure from spectral data. ESSESA is an expert system for this task in that the I3C-NMR spectral data is used to derive the carbon atom types and their structure environment for the complete structure generation of un- known compounds.

Chemical shift in NMR spectra is the physical property related with structural environment. The chemical shifts of two peaks in a I3C-NMR spectrum will be very near if the structural environments of the two carbon atoms in the structure are very similar. The peaks their chemical shifts are very near. The assignment of such peaks is very hard for I3C-NMR spectra. In ESSESA all possible explanations of such peaks are used as the substructure constraints in order to generate all possible candidate structures for an unknown compound. The correct substructures that exist really in the structures of an unknown compound will be obtained by analysis of all substructure constraints from IR, 'H-NMR, and I3C-NMR spectral data.

Some substructures will generate peaks that have similar chemical shifts. These peaks can be assigned to some different substructures, but not all of these different sub- structures are really included in the unknown structure. The incorrect substructure will be deleted by constraints from IR and 'H-NMR spectral analysis and mutually substructures consistent analysis in subsequent step in ESSESA. The details about this process will be presented in the next paper.

The substructures in Table 1 are types of carbon atoms with structural environment to topological distance one. If the substructures are considered with structural environment to topological more than one, there should be many more numbers of the substructures, and the chemical shift ranges should be more narrow than that in ESSESA, such as in Munk's COCOA22 over 5000 chemical shift ranges of carbon atom centered substructures have been implemented. Doing so has the advantage of more comprehensive analysis results with fewer substructures not existing in testing structure and resulting in larger substructures that accelerate candidate structures generation, but the knowledge base would be larger.

Table 4. Results of Analysis of Some I'C-NMR Spectra

Rerultl of Analvic by ESSESA (Sub&udm No) . . SlNCNla NotEviitmSrmchvc I Exkt in Slnichuo

I , 7,805,815,899,90J,1209, 3,8,698,1191,1196,1212, i038,1090, 1180, 1216,1219, 1222,l236,1238,1241,1242, 1252, 1253

1260

1239,1248

535,698. I196

10,72 25,698,1196, 143

2,698, 1196, 1204

1,3, IO, 25, 152,153, l56,292,709,710,713, 1242

16

6,8, 1 1 , 14,28,31,42,75, 86, 142,167, 170, 173, 184, 217,224,284,294,308,400, 545,586,679,724,729,739, 777,864,1090,1215,1221, 1232,1238,1253

273,525,670,1170,1197,1198, 1232, 1241, 1243, 1251, 1252

10, 14,61,72, 152, 156, 167, 204,217,224,284,499,584, 762,774,811. 1031, 1047, 1088,1238, 1241, 1251, 1253

2,292,698,815,1196, 1239

I , 13,25,228,698,720, 1196:1242, 1252

13,60,61,142, 152, 156,273, 525,668,1237,1238,1241, 1251, 1252, 1253

2, 10, 14,25, 155,698, 1196, 1242

709,712,714,754, 1238, 1241, 1252,1253

l,7,698.713, 1196, 1242

2,76, 89.99, 134,273,525, 1238, 1241, 1242, 1253

IO, 18,60,65,72,75,79,82, 86,116,134,255,273,525, 668,1170,1238,1251,1253

I , 21,698, 1196, 1252

I, 14,21,25,698, 11%, 1242

516,720,750,767,802,919, 993, 1029, 1172,1241. 1252

I , 698,1195,1196,1240, 1253,1273

7,13,698,1196 I , 10,273,525

IO, 134, 152, 156, 167, 1197, 1232

7, 14,60,63,77,802, 1205

I , 3, 14,21, 1238, 1242

I , 10, 13,698, 1196,1204, 1238

21,82,273,525 68.698,700,:1196

,M, 7, 13,61,72,709,713,724, 777, 1198

I , I O , 14,25,698,755, 11%,1 I97

I . 25,698, 1196, 1238 7, IO, 14,61,69,72,83, 131, 1242

ACKNOWLEDGMENT

We thank the editor, Dr. G. W. A. Milne, for encouraging the writing of this paper. We owe a deep debt of gratitude to Professor E. Pretsch for his advice on and criticism of the knowledge base in this paper.

21, 163,698, 1196

156,217,543 21, 163,224, 1238

compounds, because this method allows the direct determi- nation of the carbon atom types and their structure environ- ment. The manual interpretation of 13C-NMR data utilizes in most cases only multiplicity information, commonly derived from J-modulated or APT-spectra, and the detection

REFERENCES AND NOTES

(1) Silverstein, R. M.; Bassler, G. C.; Monill, T. C. Spectrometric identification of organic compounds. 5th ed.; J. Wiley: New York, 1991.

1000 1. Chem. Int Comput. Sci., Vol. 35, No. 6, 1995

(2 ) Clerc, J . T.; Pretsch, E.; Seibl, J. Structural analysis of organic compounds by combined application of spectroscopic methods. Elsevier: Amsterdam, 1981.

(3) Gray, N. A. B. Computer-assisted structure elucidation. J. Wiley: New York, 1986.

(4) Gray, N. A. B. Chem. Intell. Lab. Sys. 5:ll-32, 1986. (5) Huixiao, H.; Xinquan, X. ESSESA: An Expert System for Structure

Elucidation from Spectra. 1. Knowledge Base of Infrared Spectra and Analysis and Interpretation Program. J. Chem. In$ Comput. Sci. 1990, 30(3), 203-210.

(6) Huixiao, H.; Xinquan, X. ESSESA: An Expert System for Structure Elucidation from Spectra. 2. Novel algorithm of perception of the linear independent smallest set of smallest rings. Anal. Chim. Acta 1992,

(7) Huixiao, H.; Xinquan, X. ESSESA: An Expert System for Structure Elucidation from Spectra. 3. LNSCS for Chemical Knowledge Representation. J. Chem. In$ Comput. Sci. 1992, 32(1) , 116-120.

(8) Huixiao, H.; Xinquan, X. ESSESA: An Expert System for Structure Elucidation from Spectra. 4. The Canonical Representation of Struc- tures. J. Chem. In$ Comput. Sci. 1994, 34(4), 730-734.

(9) Huixiao, H.; Xinquan, X. ESSESA: An Expert System for Structure Elucidation from Spectra. 5. Substructure Constraints from Analysis of First-Order 'H-NMR Spectra. J. Chem. In$ Comput. Sci. 1994, 34(6). 1259-1266.

(10) Kalchhauser, H.; Robien, W. CSEARCH: A Computer Program for Identification of Organic Compounds and Fully Automated Assignment of Carbon-13 Nuclear Magnetic Resonance Spectra. J . Chem. In$ Comput. Sci. 1985, 25, 103-108.

(1 1) Chen, L.; Robien, W. MCSS: A New Algorithm for Perception of Maximal Common Substructures and Its Application to NMR Spectral Studies. 2. Applications. J. Chem. In$Comput. Sci. 1992, 32(5), 507- 510.

262, 179-191.

HUIXIAO ET AL.

(12) Robien, W. Computer-Assisted Structure Elucidation of Organic Compounds 111: Automatic Fragment Generation from I3C-NMR Spectra. Mikrochim. Acta 1986(II), 271 -279.

(13) Milne, G. W.; Heller, S. R. NIWEPA Chemical Information System. J . Chem. In5 Comput. Sci. 1980, 20(4). 204-21 1.

(14) Milne, G. W.; Zupan, J.; Heller, S. R. Spectra-Structure Relationship in Carbon-13 Nuclear Magnetic Resonance Spectroscopy. Results From a Large Data Base. Org. Mugn. Reson. 1979, 12(5), 289-296.

(15) Christie, B. D.; Munk, M. E. The Application of Two-Dimensional Nuclear Magnetic Resonance Spectroscopy in Computer-Assisted Structure Elucidation. Mikrochim. Acta 1987(I), 347-361.

(16) Egolf, D. S.; Jurs, P. C. Simulation of Carbon-13 Nuclear Magnetic Resonance Spectra of Substituted Cyclopentanes and Cyclopentanols. Anal. Chem. 1987, 56, 1586-1593.

(17) Balasubramanian, K. Computer Generation of NMR Signal and Intensity Patterns. J . Mugn. Reson. 1988, 77, 33-39.

(18) Bangov, I. P. Use of I3C Chemical ShifUCharge Density Linear Relationship for Ranking Chemical Structures. Mikrochim. Acra 1986(1I), 281-298.

(19) Pretsch, E.; Clerc, J. T.; Seibl, J.; Simon, W. Tables of Spectral Data for Structure Determination of Organic Compounds; New York Springer-Verlag, 1983.

(20) Kalinowski, H. 0.; Berger, S.; Braum, S. Carbon-13 NMR Spectros- copy; New York: John Wiley & Sons: 1988.

(21) Levy, G. C.; Lichter, R. L.; Nelson, G. L. Carbon-13 Nuclear Magnetic Resonance Spectroscopy; John Wiley & Sons: New York, 1980.

(22) Christie, B. D.; Munk, M. E. Structure Generation by Reduction: A New Strategy for Computer-Assisted Structure Elucidation. J. Chem. Inj Comput. Sci. 1988, 28, 87.

CI9500039