27
(45) HPP-73-3 CS-73-361 AN ALGORITHM FOR THE CONSTRUCTION OF THE GRAPHS OF ORGANIC MOLECULES by HAROLD BROWN LARRY MASINTER STAN-CS-73-361 May 1973 COMPUTER SCIENCE DEPARTMENT School of Humanities and Sciences STANFORD UNIVERSITY Stanford University Libraries Dept. of Special Collections CoH Sc SMQ Title Box IX Series \°\SC____C__sa_. Fol Foi. IX

AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

(45) HPP-73-3CS-73-361

AN ALGORITHM FOR THE CONSTRUCTION OFTHE GRAPHS OF ORGANIC MOLECULES

by

HAROLD BROWNLARRY MASINTER

STAN-CS-73-361May 1973

COMPUTER SCIENCE DEPARTMENTSchool of Humanities and Sciences

STANFORD UNIVERSITY

Stanford UniversityLibrariesDept. ofSpecial Collections

CoH Sc SMQ TitleBox IX Series \°\SC____C__sa_.Fol Foi. IX

Page 2: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

16793

An algorithm for the construction of the graphs of organic molecules

By Harold Brown and Larry Masinter

ABSTRACT :

A description and a formal proof of an efficient computer implemented

algorithm for the construction of graphs is presented. This algorithm,

which is part of a program for the automated analysis of organic com-

pounds, constructs all of the non-isomorphic, connected multi-graphs

based on a given degree sequence of nodes and which arise from a rel-

atively small "catolog" of certain canonical graphs. For the graphs

of the more common organic molecules, a catolog of most of the canonical

graphs is known, and the algorithm can produce all of the distinct

valence isomers of these organic molecules.

This work was supported in part by ARPA Contract SD-183 and NSF Grant GP

Page 3: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

An Algorithm for the Construction of the Graphs of Organic Molecules

Harold Brown and Larry Masinter

T. Introduction. The usual problem in analytical organic chemistry

in to determine the molecular structure of an unknown organic compound.

The heuristic DICNDRAL program for explaining empirical data [1,2] is a

machine implemented program which applies artificial intelligence

techniques to this problem of molecular structure determination. The

primary input to this program is the mass spectrum of the unknown

compound. Secondary inputs, if available, include the nuclear magnetic

resonance spectrum and the elementary formula of the compound. The output

of the program is a list consisting of one or more molecular graphs, each

of which represents a heuristically plausible explanation of the given

input. These graphs are rank ordered with respect to their relative

plausibility scores. Central to this DENDRAL program are algorithms which

generate all of the distinct valence isomers of a given set of atoms. It

is these algorithms that we will describe in this paper.

In graph theoretical terms, the problem that we consider is the

following: Given a degree sequence of nodes which corresponds to the

valences of the atoms of a given organic compound, algorithmically construct

a representative set of the distinct isomorphism classes of the connected,

loop-free multi-graphs based on that degree sequence. (For brevity, such

graphs will be referred to as elf-graphs.) Moreover, for pragmatic

reasons, the algorithm should:

a) Be machine implementable in a reasonably efficient manner

b) Be able to accept as additional input certain constraints

arising from the chemical heuristics of the situation.

For example, generate only those graphs which contain a certain

substructure

Page 4: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

2

c) Generate no isomorphic graphs in the intermediate stages of the

algorithm.

This last restriction is necessary because of the time and the storage

limitations when using a machine.

In section II of the paper we define the concept of a superatom, and

we briefly describe the subalgorithms used by the main algorithm. In

section 111, first a conceptual description and then a formal description

of the main algorithm is given. Section IV contains the graph theoretical

results on which the main algorithm is based, and in section V a formal

proof of the completeness and the correctness of the main algorithm is

presented.

11. Terminology and Subalgorithms. The concept of a superatom is used

throughout the paper. A superatom is a collection of atoms bonded

together into a cyclic structure which possesses at least one unassigned

valence, i.e., at least one free valence. Graph theoretically, a

superatom is a connected, loop-free multi-graph with no isthimises, i.e.,

with no edge whose deletion disconnects the graph, and with at least one

unassigned valence. If a superatom A has k free valences, then in forming

molecular structures which include A, A behaves almost like an atom of

valence k. The difference in forming structures including A and in forming

structures including an atom of valence k is the following: The k free

valences on an atom of valence k are, as edge endpoints in a graph,

indistinguishable, i.e. , barring optical isomerism, the free valences on

Page 5: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

3

the atom implicitly admit as symmetry group the group S , the full

permutation group on k objects. However, the k free valences on the

superatom A usually are distinguishable from a symmetry viewpoint, and

the free valences on A will, in general, admit only a subgroup of S .X

For example, the superatom in Figure 1 has three free valences, but the1 p _ I??group of symmetries of these free valences is only {( oo)> (t ? "j)}

Thus, associated with each superatom with k free valences is a subgroup

of S , namely, the group of symmetries on its free valences.

By the term tree we will always mean a graph whose nodes can

represent either atoms or superatoms and whose edges are all isthmuses

Also, throughout the paper, a collection of atoms given by their

valences, i.e., a degree sequence, will be represented as an integral

n-vector V = (a ,a ..,a ), where a. is the number of atoms of

valence i in the collection.

least one elf-graph based on V, i.e., a elf-graph with a. nodes of

valence i, then U(V) is precisely the minimal number of edges of any

elf-graph based V which must be deleted in order to make the graph

into a tree of atoms.

The description of the main algorithm makes use of the following

subalgorithms:

a) A tree generator subalgorithm, TG. TG accepts as input

For a degree sequence V = (a ,a ..,a ), the unsaturation of V,n

U(V), is defined as U(V) = 1/2(2 + E (i-2)a.). If there exists at1 x

Page 6: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

4

a list of valences corresponding to atoms and superatoms, each

superatom with its associated free valence group, and outputs all

non-isomorphic trees based on these atoms and superatoms. Such a

tree generator algorithm is described in [3].

b) A labeler subalgorithm, LAB. LAB accepts as input an ordered list

of m distinct symbols, a group H of permutations on these symbols,

and a list of m not necessarily distinct labels. It outputs all

non-equivalent, with respect to H, labelings of the given symbols

with the given labels subject to a set, possibly empty, of constraints

of the form that certain types of symbols must be labeled with certain

types of labels. Such a labeling algorithm is described in [4].

c) A "catolog", CAT, containing certain connected, loop-free, isthmus-

free multi-graphs, for brevity, elfif -graphs , underlying, in a sense

to be made explicit later, the graphs of organic molecules. A list

of these underlying graphs is known for most of the common organic

molecules, and consists of a relatively small number of graphs.

111. Main Algorithm. We give now a brief conceptual description of the main

algorithm. The input to the algorithm is a degree sequence, i.e., an integral

n-vector V.

1. Determine all distinct allowable partitions of V into atoms and

superatom sets with assigned free valences. These partitions

are based on the unsaturation of V.

2. For each superatom set, determine all the distinct allowable

Page 7: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

a

»

allocations of the free valences to the atoms of the set.

3. For each such free valence allocation, determine recursively the

allowable sets of atoms remaining after the deletion of the

bivalent atoms and the "pruning" of any resulting loops. This

recursion is done until: a) the remaining bivalent atoms in any

clfif-graph based on the set must all be on edges, or b) one of

two special cases is encountered.

4. For each such set of atoms, if condition (a) terminates the recursion

'look up" in the "catolog" all the clfif-graphs based on the non-

bivalent atoms in the set and for each such graph label the edges

with the bivalent atoms. If condition (b) terminates the recursion,directly "write down" the allowable graphs.

5. For each such graph, recursively label the atoms with loops and the

loops and edges with bivalent atoms.

6. For each graph so obtained, label the atoms with the free valences.

7. For each set of atoms and superatoms obtained as above, use the tree

generator to construct all the non-isomorphic elf-graphs based on

these atoms and superatoms.

The following formal description of the main algorithm is presented as

sequence of subalgorithms:

Definitions. Given V = (a1 ,an ,...,a ), define12 n

na) U(V) = (2 + Z(i-2)a.)/2.

1 X

nb) TD(V) = £ia..

1 X

Page 8: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

6

_.

B Free Valence Partitioner (FVP).

1. Determine all sets of non-negative integers

(Note that f =v )

2. For each such set, form

W = W(V , FV., {f. }) = (o,w. wn ) whereJ list

ws = £ fh(h-s)' and put W °nYGLFV(V., FV.).s<h<n i.i

C Loop-Bivalent Partitioner (LBP).

(LSPis applied recursively until either m.(W) = m (W) = 0

or m^W) < mQ (W)).

Input : W = (0,w2 ,...,wn ) where N(W) >_ 2, U(W) is a

positive integer, and 2M(W) <_TD(W).

m.(W) = max{ 0, w. + (2M(W) - TD(W))/2 }2. If mQ (W) >zn (W), go to Special Case.

3. If mQ(W) = mx(W) = 0, go to Catolog.

Input: (Vi = (o,v ..,v ), FV.) where N(V.) _^ 2,

2(U(Vi ) - Ui ) = FVi, uL >_1, and

FVi 1 max{ 1, 2M(V.) - TD(V.) }.

{ fftj t I c. = max{o, j t (FVi - TD(Vi ))/2} <t 1 j-2,

j = 2,...,n } satisfying:j-2

i) I f. = v., j = 2,...n.t=c. J J

n j-2ii) I Z tf.^ = FV..

j=2 t=c. 3*

h>c, +s— n

n1. Compute rn^W) = min{ w., Z [(j-2)/2jw. } and

j=4 ]

Page 9: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

7

1

nc) N(V) = Z a..

1 1

1. (Test) If U(V) is a non-negative integer and

2M(V) <_TD(V), continue; else STOP.

3. Do 12 for each non-trivial summand Vof V such that

P(V) = 0 and N(V) > 2.

*♦. Do 11 for I_,k <. |_N(V)/2 J5. Do 10 for each distinct k-term ordered partition

6. Do 9 for each distinct set { (V.,u.) | 1 < i <kii - -

where I V. = V and N(V.) > 2, I_Li ± k.1 1 1

k }

7.

8.

Put {V; (Vx , FV X), ..., (Vk , FVk ) }on the list9.

GLSA(V).

10 . Continue .11. Continue.

c) M(V) = max{ 1 < i < n | a. / 0 ).Note that U(V) = (2 + TD(V) - 2N(V))/2.

2. If U(V) = 0, go to TG.

of U(V), say U(V) = tu2 t ... +v. where

li^i^i ... <«,

Compute FV £ = 2(U(Vj.) - v.), 11i 1k.

If FVi < max {l, 2M(Vi) - TD(Vi) } for any

11i 1 k , go to 6 .

12 . Continue .13. If P(V) = 0, put { V; (V, 0) } on the list GLSA(V).

Page 10: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

8

__

4. Do 9 for p = m (W),. . . ,m (W)

5. Determine all sets of non-negative integers

(Note that p3Q aw ) .

6 Do 8 for each such set {p._.}

7

8

9

D Special Case (SC).

graph consisting of the ring

with N(W) bivalent nodes and its associated nodethe list

symmetry group onYGLGPH(W) and go to Loop Labeler;

else go to 2

2. If m (W) = s > 0, put the graph consisting of one

node of degree 2(s+l) with s+l loops attached andthe list

its associated edge-loop symmetry group onVGLGPH(W)

and go to Bivalent Labeler.

{ Pj t I 3<_j<_n, 0 <_t <_ ]_(J-2)/2| } satisfying:L(J-2)/2j

i) Z p. = w., 3<_ j <_ n.t=o 3

n L<3"2)/2jii) Z Z tp.. =p.

j=3 t=o Dt

Form V = Y(W, p, {Pj t>) = (0, y2»---»yn ) where

nys = I Pm,(m-,)/2 " (Note that P20 = 0)

*

m-sby 2

the listIf 2M(Y) <_ TD(Y), put V onYGL_B(W,p) .

Continue.

(Here m^W) < m

Q

(W))

1. If m.(W) = 0, put the

Page 11: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

9

__

E. Catolog (CAT).

__?_£____" V = (0,y2 ,...,yn ) where m1(Y) = m Q (Y) =0.

1. If y2 1 0, "look-up" all non-isomorphic V trivalentgraphs G with y nodes and their associated edge

symmetry groups, EGRP(G); else go to 6.

2. Do 4 for each such graph G

3. Determine all distinct iDetermine all distinct ordered partitions of y

into TD(G)/2 non-negative integral terms.

■+. For each such partition, say b: + ... + bTD(ff)/2 = y2,

use the general labeling subalgorithm to determineall distinct, with respect to EGRP(C), labelings

of the edges of G with the labels b i» "" " »bTD( Cw 2 »

replace the label b. on the labeled graphs by

bi bivalent nodes, and place the resultant graphsthe list

and their node symmetry groups ori^GLGPHtY)

based on V, and put these graphs and their associatedthe list

node symmetry groups, on GLGPH(Y).

7. Go to Loop-Bivalent Labeler.

F- Loop-Bivalent Labeler (LBL).

(LBL is applied recursively to each graph G on GLGPH(Y)

up the path of vectors from which G arose until reaching

5. Go to Loop-Bivalent Labeler.clfif-- - If y2 =0, "look-up" all non-isomorphic y* graphs

the W = W(Vi , FVi , f jt )on GLFV(Vi> FVi ) heading the path).

Page 12: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

10

j

f

i

i:

ii

i\

{

Loop Labeler (LL).

Input: A elf-graph G , its node group NGRP(G) and the

vector V = Y(W,p,{p }) on GLL(W,p) from which

G arose.

2. Determine all distinct, with respect to NGRP(G),

labelings of the nodes of G with q -t's subject to

the constraint that p._.

valence j-2t.

t's must go on nodes of

3. For each such labeling,

Bivalent Labeler (BL).

replace the label t by t loops.

Input : A connected graph H with p loops, the edge-loop

group of Hy ELGRP(tf), and w„, the number of bivalent4.

nodes in the vector W from which H arose.

1. Compute h= p + number of (non-loop) edges in H

2. Do 4 for each ordered partition of w into h

non-negative terms, say w. = b, + ... + b,2 1 h

*""!__.""" _i bh» such that at least p of the b.'s

are non-zero.

3. Determine all distinct, with respect to ELGRP(#),

labelings of the edges ahd loops of H with the

labels {bi | I<_ i <_ h } subject to the constraint

that each loop gets a positive label.

4. For each such labeling, replace the label b. by

bi bivalent nodes, 1 <_ i <_ h, and place thethe listresultant graph and its node group onYGLGPH(W).

n1. Compute n = I p t = 0,...,|.(n-2)/2j.

j=3 :T

Page 13: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

11

■!

!

■1

■I

k

Input: A graph G , its node group NGRP(G), and the vector

2. Determine all distinct, with respect to NGRP(G),

labelings of the nodes of G with f t's sübject to

the constraint that fj t~t's must go on nodes of

valence j-t.

3. For each such labeling, replace each label tby t

free valences and put

free valence symmetry

the resultant graph and itsthe list

group onVGLGPH( V . ,FV . ) .""" Input for Tree Generator.

For { V: (V^ FV^input to TG:

the list(V FV J °nVGLSA(V),

a) V- V. (The atoms)

b) A k-tuple of graphs (C^, ..., g^) and theirassociated free valence groups where G. is on the list

IV. Theoretical Results. We now present the graph-theoretical results onwhich the main algorithm is based. Even though some of these results areknown, we give proofs for the sake of clarity and completeness. Throughout

Lemma 1. If there is at least one elf-graph G based on V = (a ..,a ),then U(V) is a non-negative integer.

Proof. A spanning tree of G has N(V)-1 edges, and G has TD(v)/2 edges.

Hence U(V) = TD(V)/2 - (N(V)-l) is a non-negative integer.

G. Free Valence Labeler (FVL).

W = W(V FV ,{f }) from which G arose.n JX

1. Compute f = I f t = 0,...,n + (FV,-TD(V. ))/2.j=2 i i

GLGPH(V., FV.), I<_ i <_ k. (The superatoms).

this section, V = (^.....a^ denotes an integral n-vector

Page 14: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

12

i1

18

II

r)

n

_.

Lemma 2. If 2M(V) <: TD(V) and U(V) is a non-negative integer, then thereis at least one elf-graph based on V.

Proof. Let m = M(V) , the maximum valence of V. Given non-negative integers,b 2

, ..., b_, bm iO, let bx = L(b2,...,b_) be the least non-negative integer

such that there is at least one elf-graph based on B=(b ,b . . , b ) . Sincem m 12m

for sufficiently large b , e.g., b = E ib - 2fe b.-l), such a graph exists,2 2L(b

2 ,...,bm ) is well-defined.

Assume b1 = L(b 2 bm) >_ 2, and let Gbe a elf-graph based

on B. If there where two distinct nodes N. and N. of G both of

which had adjacent nodes of valence 1, we could delete these valence

1 nodes and add the edge (N^N.) to G obtaining a elf-graph witha lesser number of nodes of valence 1. Thus all valence 1 nodes ofG must be adjacent to the same node, say N. If there was a nodet*x adjacent to N and a node N 2 ? N adjacent to N , we could delete

the edge (N^N-) and two of the valence 1 nodes and add the edges

(NX ,N) and (N.N-) obtaining a elf-graph with a lesser number of

valence 1 nodes. (See Figure 2). Thus all nodes adjacent to N are

adjacent only to N, and since G is connected, all nodes must be

adjacent to N. (See Figure 3). Hence, N must be the unique node ofm-l

valence m, i.e., b_ = 1, and b= m - Z ib.1 i=2 x

Now assume the lemma is false, i.e., 2m <_TD(V), U(V)

is a non-negative integer and there is no elf-graph based on V = (a , ..., a.Then a± < L(a

2 ,...,am). If L(a2 a ) >2, then a, <n - Z ia.,i=2 x

and TD(V) < 2m, a contradiction. If L(a0,..., a) =0, then a, < 0.-. m x

a contradiction. If L(a2> . . . ,aj_) =1, then ax =0. Since a

Page 15: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

13

l

__

elf-graph based on (l,a2, . . . ,am) does exist, by Lemma l,c = 1 + Z

i=2(i-2)a

is even. Hence U(V) = (c+l)/2 is not integral, a contradiction.Theorem 1. There exists at least one elf-graph based on Vif and

only if 2M(V) <_ TD(V) and U(V) is a non-negative integer.

Proof. The sufficiency is just Lemma 2. For the necessity, let

m = M(V). By Lemma 1, U(V) is a non-negative integer. Assume that

and m > Z ia^ This contradicts the loop-free assumption.

Corollary 1. V determines a loop-free tree if and only if U(V) = 0

Moreover, in this case every graph based on V is a loop-free tree.

Theorem 2. There is a clfif-graph based on Vif and only if U(V)

is a positive integer and 2M(V) <_TD(V).

Proof. If there is a clfif-graph based on V, then by Theorem 1

and Corollary 1, U(V) is a positive integer and 2M(V) «.TD(V).

Conversely, by Theorem 1 there exist elf-graphs based on V.

Assume that every such graph has at least one isthmus. Let Gbe

a graph based on V with a minimal number of isthmuses, i.e., a

"least criminal." We will show that G can be modified to a elf-graph

based on V with a lesser number of isthmuses, a contradiction to

our assumption.

Let the edge (A,B) be an isthmus of G, and let X and J be the

connected components of G\(A,B). Since U(V) >0, not both X and V

are trees. Say Xis not a tree and Ais in X. Let Cbe an elementary

circuit in X, and let C be a node on Cof minimal path length from

A, where we consider here a multiple edge as a circuit. (Note that

2m > TD(V). Then m> a + 2a. + . . . + (m-l)a Hence a = 1m-l x z m-l m

Page 16: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

14

i

i\

II

__

we may have A = C). By the choice of C, there is a path

from Ato C which is edge-disjoint from C. Let Dbe a node on Cadjacent to C. (See Figure 4).

Case 1. Vis a tree

Since Gis loop-free, there must be a valence 1 node N in Y. Say

N is attached to the node Eof Y. Then, delete (CD). *aa in

r\

Then, delete (C,D), add (D,E),and move N from Eto C. (See Figure 5).

Case 2. Vis not a tree.

Let E be a elementary circuit in V, E a node of E of minimal pathlength from B and F a node of E adjacent to E. Then, delete (C,D)and (E,F), and add (D,F) and (C,E). (See Figure 6).

In both cases, the resulting graph is a elf-graph based onV and has at least one less isthmus than G. Hence our assumptionwas false, and there is at least one clfif-graph based on V.I_3mma 3. Let V and FV., i= l k, be as in Superatom PartitionerThen Z FV = 2(k - U(V-V)).

J_H_Hf 4- In a loop-free graph based on V, there are TD(V)/2 - aedges between non-univalent nodes.

1

integer, and 2M(V) < TD(V). Let = max (0, (a. + (2M(V)-TD(V))/2)>and m^V) = min{ j^ 2)n^ } . Then> <except in precisely the following two cases:

a) mx(V) = 0 and m (V) = 2

b ) m1(V) =s > 0 and m (V) =s + lMoreover, case (a) holds if and if =0> . , __, ft)holds if and <__- if a2(stl) = l_d = j , s , 5 , „„„.

i=l x

IhSS^ "" Ut V ■ «""_ ■_>" «V) >_ 2, U(v) a positive

Page 17: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

15

'

!l

i:

\

L

Then a^ = 0, j > 3. By Theorem 1, a elf-graph based on V exists.Hence a. is even. From 0 < mQ(V) = (2M(V) - 3a3 )/2, it follows

that a3 = 0 and mQ(V) = 2. The converse is immediate. Assume that

m 1(V) = s > 0. Then, a. t (2M(V) - TD(V))/2 > s > 0, and

Thus, 2s + 3 >_ M(V) > 2s. This implies that a. = 0, 3 <_ i < M(V).

Since a elf-graph based on V exists, M(V) must be even. ThusM(V) = 2(s +1). A direct computation gives that mn(V) =s+ 1.

The converse follows directly from 2M(V) < TD(V)

Lemma 5 « Let V, mQ(V), and m^V) be as in Theorem 3. Assume that

mQ (V) = mx(V) = 0 and a 2 tO. Then a 3 i 0 and is even and a. = 0

for i > 3.

i > 3. Now m (V) = 0 implies that 2M(V) - 3a <_ 0. Hence a * 0v 3 3Since a elf-graph based on V exists, a must be even.

V. Proof of Main Algorithm.

A. Every elf-graph based on V occurs. Let Gbe a elf-graph based

on V. We will show that the algorithm produces a graph isomorphicto G.

Since G exists,.by Theorem 1, U(V) is a non-negative integerand 2M(V) £TD(V). If u(V) = 0, the tree generator produces a

graph isomorphic to G. Assume U(V) >0. Let Bbc the set of

isthmuses in G t and let G^ ..., G^ be those connected components

Proof. Let m Q(V) > m^V). If m^V) = a

2,

then 2M(V) - TD(V) >0,n

a contradiction. Hence m (V) = _ [_(i-2)/2ja. . Assume m.(v) =0.i=4 x x

M(V) >2s + 3a3 +...+ (M(V)-l)aM(v)_

1 1 2s. Hence aM(v) =1.

M(V)Also, s = m^V) = z i.(i-2)/2jai implies that 2s >_ M(V) - 3

nProof. a 2 i 0 implies that m (V) = Z |_(i-2)/2ja. = 0- Thus a. = 0,

i=4 1 x

Page 18: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

16

L

of G\B having at least two nodes. Since U(V) > 0, G _,s not a tree

and kil. Also, no G^ has univalent nodes. Let V. = (0, al , ..., a 1)," -" n_ i "where a. is the number of valence j nodes of C, and let FV. be thei i

toff,

inff, lii£k. Set

and V depend only on the isomorphism

number of elements of B connected

class of g. Now P(V) = 0 and N(V) >. 2. Thus V is an allowable

summand of of Yin SAP. Also, N(Vi > i2,li i i k, and 1ik 1 [n(V)/2Set Uj_ =. (2U(Vi ) - FVi )/2. Let be the graph obtained from C

by replacing in G each of the FV. connected components of G/Bconnected to G^ by a univalent node. Then & is a elf-graph whichis not a tree. Let Vi be the n-vector associated with £ . Then

UHVi ) = \xLi and is a positive integer. Moreover, by Lemma 3,

and, after suitable reindexing, { V; (v. , FV, ),..., (V, , FV ) }the list 11 k' k 'is on.GLSA(V).

For a fixed i, I<_i<_ k , let f be the number of valence

Since G^ is a elf-graph and has isthmuses only to univalent nodes,

fj t = 0 for t > j-2. Also, by Lemma 4, "_^ has (TD(V.) - FV.)/2

edges between non-uivalent nodes. Hence, any node of valence j

in GL has at least Cj = max{ 0, j + (FV£ - TD(Vi >)/2 } univalentnodes- attached, i.e., f . = 0 for t < c . . The set

** J

i f jt I2- —n' c i 1. t li" 2 1 satisfies the conditions of FVP.J the listHence, w(Vi> FV £ , {fj>) is onYGLFVtV^ FV.) and depends only on

the isomorphism class of G. .

_ kV = Z V . Note that k, V., FV.

i=l x ' i' l

kZ UjL = U(V). Since is a elf-graph, by Theorem 1,

2M("v\) = 2m(V.) <_TD(V.) = TDCV^ + FV.. Thus FV. 12M(V.) - TD(V.),

2 nodes of G^ with t univalent nodes attached, 2f,j <n, 0 < t <j.

Page 19: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

17

Let O be the graph obtained from G. by deleting the FV.

univalent nodes and their connecting edges. Then

with C|, and G^ is a clfif-graph. Assume that m

Q

(W) <_ m (W) where

mQ and m.^ are as in LP. By Theorem 3, G. is not one of the special

case graphs, and hence deleting the bivalent nodes of G. leavesi

a connected, isthmus-free graph _". with at least two nodes and say

t^ loops. Since G. is isthmus-free, each node of ~G. must have at

least two non-loop edges attached. Hencen

xi l min{ w_, E [_(i-2)/2jwi } = rn^W). Let G*! be the graph

obtained from G. by deleting the t. loops. G. is a clfif-graph.

Hence, by Theorem 1, 2M(W) - 4t.

and w 2 + (2M(W) - TD(W))/2 < t..

number of valence j nodes of ~G.J i

Thus ti iim0(W). Let p. be the

with t loops , 4 <_ j <_ n ,0l x 1 L_J-2)/2j . Then {p.} satisfies the conditions of LP and

V is precisely the n-vector Y(W, t^ <P- t >). Since G' is a elf-graph,, v tlje list 1

2M(Y) __TD(Y) and V is onY(3LLB(W, ti ). Moreover, V depends only on

the isomorphism class of G.

Recursive application of the above process yields a sequence

Yu = W(V FV <fft}j t }) ' Yl = Y< w »' V <P jt > >. *2 ' "-.. Y^ , wheres.i

Y. is on GLL(Y. ,P- ,) for some allowable p. and either■* J J " J

«_"

(i) mQ (Y^ ) = mx(Y^) =0, or (ii) mQ (Y^) > m <V* ). Moreover, V*i i i i J

depends only on the isomorphism class of G. . . Assume (i) holds1>j-l

W = W(Vi> FVi , {fj t > ) = (0, w2 ,..., wn) is the n-vector associated

Let Ybe the n-vector associated with G". . Then M(W) <_ M(Y) + 2t..

Hence, by Theorem 1, 2M(W) - 4t± ± 2M(Y) <_ TD(Y) = TD(W) - 2w - 2t.,

of graphs GiQ = C., G^ = G^, Gi2 , ..., G , and associated n-vectorsi

Page 20: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

18

_._

for Y= V = (0,y_ ..,y ). Now by Lemma 5, V has only bivalenti

and trivalent nodes or no bivalent nodes. In the first case deleting

the bivalent nodes of G. leaves a proper elfif, trivalent graph.... iIn the second case G. is a graph with no bivalent nodes. In either

*. ISi *case, a graph G. isomorphic toff. is produced by CAT. If (ii)

XS

"

XS

"1 * 1holds, by Theorem 3, ff. is either the ring with y„ bivalent nodes

i>o "

mC

1or ff. with its bivalent nodes deleted is a single node of valence

is . b

1*-2m.(Y) with m (V) loops attached, and a graph ff. isomorphic to

mj \J

XS

"... 1 ... ,ff. is produced by BL. Hence we have that a graph ff. isomorphic

XS " xs "1... . 1to G . and the vector V are produced in the algorithm as input

i ... ito LLBL. Since C. , induces in the obvious manner a loop and/oris.-l r

1 sV i Vcbivalent node labeling of ff. arising from V , ff. is isomorphic

XS "

S

" "* J. XS " ** X1 1 *" 1 ito one of the graphs produced by LLBL with input (ff . , V ) .

XS " S " "* X1 1Recursively applying LLBL up the sequence {V.}, we obtain a graph

G\ isomorphic

toff,

with associated n-vector W. = W(V., FV. , {f._.} )i i i i* i" l ]t

in the output, i= 1, ..., k. With the input (ff ! , W.), FVL produces

a graph C| isomorphic toff.. Hence, (V-V, ffj, ..., q') is among

the inputs to TG arising from V, and with this input TG produces

a graph isomorphic to ff .B . The algorithm produces no redundances. Let ff and Hbe two graphs

produced by the algorithm with input V. We will show that ff and H

are not isomorphic, i.e., ff ¥H.Assume G = H . TG is irredundant. Hence if ff

and H arise from the input to TG, (V-V, ff , ..., ff ) and

(V-V, H , , ..., H,,) y respectively, where the ff. and H. are superatoms

with attached free valences, we must have that V = V, k = k' , and

Page 21: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

19

the H can be so indexed such that H. ~ff., 1 < j < k. Hence, H.

and G^ both must have the same free valence FV. , and ff and H must

arise from the same SAP of V, say {V; (_"-_, FV^, ..., (Tk , FV )}.Now since the labeler is irredundant, so is FVL. Hence from H =ff

i i >

it follows that the graphs 5\ and #.. defined as in (A) must be iso-

morphic. Also, ff\ and H^ must arise from the same FVP of (T., FV.),

it follows that the graphs G i and H. defined as in (A) must be

isomorphic. Similarly, since LL is irredundant, the graphs ff?and Hi must be isomorphic. Moreover, ff'. and H. must arise from the

above argument recur sively to ff. and H. y we have that the CAT

"look-up" (or SC determination) graphs, say ff! and H\, respectively,

must be isomorphic and that the sequence of loop partitions leading

from YL to the partitions determining ff| and H\ must be identical.Now ff! =B\ if and only if ff! and H\ are equal. Thus ff' =H »

i ii^ i i'

Hence we have that if ff = H t they arise from the identicalsequence of vector inputs to the various labelers and TG, and that

G'L = H[, 1 1 i 1 k. Since for a given input, the labelers produce

only non-isomorphic graphs, we must have, successively, that

Gl = HL' = Fi» Gi = Hi> and C.^-.liiik. Hence if ff ~H %

they both arise from the same input to TG. But for a given input,TG produces only non-isomorphic graphs, a contradiction.

say W. = W(Tif FV £ , {f jt>), l<i<k. Since BL is irredundant,

same LP of W., say Y. = Y(W., p., {pjt>), I£i<. k. Applying the

1 < i < k.

Page 22: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

20

_fc_

C. The portitioners produce only allowable partitions of V. We

will now show that every sequence

the algorithm yields at least one

of partitions of V produced by

elf-graph based on V.the list

Thus U(Y) is a positive integer. By construction, 2M(Y) <_TD(Y).

Since m (W) >_ m

Q

(W), it follows from Theorem 3 that N(Y) >_2.

Hence by Theorem 2, there is a clfif-graph based on Y. In the

case m (W) < m (W), Theorem 3 assures us that a clfif-graph based

on W exists. Thus LP produces only allowable output.

based on W, and FVP produces only allowable output.the list

Let {V; (V , FV^ , ..., (Vk_ FVk )} be a member ofrGLSA(V),

say Vi = (0, a

2,

.... a*), 11i £ k. Let H£ = (FV^ a

2,

.... a*)

Also, P(H £) = FVi 11, and TD(h\) = TD(Vi > + FVi >. 2M(V.) = 2M(H.).

Thus, by Theorem 2, there is a clfif-graph based on H. , 1 <_ i <_ k.

Hence, at least one setjG ..., G^c-f superatoms based on the

given partition of V exists. If V is the n-vector corresponding

to V - V and the free valences of the ff. , then a direct computation

Let V = Y(W, p, {p. >) be a member ofY'GLL(W, p). By directn

computation, U(Y) = U(W) - p. Also, p £ Z (i-2)w./2 < U(W).i=4 . X

the listLet W = W(V., FV., {f. }) be a member ofVGLVF(V., FV.). By

direct computation, U(W) = U(V.) - FV./2 = v. , a positive integer

Also, TD(W) = TD(V.) - FV.. Now M(W) = max{ j-t | 3£ j £n,

c. it £ j-2, f . *0 }, and j-t £ (TD(V.) - FV. )/2. Hence,J J XX

2M(W) £ TD(V.) - FV. = TD(W). By Theorem 2 there is a clfif-graph

Now 2U(Hi ) = 2U(V.) - FV. = 2u. , and U(H.) is a positive integer

Page 23: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

21

i

Ii.

f

|Iii

i

using Lemma 3 gives that U(V) = 0. By Corollary 1, a loop-free

tree based on V - V and the Ci exists. Thus the sequence of partions

of V does yield at least one elf-graph based on V.

VI. Acknowledgements. The DENDRAL concept and its applications toorganic chemistry were originally conceived by Professor Joshua

Lederberg. The authors thank Professor Lederberg for his guidence

and his critical suggestions which have made this work possible.

Computer Science Department

Stanford University

Stanford, California 94305

Page 24: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

22

1. E. A. Feigenbaum, B. G. Buchanan and J. Lederberg, On generality

and problem solving: A case study using the DENDRAL program, in:Machine Intelligence 6 (EdinbargWUniversity Press , Edinberg^ 1971)

165-190.

2. B.G. Buchanan, G. L. Sutherland and E. A. Feigenbaum, A program

for generating explanatory hypotheses in organic chemistry, in:Machine Intelligence 4 (EdinborgJ-iUniversity Press, Edinbttrgfc1969) 209-254.

3. J. LederbOrg, et. al., A tree generation algorithm, To appear.

4. H. Brown, L. Hjelmeland and L. Masinter, Constructive graph

labeling using double cosets, To appear, Discrete Math.

REFERENCES

Page 25: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

FOOTNOTES.

1. More formally, if Xis an m-element set and His a group ofpermutations on X, then a labeling of X with n labels a ,

ni+n2+ " " ,+nk = m '

1n2 labels a

2,

"" " , nR labels ak ,yls a mapping ifi: X+{a ,a ,such that |*" (a^l =nr Two such mappings and * areequivalent with respect to H if there is an n . H such that

,ak }

<Ji 2n

Page 26: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

2

1

Figure 1

-NYf

i. i

Figure 2.

Page 27: AN ALGORITHM FOR THE CONSTRUCTIONOF GRAPHSOF …md899rr3691/md899rr3691.pdf · An Algorithmforthe Construction ofthe Graphs of Organic Molecules Harold Brown and Larry Masinter T

Figure 6.

1