2
IEEE TRANSACTIONS ON COMPUTERS, DECEMBER 1971 Correspondence- Comments on "A Nonlinear Mapping for Data Structure Analysis" JOSEPH B. KRUSKAL Abstract-A recent paper by Sammon' describes a useful technique for analyzing data and helping find clusters. It consists of a nonlinear mapping of the original data into two dimensions (for visual inspection) which is superior for many purposes to use of the first two principal com- ponents. The present correspondence points out that essentially the same mapping can be achieved by a widely used, easily obtainable computer propram for multidimensional scaling called M-D-SCAL. In a recent paper Sammon' described a very interesting method for nonlinear mapping of multivariate data into two dimensions in order to permit the discovery of various data structures, particularly clusters. It is now possible to perform a procedure virtually identical to Sam- mon's by use of a modern, easily obtainable, widely used program for mnultidimensional scaling. (Incidentally, an application to medical data of very similar ideas has been published by Thompson and Woodbury [1].) Briefly, in Sammon's notation dij* is the distance between the given data points i andj, and dij is the distance between the corresponding points i andj in two dimensions. To find these points in two dimensions, he minimizes the expression E [dij* - dij]2 dij* dij* where each summation runs over i <j, i,j = 1,** , N. (Notice that the 4^, are constant, and only the dij vary as the points in two dimensions are moved around.) In the program M-D-SCAL (version 5M), which has been distributed on request to dozens of computer centers and whose earlier versions had been requested by hundreds of computer centers all over the world, one of the options permits this expression to be minimized: Sammon's expression / ' In practice, the denominator under Sammon's expression is so nearly constant over the region of interest that it hardly changes the resulting ponfiguration. To use M-D-SCAL in this way, it is necessary to use the control phrases WFUNCTION, SFORMI, REGRESSION = POLYNOMIAL =1, REGRESSION = NOCONSTANT, and to add the Fortran function subroutine FUNCTION WTRAN(ARGUMT) WTRAN = 1.O/ARGUMT END to the program deck. A complete description of this particular type of multidimensional scaling was first published in Kruskal [2], [3].2 REFERENCES [1] H. Thompson, Jr., and M. Woodbury, "Clinical data representation in multi- dimensional space," Comput. Biomed. Res., vol. 3, pp. 58-73, 1970. [2] J. Kruskal, "Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis," Psychometrika, vol. 29, pp. 1-27, Mar. 1964. [31 , "Nonmetric multidimensional scaling: A numerical method," Psycho- metrika, vol. 29, pp. 115-129, June 1964. Manuscript received March 22, 1971; revised April 2, 1971. The author is with Bell Telephone Laboratories, Inc., Murray Hill, N. J. 07974. 1 J. W. Sammon, Jr., IEEE Trans. Comput., vol. C-18, pp. 401-409, May 1969. 2The computer program together with a carefully written instruction manual may be obtained from the Coordinator of Computer Applications, Marketing Science Insti- tute, 1033 Massachusetts Ave., Cambridge, Mass. 02138 (attention: Miss W. H. Tsiang) for approximately $10. An annotated selective bibliography on multidimen- sional scaling, with references to 3 books and 16 articles (excluding application articles), may be obtained from the author. Comments on "A New Algorithm for Generating PrimeImplicants" S. R.DAS One of the major areas in switching theory research has been con- cerned with obtaining suitable algorithrns for the minimization of Boolean functions in connection with the general problem of their economic realization. A solution of the minimization problem, in gen- eral, involves consideration of two distinct phases. In the first phase all the prime implicants of the function are found, while in the second phase, from this set of all the prime implicants, a minimal subset (according to some criterion of minimality) of prime implicants is selected such that their disjunction is equivalent to the function and from which none of the prime implicants can be dropped without sacrificing equivalence. Many different algorithms exist for solving both the first and the second phase of this minimization problem. In a recent paper,' Slagle et al. describe a new algorithm for the generation of all the prime implicants of a Boolean function. As claimed by the authors, this algorithm is different from those previously given in the literature. The algorithm is efficient, does not generate the same prime implicant more than once (though the algorithm sometimes generates some non- prime implicants), and does not need large capacity of memory for implementation on a digital computer. The algorithm works equally well with either the conjunctive or the disjunctive (both canonical and noncanonical) form of the function. In the conjunctive case, the algorithm is applied once to get all the prime implicants, while in the disjunctive case, the algorithm is first applied to get all the prime implicates, and is next applied to the set of all these prime implicates to get all the prime implicants of the function. The function being specified algebraically in a sum-of-products or in a product-of-sums form, the basic approach of the authors consists in first finding the frequency ordering of the different literals appearing in the product or sum terms, respectively, and next carrying out a process of expansion of the func- tion around the different variables in one or more levels through a series of trees, called semantic trees. A semantic tree is defined by the authors as a tree to each node of which is attached either a circle or a cross, called a terminating node, or a set of clauses, called a nonterminat- ing node, and to each branch of which is attached a literal. From the final semantic tree, the prime implicates or prime implicants are finally found by collecting the sets of all the literals at the branches on the paths from the top down to all the circled nodes. Slagle et al. also showed how the same algorithm can be used to find the minimal forms of Boolean functions as well. On carefully going through the paper and studying the algorithm developed by the authors, I would like to make the following comments. It may be emphasized first that the algorithm described by the authors of the aforementioned paper for the generation of prime implicants is very straightforward and simple, and represents a worth- while contribution in the field, though the basic idea utilized in the development of the algorithm is not altogether new and has previously been used by other authors [1]-[3] in finding the prime implicants of Boolean functions in a more or less similar manner. Scheinman [1] developed a tabular technique for the generation of prime implicants of a Boolean function based on successive expansion around the vari- ables starting from the minterm-type expression. In the method of Scheinman all the prime implicants of the function are not generated Manuscript received February 16, 1971; revised May 5, 1971. This work was sup- ported in part by the National Research Council of Canada under Grants A-875 and A-I 690. The author is with the Department of Electrical Engineering, University of Ottawa, Ottawa, Ont., Canada.' 1 J. R. Slagle, C.-L. Chang, and R. C. T. Lee, IEEE Trans. Comput., vol. C-19, pp. 304-310, Apr. 1970. 1614

Comments on " ANew Algorithm for Generating Prime Implicants"

  • Upload
    sr

  • View
    218

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Comments on " ANew Algorithm for Generating Prime Implicants"

IEEE TRANSACTIONS ON COMPUTERS, DECEMBER 1971

Correspondence-

Comments on "A Nonlinear Mapping for DataStructure Analysis"

JOSEPH B. KRUSKAL

Abstract-A recent paper by Sammon' describes a useful techniquefor analyzing data and helping find clusters. It consists of a nonlinearmapping of the original data into two dimensions (for visual inspection)which is superior for many purposes to use of the first two principal com-ponents. The present correspondence points out that essentially the samemapping can be achieved by a widely used, easily obtainable computer

propram for multidimensional scaling called M-D-SCAL.

In a recent paper Sammon' described a very interesting method fornonlinear mapping of multivariate data into two dimensions in order topermit the discovery of various data structures, particularly clusters.It is now possible to perform a procedure virtually identical to Sam-mon's by use of a modern, easily obtainable, widely used program formnultidimensional scaling. (Incidentally, an application to medical dataof very similar ideas has been published by Thompson and Woodbury[1].)

Briefly, in Sammon's notation dij* is the distance between the givendata points i andj, and dij is the distance between the correspondingpoints i andj in two dimensions. To find these points in two dimensions,he minimizes the expression

E [dij* - dij]2dij* dij*

where each summation runs over i <j, i,j = 1,** , N. (Notice that the4^, are constant, and only the dij vary as the points in two dimensionsare moved around.) In the program M-D-SCAL (version 5M), which hasbeen distributed on request to dozens of computer centers and whoseearlier versions had been requested by hundreds of computer centersall over the world, one of the options permits this expression to beminimized:

Sammon's expression / '

In practice, the denominator under Sammon's expression is so nearlyconstant over the region of interest that it hardly changes the resultingponfiguration. To use M-D-SCAL in this way, it is necessary to use thecontrol phrases WFUNCTION, SFORMI, REGRESSION = POLYNOMIAL =1,

REGRESSION = NOCONSTANT, and to add the Fortran function subroutine

FUNCTION WTRAN(ARGUMT)WTRAN = 1.O/ARGUMTEND

to the program deck.A complete description of this particular type of multidimensional

scaling was first published in Kruskal [2], [3].2

REFERENCES

[1] H. Thompson, Jr., and M. Woodbury, "Clinical data representation in multi-dimensional space," Comput. Biomed. Res., vol. 3, pp. 58-73, 1970.

[2] J. Kruskal, "Multidimensional scaling by optimizing goodness of fit to a nonmetrichypothesis," Psychometrika, vol. 29, pp. 1-27, Mar. 1964.

[31 , "Nonmetric multidimensional scaling: A numerical method," Psycho-metrika, vol. 29, pp. 115-129, June 1964.

Manuscript received March 22, 1971; revised April 2, 1971.The author is with Bell Telephone Laboratories, Inc., Murray Hill, N. J. 07974.1 J. W. Sammon, Jr., IEEE Trans. Comput., vol. C-18, pp. 401-409, May 1969.2The computer program together with a carefully written instruction manual may

be obtained from the Coordinator of Computer Applications, Marketing Science Insti-tute, 1033 Massachusetts Ave., Cambridge, Mass. 02138 (attention: Miss W. H.Tsiang) for approximately $10. An annotated selective bibliography on multidimen-sional scaling, with references to 3 books and 16 articles (excluding application articles),may be obtained from the author.

Comments on "A New Algorithm for GeneratingPrimeImplicants"

S. R.DAS

One of the major areas in switching theory research has been con-cerned with obtaining suitable algorithrns for the minimization ofBoolean functions in connectionwith the general problem of theireconomic realization. A solution of the minimization problem, in gen-eral, involves consideration of two distinct phases. In the first phaseall the prime implicants of the function are found, while in the secondphase, from this set of all the prime implicants, a minimal subset(according to some criterion of minimality) of prime implicants isselected such that their disjunction is equivalent to the function andfrom which none of the prime implicants can be dropped withoutsacrificing equivalence. Many different algorithms exist for solving boththe first and the second phase of this minimization problem. In a recentpaper,' Slagle et al. describe a new algorithm for the generation of allthe prime implicants of a Boolean function. As claimed by the authors,this algorithm is different from those previously given in the literature.The algorithm is efficient, does not generate the same prime implicantmore than once (though the algorithm sometimes generates some non-prime implicants), and does not need large capacity of memory forimplementation on a digital computer. The algorithm works equallywell with either the conjunctive or the disjunctive (both canonical andnoncanonical) form of the function. In the conjunctive case, thealgorithm is applied once to get all the prime implicants, while in thedisjunctive case, the algorithm is first applied to get all the primeimplicates, and is next applied to the set of all these prime implicates toget all the prime implicants of the function. The function being specifiedalgebraically in a sum-of-products or in a product-of-sums form, thebasic approach of the authors consists in first finding the frequencyordering of the different literals appearing in the product or sum terms,respectively, and next carrying out a process of expansion of the func-tion around the different variables in one or more levels through aseries of trees, called semantic trees. A semantic tree is defined by theauthors as a tree to each node of which is attached either a circle or across, called a terminating node, or a set of clauses, called a nonterminat-ing node, and to each branch of which is attached a literal. From thefinal semantic tree, the prime implicates or prime implicants are finallyfound by collecting the sets of all the literals at the branches on thepaths from the top down to all the circled nodes. Slagle et al. alsoshowed how the same algorithm can be used to find the minimal formsof Boolean functions as well. On carefully going through the paper andstudying the algorithm developed by the authors, I would like to makethe following comments.

It may be emphasized first that the algorithm described by theauthors of the aforementioned paper for the generation of primeimplicants is very straightforward and simple, and represents a worth-while contribution in the field, though the basic idea utilized in thedevelopment of the algorithm is not altogether new and has previouslybeen used by other authors [1]-[3] in finding the prime implicants ofBoolean functions in a more or less similar manner. Scheinman [1]developed a tabular technique for the generation of prime implicantsof a Boolean function based on successive expansion around the vari-ables starting from the minterm-type expression. In the method ofScheinman all the prime implicants of the function are not generated

Manuscript received February 16, 1971; revised May 5, 1971. This work was sup-ported in part by the National Research Council of Canada under Grants A-875 andA-I 690.

The author is with the Department of Electrical Engineering, University of Ottawa,Ottawa, Ont., Canada.'

1 J. R. Slagle, C.-L. Chang, and R. C. T. Lee, IEEE Trans. Comput., vol. C-19, pp.304-310, Apr. 1970.

1614

Page 2: Comments on " ANew Algorithm for Generating Prime Implicants"

CORRESPONDENCE

sometimes, though the prime implicants necessary for finding the mini-mal solutions are always obtained. Further, this method, like that ofSlagle et al., is directly applicable for finding the minimal forms of thefunction. An algebraic approach based primarily on successive expan-sion to generate all the prime implicants of a Boolean function utilizingthe maxterm-type expression was first proposed by Nelson [2]. Thisbasic idea of Nelson was subsequently utilized by Das and Choudhury[3 ] in developing a tabular method for a more efficient generation of allthe prime implicants of a Boolean function starting from the maxterm-type expression represented in decimal mode. The semantic tree ap-proach of Slagle et al. is almost similar to the method of Das andChoudhury, except that, in the method of Das and Choudhury, theexpansion, unlike that by Slagle et al., is carried out successively aboutall the variables starting from the highest weighted one in differentlevels. The authors also extended their tabular method for generatingprime implicants of functions having many unspecified fundamentalproducts, utilizing a very novel idea suggested in a paper by Mc-Cluskey [4]. In developing their algorithm the authors of the aforesaidpaper failed to mention these existing and closely related techniques.The idea of the present communication is thus to draw the readers' at-tention to the existence of these related papers.

REFERENCES

[1] A. H. Scheinman, "A method for simplifying Boolean functions," Bell Syst. Tech.J., vol. 41, pp. 1337-1346, July 1962.

[2] R. J. Nelson, "Simplest normal truth functions," J. Symbolic Logic, vol. 20, pp.105-108, June 1955.

[3] S. R. Das and A. K. Choudhury, "Maxterm type expressions of switching functionsand their prime implicants," IEEE Trans. Electron. Comput., vol. EC-14, pp.920-923, Dec. 1965.

[4] E. J. McCluskey, Jr., "Minimal sums for Boolean functions having many unspeci-fied fundamental products," AIEE Trans. (Commun. Electron.), vol. 81, pp. 387-392,Nov. 1962.

Comments on "An Algorithm for Finding IntrinsicDimensionality of Data"

G. V. TRUNK

In the above paper,' Fukunaga and Olsen present an alternativemethod of estimating the intrinsic dimensionality of data. Their pro-posed algorithm differs from others in that it relies heavily on operatorinteraction and provides a method of specifying variable local regions.The authors state: "This variability is critical as the practical problemof determining dimensionality depends on the size and number ofsamples in the local regions." This is illustrated in their summaryTable II (B), in which, for local region sizes containing five and tensamples, the indicated dimensionalities are one and three, respectively,when using the 1 percent eigenvalue criterion; and one and two, respec-tively, when using the 10 percent criterion. While the authors may havea decision rule to select the correct answer from the summary table, Idid not see it in their paper; and without such a rule, I do not believethe problem has been solved satisfactorily.

While the size of the local region is critical for Fukunaga and Olsen,it is not nearly as important for the statistical method [1]. In order todemonstrate this, consider the three Gaussian examples presented byFukunaga and Olsen. One hundred cases of each example, an exampleconsisting of 50 20-dimensional vectors, were analyzed by the statisticalmethod; the results are shown in Table I-out of 300 cases, only twoincorrect answers were obtained. For all these cases, the local regionfor each point was defined by its five nearest neighbors. The statisticalmethod is also very fast: the running times for examples 1, 2, and 3were 2.7, 2.9, and 3.1 s, respectively, on a CDC 3800 computer.2

Manuscript received May 4, 1971.The author is with the Radar Division, Naval Research Laboratory, Washington,

D. C. 20390.1 K. Fukunaga and D. R. Olsen, IEEE Trans. Comput., vol. C-20, pp. 176-183,

Feb. 1971.2 While the running times used by Fukunaga and Olsen were 76, 120, and 140 s,

no valid comparison can yet be made since their computer was not identified.

TABLE I

ESTIMATED DIMENSIONALITIES OF THE 100 CASESCONSIDERED FOR EACH GAUSSIAN EXAMPLE

Gaussian Examples Estimated Dimensionalitiesin Fukunagaand Olsen 1 2 3 4

1 100 0 0 02 0 100 0 03 0 0 98 2

The authors state that previous investigators had not considered thenoise problem and then attack the problem by using a large number ofsamples. They estimate very accurately the eigenvalues and note thesmall difference in the eigenvalues due to the parameters and those dueto the noise. However, in 1968, a "filtering" method [2], which doesnot require a large number of samples, was proposed as a solution tothe noise problem. This method defines a pseudo signal-to-noiseratio R:

D

R(K-N) /2(1)

where K is the dimensionality of the vector space, N is the intrinsicdimensionality of the data, o- is the standard deviation of the noise oneach basis vector, and D is the average distance from a point to its(N+ 1)-nearest neighbor. It was shown that when R> 12, the noise doesnot affect the estimation of dimensionality. When R< 12, the data canbe filtered in the following manner. The original M points are randomlydivided into L subgroups, each subgroup containing M/L points.This filtering has increased the signal-to-noise ratio in each subgroupsince the average distance D has been increased. The algorithm forfinding the optimal number of subgroups L and the manner of com-bining the results of the various subgroups is presented in [2]. Whetheror not the filtering method can be used in conjunction with Fukunagaand Olsen's method is not known, since the latter requires a minimumnumber of points in the local region in order to estimate the covariancematrix.

REFERENCES

[1] G. V. Trunk, "Statistical estimation of the intrinsic dimensionality of data collec-tions," Iniform. Contr., vol. 12, pp. 508-525, May-June 1968.

[2] -, "Representation and analysis of signals: Statistical estimation of intrinsicdimensionality and parameter identification," Gen. Syst., vol. 13, pp. 49-76, 1968.

Authors' Reply3K. FUKUNAGA AND D. R. OLSEN

In the statistical method proposed by Trunk [2], [3], he calculatesthe density function p(X| N) where X= [xl . .. x,]T is the observedrandom vector and N( = 1, 2, * - * ) is the intrinsic number of parametersfor the random vector. The most likely N for the observed vectors isdetermined by applying the multihypotheses test to these density func-tions. The number of random variables are reduced by using sufficientstatistics such as the ratios of local distances and certain angles betweenlocal vectors, rather than the original xi, x, .

As far as the estimation of an intrinsic dimensionality is concerned,the statistical method is in its essential nature a more accurate but morecomplex estimate than the local eigenvalue method of [1] where onlysecond moments are considered. Although the statistical method re-

quires the selection of some control parameters, these are readily set.Several assumptions were required in the derivation of the density

' Manuscript received May 25, 1971.The authors are with the School of Electrical Engineering, Purdue University,

Lafayette, Ind. 47907.

1615