Comput. Lang. Vol. 14, No. 1, pp. t 1-23, 1989 0096-0551/89 $3.00 + 0.00 Printed in Great Britain. All rights reserved Copyright 1989 Pergamon Press plc
CREAT ING EFF IC IENT PROGRAMS BY EXCHANGING
DATA FOR PROCEDURESt
JOHN FRANCO and DANIEL P. FRIEDMAN Department of Computer Science, Indiana University, Bloomington, IN 47405, U.S.A.
(Received 6 July 1988; revision received 3 October 1988)
Abstract--We present a programming style which fosters the rapid development of programs with low asymptotic complexity. The crucial idea behind the new programming style is that mutually recursive procedures, each assigned the task of returning the solution of a subproblem of the given problem, are constructed directly from the problem instance. That is, the input data is compiled into procedures.
Scheme Lisp Programming languages Complexity Efficiency
Traditionally, software developers have seen data as an entity to be entirely processed by a program. However, data can be viewed as instructions for building at least some of the program text. For example, the string matching algorithm in  syntactically transforms input strings to control structures that become part of the computational process. As is the case in string matching, programs where data is compiled can be both efficient and concise.
In this paper we present the beginnings of a programming methodology of wide applicability in which some or all input data is syntactically transformed to compiled code. The transformations may occur as late as run-time. They are particularly suited to problems that can be solved by subproblem decomposition. Each subproblem is regarded as a three-state object which communi- cates with other objects. Generally, but not exclusively, the communications are requested for the solution or partial solution of more primitive subproblems. The states are: (1) waiting for the first request; (2) processing that request; and (3) having found the solution to the subproblem. The objects are modeled as procedures and each state change results in a change in procedure definition. The programmer specifies only a template for procedure definitions and communications links. That template is used to transform at least some of the data to an interacting network of procedures, possibly changing with time due to the generation of requests for subproblems that are implied by the data.
Programs developed under the proposed methodology can be both concise and efficient in time and space. Conciseness is due to the reduction of the programming effort to templates. Efficiency of time is due to the three-state model, which can prevent the same subproblem from being solved twice, and to the ability of the problems to do partial computation on compiled data arguments in the presence of non-compiled data arguments. Efficiency of space is due to the property that no space need be used on procedures representing subproblems for which there are no requests.
Unfortunately, there is no existing programming language that can accommodate the methodology as we envision it. However, the macro-expansion facility of Scheme known as extend-syntax (see [2-5]) allows partial implementation. This facility is inefficient for our purposes since it is not designed for our application. Hence, for the rest of this paper, we make use of extend-syntax only for illustrative purposes and ignore the computation time of the extend-syntax facility. Using extend-syntax we have developed concise and efficient programs for numerous graph problems (e.g. the Shortest Path problem), Object-Simulations, the Topo- logical Sort problem, and others. Examples of such programs are presented later.
tThis report is based on work supported by Air Force Office of Scientific Research under Grant No. AFOSR-84-0372 and by NSF Grants DCR 85-01277 and DCR 854)3279.
12 JOHN FRANCO and DANIEL P. FRIEDMAN
Traditionally, time required for compilation has been regarded as unimportant since only one compile is necessary using conventional programming techniques. This time is significant using our approach, however, if re-compilation is necessary for each change in data (as is the case for many graph theory problems). It may not be significant if there is both a compiled data argument and a non-compiled data argument. For example, consider the problem of simulating a finite state acceptor. Such simulations require a description of the machine and one or more inputs to the machine as data. Only the description data is compiled. Hence, re-compiles are only necessary when machine descriptions change, and compilation time can be amortized over all simulations of the same machine. Furthermore, because most of the computational effort is accounted for during compilation, the run-time of each simulation is extremely fast. This property, which holds for many problems, is similar to the property than motivates research on partial computation (see e.g. [6-10]).
At present, the methodology is ad hoc and underdeveloped. We have found little resistance to applicability in some problem domains and considerable resistance in others. The potential for its success is good, however, for the following reasons:
The treatment of elements of computation as objects embedded in a communications network is well understood.
It can be extended to admit concise, efficient code for a wide variety of problems. Partial computation is, to some extent, built into the methodology. The three-state mechanism fosters efficient solutions to some problems. The code between the assignment statements that change procedure definitions can
be functional. The "piecewise" functional nature of procedures may lead to improved verification.
In summary, it may be possible for concise and efficient programs to be constructed by means of the techniques that are illustrated in this paper. This goal is similar to that of researchers studying partial computation and functional transformations. Before going further we touch on the work of these groups.
Functional programming has been developed with the aim of writing easily verifiable programs (we call this comprehensibility). Comprehensibility is achieved, for the most part, by attenuating considerations of state and control. The problem with functional techniques is that programs so written, in conventional style and for conventional hardware, usually are neither as efficient as possible nor understood in terms of their time and/or space complexity.
One possible remedy is the concept of transforming a functional program to an equivalent, possibly non-functional program of improved complexity [11-15], The transformation itself can be interactive. The idea is to have a programmer produce a comprehensible program and let the transformation take care of the complexity issues. The transformation need not be efficient since it is performed only once, just prior to compilation.
Although remarkable successes have been achieved, there are several stumbling blocks to the transformation approach as now practiced. First, one or more "eureka" steps is required in order to achieve success. Second, although the problem of producing syntactic transformation rules such as "append ~ rplacd" seems tractable, the problem of producing transformation rules which are semantically generated is a hard one to solve correctly since an enormous amount of rule inter- action must be taken into account. For example, consider the problem of producing an efficient Union-Find algorithm. It is easy to find an optimal algorithm for doing Union and an optimal algorithm for doing Find, but neither of the two is optimal for solving the Union-Find problem . Third, the transformation approach cannot be complete since the problem of determining complexity is undecidable .
Partial computation of a computer program is by definition  "specializing a general program based upon its operating environment into a more efficient program." The objective is to reduce run-time complexity by re-using the results of previous partial computations. The idea is to regard programs as data and perform an analysis which allows computation to proceed as much as possible with no or partial actual data. Analyzed programs may be manipulated in order to continue the partial computation as much as possible.
Creating etficient programs by exchanging data for procedures 13
Our approach is different from both the partial computation and functional transformation approaches. Partial computation regards programs as data to be manipulated. Data compilation, however, regards data as programs. Doing so results in some form of partial computation in many cases. Functional transformations convert functional programs to efficient, possibly non-functional programs. The transformations require one or more "eureka" steps. Data compilation also requires a "eureka" step to develop a program that replicates in precisely the right way. It relies on the creative effort of the programmer to design a solution of low complexity. We believe our tools, under data compilation, will make this creative effort relatively easy because the metaphors behind their use will change little from one problem to the next.
The remainder of this paper contains three sections. In the next section we show how to construct code from input data at compile-time using the macro-expansion facility, known as extend-syntax . We use the example of constructing a finite state acceptor to illustrate the ability of our techniques to yield comprehensible programs. This example also has both a data component that does get compiled and one that does not. We use the example of Topological Sort to illustrate the three-state paradigm. We use the example of finding cutpoints in a graph to illustrate a complicated use of our ideas. In Section 3 we consider the solution to sample dynamic networks which arise in Dynamic Programming. In Section 4 we present conclusions.
2. CONSTRUCTING CODE FROM INPUT DATA AT COMPILE-T IME
Conventional compilers lack the sophistication needed to optimize complexity. In Lisp-systems, on the other hand, macro-sublanguages provide a facility for extending the compiled language. With a macro-facility we can design algorithms that translate input data directly into efficient code. Such a solution is particularly preferable when macro definitions are easy to design as with extend-syntax [3-5].
2.1 Syntactic extension
We briefly sketch the use of this macro expansion facility in constructing what we call macro-procedures. For simplicity, the examples considered in this subsection do not make use of the three-state paradigm.
Consider the problem of testing set membership (without recursion) in a small finite set of symbols. The following is a macro-procedure for solving this problem. This procedure, which uses a curried eq? called c-eq? defined by (lambda (x) (lambda (y) (eq? x y))), is intended only for illustrative purposes:
(extend-syntax (member-in-set?) [(member-in-set? n...)
(let ([n (c-eq? 'n)].. .) (lambda (y) (or (n y). . .)))])
The body of this program, given the input (member-in-set? a b c), expands to
(let ([a (c-eq? 'a)] [b (c-eq? 'b)] [c (c-eq? 'c)])
(lambda (y) (or (a y) (b y) (c y)))),.
Consider the characteristic function cf = (member-in-set? a b c). The call (cf 'a) returns true. This happens as follows: y gets the value 'a and or is invoked. At some point procedure a is invoked with argument 'a. But this procedure is simply a test with symbol 'a and the argument y which has value 'a. Hence the test returns true as does the or. The call (cf 'd) returns false because no procedure matches with 'd.
As a demonstration of the level of comprehensibility that is possible to compiling data, consider the problem of writing a finite state acceptor. Suppose the definition of a specific finite state acceptor is an initial state followed by a list of triples (S, A, P) where S is a state identifier; A is true if the state S is an accepting state and false otherwise; and P is a list of pairs (d, S') which defines the transitions to states S' given input symbol d. A macro-procedure for this
JOHN FRANCO and DANIEL P. FRIEDMAN
(extend-syntax (fsa) [ ( fsa in i t - s ta te ([Sa f ina l [a Sb] . . . ] . . . ) )
(letrec ([Sa (lambda (I)
(case (car i) [a (Sb (cdr i))] ... [$ final] [else #f]))] . . )
init-stat e) ] )
Fig. l(a). A Finite State Accepter.
(define test (lambda ()
(let ( [machine (fsa
qO ([qO #t [a qO] [b ql]]
[ql #t [b ql] [a q2]] [q2 #f [a q2] [b q2]]))])
(machine '(a a a b b c $)))))
Fig. l(b). An example using the program of Fig. l(a).
([qO (lambda (i) (case (car i) [a (qO (cdr i))] [b (ql (cdr i))] [$ #t] [else #f]))]
[ql (lambda (i) (case (car i) [b (ql (cdr I))] [a (q2 (cdr i))] [$ #t] [else #f]))]
[q2 (lambda (i) (case (car I) [a (q2 (cdr i))] [b (q2 (cdr i))] [$ #f] [else #f]))])
Fig. l(c). The expansion of machine specified in Fig. l(b).
problem is given in Fig. l(a). This procedure constructs a finite state machine simulator, using extend-syntax, from a machine specification in the form given above. It is essentially a one line program which expands to an n line program where n is the number of states. Each line is a case statement that leads to three outcomes: a call to the prodedure corresponding to the next state depending on the next input symbol; termination of computation when the special end-of-string symbol $ is encountered; termination when an illegal symbol is encountered. Once constructed, the simulator takes a string representing the machine's input as argument It's output is true when the simulated machine accepts its input.
The program of Fig. 1 (a) simulates a finite state machine optimally modulo the compilation time. It is purely functional. If called as in the example of Fig. l(b), as many simulations as desired may be run on one compilation. Hence, in this case compilation time is not an important time complexity factor. The reader is encouraged to rewrite Isa to take arguments of quoted data, using a purely functional paradigm.
The code for fsa may be used to simulate any finite state accepter. The expansion for the machine simulated by the example of Fig. l(b) is given in Fig. l(c). The example illustrates that, with a macro-procedure, the input symbols become program identifiers which are bound to producers as in Katayama . These procedures may be regarded as objects which communi- cate with each other. We used extend-syntax to construct procedure pointers and "wire" the communication links at compile-time.
Our next macro-procedure example, a solution to Topological Sort, is one of the most elementary macro-procedures that uses the three-state paradigm.
2.2 Macro-procedure solution to Topological Sort
Topological Sort is the problem of finding a total order of elements that are partially ordered. This problem is fundamental and is described at length in [19...