Shortest path forest with topological ordering: An algorithm description in SDL

Tmspr, Rrs.-B Vol. 148. pp. 343.-347 8 Pergamon Press Ltd., IYVO. Printed m Great Britain

SHORTEST PATH FOREST WITH TOPOLOGICAL ORDERING: AN ALGORITHM DESCRIPTION IN SDL

ROBERT DIAL

Urban Mass Transportation Administration, Washington, D.C., U.S.A.

FRED GLOVER

University of Colorado, Boulder, CO., U.S.A.

DAVID KARNEY and DARWIN KLINGMAN

University of Texas, Austin, Texas, U.S.A.

(Recriwd 15 Novernher 1977: in reai.wd,forru 2 Auyust 1979)

Abstract-This short note presents a formal description of a fast and robust shortest path algorithm. Modeled on an algorithm of Pape (1974), it requires less memory store than most algorithms and at the same time permits arc lengths to range between - y_ and + z. It is described in a machine processable language called SDL. The note opens with a brief introduction to SDL syntax.

1. INTRODUCTION

Presented below is a formal description of a shortest path algorithm, invaluable in transportation network modeling. Its content represents the state of the art; i.e. it is one of the fastest known algorithms for finding shortest paths in a large sparse network that charac- terizes urban transportation systems (Dial et al., 1977). Its form needs an apology.

It is the opinion of the first author that two problems seriously impede progress in our business. One is the lack of a common language for rigorously describing the way an algorithm works. The other is bad software. The language used below represents an attempt at UMTA to cope with both problems.

SDLt is a programming language. Its dual purpose is to provide a context in which software designs can be scrutinized by software engineers, somewhat the way an engineer scrutinizes a bridge design, and to provide a syntax which permits the fast development of error-free computer code. At UMTA, we have been quite successful in using SDL as a medium for both human-to-human and human-to-machine communication. It has obviated the need for incomprehensibly meandering flowcharts and rambling prose, which are incapable of depicting anything but quite trivial algorithms and programs. It has eliminated lengthy, trite computer code which only the computer (and occasionally the programmer) can comprehend.

2. SDL SYNTAX

To professionals and students of models and algorithms, SDL’s syntax is nearly self- explanatory. There are, however, certain features employed below which need explanation. SDL is a “top-down” language. Its format lists input and output data first, followed by a summary comment, followed by definitions of data referenced only by the procedure (or those defined within it) and ended with a list of “executable statements”. The scope of a conditional or iterative statement is indicated by indentation. That is, all subsequent indented instructions down to the next one with the same indention are executed. State- ments may refer to other (sub) procedures, whose definitions appear later in the same format described above. The definition of a procedure begins at its “procedure statement heading” and ends at its “end statement”. Any other procedure defined in between has

tDraft materials describing SDL may be obtained from the first author

343

344 ROWKT DIAL ct trl.

access to the embracing procedure’s data definitions. A comment can end a line, provided it is introduced with two periods (. .).

As mentioned above, there are two sets of data definitions. The first set is the Input and Output arguments (“call-by-value subroutine parameters”), which are listed verti- cally with one line per data element (like COBOL, rather than horizontally like Fortran). Input arguments are “read only”, while output arguments may have their values altered by the procedures. The second set is the local data, which are listed in a similar format. This vertical data definition format allows ull the attributes of each data element to appear on the same line as its first reference. These attributes include subscript range, value range, and initial value. The syntax of a data definition is as follows (note that data names can include blanks):

data definition : : = <name > <subscripts > <range > <initial value > name :: = (variable name, which may include blanks) subscripts :: = (< index range list > )? 1 , index range list : : = <range> 1 <index range list >, -c range > range : : = < lower bound > to < upper bound > initial value : : = ,/< logical-arithmetic expression > / 1 < null >

Esu~irple I

i, I to 100, /3/

defines the variable i, which can take integer values between 1 and 100, and is set to 3 when procedure is entered. [Note that a variable’s “type” (integer) is implied by the type of its range’s bounds.]

Exutnplr 2

HI, I to I1

defines the variable m which is an integer range between 1 and II (n being an integer defined elsewhere). The variable is not “initialized.”

dist( 1 to n),fmax to maxdist, /for i: = 1 to II: maxdist/

defines a one-dimensional floating point array, whose subscript ranges from 1 to n (an integer defined elsewhere). The value of any element of dist, e.g. dist (7), can vary from -fmax to maxdist. The variable fmax is a “machine constant,” which always is the biggest (“single precision”) value the particular computer can represent in its floating point format. The upper bound maxdist, is a floating point variable defined elsewhere. Every element of dist is set to maxdist upon entry into the procedure. The above examples along with a rudimentary knowledge of any programming language should help the reader to follow the algorithm described below.

3. FORMAL ALGORITHM DESCRIPTION

Procrdurr Pape . A shortest path forest builder.

Input . . Data input to Pape by calling routine: maxdist, -J‘max to fmax Maximum path length allowed. )I, 1 to imax Highest node number in network. arcs, 1 to imax . Number of arcs in network. index(1 to II + I), 1 to arcs + 1 Pointer to exit arcs. .j(l to arcs), 1 to II . j-node of exit arc. I/( I to arcs), -f’max tof’max . Length of exit arc. 117, 1 to II Number of root nodes. root node(l to m), 1 to II . Root nodes themselves.

Shortest path forest with topological ordering 345

output tree(O to n), - 1 to n,

/for i: = 0 to n: n - l/ dist (1 to n), -fmax to maxdist,

/for i: = 1 to n : maxdist/ next (0 to n), 0 to n, /for i : = 0 to n : O/ last (0 to n), 0 to n,/for i: = 0 to n: O/

. . Data output by Pape to calling routine:

. Shortest path predecessor node

. .and length.

. Forward Thread

. . Reverse Thread

Summary Introduction. Input a subset (called “roots”) of the nodes (number 1 to n) spanned by a

directed graph comprising arcs of known length, Pape connects each node in the network to its closest root node via the shortest path. The result is a disjoint set of shortest path trees, one for each root node, which are called collectively a “shortest path forest”. Note that if there is only one root node, the result is a shortest path tree. (Indeed, building shortest path trees is Pape’s major use.) Pupe’s output includes a description of each path in the forest along with its length. It can also optionally provide two lists which sequence the nodes in forward and reverse topological order. As used here, a “forward topological ordering” of the nodes is one in which a node appears later than any node between it and the shortest path from its root node. Conversely. in a “reverse topological ordering”, a node appears earlier than any node on the shortest path between it and its root node.

Using a “label correcting” technique, Pape builds a shortest path forest rooted at the nodes given a root node(l), . . . , root node(roots). Its inputs and outputs are identical to those of CACM Algorithm 360 (Dial, 1969) a very widely used shortest path algorithm. For large, sparse networks it is quite superior to Algorithm 360. It (often) goes faster, (always) uses less storage, can use non-integer numbers for arc lengths, and has no limit on path length. In addition, the network’s arc lengths may be negative, provided the network contains no negative cycles. The algorithm’s inputs, mechanics, and efficiency relative to Algorithm 360 and others are discussed at length in Reference 3. Its inventor is Pape (1974).

Input. Inputs to Pape include a network description, root nodes, and an upper bound on path length. The network description resides in one nodes-length array, index, and two arcs-length arrays, j and d. The array j contains the terminal node numbers of all arcs in the network stored in ascending sequence with respect to their initial node number. The second array d has elements which correspond on a one to one basis with those of j and contains arc lengths from which path lengths are to be calculated and minimized. (Pape cannot deal directly with “turn penalties”.) Index(i) points to the first element ofj (and d) representing an arc exiting from node i. Index’s subscript ranges from 1 to n + 1, where PI is an input parameter giving the highest node number in the network. Index(n + 1) contains the value arcs + 1, where arcs is an input parameter giving the number of arcs in the network and the length of the arrays j and d. The quantity and numbers of the root nodes are input in the scalar m and the vector “root nodes”. The scalar maxdist is an upper bound on path length, and Pape will consider only paths which are strictly shorter than maxdist.

Output. The algorithm’s output describes all shortest paths in two arrays, tree and dist. Tree 0’) is the unique arc terminating at j in the shortest path forest. Thus, the shortest path to j from its nearest root node is given in reverse recursive order by the nodes j, tree(j), tree(tree (j)), . , etc., until a root node is encountered. Dist (j) contains the length of the shortest path between j and its closest root node.

The forest’s topological orderings are returned in the arrays next and last. Next(O) contains the first node in the forward sequence, next(next(0)) the second, etc., until a node number of zero is encountered. The reverse sequence is similarly stored, beginning with last(O). In the event that a node j cannot be reached from a root node on a path less than maxdist in length, dist (j) = maxdist, tree(j) = - 1, and last(j) = next(j) = ,j.

Note that the differences between the CACM Algorithm 360’s outputs and those of the present algorithm are that the “tree”, “ next”, and “last” arrays all have a zeroeth element.

346 ROBERT DIAL PY al.

Last(O) begins the reverse thread, and next(O) begins the preorder. If node is a root node then tree(node) = 0. If node is an unreached node. tree(node) = - 1. Always tree(O) = - 1, and is introduced for the convenience of this procedure. In addition, while both algorithms output forward and reverse topological sequences, they are not the same sequence of nodes. The CACM algorithm’s topological sequence is actually the order in which the nodes were scanned.

Mechanics. Described formally below are a main procedure and three subprocedures. Initialize, Scan Node i, and Order Nodes Topologically. The procedure Initialize merely places the given root nodes into the nodes to be scanned list. The main procedure then draws nodes to be scanned from a forward list of reached nodes. These nodes are linked together by the “next” array. When next(i) = 0, the node i has not been reached and is therefore not in the list. Otherwise it has been reached. If next(i) > 0 i has already been scanned and is not in the list. If next(i) > 0 it has been reached but needs to be scanned (or rescanned). In this later case next (i) contains the number of the next node to be scanned, unless i is at the end of the list, in which case next(i) = imax.

The procedure Scan Node i examines each link (i ,j) directly exiting from node i. Whenever a shorter path to a node j is found, one of three actions ensue. If j is reached for the first time (i.e., next(j) = 0), it is put at the bottom of the list. Ifj is already in the list of reached nodes to be scanned (i.e., next(j) > 0) it is left at its current position in the list. If j has already been scanned (i.e., next(j) < 0) it is put at the head of the list.

The procedure Order Nodes Topologically need only be invoked whenever such a sequence is required. Using the array “tree” giving each node’s predecessor node in the shortest path forest, this procedure creates a forward and backward topological ordering of the nodes using the arrays “next” and “last.” These latter lists give the “thread” and “reverse thread” functions of Glover et al. (1974) and they are a forward and backwards representation of Knuth’s “preorder” sequence (Knuth, 1973).

Main procedure

Definitions i, 1 to imax,/imax/ arc, 1 to arcs node, 0 to n root, 1 to m bottom, 0 to imax,/root node(l)/

Initialize while i #imax

scan node i next (i): = -next(i) i: = abs(next(i))

Order nodes topologically

. . Local data used by Pape: . A node number (index).

. . Index into arc array.

. . Node number (index).

’ ’ Root node index.

. . Bottom of reached node list.

. . Main process starts here. . . Keep scanning until the list of . nodes to be scanned is empty. . . Mark i scanned and get next . node to (re) scan. . . All done (end of main procedure).

Procedure initialize for each root

next(root node (root)): = i i: = root node (root) tree(i): = dist(i): = 0

End initialize

Subprocedure descriptions begin here. . . Chain root nodes into list of . nodes needing to be scanned.

. . Set root nodes’ labels. End of Initialize subprocedure.

Shortest path forest with topological ordering 347

Procedure Stun node i for arc: = index(i) to index(i + 1) - 1

if d&(i) + d(arc) < dist(j(arc)) tree(j(arc)): = i dist(j(arc)): = dist(i) + d(arc) if next(j(arc)) < 0

next(j(arc)): = next(i) next(i): = j(arc)

if next(j(arc)) = imax bottom: = j(arc)

else if next( j(arc)) = 0 next(bottom): = j(arc) bottom: = j(arc)

next(bottom): = imax End Scan Node i

Procedure order nodes topologically for each node

last(node: = next(node): = node

for each node if trace(node) # - 1

last (next/tree(node))): = last(node) next(last(node)): = next(tree(node)) next(tree(node)): = node last(node): = tree(node)

End Order Nodes Topologically End Pape

. Another subprocedure definition. . . Look at each arc exiting i. . . Shorter path to j(arc)? . . Yes: Update tree and . . update path length labels. . . Node already scanned? . Yes: Put it at head of list of . . nodes to be scanned, and

if bottom of list moves, update its pointer. Otherwise if j already reached,

. put it at bottom of list of

. reached nodes to be scanned. . . Mark it as the bottom.

. . End of the Scan Node i subprocedure.

. . A third subprocedure definition. . . Initialize by having each . . node point to itself. . . Look at every node exactly once. . . Is node reached? . Chain tree node back

. . around this node.

. . Hook this node to its . . tree node.

. . End of subprocedure definition.

. . End of formal description.

Acknowledgements-M. R. Wigan kindly provided the authors with many useful ideas regarding this paper in particular and algorithm publication in general. It is always a joy to hear Mark’s mind at work. Also, Robert Cody provided us with thoughtful and valuable criticism. We are very grateful to both these gentlemen,

REFERENCES

Dial R. (1969) Algorithm 360: Shortest path forest with topological ordering. Comnrurtications of rhr ACM 12. 632-633.

Dial R., Glover F.. Karney D. and Khngman D. (1977) A computational analysis of alternative algorithms and labeling techniques for finding shortest path trees. Research Reporr CCS 291, Center for Cybernetic Studies, University of Texas at Austin. (To appear in Networks.)

Glover F., Klingman D. and Stutz J. (1974) The augmented threaded index method for network optimization. INFOR 12, 293-298.

Knuth D. (1973) The Art of Compufer Programming, Vol. I: Fundamenra[ Algorifhms. Addison-Wesley, Reading, Mass.

Pape U. (1974) Implementation and efficiency of Moore algorithms for the shortest route problem. Mathrmati- cal Programming 7, 2 12-222.

Documents

Shortest path forest with topological ordering: An algorithm description in SDL