Parallel running of a modular simulation scheme

Parallel running of a modular simulation

Brian Day, Daniel Richardson, and Nigel Cole

Tower View T1.2, University of Bristol, Bristol, UK

scheme

The idea of modularity in simulation is discussed in relation to parallel processing. A mathematical model for parallel simulation is presented. In this model a possibly nonlinear set of consistency conditions is satisfied in between time steps. Two examples are given which are fundamental to HVAC simulation: (I) heat flow in networks of heat paths and thermal capacitors, and (2) inertialess flow of fluid in networks of pipes and other components. These examples present in abstract form many of the difficulties encountered in thermal simulation of buildings with heating and ventilating plants. The question of convergence of networks of parallel processes is discussed, convergence is proved for the two examples, and a general theorem which implies unconditional convergence is developed. This theorem applies in many cases in which the processes represent physical components, and each process resets some variables to attain local steady state with its neighbors, so the sum of the local distances from steady state is at least not increased by any process.

Keywords: HVAC, simulation, parallel processing, modularity, convergence

Introduction

The modular hypothesis is that the global behavior of a complex physical system, such as a building together with plant and control systems, can be understood and simulated by considering only local interactions of the parts of the system, which are called components. Ac- cording to this point of view, it should only be necessary to understand and validate a model of a component in isolation, since it should be true that a network formed by connecting valid component models will validly represent the whole system. Of course, com- patability between component models may be a problem, but the hope is that the modular approach to simulation will make validation easier and will aiso provide a framework for collaboration and exchange of information between different groups working in the field of simulation.

Another quite important aspect of the modular idea is that a general-purpose modular system ought to con- sist of a main program, which is independent of application, together with a collection of processes or subroutines which are models of components. All the application knowledge should be contained in the component subroutines.

It is generally agreed that it is desirable to be able to establish some set of consistency conditions over the whole model in between time steps. Examples of consistency conditions are given in the fourth section. These conditions may express conservation of certain

Address reprint requests to Dr. Richardson at the Department of Mathematics, University of Bath, Claverton Down, Bath BA2 7AY, UK.

Received February 1988; revised August 1988.

0 1989 Butterworth Publishers

physical quantities, such as mass or energy, within each component. Or the conditions might define an average value of some derivatives over the next time step, as in an implicit finite difference technique. The consistency conditions are not linear in many cases. They may not even be algebraic. We believe that the main computational problem of simulation is how to solve these consistency conditions.

The modularity idea implies that a general-purpose main program must solve the consistency conditions without being guided by any physical intuition about the application. There seem to be three possible ap- proaches to a solution of this difficulty.

1. Physics bused. Each component process tries to establish local consistency, and the main program simply iterates until an acceptable level of global consistency is reached. Component processes may pass values of derivatives of consistency functions to one another, as well as values of local variables. In general this method may depend upon quite complex message passing between component processes. The mathematics and all the physics are kept together in the component process, and the main program is just man- agerial. An advantage of this approach is that because the physics and the mathematics are together, it may be possible to give algorithms which are uncondition- ally convergent.

2. Numerical based. In this case, the main program uses a standard general-purpose nonlinear equation solver, such as Powell’s method. The component subroutines pass to the main program numerical information such as current estimated values of partial derivatives. The method is iterated until acceptable global consistency is obtained. This is called a numerical method because the central equation solver is never allowed to see the set of equations to be satisfied,

Appl. Math. Modelling, 1989, Vol. 13, April 225

Parallel running of a modular simulation scheme: Day et al.

but is only given current values of assorted functions and partial derivatives. This technique is used by HVACSIM + . It has the great advantage of using standard techniques. A great deal of work is being done to use parallel processing for standard numerical algorithms’-5 and numerical simulation will automati- cally benefit from the results. The numerical method has convergence difficulties, however, in complex problems. This is because of the well-known intract- able numerical instability of sets of nonlinear equations. It is not possible to develop a general-purpose always convergent numerical equation solver for ar- bitrary sets of algebraic equations. Some of the reasons for the numerical instabilities can be seen in the second example of the fourth section.

3. Computer algebra. In this approach the component subroutines would pass, in symbolic form, algebraic and possibly nonlinear consistency conditions to one another and to the main program. The main program would solve the whole set of consistency conditions symbolically and exactly. No one has yet tried this approach, but it is included here for the sake of completeness. Mechanical solution of sets of algebraic equalities and inequalities is not very well developed. This approach does not have the convergence problems of the numerical approach, but it may require unrealistic amounts of computing time for some problems.6

The rest of this paper is about the Bristol system, which is an example of a physics-based system. The underlying mathematical model is essentially parallel.

The components of a physical system are usually assumed to be acting in parallel rather than sequen- tially. Therefore the physics-based version of the modular hypothesis immediately gives some ideas as to how to apply parallel processing in the field of simulation. It might be imagined, for example, that each component in a physical system could be assigned to a processor which would simulate it. At this stage, however, it is likely that the number of processors available is considerably less than the number of components. The situation is similar to a play with a large cast of characters and not many actors. All the actors are playing most of the time, and each one may take on, in turn, many roles.

The Bristol modular system is described in the second section. The main unusual feature of this system is the idea of fast and slow modes, and underlying this the concept of fast and slow variables in the physical system. The idea is that the slow variables move smoothly with a reasonably small upper bound on their rate of change. The fast variables may, on the other hand, move very rapidly from one state to another, but usually they are very nearly in steady state with respect to each other and the current values of the slow variables. When the fast variables are near to this steady state, we will say that they are consistent with each other and with the current state of the slow variables.

In terms of components, fast variables in the output at time Tare determined not just by the internal state

226 Appl. Math. Modelling, 1989, Vol. 13, April

but also by the input at time T. That is, consistency is required at each time step at all components. Thus, the passage from T to T + DT occurs in two phases which are called fast and slow. In fast phase the components negotiate until all are satisfied as to local consistency. In slow phase time advances by DT and all components change internal state.

The original implementation of the Bristol modular system was in FORTRAN. Components were represented by subroutines with a fixed structure.

In the third section new implementations in Occam running on transputer networks are briefly described. In these implementations any group of components may be run in parallel in slow mode. In fast mode any group of disconnected components may be run in parallel. This is related to the asynchronous multicoloring idea reviewed in Voigt and Ortega.5

The modular approach is quite attractive, but there are also considerable difficulties in using it in some cases. Two illustrative examples fundamental to thermal simulation of buildings are given in the fourth section. Both examples concern potential and flow. In the first example, intended to represent heat flow, the relationship between flow and potential is linear. The second example is intended to represent inertialess mass flow in pipe networks, and the relationship between flow and potential is quadratic. In a network of heat paths with no thermal capacity, all the variables are fast. Similarly, in a pipe network, if a mass flow is altered at one point in the network, all the mass flows in the network may change virtually instantaneously. Schemes of parallel local interactions are given which reliably establish appropriate consistency conditions for these examples.

Some related problems of convergence of networks of parallel processes are discussed in the next section. General criteria are found which imply convergence of parallel processes on networks. The basic idea is that if no process increases the total error, and if processes near the boundary decrease the total error, then the whole network of processes must eventually converge to a situation of zero error.

In the final section an attempt is made to state the advantages and disadvantages of the physics based version of modularity.

Description of the Bristol modular system

We distinguish systems on three levels: (1) physical system, (2) mathematical model, and (3) computer implementation. The mathematical model is intended to be a relatively simple abstract bridge between the complex physical reality and the intricacies of computing. Some of the features of the three levels of system are shown in Table 1.

The building blocks are components, processes, and subroutines for the three levels, respectively. The physical system runs in continuous time, whereas the mathematical model and the computer implementation run in discrete time with a time step Dt. The code representing a component is called a subroutine here

Table 1


Level Basic element Time Parallelism

Physical system Mathematical model Computer implementation

Component Process Subroutine

Continuous time Discrete time Dt Discrete time Dt

Parallel Parallel Sequential or parallel

because the original implementation was written in FORTRAN. In the physical system and the mathematical model, the components and their representa- tions as processes run in parallel. For example, a heat exchanger and the weather operate simultaneously: whatever they do, they are both doing it all the time.

A process is something with ports, parameters, internal states, an output function, and an internal state function. Processes are joined together into networks by connecting the ports to nodes. There are three possible types of ports: (1) input ports, (2) two-way ports, and (3) output ports. Each port is connected to a node, which is a location at which a vector of real values is stored. There may be different types of nodes with different lengths of vectors, and interpretations of the variables in them. For example, a fluid node has been defined. This has length 7; and each position on the node has a given interpretation. Position 2 on a fluid node, for example, is interpreted as mass flow in kg/s. The three different types of ports act differently on their nodes. Input ports only read data from their node. A two-way port may both read and write to its node. An output port has exclusive power to write to its node; it may both read and write on the node, but anything read there is necessarily written by itself. If an output port is connected to a node, the only other possible connections to it are input ports. The desirability of having two-way ports may be seen by considering fluid flow in pipe networks. A pipe naturally has two ports, one on either end, but it is not known in advance which way the fluid is flowing, and also the data do not necessarily all flow in the same direction as the fluid.

The general restrictions on connections are as fol- lows. If an output port is connected to a node, the only other ports which are allowed to connect to the node are inputs. Thus, no two outputs go to the same node, and no output and two-way port connect to the same node. Any number of inputs may be joined to any node. No more than two two-way ports may join at a node.

The parameters of a process are real values which are needed to define the process but which do not normally change during the course of a simulation, such as the volume of a tank. The internal states of a process are real values which may change during a simulation, such as the average temperature of the air in a room.

As stated before, a process writes values to some of the positions on the nodes connected to its two-way and output ports. These values will be called output values of the process. The output function of a process determines the output values as a function of (1) the current internal state, (2) the current values on nodes to which the ports are connected, and (3) the current

values of the global variables T and Dt. All the values of variables on nodes are intended to be interpreted as the average values in the time step (T, T + Dt). That is, these nodal values are intended to approximate the average values in the physical system. Note that the output values depend upon the current values on the nodes, not just on the values at the last time step. Thus remote nodal values may be interlinked without time delay.

The internal state function determines the next internal state of the process. This new internal state is a function, as before, of the current internal state, the values on the connected nodes, and the values of T and Dt. The next internal state is supposed to be the state at time T + Dt, whereas the current internal state is supposed to be the state at time T.

Fast and slow mode Time moves in discrete steps of size Dt. Each step

from T to T + Dt occurs in two phases, called fast and slow. In fast mode a process acts by

1. reading values on nodes to which it is connected 2. resetting values on two-way and output ports 3. possibly resetting some internal states 4. calculating a local error E

In slow mode a process acts by

1. reading values on nodes to which it is connected 2. resetting internal states

In slow mode, values on nodes are not altered. In slow mode any group of processes may act in parallel. Define a group of processes to be 2-disconnected if there are not two two-way ports among the group which connect to the same node. In fast mode any 2-discon- netted group of processes may act in parallel. Two processes which are joined to the same node by two- way ports may not act synchronously. However, processes which are joined by input-output connections may act together, as may processes which are joined by an input and two-way link. The reason for not running together processes which are joined by two two- way ports is that one of the processes will finish before the other and the later process will overwrite the results of the earlier one. This could result in ignoring the effects of processes with short computation times. A group of processes which are not connected by pairs of two-way links may be run in parallel since this is the same as first running them all on the original data and after they have all run updating the data with the results. For finding solutions of sets of linear equations in fast mode, this technique is the same as the Gauss- Seidel iterative method for components which are only

Appt. Math. Modelling, 1989, Vol. 13, April 227

Parallel running of a modular simulation scheme.: Day et al.

connected by two-way ports, but in general the technique gives something intermediate between the Gauss- Seidel method and the Jacobi iterative method. It is the Gauss-Seidel method, but occasionally it forgets to use the most recent values of the data.

Define a cycle, in slow or fast mode, to be a transformation which is achieved by exactly one action of each process in the model. It should be clear that the transformation corresponding to a cycle is not deter- ministic since there are choices as to which processes act in parallel, and the order in which processes act.

In the transition from the state at time T to the state at time T + Dt, the first step is fast mode. In fast mode time remains at T, and as many cycles of actions occur as are necessary to bring all the local error measure- ments below some given global tolerance value TOL. Fast mode may go on forever, and in this case the fast process is said to be divergent. If the fast process always ceases within N cycles, the fast mode is said to be convergent in N steps. The idea of fast mode is that the processes asynchronously negotiate outputs until steady state within tolerance is reached.

In slow mode, time advances from T to T + Dt, and one cycle of internal state resetting occurs.

Associated with the fast/slow concept is an idea about the dynamics of the physical system. The physical system is supposed to be described by the orbits of two vectors F and S. The variables in F are called fast and the variables in S are called slow. In the modular mathematical model some of these variables will be internal states of processes, and others will appear in public on nodes. We assume that in the physical system it would somehow be possible to hold the slow variables constant and in that case the fast variables would tend toward some steady-state F”. The state F* depends both on S and the initial value of F. We also assume that most of the time in the physical system the fast variables F are very close to F*. For example, a control signal which is electrically transmitted might be a fast variable. If we pick a random moment to look at the physical system, we are unlikely to find a state where a control signal has been sent but not yet received.

The fast and slow ideas originated with models of buildings with plant and control systems. Time constants of buildings are much larger than time constants of plant, and these in turn are much larger than time constants of control systems.

Implementations

The original implementation is sequential, written in FORTRAN, and runs on an IBM AT compatible computer. It uses a Hercules graphics card and a Mouse Systems mouse. Networks are represented by schematic diagrams which can be input and edited graph- ically. About 50 different types of component have been written, mostly connected with HVAC applica- tions. More description of the Bristol system may be found in Day.7,8

There are now also two parallel implementations.


Onw is called the tree and the other is called the checkerboard. These run on networks of transputers, hosted by an IBM AT. We are using the language Occam and the Inmos transputer development system.

The tree implementation runs on a tree of transputers with at most three branches at each vertex. The number of branches is restricted. because each transputer has connections with at most four neighbors. See Figure 1.

All the data for the model, i.e., all the values on the nodes and all the parameters and internal states and all the connections from ports to nodes, are held in memory on the root transputer. The root also has a procedure on it called the manager. There are channels for communication from the root transputer to and from each of the subsidiary transputers, tl, t2, t3, which are immediately below. Each subsidiary transputer also has channels for communication to and from the transputers, if any, immediately below it.

Each transputer in the tree has an address, which is a sequence of numbers between 1 and 3. The length of sequence needed is the depth of the tree. The address is interpreted as a list of directions for getting to the transputer from the root. Messages are passed between the root and subsidiary transputers. Each message for a subsidiary transputer is prefaced by the address of that transputer. When a subsidiary transputer is passed an address and message going down, it passes them on down the tree if the message is not meant for itself. All messages going up are passed on up toward the root.

The subsidiary transputers contain procedures for all different types of components. That is, if there are 40 different component types, all 40 procedures will be on each subsidiary transputer. These are the component types, not the components. In a finite element application, there might be only one component or process type, although the number of components (ele- ments) could be large.

A subsidiary transputer is passed the data for a process, it performs an action and returns the data. The data which are passed include the process type which tell the transputer which procedure to use. The data also includes the parameters, internal states, and values on nodes which are connected to the process.

The manager controls one cycle and tries to keep all the subsidiary transputers as busy as possible. When one of the subsidiary transputers finishes its action, the data are returned to the manager, who records it in the data structure at the root. The manager then tries to find a process which has not yet been run in the current cycle, fetches the data, and sends it to the free transputer. If it is fast mode, the manager is also

[root]

<UT ’ A”\

WI

[till- [tlil 'It131

Figure 1.


constrained by the requirement that two connected processes cannot be run together if the connection consists of two two-way ports. The manager has available an array of pointers from each node to all components connected to it by two-way ports. In fast mode the manager marks the nodes which are connected to processes currently being run but not finished. Using the pointers, the manager can test whether a process is proscribed by being joined via two-way ports to a process which is currently being run. When a new process is selected to run, it can only qualify if none of the nodes to which it is connected by a two-way port are marked. Thus when one of the subsidiary transputers is free in fast mode, the manager scans through the list of processes. When one is found that has not yet been run, and which is not proscribed, the data for that process are fetched.

The checkerboard implementation runs on a rectan- gular array of transputers. The components are distributed over the array according to position in the schematic diagram. The schematic diagram is divided into cells and arranged in such a way that communication between cells is minimized, and then one transputer is made responsible for each cell. Each cell transputer contains all the code needed to run a sequential simulation for all the components within the cell. When a component is only connected to nodes which lie within the cell, its process can be run without reference to other cells. When a transputer needs to read a value on a node which is not contained in its cell, it must send out a request for information. Similarly, when a transputer wishes to write a value to a node which is not in its territory, it must transmit the node to one of its neighbors.

Deadlock between cells is avoided by a new technique called thefading checkerboard. Cells can be red or black, and the initial state of the network has every neighbor of a black cell red, and vice versa, as in check- erboards. Red cells transmit messages, and black ones receive them. When a red cell has finished transmit- ting, it may turn black. When a black cell has observed that all of its red neighbors have turned black, it may turn red. These parallel implementations will be discussed in more detail in subsequent publications. All the software, in compiled form, is freely available from the authors.

Examples involving flow and potential in networks

The system consists of a network of points joined by directed paths. Pi will be the ith point, and Qj will be the ith directed path. Every point is connected to at least one path, and every path is connected to two points. Each point Pi has a capacitance Ci associated with it. Capacitances may be zero, or positive and infinitesimal, or positive and nonintinitesimal. Each path Qi has a resistance R; associated with it. The resistances must be positive and not infinitesimal. There is a variable pi called potential, associated with each point Pi; and there is a variable qi, called flow, asso-

ciated with each directed path Qi. If Pi is a point, let D(Pi) be the sum of the flows on paths directed toward Pi minus the sum of the flows on paths directed away from Pi. Four rules govern the behavior of these models.

Boundary conditions. At each point Pi, there is either an imposed flow F(Pi,t) which may be a function of time t, or an imposed potential G(Pi,t), which may also be a function of time. Conservation conditions. If Pi is a point with zero or infinitesimal capacitance, and Pi has an imposed flow, then D(Pi) + F(Pi,t) = 0. That is the imposed flow plus the sum of the flows on the paths is zero. Note that the imposed flow may be 0. Change of potential. If Pi is a point with positive noninfinitesimal capacitance, and Pi has imposed flow, then potential at pi changes according to

dPi Wpi) + F(Pi,t) -= dt Ci

The fourth rule says how the flow on a path is determined from the potentials on either end. Let Qi be a path with flow qis Let phi and pet be the potentials at, respectively, the beginning and the end of Qi. Define Ap = phi - pei. Our two systems are obtained by two different versions of rule 4.

4a. Heat flow. The flow qi on Qi is determined by

An

where Ri is the resistance of Qi. 4b. Inertialess fluid jlow in pipe networks. The con-

sistency condition is that either

Ap > 0: q:Ri = Ap and qi > 0

or

Ap < 0: q:Ri = -Ap and qi<O

or Ap = 0 and qi = 0.

In either of these models the consistency conditions are all of the conditions 2 and 4. We assume in all cases that the boundary conditions are such that the consistency conditions have a unique solution.

The model with flow definition 4a is called the heat flow or linear model. Its intended interpretation is that potential means temperature and flow means heat flow. The points with positive noninfinitesimal capacitance are thermal capacitors. Of course there are many other interpretations. The model with flow definition 4b is called the hydraulic or quadratic model. Its intended interpretation is that potential means pressure and flow means mass flow. The points with positive nonimini- tesimal capacitance are tanks. In either model, all the variables are fast except the potentials at points with either imposed potential or with positive noninlinites- imal capacitance.

There are many ways of dividing such systems up into components or into components and nodes and representing the components by processes. For pur-



poses of this discussion it will be assumed that there are three types of processes.

Junctions. These represent points with zero or infinitesimal capacitance and imposed flow (which may be zero). Tanks. These represent points with positive noninfinitesimal capacitance or imposed potential. Pipes. These represent paths.

All the ports are two-way. Processes have to be joined together at nodes in the

Bristol system. The above way of dividing up the system therefore has the possibly unnatural consequence that junctions and pipes are joined by nodes. Two pipes cannot be joined together without a junction between them, and a junction cannot be joined to a junction or a tank without a pipe between them. In fact every node is the meeting of a pipe and either a junction or a tank.

We assume that junctions and tanks set potentials on the nodes to which they are connected. Pipes set flows on the nodes to which they are connected. The convention will be that positive flow at a node will mean flow away from the pipe toward the node.

In slow mode, tanks reset their internal potential. In fast mode a tank simply sets the potential at any node to which it is connected. The potential set at a node is the same as its own internal potential which does not change in fast mode. In fast mode junctions and pipes negotiate until all consistency conditions are satisfied up to some tolerance. In fast mode, a pipe reads the potentials on either side of it and sets the flow accordingly. It must also pass some additional information to the processes on either side. This information must be sufficient to allow the junctions to reset their potentials appropriately.

The pipes pass their own resistance in both directions and inform each end about the potential at the other end. There are three mass flows associated with the pipe: the mass flow which it has calculated, and also the expected mass flows on either end which can be calculated from the pipe’s reports. Define the error on the left of the pipe to be the absolute difference between the current mass flow and the expected mass flow on the left. The error on the right is the absolute value of the difference between the expected mass flow on the right and the current value. The error of the pipe is the sum of these two errors.

The task of a junction in fast mode is to set its potential in such a way that after the action of the neighboring pipes, its own consistency condition will be satisfied. Let F(p) be the sum of the flows into the junction after the pipe flows have changed to be consistent with potential p. F(p) is a monotone function of p. The consistency condition is that F(p) + F(Piyt) = 0, where F(Piyt) is the imposed flow. F(p) depends upon the resistances of the pipes and the potentials at the other ends. If the junction has all this information, it can set p so that the local consistency condition is satisfied. Since F(p) is monotone and the correct value of p is somewhere between the lowest and highest potentials on the branches, the bisection

algorithm9 can be used to calculate p. In the linear case, p can be found directly by a fairly well-known for- mula. lo

The fast mode process, therefore, is that the junctions and pipes asynchronously reset their potentials and flows in order to get local consistency. The pipes also continually update their reports of potentials in either direction.

The consistency condition as a whole is just the simultaneous satisfaction of all the local consistency conditions. However, when a process establishes its own local consistency, it may disturb the consistency of neighboring processes.

Define the error of a junction to be the absolute value of F(p) - F, where F is the imposed flow. Recall that F(p) is not the current sum of the flows but the sum expected to result after the action of the neighboring pipes. If a junction knows the neighboring potentials and resistances, then when it resets its potential it will reduce its own error to zero. After the pipes have acted, it may also increase the errors of neighboring junctions. It has, so to speak, pushed its errors down the pipes. The amount by which the neighboring errors may be increased is no more than the amount by which its own error has been reduced. Therefore in this situation the action of a junction will not make the total error of the system larger than before. As long as the errors are sometimes reduced, we would expect eventual convergence. This is the basic idea of the theorem in the next section.

An additional complication is caused by the fact that pipes’ reports may be out of date. That is, sometimes junctions will act on the basis of reported potentials which were true in the last cycle but are no longer true. Nevertheless, we claim that the fast mode process converges whenever there is a unique solution of the consistency conditions and every point is connected to a point with positive noninfinitesimal capacitance.

Some version of the quadratic consistency equations is mentioned in most textbooks which are con- cerned with fluid flow in pipes. The mass flow is often not squared but taken to some power less than 2, such as 1.86. This number is actually a parameter of the pipe. In this case the method above continues to apply without modification.

There are two standard methods for solving the nonlinear consistency equations for networks of pipes and junctions: the Hardy-Cross technique, and the quantity balance technique. See Douglas,” pp. 387-391. It is a version of the quantity balance technique which is discussed above.

The Hardy-Cross technique is a variant of the N- dimensional Newton method. It is an iterative technique which produces a sequence of approximations to the solution. After an approximation is obtained, the partial derivatives are used to make a linear approximation to the consistency conditions. This linear approximation is locally correct. This is then solved to find the next approximation to the solution.

The difficulty with the Hardy-Cross technique, and any related numerical technique, is that no linear ap-



proximation is good enough to be reliable. In the case of pipe networks with quadratic flow, the partial derivatives of the consistency condition for a pipe change sign depending on which end of the pipe has the higher potential. The state space for the system is divided into many regions, and the partial derivatives of the consistency conditions change discontinuously between the regions.

Convergence of parallel processes

The object in this section is to give criteria which imply that a fast mode process converges. This means, as in Day,8 that all the local errors can be made arbitrarily small by taking sufficiently many cycles. More pre- cisely, for any network and any initial state, let E be the sum of the absolute values of the local errors. Then convergence means that for any number E > 0 we can find N so that, after N or more cycles, E < E. The number N may depend upon the initial state. Note that there are a large but finite number of different ways of doing N cycles, depending upon the order of action of processes. Convergence means that for each E > 0 we can find N so that no matter how N cycles are done the result is E < E.

If the fast process does not converge, then for some E > 0 there are arbitrarily long chains of cycles such that E > E. Since there are only finitely many ways of doing each cycle, this means that the chains of cycles can be combined to one infinite chain of cycles in which E is permanently > E. Thus in order to prove convergence, it is sufficient to show that in any particular chain of cycles in fast mode the value of E tends to 0.

In fast mode the tanks set potential to be the same as their own internal potential, which does not change. So after one cycle we may as well delete the tanks. We may also delete junctions where the potential is set by a boundary condition. Assume this has been done. The dangling nodes are now called boundary points. Assume that we are dealing with a single connected network, and a set of boundary conditions has been imposed so that the fast mode consistency equations have a unique solution.

In both examples, the effect of a cycle on the potentials is monotone. That is, if P is the vector of all potentials over the network and C(P) is the new vector after a cycle, then P > Q + C(P) > C(Q). This is because in the fast mode process the new potential is a monotone function of the neighboring potentials. If any neighboring potential is increased, the local steady- state potential at a junction is also increased. In all cases the new potential value may be seen as a linear or nonlinear weighted average of the neighboring potentials.

Let S be an initial state of the network. Define SU to be the initial state in which every potential on a nonboundary point is initially set to the maximum potential on S. Define Sf to be the initial state in which every nonboundary potential is set to the minimum potential in S.

Because of the monotonicity condition, the state of S after N cycles is squeezed between the states of SU and Sl after N cycles. Since we assume the solution is unique, convergence for SU and Sl implies convergence for S.

In Su all divergences are negative, and in Sl all divergences are positive. In Su all the junctions will move their potentials down, and in Sl all junctions will move their potentials up. After one or several cycles, all divergences will still be of the same sign if not zero. In all sequences of cycles starting with Su all divergences will be negative or zero. Therefore all the potentials are moving downward. Since S1 is a lower bound for the potentials, the sequence of potentials converges. Similarly, any sequence starting with Sl has potentials monotone nondecreasing and bounded, and therefore these sequences also converge.

In an earlier attempt to prove convergence for some related processes, a general theorem was developed. This is somewhat more complicated, but may be of independent interest, since many interactions are not monotone.

Assume a particular network which is connected. That is, any two components can be joined by a path through the network. Assume also that an error function > 0 has been defined for each process, and a total error > 0 has been defined for the system. The total error may be defined to be the sum of the local errors. For the theorem below to work, at least we have to assume that whenever the total error tends to zero, the sum of the local errors also tends to zero.

Define a path to be a sequence of processes in which each one is connected to the next by an output or a two-way connection. Define a boundary point to be a node which is only connected to one two-way port or to an output port.

Theorem Consider the fast process on a network in which

every node is connected via some path to a boundary point. Let SZ be a set of initial states. The fast mode process necessarily converges for any initial state in SZ if the following conditions hold over the closure of the set of all states attainable from SI.

a.

b. C.

d.

e.

The local error function adequately reflects what processes do in the sense that, if in any sequence of cycles the error of a process tends to zero, then the changes made by the process on nodes also tend to zero. No process increases the total error. There is a continuous function f(E), with f(E) > 0 whenever E > 0, so that each process immediately connected to a boundary point by an output or two- way port decreases the total error by at least f(E) > 0 whenever it acts with error E > 0. The output function and internal state function for each process are continuous. For each initial condition the values on the nodes and the internal states are all permanently bounded.



Proof. First consider a network in which there is only one node. This must be a boundary point, and therefore there is only one process in the network. The error decreases after each action of the process. The error is monotone decreasing and nonnegative and must therefore tend to some limit. Suppose this limit were A. The error is eventually confined to a closed interval [A,B], where B > A. On this interval f(E) would be bounded away from 0, if A > 0, since f(E) is continuous and f(E) > 0 whenever E > 0. Therefore the limit A must be 0.

Now use induction on the number of nodes in the network. Suppose we have proved the convergence proposition for networks with less than N nodes, and consider a network with N nodes. If the process does not converge there would be a particular sequence of cycles in which the total error would be permanently bounded above zero. Assume this particular sequence of cycles. Thus we are now only dealing with a determined sequence. The total error is monotone decreasing and nonnegative and therefore tends to some limit.

It is sufficient to show that we can bring the total error down arbitrarily small. Suppose, on the contrary, that the total error tended to limit E > 0. Pick TOL arbitrarily small. In each cycle the total error decreases as long as some process writing to a boundary point has positive error. At some point, a permanent state is reached in which every process writing to a boundary has local error < TOL. The number of cycles needed to achieve this is no more than the number needed to get to within f(TOL) of the limit for the total error. Since the error at each boundary process tends to zero, the changes made at the nodes by each boundary process also tend to zero.

Consider a process in the network which writes to a boundary point. Since the process writes to the boundary, its error can be brought permanently down as low as we like. Form a new smaller network by deleting this process and the nodes which are only connected to this process. Since the process writes to a boundary point, at least one node is deleted. By induction the fast process converges on the smaller network for any initial state. Thus for any initial state and any E > 0, we can find N so that after N cycles the total error in the small network is 43. The transformation effected by N cycles is continuous. Since a continuous function on a closed bounded set is uniformly continuous, the transformation effected by N cycles is uniformly continuous on any bounded set of initial states. Take the bounded set of initial states S to be all the possible states which can be reached by the processes from a particular initial condition. S is bounded by condition a. For any E > 0 we can find M(E) so that after M(E) cycles the total error in the small network is <e/3, no matter which initial state in S is used.

We can find a number 6 < 43 so that no variation within 6 of the boundary conditions can make more than 43 difference over M(E) cycles in the small network, no matter which initial state in S is used. This is because once E is given, M(E) is fixed and the trans-

232 Appt. Math. Modelling, 1989, Vol. 13, April

formation effected by M(E) cycles is uniformly continuous. Having found 6, now go back to the original network and run it until the changes made by the boundary process are permanently below 8. Now run the original network for M(E) more cycles. The total error in the network is less than the error in the small network plus the boundary process error, and this is less than the error in the small network in isolation (43) plus the error in the small network introduced by the boundary process (43) plus the boundary process error (43). Therefore the total error is <E.

The idea of the above argument is that if a part of the network has reached, within some tolerance, steady state, it may be removed, thus reducing the problem. On the other hand, if a component near a boundary point does not reach steady state, it “leaks” total error out of the whole system, thus reducing the possibility of local error. This does imply ultimate convergence but does not give results about rate of convergence.

Let S be the set of initial states in which all divergences are of the same sign. A fast mode process starting in St remains in St, and the total error is never increased, since junctions are never acting in opposi- tion to one another. All the potentials are moving in the same direction. So if a potential is reset using out- of-date information, this will only mean that the potential was not moved as far as possible.

Therefore the theorem of this section can be used to give another proof that the fast mode process converges for the two examples. In order to apply the theorem directly to the examples, without using the monotonicity condition, some changes would have to be made in the action of the pipe and its error function. As matters stand, in the quadratic case, it occasionally happens (when two junctions with opposite sign di- vergence are neighbors, and the one with high potential moves down and the one with low potential moves up) that the action of the pipe may increase the total error.

Resistance passing techniques and speed of convergence

The families of components defined above can be expanded. The new components will either be pressure-setting types (that is, like junctions) or mass-flow- setting types (that is, like pipes). Define a join to be a junction-type component with only two connections. Define a branch to be a sequence of pipe-type components alternating with join-type components. Such branches would never be used in the small families of the example, since a branch of pipes and joins is equiv- alent to one pipe.

In order to speed up the convergence of the fast mode processes defined above, a resistance passing technique has been developed. In this technique the pipes and joins in a branch pass the potentials of the junctions at the two ends in both directions. Thus if the branch has k components in it, after k cycles all the components will have a report of the potentials at either end of the branch. The pipes and joins also pass the sum of the resistances in both directions. Thus if


the potentials at the ends are fixed, the exact solution is found in k cycles, where k is the length of the branch. This technique is due to Peter Kimber.

The effect of the technique is to treat a branch al- most as if it were one pipe. Thus the fast mode becomes a dialogue between junctions with at least three connections, and the computational difficulty is related to the number of such junctions.

Reports of computational experience will be given in a subsequent paper.

However, it can be said that the quantity balance method for solving nonlinear flows in pipe networks seems in general to be annoyingly slow, only redeemed by the fact of ultimate convergence. It is quite common to have networks of, say, 50 components which take more than 1000 iterations to converge to a tolerance of 1%. On the other hand, our experience is that New- ton’s method does not reliably converge even when the starting point is quite close to the steady state. The Hardy-Cross method is just as unstable as Newton’s method. Even the method of steepest descent would not necessarily converge. The source of all these dif- ticulties is that in the equation for the relationship between mass flow and pressure drop for the pipe, or in the gradient of that equation, the signs of some of the terms depend upon which of the two pressures is the highest, which cannot in general be known in advance. We do not know what would be an improvement on the quantity balance method. Suggestions would be welcomed.

Parallel processing on small transputer networks gives sufficient speed that it allows us to solve moderate- sized design problems involving fluid flow.

Conclusion

In a modular simulation system, such as the Bristol system, there is, on the one hand, a main program which is perfectly general and independent of application, and on the other hand, a collection of subroutines which represent types of components of a physical system. The generality of the main program means that a good deal of work can be done to get it right, and also it may supply a framework for collaboration and exchange of knowledge among research workers. Another advantage is that the subroutines representing

individual components may be relatively easy to create and validate.

There are also difficulties with the modular approach which we have adopted. One problem is to determine how to chop a coherent physical system into components and to set up a format for exchange of information among components. Another difficulty is that the modular approach, as we have understood it, means that the main program should not contain any application knowledge. A set of nonlinear equations may have to be solved between time steps. The modular idea seems to imply that the solution must be found by using general techniques. If physical intuition about the application is to guide the solution, it has to be embodied in the local actions of the components. It seems likely that in many cases special-purpose solution techniques which depend on application knowledge will be more efficient. Therefore it is to be expected that the gains to the research community promised by modularity will be paid for by computational slowness. Hopefully parallel processing will help to alleviate this problem.

References

5

6

Capello, P. R. Gaussian elimination on a hypercube automaton. J. Parallel and Distributed Computing 1987, 4, 288-308 Heller, D. E. A survey of parallel algorithms in numerical linear algebra. SIAM Rev. 1987, 20, 740-717 O’Learv. D. P. Parallel implementation of the block coniugate gradient algorithm. Paral/;{ Computing 1987, 5, 127-139 - Shao, J. and Kang, L. An asynchronous parallel mixed algorithm for linear and nonlinear equations. Parallel Computing 1987, 5 Ortega, J. M. and Voigt, R. G. Partial differential equations on vector and parallel computers. SIAM Rev. 1985 Davenport, J. and Heintz, J. Real quantifier elimination is dou- bly exponential. In Algorithms in Real Algebraic Geometry, eds. D. S. Amon and B. Bechberger. Academic Press, 1988 Day, B. et al. A modular system for simulation of the thermal performance of buildings and their service systems, Environ- mental Engineering Research Unit report, University of Bris- tol, 1985 Day, B. et al. A mathematical framework for modular simulation. Mathematics and Computing in Simulation 1987 Burden, R. L., Faires, J. D., and Reynolds, A. C. Numerical Analysis. Prindle, Weber and Schmidt, Boston, 1981 Holman, J. P. Heat Transfer. McGraw-Hill, New York, 1981 Douglas, J. F., Gasiorek, J. M., and Swatfield, J. A. Fluid Mechanics, Pitman, 1983


Documents

Parallel running of a modular simulation scheme