Design of neurocomputer architectures for large-scale constraint satisfaction problems

NEUROCOMPUTING

ELSEVIER Neuroeomputing 8 (1995) 315-339

Design of neurocomputer architectures for large-scale constraint satisfaction problems

H. Nikolaus Schaller *

Technical University of Munich, Arcisstr. 21, D-80290 Miinchen, Germany

Received 11 April 1994; accepted 26 January 1995

Abstract

Many technical problems, like scheduling a crossbar switch or factorizing numbers, can be easily described declaratively as constraint satisfaction problems. Especially in real time control applications, a general purpose processor is required that solves these problems efficiently. A promising approach is a massively parallel neurocomputer architecture. The design of such a neurocomputer architecture can be divided into three levels. The first level is the programming model. This includes methods for compiling large-scale programs. We present a programming model with binary variables and range constraints as the machine language. The second level addresses the implementation by iterative constraint satisfaction procedures for range constraints based on the gradient of an error function. Two methods are developed, the Rolling-Stone and the dynamic barrier algorithm. The dynamic barrier algorithm is evaluated with the N-Queens benchmark. The third level is the realization principle of neurocomputers for constraint satisfaction problems by electronical or optical hardware.

Keywords: Constraints; Hopfield network; N-Queens; Dynamic barrier; Computer architecture

1. Introduction

Nowadays, many non-numerical technical problems are solved on digital computers by designing specialized algorithms in imperative or object-oriented programming languages on single or multi-processor machines. But the design of fast

l Corresponding author. Email: [email protected]

0925-2312/95/$09.50 0 1995 Elsevier Science B.V. All rights reserved SSDZ 0925-2312(95)00023-2

316 H.N. Schalier / Neurocomputing 8 (I 995) 315-339

algorithms can be very difficult and time consuming. On the other hand, it is often very simple to give a declarative description of properties of the intended solutions of the given problem. So, a promising approach for simplifying and accelerating the whole problem solving process is to formalize this declarative description and to apply a general purpose procedure to find solutions. This procedure should be useable like traditional digital computers. Especially for on-line control applications, such as the control of a complex, redundant multi-stage crossbar switch for example, fast VLSI hardware would also be useful.

The principles of declarative programming are well known for a long time in AI (e.g. PROLOG). A special branch of declarative programming are constraint satisfaction problems (CSP). A constraint satisfaction problem consists of a given set of variables and a set of logical expressions (relations). To solve the problem, an assignment of values to these variables must be found, making all expressions true.

A well established method to find a solution (usually the problem has many, but only one is requested) is the depth first search (DFS) method. This is essentially an algorithm trying all possible value assignments to the variables. If no constraints are violated, it has found a solution. To speed up this technique, several heuristics have been developed. The major ones are skipping (pruning) further assignments if there is already a constraint violated and sorting the order of variables and the values assigned to a variable.

But there are limitations to this approach. The first one is that the design of good, fast and general heuristics is difficult. And even well designed heuristics with low computation time fail for certain large-scale problems like the N-Queens problem discussed in Section 3.1. This limit has a theoretical background. Con- straint satisfaction problems and other problems can be classified as polynomial (P) and nondeterministic polynomial (NP) problems. It is a well-known, but yet unproven, conjecture that there does not exist any algorithm that finds solutions of a problem of class NP within computation time proportional to a polynomial in the problem size N. Instead, one has to expect exponential dependence of the computation time on the problem size. So, general large-scale constraint satisfaction problems require much time to be solved. Unfortunately, it is not easy to decide if a practical constraint satisfaction problem belongs to P or NP. So we have to expect that a heuristic, both fast (i.e. computable in polynomial time) and general, does not exist.

Another limitation of the depth first search method is the limited speed-up by the parallelization because the communication overhead required for load balanc- ing methods increases with the number of processors [241 and saturates multi- processors.

Nevertheless, it is worth looking for general constraint satisfaction procedures different from depth first search that solve most or all problems in P in short time. A promising approach pioneered by J. Hopfield ill,121 uses recurrent neural networks, although there is no evidence that they can crack NP problems. Because of the completely different communication structure speed-up limits from communication overhead are generally avoided.

HA? Schaller / Neumcomputing 8 (1995) 315-339 317

It has to be noted that many technical applications are of a different class, they are combinatorial optimization problems. These are constraint satisfaction problems in which all feasible solutions are scored by some function and only the best or nearly the best one is asked for. So, constraint satisfaction problems are slightly easier to solve, since it is known that any feasible solution is a solution. This is the reason why this work is restricted to solving constraint satisfaction problems only. But since there is a large demand on optimization for technical applications, there will be a proposal in section 2.5 for handling optimization problems with the architecture presented.

The design of a general constraint satisfaction problem solver can be guided by the three computer design level defined by Blaauw [61: (1) programming model (architecture), describing the logical view of the machine, (2) implementation, describing the functional organization, (3) realization, describing the physical components.

For the design of a high quality programming model, most of the design principles of Blaauw can be applied: (1) orthogonality, (2) symmetry, (3) generality, (4) transparency, (5) open-endedness, (6) completeness.

Of course there are some other goals to be met by a good design. The user of the computer expects that the processor is (1) easy to program, (2) efficient, (3) scalable.

In this contribution, the programming model will use binary variables (to simplify the realization by neurons> and range constraints (between-k-and-Z) for the number of variables with the value 1. We will present how to design large-scale programs using this type of constraints systematically. This program design can be supported by compilers like programming traditional computers by high-level programming languages.

The implementation can use any method appropriate for this programming model. It can be either an optimized depth first search algorithm realized on a standard serial or parallel processor, but a faster realization can result from a neural network with high degree of parallelism. In the latter case, there will be one neuron representing each variable and the problem is translated into an intercon- nection structure. This neural implementation requires regular, vectorizable operations and only a small number of global signals and parameters. The neural network can be realized electronically or optically.

So the ideas of J. Hopfield and others are picked up, combined with the design principles of computer architecture and a general purpose constraint satisfaction processor based on neural networks is synthesized.

It should be noted that there are may other related proposals in the literature,

318 H.N. Schaller / Neurocomputing 8 (1995) 315-339

for example [2,9,20,30,35]. Unfortunately, some of them are not guided by design principles.

2. The programming model

In traditional computer architectures, it has proven useful to define a programming model that describes the logical view of the machine for the programmer. A programming model consists of (1) a static description of the data and program storage elements (memory,

register), (2) a list of machine language operations and their binary codes (operation codes), (3) the data types, addressing modes and the semantics of the operations of

sequential program execution, (4) input and output.

The programming model doesn’t include a reference to the method, by which steps machine language operations are interpreted and to the physical components the machine consists of. A neurocomputer for solving constraint satisfaction problems can be described in a similar manner.

2.1. The programming model with between-k-and-l-out-of-n constraints

In the programming model proposed (Fig. 11, the data storage elements are binary variables in a working register z, and the program (a set of between-k-and- I-out-of-n constraints [9,21,25,34]) is stored in some memory. The machine has only a single command (s t a r t >, namely solving the presented problem. Thus, no

working register

input mask

input data

0 atafl

0 continue

G solved

unsolvable

b

machine program

Fig. 1. Programming model for a constraint satisfaction problem solver.

H.N. Schnller /Neurocomputing 8 (1995) 315-339 319

sequential program control is required. Input is done by dynamically modifying constraints while output is derived from the values assigned to the variables. Solutions are indicated by the signal so 1 v ed. Then, the search process can be restarted by con t i n u e. If the problem has no (more) solutions, u n s o 1 v a b 1 e

should be indicated. In detail, the programming model has the following components:

(1) The assignment to the variables is stored in the working register 2,. (2) The output mask register o, tells if a variable (Y belongs to the output of the

problem or is an internal auxiliary variable. (3) The matrix element Ci, = 1 describes that variable z, is constrained by

constraint i. It is 0 if the variable is not constrained. (4) The values ki and li are the lower and upper bound of the number of variables

in the constraint i that must have a 1 assigned for feasible solutions. (5) The k and 1 values can be enabled individually by e, for external control. This

results in a program that can be modified by input data during runtime. What’s the meaning of a between-k-and-l-out-of-n constraint (range constraint)?

To be fulfilled, there must be between k and 1 variables assigned with 1 out of a set of n variables. The others must be 0. If the number of ones in the set is less than k or larger than 1, the constraint is violated. If k = 1, a k-out-of-n constraint [23,33,32] results. The conversion of a range constraint by introducing slack variables is discussed in Section 3.2.

Why are these between-k-and-l-out-of-n constraints attractive? Firstly, they are easily transformed into a quadratic error function and into the weight matrix of a neural network as will be shown in Sections 3 and 4. The values k and 1 become bias inputs to the neurons which are easily controlled by analog signals if external input is required [261. Secondly, this type of constraint is transparent, parsimo- nious, and general and has a large potential for programming different problems like map coloring, placement, scheduling, integer arithmetic, puzzles, and others. Simple design rules can be found in [251 that will be shown later in some examples. So most of the requirements for a high quality programming model are met.

There are several different representations for machine language programs for this programming model. The first is to give the vectors e, o, k, 1 and the matrix C which is essentially a discrete linear system of inequalities. A second one is the graphical representation shown in Fig. 2, which may be most familiar for neural network researchers, since clusters of neurons ([18] and others) are resembled. A small circle represents a variable and an oval a constraint for the values of all variables enclosed. The value k (if k = Z) or the range k-l is written beneath the oval. In the example in Fig. 2, we have a l-out-of-2 and a between-O-and-l-out-of-2 constraint for the variables a, b and b, c respectively. A third representation similar

&g) or B-1 1 O-l 1 O-1

Fig. 2. Example of boolean constraints.


e-1

input e - 2

e-3

a-la-2a-3

Fig. 3. A 3 x 3 crossbar switch.

to assembler languages has been developed that can be read or generated directly by standard computers. For the example in Fig. 2, the program reads

l=a+b

0-l=b+c.

To give a practical example of a larger program, the crossbar switch [37] shown in Fig. 3 can be controlled by the set of variables and constraints shown in Fig. 4. There is a binary variable for each of the switches. The variables are controlling the state of the switch (1 means closed). The horizontal and vertical constraints inhibit a short circuit by two switches closed on the same input or output line.

To plan a schedule for closing and opening the switches depending on connection requests, the matrix of variables is replicated for the number of time steps to plan in advance. Constraints are added for the number of time steps k, ~L1 a certain switch e + a must be closed. The first type of constraints are static while the latter are controlled externally by dynamically changing their value k, ‘(I depending on the requests. It must be admitted that there are simple and efficient algorithms for scheduling this single-stage switch directly. But if a multi-stage switch is considered with redundant paths and possibly defective switches, the constraint satisfaction approach shows its advantages.

2.2. IF-THEN rules

There are methods for designing programs using range constraints systematically. Let’s first construct a general IF-THEN rule of the form 1261

(a, A a2 A . *. A a,) + (b, V b, V * * * V b,) (1)

Fig. 4. Constraints for scheduling a crossbar switch (only a single demand and one switch for k, _ 1 are shown).

H.N. Schalkr /Neurocomputing 8 (1995) 315-339 321

a1 a2 . . . am

Fig. 5. Constraints representation of an IF-THEN rule.

This can be represented as a between-l-and-Cm + n)-out-of-cm + n) constraint on the set

{Zl, z 2,“‘, z,, b,, b;?,...,b,} (2) of variables.

The negated variables are auxiliary variables with a l-out-of-2 constraint on the negated variable ii and the original ai. The resulting constraint is shown in Fig. 5. It works as follows. Let’s assume the premise is true, i.e. all variables al, a2,. . . have a 1 assigned. Then, all negated variables Iii must have a 0 assigned. So, at least one of the variables b,, b,, . . . must have a 1 assigned which is equivalent to a logical OR of their values. So the conclusion must be true. The same argument can be applied to the opposite direction (all b,, b,, . . . are assigned 0, resulting in at least one 0 in aI, a2,. . . ).

Note that the view of aI, a2,. . . as inputs and b,, b,, . . . as outputs and a processing direction is arbitrary. This direction is not encoded into the constraints, since this programming language is strictly declarative and an explicit distinction between forward- and backward-chaining like in expert system shells is not required.

2.3. Arbitrary boolean constraints

Applying an arbitrary boolean constraint to a set of variables can also be done systematically 1271. The semantics of a boolean expression as a constraint is that the expression must be true for solutions. This is achieved by the following steps. (1)

(2)

(3)

(4)

Convert the given boolean expression into a minimized- conjunctive f&n-A conjunctive form is the logical AND (conjunction) of terms derived by logical OR (disjunctions) from the variables or its negates. Each disjunction of the conjunctive form imposes a between-l-and-n-out-of-n constraint on the n variables in the conjunction. The negates are auxiliary variables with l-out-of-2 constraints as explained above. The conjunction of the disjunctions requires no special treatment, since all constraints have to be fulfilled together.

An example for the expression a vb V c is shown in Fig. 6. A prototype of a compiler for arbitrary boolean expressions has been con-

structed [10,271. It reads in a set of source file lines with boolean expressions and

322 H.N. Schaller /Neurocomputing 8 (1995) 315-339

a b c

Fig. 6. Conversion of a boolean disjunction into a constraint.

a a

b b - auxiliary ahb

- auxiliary variables avb variables

Fig. 7. Neural Logic Gates (from 1191).

generates a constraint program using the rules described earlier. In addition, some rules have been developed for optimizing the resulting constraint program in the number of constraints or variables [3].

A different approach for representing boolean expressions by k-out-of-n constraints was proposed in [4,18,19,21], the neural logic gates (NLG). This design principle defines macros for logical gates like NAND and NOR (Fig. 7). Using these, any boolean expression can be formulated directly by conversion of the expression structure into a tree of neural logic gates.

2.4. Large-scale problem

But this is not the end of the methodology for the development of large-scale programs. Since all boolean expressions can be represented by constraints, we are able to use the whole development in logical switching circuit design, as for example binary full adders [4], multiplier structures, and so on. And this opens a new area of large-scale applications.

Since there is no explicit direction of signal processing from the input to the output, we can constrain the output of a N x N bit parallel multiplier to a certain binary coded number P and constrain, for example, the factors A and B to be larger than 1 (Fig. 8). Solving the resulting constraints is equivalent to the

aN.1 .” bN.1 “’ by l-(N-l)

k as required by binary representation of P

Fig. 8. Constraints for the factorization of numbers.

H.N. Schaller / Neurocomputing 8 (1995) 315-339 323

Fig. 9. The 7 x 7 jigsaw puzzle (from [2 !21).

factorization of P. If there is no solution, P is prime. So, together with neural constraint satisfaction procedures, we have a new approach to the problem of checking a given number for primal&y or finding its factors. This result has importance for cryptology and computer security. All details can be found in [291.

Another large-scale problem has been studied, the jigsaw puzzle (Fig. 9 [22,241). The puzzle consists of a square matrix of NX N positions and a number of different pieces to be placed on the matrix. The pieces are the blank character q ,

digits o up to 9 the algebraic operators + cl 03

0, H, m, m, and the numerical

equivalence = . 0

The problem is that only the number of pieces of each type is

given and a placement with semantically correct equations must be found. The assignment of a piece s to a position x, y can be easily represented by a

three dimensional structure of binary variables z._,~ as shown in Fig. 10. The required constraints can be grouped in four classes. (1) Problem specific constraints: enforce the number k, of pieces of each type s.

This is a k,-out-of-W x N) constraint for the layer s. This can be controlled by external inputs during runtime to solve different problems. Additionally, there is an at-most-l-out-of-16 constraint for each column to enforce the assignment of not more than one piece per position.

layer for piece a

Fig. 10. Binary variables for jigsaw puzzles.


(2) Problem size constraints: make it possible to define a larger working area and to constrain the assignment of pieces to a square in some corner of the area. It is possible to define these constraints in a way that the size N of the active area becomes the k-value of a single k-out-of-n constraint that can be controlled by an external input during runtime.

(3) Syntactic constraints: describe the syntactical structure of equations, i.e. alter- nating digits and operators, at least one equal sign if there is at least one operator in a row, and some symmetry rules on positioning the blank charac- ters. These constraints can be formulated by replicated patterns of boolean constraints.

(4) Semantic constraints: enforce the correctness of the equations. This requires handling of operator precedence rules, addition, subtraction, multiplication and division. Although this requires a very large amount of variables and constraints, it can be formulated with the methods developed.

Constraints of type (1) and (3) have been generated and tested for the problem shown in Fig. 9. Some more details can be found in [291.

2.5. Combinatorial optimization

How to solve combinatorial optimization problems with this programming model is best explained for an example. Assume a placement problem where the least area is asked for in which a given set of pieces can be placed and there are some constraints to avoid the overlapping of pieces.

Firstly, a three-dimensional structure of variables is defined (similar to the jigsaw puzzle) to which the piece and placement constraints are added. Then, the size N of the working area is added as an explicit constraint by enabling the specified number of rows and columns in the variable structure. If this is done by a special design rule that can’t be explained here in detail, a single range constraint allows us to define a lower and upper bound for the area to be used by the placement.

Now optimization is done by repeatedly starting the constraint satisfaction procedure and checking if a solution is found. If this is done with halving the interval of the allowed area from repetition to repetition, this process will converge rapidly to the critical size which is the optimum looked for. Generally, the idea is to make the value of the score function an explicit constraint on the variables and to tighten the bounds.

Of course, some comment must be made. The method explained above works only if the constraint satisfaction procedure is fast and can give the definite answer that the problem can or can’t be solved. This is partially true for depth first search, which may be fast for solvable problems and can detect unsolvable problems (but usually after a long time only). But the methods presented below are usually not able to decide the unsolvability. In fact, they may fail even for solvable problems within a certain time limit. So we can’t expect from either method the solution of every combinatorial optimization problem. But suboptimal solutions can be found,


for example those with the smallest working area size that the procedure was able to solve within a given time limit.

3. The implementation of constraint satisfaction procedures

The goals to be considered for the design of appropriate constraint satisfaction procedures can be classified into three groups. a Hard goals

(Gl) appropriateness for any set of range constraints for binary variables (G2) independence of the rearrangement of variables or constraints (G3) lack of arbitrary system parameters influencing the solution quality (only

speed) (G4) usage of vector operations only (for high degree parallel realization) (G5) problem independence (general purpose design) (G6) scalability to solve large-scale problems

0 Soft goals (G7) faster than depth first search (G8) find all solutions with equal probability (G9) decode unsolvable problems

l Probably unreachable goals (GlO) always fast (find a solution of any solvable problem in polynomial time) These goals have to be met as well as possible by any proposed design.

3.1. State of the art

Let’s first look how the constraint satisfaction procedure could be designed in principle. There are the following five major classes of general methods: (1) systematically guided search (depth first search, breadth first search) (2) algebraic procedures (term rewriting [ 151, calculus) (3) divide and conquer (solve subproblems and glue solutions together) (4) deterministic relaxation into equilibrium (gradient descent, Hopfield network,

others) (5) stochastic search or relaxation (simulated annealing [8,14], genetic algorithms,

others [2,30,35]) Methods from group (1) are traditional tree search algorithms. They can be

easily applied to any set of k-out-of-n constraints in theory, but there arise some practical problems. The first is that the search tree must be pruned and sorted, otherwise the algorithm would waste much time. This can be optimized best by adding the ‘any constraint violated’ pruning rule and the ‘most constrained first’ sorting heuristics to the depth first search as explained in the introduction.

But even using these heuristics, the method becomes too slow for certain large-scale problems. Experimental results with the N-Queens problem coded by k-out-of-n constraints (see Section 3.4) suggest that the computation time of the optimized depth first search is even proportional to an exponential function of N


A

steps for first solution

Fig. 11. Number of steps for first solution in sorted, pruned depth first search for the N-Queens problem.

(Fig. 11). For example, we could not find solutions for N = 80 within lo9 trials (several days of computation time on a RISC workstation). The same result has been reported by other authors [311.

Algebraic methods (2) use rules to rewrite the expressions of the constraints. The rewriting rules are designed to result in simplified equations with just one variable on the left side and a constant on the right. Unfortunately, there are only efficient algorithms for continuous, linear equations (Gaul3 elimination) or inequa- tions (SIMPLEX algorithm) but not for discrete (although linear) equations like those resulting from k-out-of-n constraints. So, this method can’t be applied at all.

Currently, the author is not aware of any divide and conquer algorithm (3) that splits a given k-out-of-n problem into independent subproblems that can be solved separately. Note that all variables are usually connected by several chains of constraints.

So, only methods (4) and (5) remain and, in fact, components of both will be used in the algorithms presented here.

3.2. Gradient based constraint satisfaction procedures

To apply relaxation methods to constraint satisfaction problems, we have to define a system state and an error function. The system state is naturally deter- mined by the variables of the constraint satisfaction problem. But we have to take into account that the constraint satisfaction problem has discrete variables, while relaxation requires usually continuous variables. So we simply define that the values to be assigned to a variable lie between 0 and 1 inclusive. This extends the search space to the interior of the hypercube spanned by the variables (Fig. 12).

To define an appropriate error function, we have to (1) convert the between-k-and-l constraints into k-out-of-n constraints, (2) define a quadratic error function for the constraint satisfaction problem, and (3) increase the error in the interior of the search space to prefer the assignment

0fOor 1.

H.N. Schaller/Neurocomputing 8 (1995) 315-339 321

(0,0,0,0),,

Fig. 12. Two-dimensional projection of a search space with four variables.

between-3-and-5-out-of-6

000000

’ 5-out-of-8

(0 0 0 0 0 0 i 0 0) application variables slack variables

Fig. 13. Conversion of range constraints using slack variables (from [34]).

The first step can be done by introducing slack variables as shown in Fig. 13. A general between-k-and-l-out-of-n constraint is converted into an l-out-of-(n + I- k) constraint on the n original and I - k additional slack variables [21,25,34]. This method can be used only for constant values of k and I, of course, since they determine the (static) number of slack variables. If one or both limits are to be controlled externally, the scheme shown in Fig. 14 must be used instead.

A quadratic error function can be defined by first describing a constraint satisfaction problem with k-out-of-n constraints as a discrete, linear equation system with b constraints and u variables [26]

Vi: 2 Cia.z, = ki with VCX:Z, E (0, I}. (3) a=1

between-k-and-l-out-of-6

(0 0 0 0 0 0)

t (k+6)-out-of-12

application variables ~~ :‘* :I,:

I-out-of-1 2

Fig. 14. Conversion of controllable range constraints using a constant number of slack variables.

328 H.N. &hailer /Neurocomputing 8 (1995) 315-339

If all deviations from equality are squared and summed up, the error function

(4)

results. This error has the property to be generally positive and to be zero only for

solutions of the constraint satisfaction problem. So, finding a solution is equivalent to finding the zeroes or the global minima of the error. This opens up the application of optimization methods.

Finally, we add the penalty term u b

+ C CciazaC1 -za) a=li=l

which increases the error, if any variable has a value in the interval between 0 and 1. This allows to distinguish between corners and the interior of the search space. The resulting error function is

Why is it useful to define an error function like this one? We find that the major features of this function are that it is quadratic, and the negative gradient x, = - aE/az, of this quadratic function

X, = &J2ki - 1) - 2 c ‘&J& i=l p#ai=l

(7)

is linear in zB and can be realized by the synapses of a neural network. So we can use a neural network with appropriate weights, as shown in Fig. 18 in Section 4, to calculate the gradient. If our constraint satisfaction procedure uses the gradient of the error to change its state, we can build a neural network that executes this algorithm as a massively parallel processor.

Of course, there are many continuous optimization methods based on the gradient (see e.g. [5,8,13,361). Let’s discuss only three of them briefly: (1) The Steepest descent is also known as gradient descent. This method couples

the momentary state with the negative gradient by the motion equation

dZ’(t) dt

=2(t)

which describes that the velocity is proportional to the negative gradient of the error at the actual position. This speed is variable but becomes zero in a minimum. Unfortunately, this method can’t be applied to the error function E at all, since the error becomes negative if the assignment leaves the search space.

(2) A method that overcomes this is the HopjTeki descent which confines the steepest descent to the limits of the search space. Firstly, the change of the


state Sz is damped by a first order lag system, and secondly the assignment z is derived by a nonlinear (sigmoidal) limiting function of the system state z. This results in the Hopfield motion equations 1121

d&X u, -- dt -.Iza T

Pa)

z, = i * (1 + tanh( u,&,)) P)

with T > 0 and us > 0. Both methods presented so far have the drawback that they can get stuck at local minima in which the gradient becomes zero but the error does not vanish.

(3) The Rolling-Stone method [7] is a proposal to overcome this problem. The main drawback of all descent methods is that the system motion comes to a rest at local or global minima. This is generally avoided in the Rolling-Stone method by making the speed constant and only changing the direction of motion into the direction of the gradient. Of course, this method can’t stop at a minimum but must continue straight ahead with an ascent until it runs through a maximum and so on. It has been proven [7] that the system must run through all critical points (gradient is zero) which include all minima and maxima and therefore runs through the global minima of E which correspond to solutions of the constraint satisfaction problem. To apply this method to the quadratic error function E with confined search space, the same nonlinear limitation trick as the Hopfield descent is used, but in this case a sinusodial function is used. More details can be found in [28].

But there is a different class of methods using a discrete search space and a discrete flow of time. The main idea is firstly to approximate the continuous Hopfield descent by numerical integration. This results in the update rule

u,(t+At) = (1 -At/+&) +Atx,(t) (10)

where At is the time interval. Secondly, setting At = T = 1 and taking the limit u0 + 0 results in the discrete

Hopfield descent with the update rules [ll]

u,(t+ 1) =x,(t) (Ha)

z,(t) = ( y for W) < 0 u,(t) > 0’

irrelevant if U, = 0. uw To avoid oscillations of this dynamical system, only a single variable may change

its state in each time step. The selection of the next variable that may change its state is usually done at random with equal probability. Note that the discrete Hopfield descent has the same major drawback as its continuous counterpart, it gets trapped in local minima.

Let’s comment on the fact that the discrete methods also use the gradient which is primarily a continuously defined function. It can be shown that each component -x, of the gradient describes exactly the change of the error if the assignment to


the variable z, is changed from 0 to 1 and all others are left unchanged. The formula is

Together with the rule (lib), this information is enough to reduce the error in every time step as long as the system has not yet found a minimum.

The behaviour of such discrete systems can be explained with a state trajectory jumping between corners of a hypercube. Every state corresponds to a certain error for the given constraint satisfaction problem. Changing a single variable assignment in every time step is equivalent to moving the state along an edge parallel to the coordinate axes. This movement is guided by the update rule to reduce the error until a corner is reached, in which no further reductions are possible. This is a stable point and is either a local or global minimum.

Analysis of the discrete Hopfield network for k-out-of-n constraints has shown that it is practically useless. Two reasons can be identified. Firstly, the random selection of variables to be changed leads to very slow convergence. The other reason lies in the local minima the method may get stuck in. Solutions for both problems are proposed in the dynamic barrier algorithm.

3.3. The dynamic barrier algorithm

The dynamic barrier algorithm consists of four distinct improvements over the discrete Hopfield descent: (1) improved state change selection rule, (2) destabilization of local minima, (3) barrier if a variable has been changed to escape from a local minimum, (4) speed-up by simultaneous state change of several variables.

The first improvement is to select the most active variable as the next one to change its state according to (lla) and (lib). This replaces the random selection rule, which has only a low probability of selecting a variable that reduces the error. For the dynamic barrier method, the activity is defined as

a, = x, - (22, - 1) i Ci, i=l

(13)

This value becomes zero for all variables, if the system has found a solution and approximates the number of constraints violated by the value of z,. So, the most active variable is the variable that violates a large number of constraints. Because of this, it is best to allow this variable to change its state in the next time step to reduce the error rapidly. If there is a tie, one of the variables is chosen randomly. Note, that this is the only randomized component of the dynamic barrier algorithm.

A further improvement could augment this rule and make it even more deterministic. Note, that it has been observed in experiments that a simpler definition of ap = I x, 1, which selects the largest change in error, gives worse results.

H.N. Schaller / Neurocomputing 8 (I 995) 315-339 331

The main drawback of the Hopfield descent is that it gets stuck in local minima. Fortunately, it is possible to decide between local and global minima [181 because of the special construction of the error function E. The system is in a minimum if no variable can change its state according to Eq. (lib). If E is zero, then the state is a solution. Otherwise, it is a local minimum. This decision can be vectorized by checking if any a, is nonzero, which indicates a local minimum, and by reduction using a logical conjunction.

Now, if a local minimum is detected, a destabilization analogous to the Rolling- Stone method is applied. In the dynamic barrier algorithm, the most active variable will be selected as before, but rule (lib) will be violated, i.e. the assignment of the variable will be inverted. This increases the error, of course.

In the next time step, the system would return back to the same minimum in most cases. To avoid this, the dynamic barrier has been introduced. This barrier disables the variable that has changed its state for some time steps. In the picture of the hypercube, an obstacle plane is put between the values 0 and 1, perpendicu- lar to the coordinate axis of the variable.

Of course, this barrier has to be removed after some time steps. Otherwise, the system could never reach low error levels. Experiments have shown that the best rule is to use an incremental duration, i.e. starting with one time step for the local minimum. For the next minimum, the barrier lasts two time steps and so on. Note that there is a separate barrier for each variable and several variables may be disabled concurrently.

The last improvement comes from the observation that oscillations in the case of simultaneously applying the update rule to all variables result from overshoot in common constraints. So, if two variables have no constraint in common, they may change their state in a single time step using rule (lib) and the error can’t increase due to conflicts. And they can even be changed together if they have a constraint in common, but the number of variables with a 1 assigned is much less or much larger than k. From this observation, the fourth improvement results which uses markers for all variables and works as follows: (1) Delete all marks. (2) For all constraints do:

l Select most active of the variables from this constraint with the following properties: l not marked and l enabled (i.e. no barrier) and l rule (lib) changes state (reduces error) or system is in local minimum.

l Mark the selected variable, remove mark from all others in the constraint. This algorithm guarantees that there is at most one variable marked in each

constraint and this one is most active and it is not disabled. It also ensures that there is always at least one of all variables marked (even in the case of a global minimum). The marked variables are changed in their assignment at the next time step. If the system is in a local minimum, the dynamic barrier is built up for all marked variables.


Fig. 15. A solution of the g-Queens problem.

3.4. Simulation and evaluation of the dynamic barrier algorithm

Simulations have been done with the N-Queens problem ([17,31] and others). This well known problem consists of placing N queen figures of the game of chess on an iV x N chessboard, so that no two queens can attack each other (Fig. 15). Solutions exist for N = 1 and any N 2 4. This problem is often used as a benchmark for search algorithms because finding solutions is not trivial but solvable and the problem can be scaled to any arbitrary size. And it can be formulated easily by using l-out-of-h’ constraints for rows and columns and between-O-and-1 constraints for all diagonals of a N x N square of variables [l&34].

The result of an optimized depth first search algorithm has been given already in section 3.1. The standard discrete Hopfield descent could find a solution of the g-Queens problem only in one of 1000 runs. With the selection rule using the activity, it found a solution in 13 runs. With the dynamic barrier, solutions were found in 972 of 1000 runs and with simultaneous updates in 993 of 1000 runs within 10000 time steps.

1-y .,..“Y() ‘“““&J 1 + N

Fig. 16. Number of time steps of the dynamic barrier algorithm required to solve the N-Queens problem.

H.N. Schaller / Neurocomputing 8 (1995) 31 S-339 333

t 4

1000

Fig. 17. Distribution of search time of 10000 runs of the dynamic barrier alogrithm for the N-Queens problem.

More interesting is the number of time steps required to find the first solution dependent on the problem size N, i.e. the empirical complexity. The results for the N-Queens problem are shown in Fig. 16. The vertical bars are the minimum and the maximum of the number of steps for 8 runs for problem sizes between 1 and 200. As can be seen, the median search time grows but stabilizes at about 800 steps, even for large problems with N = 200. This indicates O(1) complexity. In the N = 200 case, the depth first search algorithm was not able to find a solution within lo9 steps at all.

The distribution h, with bin-width 10 of the search time (Fig. 17) has shown negative exponential character. The maximum (933 runs) is for the interval lo-19 steps, while the median is 147. This means that a very short search time is common and large search times are rare.

In addition, the distribution of solutions was checked for the &Queens problem. It has 92 solutions and the dynamic barrier algorithm found them all. Unfortu- nately, by using the x2 test, an equal distribution had to be rejected with 5% error probability. This has to be studied in more detail.

Experiments with the factorization problem and the jigsaw puzzle were not successful. Maybe they are in fact problems of the class NP.

3.5. Discussion

What has been achieved? Generally, we have shown that some of the drawbacks of the classical Hopfield

method can be overcome by the Rolling-Stone method or by the dynamic barrier algorithm.

Regarding the design methods, both new methods have been designed systematically by improving the original Hopfield descent method and without specializing to any constraint satisfaction problem.

Of the general goals, the hard goals (Gl-G6) are all fulfilled by the dynamic barrier algorithm. This algorithm accepts any between-k-and-Z-out-of-n problem description, is independent of the rearrangement of variables or constraints (i.e. the order they are presented), has no inherent system parameters, is not optimized

334 H.N. Schder/Neurocomputing 8 (1995) 315-339

for a special problem, can be scaled to large problems and uses only vectorizable operations.

Only one of the soft goals has been achieved. The dynamic barrier algorithm is faster than the depth first search (G7) for large-scale N-Queens problems. But it didn’t find solutions with equal probability (G8) and is generally not able to decide between solvable and unsolvable problems (G9). This is an area for further research.

4. The realization as a neurocomputer

The main idea to realize the constraint programming model by a neurocomputer running one of the constraint satisfaction procedures developed in section 3 is to use a single neuron per variable. If the neuron is active, i.e. fires, the value 1 is assigned to the variable and if the neuron is inactive, a 0 is assigned. Since neurons are coupled by synapses with variable weight Tap (Fig. 18), they can be used to calculate the gradient of the error function for the assignment represented by the neuron activities. The constraint satisfaction procedure is built into the neuron dynamic of c and Z: So we have a constraint checker and an assignment generator. Both can be designed separately: (1) find the appropriate weight factors for a given set of k-out-of-n constraints, (2) define a neuron dynamic that makes the system state evolve as required by the

desired constraint satisfaction procedure.

4.1 Calculation of the error gradient by neural structures

The first problem has been solved earlier by Page and Tagliarini and is the k-out-of-n design rule [23], [33]. This rule is as follows: (1) if two neurons have constraints in common, set the weight Tap to - 2 times the

number of constraints in common:

Tap = - 2CCi,Cip for CY z p. 1

This results in a mutual inhibitory feedback. Note, that there is no self-feedback (Tap = 0).

(2) The bias input Z, of the neurons (which can be regarded as a synaptic connection to a constant supply of the signal level 1) is increased from 0 by 2k,-1 for each constraint i the neuron belongs to:

(15)

Note, that this feature can also be used for external inputs into the neurocomputer by controlling the k-value of a constraint by a variable input amplitude.

An example of the weight matrix resulting from the constraints in Fig. 2 is

H.N. Schder /Neurocomputing 8 (1995) 315-339

Fig. 18. General structure of a recurrent neural network.

Fig. 19. Weight matrix resulting from constraints in Fig. 2.

33s

shown in Fig. 19. Note that a slack variable si has been introduced, to remove the range constraint.

The weights, i.e. the constraint structure, can be stored in reprogrammable electrical circuits. The storage devices can be floating gate MOS-FETs (EEPROM), like in the experimental analog neuroprocessor 80170 by Intel [l], or a RAM in digital neurocomputers. In an optoelectronical processor, an exchangeable grey scale transparency can do the weighting (Fig. 20) of the incoherent light emitted at z,. The sensors at position X, are used to convert the weighted sums back into electrical signals. The external input signals can be added here.

A different network structure using two layers can be developed. In this case, the program matrix Ci, can be used directly for both weight matrices having the

Fig. 20. Optical weight matrix.


common global line

local value

+

sum of local values

__________________________________________ __________._____________________________.------~

local value sum of local values

Fig. 21. Global calculations.

benefit of binary weights (0 or 1) only [20]. Especially for an optical system, the storage may become much simpler. A single black and white transparency should be sufficient by mirroring back the signals at position x, and combining the light emitters and sensors at z,. Details of this idea are currently being worked out.

4.2. Definition of the dynamic of neurons

The definition of the dynamic of a single neuron results directly from the motion equations of the constraint satisfaction procedure. The only thing to be kept in mind is the efficient generation of the global signals.

Most global signals are the sum, the maximum, or the logical AND or OR of some local values. So a single party line can sum up the current injected locally, convert the sum into a voltage drop at a single global resistor and the voltage level can be picked up locally at each neuron (Fig. 21). A similar technique can be used

H.N. SchaUer/Neurocomputing 8 (1995) 315-339 337

for determining the maximum of a value. For digital signals, wired-OR and wired-AND gates with open-collector outputs are appropriate.

The core part of the neuron can be built up from amplifiers, comparators, flipflops, logic gates, and timers. For example, the Dynamic Barrier Neural Network (DBNN) results from the dynamic barrier neuron circuit shown in Fig. 22. The signals G and S are generated by open-collector circuits and indicate global solutions (E = 0) and stable states respectively.

The Rolling-Stone Neural Network (RSNN) is described in [281.

5. Summary

This contribution has defined a declarative programming model for neurocomputers for solving large-scale constraint satisfaction problems. It has extended and systematized the development of constraint programs with k-out-of-n constraints. It has been presented how a compiler can be constructed. And finally two really large-scale problems, the factorization of numbers and a jigsaw puzzle, have been programmed.

In a second part, the implementation of the programming model with between- k-and-l-out-of-n constraints has been discussed. The proposals are based on the analysis of constraint satisfaction procedures and the requirements for a massively parallel realization without communication overhead. Based on gradient descent methods, the Hopfield approach has been analysed and several of its pitfalls have been solved, including the convergence speed and the handling of local minima.

Two new constraint satisfaction procedures have been presented, the Rolling- Stone method and the dynamic barrier algorithm. The dynamic barrier algorithm has been tested extensively with the benchmark of the N-Queens problem and has been shown to outperform an optimized depth first search algorithm, even without any special features making it specially appropriate for the N-Queens problem. Although the method is not yet able to solve the factorization problem, it is worth doing research for further general improvements, because the general neurocomputer design method developed here has a wide area of application.

In the third and last part, some proposals for the realization of these constraint satisfaction procedures by electrical or optical components have been presented. This results in the neural networks DBNN and RSNN.

Acknowledgements

The author would like to thank Prof. Dr. J. Swoboda for supervising his Dr.-Ing. thesis [29] of which the present contribution is an excerpt and the anonymous reviewers of the draft of this paper for their valuable comments.


References

[l] 8017ONX, Electrically Trainable Analog Neural Network, Experimental Data Sheet, Intel Corpo- ration, 1991.

[2] H.-M. Adorf, Connectionism and neural networks, in: F. Murtagh and A. Heck, eds., Knowledge Based Systems in Astronomy (Springer, Heidelberg, 1989) 215-245.

[3] M. Angermayer, Entwickhmg eines heuristischen Minimierungsverfahrens fiir eine Klasse neu- ronaler Strukturen, diploma thesis, Lehrstuhl fur Datenverarbeitung, Technical University of Munich, 1992.

[4] M. Arai, T. Nakagawa and H. Kitagawa, An approach to automatic test pattern generation using strictly digital neural networks, 130~. IJCNN’92, Baltimore (1992) IV-474-479.

[5] M. Bazaraa, S. Mokthar and C.M. Shetty, Nonlinear Programming - Theory and Algorithms (John Wiley & Sons, New York, 1979).

[6] G.A. Blaauw, Computer architecture, Elektronische Rechenanlagen (4) (1972) 154-159. [7] J. Chao, W. Ratanasuwan and S. Tsijii, A new global optimization method: “Rolling-Stone

Scheme” and its application to supervised learning of multi-layer perceptrons, in: I. Aleksander and J. Taylor, eds., Proc. ICANN’92: Artificial Neural Networks II, Brighton, (Elsevier, Amsterdam, 1992) 395-39.

[8] A. Cichocki and R. Unbehauen, Neural Networks for Optimization and Signal Processing (B.G. Teubner/J. Wiley, Stuttgart/Chichester, 1993).

[9] S.B. Eberhardt, T. Daud, D.A. Kerns, T.X. Brown and A.P. Thakoor, Competitive neural architecture for hardware solution to the assignment problem, Neural Networks (4) (1991) 431-442.

[lo] K. Ehrenberger, Automat&he Umsetzung von booleschen Funktionen in eine neuronale Struktur, diploma thesis, Lehrstuhl fiir Datenverarbeitung, Technical University of Munich, 1992.

(111 J.J. Hopfield, Neural networks and physical systems with emergent collective computation abilities, Proc. Nat. Acad. Sci. 79 (1982) 2554-2558.

[12] J.J. Hopfield, Neurons with graded response have collective computational properties like those of two-state neurons, Proc. Nat. Acad. Sci. 81 (1984) 3088-3092.

[13] E.L. Johnson and G.L. Nemhauser, Recent developments and future directions in mathematical programming, IBM Systems J. 31 (1992) 79-93.

[14] P.J.M. van Laarhoten and E.H.L. Aarts, Simulated Annealing: Theory and Applications (Khrwer Academic Publishers, Dordrecht, 1987).

[15] W. Leler, Constraint Programming Languages (Addison-Wesley, Reading, 1988). [16] S. Minton, M.D. Johnston, A.B. Philips and P. Laird, Minimizing conflicts: a heuristic repair

method for constraint satisfaction problems, Artificial Intelligence 58 (1992) 161-205. [17] B.A. Nadel, Representation selection for constraint satisfaction: A case study, IEEE Expert 5 (3)

(1990) 16-25. [18] T. Nakagawa and H. Kitagawa, SDNN: An G(1) parallel processing with strictly digital neural

networks for combinatorial optimization, in: T. Kohonen, K. M&isara, 0. Simula and J. Kangas, eds., Artificial Neural Networks (Elsevier, Amsterdam, 1991) 1181-1184.

[19] T. Nakagawa, H. Kitagawa, E. Page and G. Tagliarini, SDNN3: A simple processor architecture for O(1) parallel processing in combinatorial optimization with strictly digital neural networks, Proc. IJCNNPI, Singapore (1991) 2444-2449.

[20] T. Nakagawa, K. Murakami and H. Kitagawa, Strictly digital neurocomputer based on a paradigm of constraint set programming for solving combinatorial optimization problems, Proc. ICNN’93, San Francisco (1993) 1086-1091.

[21] T. Nakagawa and K. Murakami, Evaluation of virtual slack-neurons for solving optimization problems in circuit design using neural networks based on the between-l-and-k-out-of-n design rule, Froc. WCNN’93, Portland, OR (1993) 122-125.

[22] J. Nievergelt, Parallel solution of a jigsaw puzzle, in: H. Burkhart, ed., Lecture Notes in Computer Science, Vol. 457 (Springer, Berlin, 1990).

[23] E.W. Page and G.A. Tagliarini, Algorithm development for neural networks, Proc. IEEE SPIE, Vol. 880, High Speed Computing (1988) 11-19.


[24] V. PleSer, J. Sauerbrey and H.N. Schaller, Ein Workstation-IAN ah verteiltes System zum parallelen L&en von kombinatorischen Suchproblemen, Informationstechnik 5 (Oldenbourg, Miinchen, 1992) 273-279.

[25] H.N. Schaller, A collection of constraint design rules for neural optimization networks, in: I. Aleksander and J. Taylor, eds., Proc. ICANhr’92: Artificial Neural Networks II, Brighton (Elsevier, Amsterdam, 1992) 1039-1042.

[26] H.N. Schaller, On the problem of systematically designing energy functions for neural expert systems based on combinatorial optimization networks, Proc. Neuro-Nimes’92 (EC& Nanterre Cedex, 1992) 648-653.

[27] H.N. Schaller and K. Ehrenberger, Defining the attractor of a recurrent neural network by boolean expressions, in: S. Gielen and B. Kappen, eds., Proc. ICANN’93, Amsterdam (Springer- Verlag, London, 1993) 712-715.

[28] H.N. Schaller, Problem solving by global optimization, Proc. IJcNN’93, Nagoya, Japan (IEEE, Piscataway, 1993) 1481-1484.

[29] H.N. Schaller, Entwicklung hochgradig paralleler Rechnerarchitekturen zur L&ung diskreter Belegungsprobleme, Dissertation, Technical University of Munich, 1994.

[30] R. Sosic and J. Gu, Fast search algorithms for the N-Queens problem, IEEE Trans. SMC 21 (6) (1991) 1.572-1576.

[31] H.S. Stone and J.M. Stone, Efficient search techniques - An empirical study of the N-Queen problem, IBM .I. Res. Development 31 (4) (1987) 464-474.

1321 G.A. Tagliarini, Undesirable equilibria in systematically designed neural networks, Proc. IEEE Southeastcon Region Three Conf., Columbia SC. (1989) 63-67.

[33] G.A. Tagliarini, J.F. Christ and E.W. Page, Optimization using neural networks, IEEE Trans. Comput. 40 (12) (1991) 1347-1358.

[34] G.A. Tagliarini and E.W. Page, Learning in systematically designed networks, Proc. IEEE 1st. IJCNN, Washington (1989) I-497-502.

[35] Y. Takefuji, Neural Network Parallel Computing (Khtwer Academic Publishers, Boston, 1992). [36] A. T&n and A. Zilinskas, Global optimization, in: Lecture Notes in Computer Science, Vol. 350

(Springer-Verlag, Berlin, 1987). [37] T.P. Troudet and S.M. Walters, Neural network architecture for crossbar switch control, IEEE

Trans. CAS 38 (1) (1991) 42-56.

II. Nikolaus Schaller, born in 1963, is ‘wissenschaftlicher As&tent’ (scientific assistent) at the Chair of Data Processing of the Technical University of Munich. He received the Dipl.-Ing. diploma in 1987 and the Dr.-Ing. degree in 1994, both by the Faculty of Electrical Engineering and Information Technol- ogy of the Technical University of Munich. He has conducted research in neural networks, computer architecture and data network management and taught students in these topics since 1988. He is member of IEEE, GI, and VDE/ITG.

Documents

Design of neurocomputer architectures for large-scale constraint satisfaction problems