Upload
vutruc
View
218
Download
0
Embed Size (px)
Citation preview
A L G E B R A I C D E R I V A T I O N O F N E U R A L N E T W O R K S
A N D ITS A P P L I C A T I O N S I N I M A G E P R O C E S S I N G
by
PINGNAN SHI
B. A. Sc. (Electrical Engineering), Chongqing University, 1982 A. Sc. (Electrical Engineering), The University of British Columbia, 1987
A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
in
THE FACULTY OF GRADUATE STUDIES
THE DEPARTMENT OF ELECTRICAL ENGINEERING
We accept this thesis as conforming to the required standard
THE UNIVERSITY OF BRITISH COLUMBIA April 1991
© Pingnan Shi, 1991
In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission.
Department
The University of British Columbia Vancouver, Canada
DE-6 (2/88)
Abstract
Artificial neural networks are systems composed of interconnected simple computing units
known as artificial neurons which simulate some properties of their biological counter
parts. They have been developed and studied for understanding how brains function,
and for computational purposes.
In order to use a neural network for computation, the network has to be designed in
such a way that it performs a useful function. Currently, the most popular method of de
signing a network to perform a function is to adjust the parameters of a specified network
until the network approximates the input-output behaviour of the function. Although
some analytical knowledge about the function is sometimes available or obtainable, it is
usually not used. Some neural network paradigms exist where such knowledge is utilized;
however, there is no systematical method to do so. The objective of this research is to
develop such a method.
A systematic method of neural network design, which we call algebraic derivation
methodology, is proposed and developed in this thesis. It is developed with an emphasis
on designing neural networks to implement image processing algorithms. A key feature
of this methodology is that neurons and neural networks are represented symbolically
such that a network can be algebraically derived from a given function and the resulting
network can be simplified. By simplification we mean finding an equivalent network (i.e.,
performing the same function) with fewer layers and fewer neurons. A type of neural
networks, which we call LQT networks, are chosen for implementing image processing al
gorithms. Theorems for simplifying such networks are developed. Procedures for deriving
such networks to realize both single-input and multiple-input functions are given.
To show the merits of the algebraic derivation methodology, LQT networks for im
plementing some well-known algorithms in image processing and some other areas are
developed by using the above mentioned theorems and procedures. Most of these net
works are the first known such neural network models; in the case there are other known
network models, our networks have the same or better performance in terms of compu
tation time.
i n
Table of Contents
Abstract ii
List of Tables vii
List of Figures viii
Acknowledgement x
A Glossary of Symbols xi
1 I N T R O D U C T I O N 1
2 B A C K G R O U N D 6
2.1 Artificial Neural Networks 6
2.1.1 Neuron Models 8
2.1.2 Network Models 12
2.2 Image Processing 17
2.2.1 Image Enhancement 17
2.2.2 Image Restoration 26
3 D E F I N I T I O N S A N D N O T A T I O N S 32
4 R E A S O N S F O R A L G E B R A I C D E R I V A T I O N 40
4.1 Drawbacks of the Learning Approach 41
' 4.2 The Advantages of The Analytical Approach 46
iv
4.2.1 Designing the Hamming Network 47
4.2.2 Designing the Parity Network 48
4.3 The Algebraic Derivation Methodology 50
5 S Y M B O L I C R E P R E S E N T A T I O N O F N E U R O N S A N D T H E I R N E T
W O R K S 52
5.1 Neuron Models 52
5.2 LQT Networks 55
6 C O M P U T A T I O N A L P R O P E R T I E S O F L Q T N E T W O R K S 59
6.1 Network Equivalence 59
6.2 Network Quality 64
6.2.1 Network Depth 64
6.2.2 Network Size 65
6.3 Criterion for Network Simplification 66
7 D E R I V A T I O N P R O C E D U R E S 67
7.1 Realization of SISO Functions 68
7.2 Realization of MISO Functions 80
8 N E T W O R K R E A L I Z A T I O N S O F I E T E C H N I Q U E S 88
8.1 Network Realizations of Linear Techniques 88
8.2 Network Realizations of Non-linear Filters 93
8.2.1 Dynamic Range Modification 93
8.2.2 Order Statistic Filtering . 95
8.2.3 Directional Filtering 106
9 N E T W O R K R E A L I Z A T I O N S O F IR T E C H N I Q U E S 110
v
9.1 Network Realizations of Linear Filters 110
9.2 Network Realizations of Non-linear Filters 112
9.3 Comparison with the Hopfield Network 116
10 A P P L I C A T I O N S I N O T H E R A R E A S 119
10.1 Sorting 119
10.2 Communication 126
10.2.1 Improvement over Hamming Network 126
10.3 Optimization 130
10.3.1 Solving Simultaneous Equations 130
10.3.2 Matrix Inversion 132
11 C O N C L U S I O N S 134
11.1 Summary 134
11.2 Contributions 135
11.3 Future Work 137
Appendices 140
A T H E G E N E R A L I Z E D D E L T A R U L E 140
B H O P F I E L D N E T W O R K A L G O R I T H M 142
Bibliography 143
v i
List of Tables
4.1 The Exclusive OR 44
4.2 Network Parameters For Solving XOR Problem 44
4.3 Network Parameters For Solving Parity-3 Problem 45
4.4 Network Parameters For Solving Parity-4 Problem 45
4.5 Number of Learning Steps 46
vii
List of Figures
2.1 A Biological Neuron 9
2.2 An Artificial Neuron 9
2.3 Typical Activation Functions: (a) Linear; (b) Quasi-linear; (c) Threshold-
logic; and (d) Sigmoid 10
2.4 A Three-layer Back-propagation Network 13
2.5 The Hopfield Network 14
2.6 The Hamming Network 15
2.7 Directional Smoothing Filter 21
2.8 Color Image Enhancement 26
3.1 The Schematical Representation of a Neuron 33
3.2 The Schematical Representation of the Input Neuron 35
3.3 Three Ways of Setting Up an Input Neuron 35
3.4 The Back-propagation Network 36
3.5 The Hopfield Network 36
3.6 The Hamming Network 37
4.1 Networks for Solving : (a) XOR problem; (b) Parity-3 problem; and (c)
Parity-4 problem 43
4.2 Realizing Function (4.4) 49
4.3 Realizing Function (4.5) 49
4.4 The Network for Solving the Parity Problem 50
viii
5.1 The Contrast Stretching Function 53
5.2 Typical LQT Network Topologies 55
5.3 Representing Part of a Network 56
5.4 A LQT network 58
8.1 Network Architecture for Linear Filtering 89
8.2 Network Realization of the Order Statistic Function for n = 3 99
8.3 The Schematical Representation of OSnet 99
8.4 The Schematical Representation of Adaptive OSnet 100
8.5 A Max/Median Filter Realized by Using OSNets 101
8.6 A Max/median Filter Realized by Using OSNets 102
8.7 A Network Model of the CS Filter 106
8.8 A Network Realization of the DA Filter for the Case iV = 3 109
9.1 A Network Realization of the Maximum Entropy Filter 114
10.1 A Neural Network for Sorting an Input Array of Three Elements 125 10.2 The Network Model for Implementing the HC Algorithm 129
ix
Acknowledgement
First of all, I would like to acknowledge the indispensable support of my wife, Angela
Hui Peng. She has maintained an environment under which I have been able to devote
all my energy to this thesis. Moreover, she did all the drawings in this thesis, which are
far better than I could have done.
I would like to thank my supervisor, Professor Rabab K. Ward for her guidance and
support. I am grateful to her for introducing me to this challenging and fascinating world
of artificial neural networks, and providing me the opportunity to explore it.
Special thanks are due to Professor Peter Lawrence for his kindness and recommen
dation of a book written by P.M. Lewis, which is invaluable for my research.
Thanks are also due to Professor Geoffery Hoffmann, whose courses on neural net
works and immune systems gave me fundamental understandings of the two related fields,
and inspiration to continue my research on the former one.
Finally, I would like to thank John Ip, Doris Metcalf, Kevin O'Donnell, Robert Ross,
Qiaobing Xie, Yanan Yin, and many others in this and other departments who have
made my study here pleasant and productive.
x
A Glossary of Symbols
Notation
C x1,...,xn | w1,...,w2 3
V 3
u,n 0
< Xx,...,Xn | ltf!,...,«;2 >
AxB
(Z X i , . . . , X n | 10!,...,102 •
{| X U . . . , X n | W1,...,W2 |}
o
I
Explanation
Membership relation "belong to" and its negative "does not
belong to"
Strict containment relations
Linear neuron
For all
There exists one element
Union, intersection
Empty set
Besides the usual "minus" operator, it also denotes set
difference
Quasi-linear neuron
Cartesian product of sets A and B
Threshold-logic neuron
Either one of the three neuron types
/ is a function form set X to y
Composition of two functions
Sets of natural numbers, integers, real numbers, positive real
numbers including zero, and binary numbers.
Cardinality of the set X, length of the sequence S, absolute
value of the relative number x.
End of proof
xi
I End of solution
Implication operators
A , V , 0 Logic AND, OR, XOR
x, x Vector
||x|| L2 norm of vector x X Matrix
1,1 Vector of one's
0,0 Vector of zero's
0,1 Zero or unit matrices
{} Denote a set; various forms are {x^, {x\P(x)}, and
{ x i , x n j , etc.
f, g Original and distorted one-dimensional images
U, V Original and distorted two-dimensional images
W Window for processing images
xn
Chapter 1
I N T R O D U C T I O N
In recent years artificial neural networks have been successfully applied to solve prob
lems in many areas, such as pattern recognition/classification [60, 70], signal processing
[24, 62], control [71, 9], and optimization [35, 37]. The essence "of neural network problem
solving is that the network performs a function which is the solution to the problem.
Currently, the most popular method of designing a network to perform a function is
through a learning process, in which parameters of the network are adjusted until the
network approximates the input-output behaviour of the function. Although some ana
lytical knowledge about the function is sometimes available or obtainable, it is usually
not used in the learning process. There are some neural network paradigms where such
knowledge is utilized; however, there is no systematic method to do so. It is the objective
of this thesis to develop such a method.
Many tasks, such as those in image processing, require high computational speed.
Examples are real-time TV signal processing and target recognition. Conventional von-
Neumann computing machines are not adequate to meet the ever increasing demand for
high computational speed [66]. Among non-conventional computing machines, artificial
neural networks provide an alternative means to satisfy this demand [19].
The study of artificial neural networks has been inspired by the capability of the
nervous system and motivated by the intellectual curiosity of human beings to understand
themselves.
The nervous system contains billions of neurons which are organized in networks.
1
Chapter 1. INTRODUCTION 2
A neuron receives stimuli from its neighboring neurons and sends out signals if the cu
mulative effect of the stimuli is strong enough. Although a great deal about individual
neurons is known, little is known about the networks they form. In order to under
stand the nervous system, various network models, known as artificial neural networks,
have been proposed and studied [78, 52, 33, 12, 21]. These networks are composed of
simple computing units, referred to as artificial neurons since they approximate their
biological counterparts. Although these networks are extremely simple compared with
the complexity of the nervous system, some exhibit similar properties.
In addition to understanding the nervous system, artificial neural networks (referred
to as neural networks hereafter) have been suggested for computational purposes. Mc-
Culloch and Pitts in 1943 [63] showed that the McCulloch-Pitts neurons could be used
to realize arbitrary logic functions. The Perceptron, proposed by Rosenblatt in 1957
[79], was designed with the intention for computational purposes. Widrow proposed in
1960 [95] a network model called Madaline, which has been successfully used in adaptive
signal processing and pattern recognition [96]. In 1984, Hopfield demonstrated that his
network could be used to solve optimization problems [35, 36]. In 1986, Rumelhart and
his colleagues rediscovered the generalized delta rule [80], which enables a type of net
works known as the back-propagation networks, or multi-layer perceptrons, to implement
any arbitrary function up to any specified accuracy [30]. These contributions have laid
down the foundation for applying neural networks to solve a wide range of computational
problems, especially those digital computers take a long time to solve.
From the computational point of view, a neural network is a system composed of
interconnected simple computing units, each of which is identical to the others to some
extent. Here follows an important question: How can these simple computing units be
used collectively to perform complex computations which they can not perform individ
ually?
Chapter 1. INTRODUCTION 3
This problem can be attacked from two directions: (1) assuming a network model
(usually inspired by certain parts of the nervous system), adjusting its parameters until
it performs a given function; (2) assuming a set of simple computing units, constructing
a network in such a way that it performs a given function. The application of the back-
propagation networks is an example of the former approach; the designing of the Hopfield
network is an example of the latter approach. Due to reasons given in Chapter Four, we
have chosen the second approach.
Most existing network models were designed for the purpose of understanding how
the brain works. As a result, these models are restricted by the characteristics of the
nervous system. We, however, are mainly concerned with applying some computing
principles believed to be employed in the nervous system, such as collective computing,
massive parallelism and distributed representation, to solve computational problems.
Consequently, the network models we are going to develop shall be restricted only by
today's hardware technology.
Therefore, a more precise definition of the task we are undertaking is due here. The
problem we are concerned with is, assuming a finite supply of simple computing units,
how do we organize them in a computationally efficient way such that a network so
formed can perform a given function. By simple, we mean that these units can be mass
produced with today's hardware technology. By computationally efficient, we mean that
the network so formed can perform the function in a reasonably short period of time. The
latter restriction is necessary not only because computational speed is our main concern
but also because there exist an infinite number of networks which can perform the same
function.
Although a lot can be learnt from the works of researchers such as Kohonen, Hop-
field, and Lippmann on the constructing processes of their network models, a systemat
ical method of network design has not yet been developed. They relied mainly on their
Chapter 1. INTRODUCTION 4
imagination, intuition, as well as inspiration from their knowledge of the nervous sys
tem. However, to effectively use neural networks as computing machines, a systematical
method of network design is very much desired. It is our objective to develop such a
method.
In this thesis, we propose a methodology of network design, which consists of the
following five stages:
1. Find the minimum set of neuron models for a given class of functions;
2. Devise symbolic representations for these neurons and their networks;
3. Establish theorems for manipulating these symbols based on the computational
properties of the neurons and their networks;
4. Establish procedures for deriving neural networks from functions;
5. Use these procedures and theorems to derive and simplify network models for spec
ified functions.
We call this methodology algebraic derivation of neural networks.
This methodology is developed with an emphasis on deriving neural networks to
realize image processing techniques. Procedures and theorems for network derivation
are developed. They are used to construct network models for realizing some image
processing techniques and techniques in other areas. The results we have obtained show
that our methodology is effective.
This thesis consists of eleven chapters. Chapter Two provides some background in
formation on neural networks and image processing. The materials in the rest of the
thesis are believed to be new and are organized as follows. Chapter Three gives formal
definitions of neurons, neural networks, and related concepts. Chapter Four elaborates
Chapter 1. INTRODUCTION 5
on the reasons for our approach; we shall show the limitations of learning as used in
back-propagation networks and the limitations of conventional network design method
as exemplified by the Hamming network. Chapter Five finds a set of neuron models
which are used to form networks for realizing image processing techniques, and provides
symbolic representations for these neurons and their networks. Chapter Six develops
theorems for network simplification based on some basic computational properties of the
neurons and their networks. Chapter Seven gives procedures of algebraic derivation and
explains them through some examples, which in turn are useful in network derivations
in later chapters. Chapters Eight and Nine show examples of deriving neural networks
to realize image enhancement and restoration techniques respectively. Chapter Ten con
tains examples of deriving neural networks for realizing some techniques in other areas.
Chapter Eleven concludes our work and speculates on the future work.
Chapter 2
B A C K G R O U N D
This chapter gives necessary background information on neural networks and image pro
cessing. Section 2.1 gives an overview of neural networks with an emphasis on their appli
cations for computing. Concepts of neurons and neural networks are introduced. Some
neuron and neural network models are described. Among the network models proposed
in literature, the back-propagation network, the Hopfield network, and the Hamming
network are chosen to be described in more detail since they are good representatives of
their classes and also they will be referred to further in later chapters. Section 2.2 gives
an overview of problems in image enhancement and restoration and techniques used to
solve them.
2.1 Artificial Neural Networks
People have long been curious about how the brain works. The capabilities of the nervous
system in performing certain tasks such as pattern recognition are far more powerful than
today's most advanced digital computers. In addition to satisfying intellectual curiosity,
it is hoped that by understanding how the brain works we may be able to create machines
as powerful as, if not more powerful than, the brain.
The nervous system contains billions of neurons which are organized in networks.
A neuron receives stimuli from its neighboring neurons and sends out signals if the cu
mulative effect of the stimuli is strong enough. Although a great deal about individual
neurons is known, little is known about the networks they form. In order to understand
6
Chapter 2. BACKGROUND 7
the nervous system, various network models, known as artificial neural networks, have
been proposed and studied. These networks are composed of simple computing units,
referred to as artificial neurons since they approximate their biological counterparts.
Work on neural networks has a long history. Development of detailed mathematical
models began more than 40 years ago with the works of McCulloch and Pitts [63], Hebb
[29], Rosenblatt [78], Widrow [95] and others [74]. More recent works by Hopfield [33, 34,
35], Rumelhart and McClelland [80], Sejnowski [82], Feldman [18], Grossberg [27], and
others have led to a new resurgence of the field in the 80s. This new interest is due to the
development of new network topologies and algorithms [33, 34, 35, 80, 18], new analog
VLSI implementation techniques [64], and some interesting demonstrations [82, 35], as
well as by a growing fascination with the functioning of the human brain. Recent interest
is also driven by the realization that human-like performances in areas such as pattern
recognition will require enormous amounts of computations. Neural networks provide
an alternative means for obtaining the required computing capacity along with other
non-conventional parallel computing machines.
In addition to understanding the nervous system, neural networks have been suggested
for computational purposes ever since the beginning of their study. McCulloch and
Pitts in 1943 [63] showed that the McCulloch-Pitts neurons could be used to realize
arbitrary logic functions. The Perceptron, proposed by Rosenblatt in 1957 [79], was
designed with the intention for computational purposes. Widrow proposed in 1960 [95]
a network model called Madaline, which has been successfully used in adaptive signal
processing and pattern recognition [96]. In 1984, Hopfield demonstrated that his network
could be used to solve optimization problems [35, 36]. In 1986, Rumelhart and his
colleagues rediscovered the generalized delta rule [80], which enables a type of networks
known as the back-propagation networks to implement any arbitrary function up to any
specified accuracy. In European, researchers such as Aleksander, Caianiello, and Kohonen
Chapter 2. BACKGROUND 8
have made conscious applications of computational principles believed to be employed
in the nervous system [3]. Some commercial successes have been achieved [4]. These
contributions have laid down the foundation for applying neural networks to solve a wide
range of computational problems, especially those digital computers takes a long time to
solve.
2.1.1 Neuron Models
The human nervous system consists of IO 1 0 - 1 0 u neurons, each of which has 103 -
104 connections with its neighboring neurons. A neuron cell shares many characteristics
with the other cells in the human body, but has unique capabilities to receive, process,
and transmit electro-chemical signals over the neural pathways that comprise the brain's
communication system.
Figure 2.1 shows the structure of a typical biological neuron. Dendrites extend from
the cell body to other neurons where they receive signals at a connection point called
a synapse. On the receiving side of the synapse, these inputs are conducted to the
body. There they are summed, some inputs tending to excite the cell, others tending to
inhibit it. When the cumulative excitation in the cell body exceeds a threshold, the cell
"fires"—sending a signal down the axon to other neurons. This basic functional outline
has many complexities and exceptions; nevertheless, most artificial neurons model the
above mentioned simple characteristics.
A n artificial neuron is designed to mimic some characteristics of its biological counter
part. Figure 2.2 shows a typical artificial neuron model. Despite the diversity of network
paradigms, nearly all neuron models — except for few models such as the sigma-pi neuron
(see [15]) — are based upon this configuration.
In Figure 2.2, a set of inputs labeled (xi,X2, ...,XN) is applied to the neuron. These
inputs correspond to the signals into the synapses of a biological neuron. Each signal
Chapter 2. BACKGROUND 9
From other neurons
Dendrites
To other neurons
Figure 2.1: A Biological Neuron
XN
(Ol
(Oi
f(.) Activation Fuction
Figure 2.2: A n Art i f i c ia l Neuron
Xi is mult ipl ied by an associated weight w, 6 i w 2 , W N } , before it is applied to
the summation block, labeled Each weight corresponds to the "strength" of a single
biological synapse. The summation block, corresponding roughly to the biological cell
body, adds up al l the weighted inputs, producing an output e = Yl? w%Xi. e is then
compared wi th a threshold t. The difference a = e — t is usually further processed by
an activation function to produce the neuron's output. The activation function may be
a simple linear function, a threshold-logic function, or a function which more accurately
simulates the nonlinear transfer characteristic of the biological neuron and permits more
Chapter 2. BACKGROUND 10
general network functions.
Most neuron models vary only in the forms of their activation functions. Some ex
amples are the linear, the quasi-linear, the threshold-logic, and the sigmoid activation
functions. The linear activation function is
fL(ct) = a
The quasi-linear activation function is
a if a > 0
0 otherwise
The threshold-logic activation function is
1 if a > 0
0 otherwise M<x) =
A n d the sigmoid activation function is
Figure 2.3 shows these four activation functions.
f(cx) A
l
• a
f(a) •
1
a - • a
(b) (c)
f(a) •
(d)
(2.1)
(2.2)
(2.3)
(2.4)
a
Figure 2.3: T y p i c a l Act ivat ion Functions: (a) Linear; (b) Quasi-linear; (c) Threshold-logic; and (d) Sigmoid.
The threshold-logic and the sigmoid neurons are the most commonly used neuron
models among all the models proposed i n the literature. Linear neuron model was used
Chapter 2. BACKGROUND 11
in the early days of neural network study. Quasi-linear neuron model is rarely seen in
the literature.
The threshold-logic neuron model was used by McCulloch and Pitts [63] to describe
neurons which were later known as McCulloch-Pitts neurons. Rosenblatt later used this
model to construct the well-known Perceptron [79]. Lewis and Coates also used this
model, which they called T-gate, to form networks to implement logic functions [57].
The Hopfield network proposed by Hopfield [33] also uses the threshold-logic neurons as
the primitive processing units.
The sigmoid neuron model was used out of necessity by Rumelhart and his colleagues
to construct what is later known as the back-propagation network [80]. This neuron
model is used for the convenience of back-propagating errors. Variants of this model are
used in networks which are trained by error back-propagation.
The linear neuron model was used by Kohonen and Anderson respectively to construct
networks known as linear associators [53, 5]. Since networks of linear neurons are easy
to analyze and their capacities won't increase as the number of layers increases, they are
not academically challenging. Nevertheless, the practical use of linear neurons can not be
overlooked. They are important in performing certain tasks such as matrix computations.
Moreover, they are the basis of other more complex neuron models.
The quasi-linear neuron model is used in this thesis as a gating device (see Chapter
Seven). This model can be viewed as the neuron model, which is sometimes referred to as
threshold-logic node (see [58]), without saturation. The quasi-linear activation function
was used by Fukushima to construct S-cell which is one of the primitive processing units
of his network model (see [21]).
Chapter 2. BACKGROUND 12
2.1.2 Network Models
Although a single neuron can perform certain pattern detection functions, the power of
neural computing comes from connecting neurons into networks. Larger, more complex
networks generally offer greater computational capabilities. According to their structures,
neural networks are classified to two categories: feedforward and feedback (recurrent)
networks.
Feedforward Networks
In feedforward networks, neurons are arranged in layers. There are connections between
neurons in different layers, but no connection between neurons i n the same layer. The
connection is unidirectional. The output of a feedforward network at time k depends
only on the network input at time (k — d), where d is the processing time of the network.
In other word, a feedforward network is memoryless.
M a n y neural network models belong to this class, examples are Perceptron, back-
propagation network, self-organizing feature map [52], counter-propagation network [30],
Neocognitron, and functional-link network [50]. In the following, the back-propagation
network is described since it is the most popular one i n neural network applications.
The back-propagation network was introduced by Rumelhart and his P D P group
[80]. The neuron model used i n this network is the sigmoid neuron. Figure 2.4 shows a
three-layer back-propagation network.
The back-propagation network is used to implement a mapping from a set of patterns
(input patterns) to another set of patterns (output patterns). The set composed of both
sets of input and output patterns is referred to as the training set. The implementation
is done by obtaining parameters (weights and thresholds) of a network through a error-
back-propagation training procedure known as the generalized delta rule (see Appendix
Chapter 2. BACKGROUND 13
v i y, y„
Xi X; XN
Figure 2.4: A Three-layer Back-propagation Network
A).
The training begins by initializing the parameters with random values. Then an
input pattern is fed to the network. The stimulated output pattern is compared with
the desired pattern. If there is disagreement between these two patterns, the difference
is used to adjust the parameters until the output pattern matches the desired pattern.
This process is repeated for all the input patterns in the training set until a convergence
criterion is satisfied. After the training, the network is ready to work.
The back-propagation network has been applied in many areas including pattern
recognition [82, 72, 17, 80] and image processing [84]. It has been found to perform
well in most cases. Nevertheless, it has some practical problems, one of which is that it
usually takes a long period of time to train, especially if 100% accuracy is required.
Chapter 2. BACKGROUND 14
Feedback Networks
Networks in which there are connections between neurons in the same layer and/or
from neurons in the later layers to that of earlier layers are called feedback or recurrent
networks. Examples are the Hopfield network [33], the Hamming network [58], the ART
networks [12], and the Bidirectional Associative Memories (BAM) [54]. In the following
the Hopfield network and the Hamming network are described in more detail.
Figure 2.5 shows the structure of a discrete Hopfield network. The neurons in such
a network are threshold-logic neurons. The network parameters are obtained through a
simple procedure (see Appendix B).
Figure 2.5: The Hopfield Network
After all the parameters are set up, the network is initialized with an input pattern
(normally a binary pattern). Then the output of the network is feedback to its input, and
the network iterates until it converges. The output after the convergence is the "true"
network output.
The Hopfield network was originally used as an associative memory [33]. The network
"memorizes" some patterns which are used as exemplars. When presented with a pattern,
the network will produce the exemplar which most resembles the input pattern.
The Hopfield network has also been applied to various areas including optimization
Chapter 2. BACKGROUND 15
[35, 36, 89] and image processing [99, 83].
The Hopfield network is a "pure" feedback network in the sense that there is only
one layer of neurons and every neuron can feed its output to anyone else. There are
other networks which are mixtures of the feedforward networks and the "pure" feedback
networks. The Hamming network proposed by Lippmann [58] is an example of such
networks.
Figure 2.6 shows the structure of a Hamming network. The upper sub-network has a
feedback structure, and the lower sub-network has a feedforward structure. The neuron
model used in this network is the threshold-logic node but all the neurons are operating
in the linear range. Hence, they are virtually linear neurons.
upper subnet picks maximum
lower subnet calculate matching scores
Xo X l Xn-2 Xn-1
Figure 2.6: The Hamming Network
The Hamming network is used on problems where inputs are generated by selecting an
Chapter 2. BACKGROUND 16
exemplar and reversing bit values randomly and independently. This is a classic problem
in communication theory, which occurs when binary fixed-length signals are sent through
a memoryless binary symmetric channel. The optimum minimum error classifier in this
case calculates the Hamming distance to the exemplar for each class and selects that
class with the minimum Hamming distance [23]. The Hamming distance is the number
of bits in the input which do not match the corresponding exemplar bits. The Hamming
network implements this algorithm.
For the network shown in Figure 2.6, parameters in the lower subnet are set such
that the matching scores generated by the outputs of the lower subnet are equal to ./V
minus the Hamming distances to the exemplar patterns. These matching scores range
from 0 to the number of elements in the input. They are the highest for those neurons
corresponding to classes with exemplars that best match the input. Parameters in the
upper subnet are fixed. All the thresholds are set to zero. Weights from each neuron to
itself are 1; weights between neurons are set to a small negative value.
After all the parameters have been set, a binary pattern is presented at the bottom
of the Hamming network. It must be presented long enough to allow the outputs of the
lower subnet to settle, and initialize the output values of the upper subnet. The lower
subnet is then removed and the upper subnet iterates until the output of only one neuron
is positive. Classification is then complete and the selected class is that corresponding
to the neuron with a positive output.
The Hamming network has been used as associative memory, as which the Hamming
network has some advantages over the Hopfield network [59]..
Chapter 2. BACKGROUND 1 7
2.2 Image Processing
Images consist a large percentage of information people daily receive. Since images are
often degraded due to the imperfection in the recording process, it is necessary to process
these images to reveal the true information they contain.
Image processing includes several classes of problems. Some basic classes are image
representation and modeling, image enhancement, image restoration, image reconstruc
tion, and image data compression. In this thesis, only image enhancement and image
restoration are considered. Hence, the term image processing hereafter is used to refer
to both areas of image enhancement and image restoration unless specified otherwise.
Image processing techniques may be divided into two main categories: transform-
domain methods, and spatial-domain methods. Approaches based on the first category
basically consist of computing a two-dimensional transform (e.g. Fourier or Hadamard
transform) of the image to be processed, altering the transform, and computing the
inverse to yield an image that has been processed in some manner. Spatial-domain tech
niques consist of procedures that operate directly on the pixels of the image in question.
In this thesis, both categories are introduced with an emphasis on the latter one.
2.2.1 Image Enhancement
Image enhancement techniques are designed to improve image quality for human viewing.
This formulation tacitly implies that an intelligent human viewer is available to recognize
and extract useful information from an image. This viewpoint also defines the human as
a link in the image processing system. Image enhancement techniques include contrast
and edge enhancement, pseudocoloring, noise filtering, sharpening, magnifying, and so
forth. Image enhancement is useful in feature extraction, image analysis, and visual
Chapter 2. BACKGROUND 18
information display. The enhancement process itself does not increase the inherent infor
mation content in the data, but rather emphasizes certain specified image characteristics.
Enhancement algorithms are generally interactive and application-dependent.
Before we start the overview of image enhancement techniques, some notations have
to be explained here. U represents the image to be processed and V represents the
enhanced image. The ij t h element of U is denoted as U;J or u(i, j); similarly, or v(i,j)
is the ij t h element of V . A window is denoted as W and the window size is denoted as
i n
The following is an overview of some image enhancement techniques.
Point Operations
Point operations are zero memory operations where a given gray level u 6 [0, L] is mapped
into a grey level v £ [0, L] according to a transformation, that is,
v = f(u) (2.5)
Contrast Stretching Low-contrast images occur often due to poor or nonuniform
light conditions or due to nonlinearity or small dynamic range of the imaging sensor. A
typical contrast stretching transformation is
au, 0 < u < a
/3(u-a) + va, a<u<b (2.6)
7(u — b) + Vb, b < u < L
The slope of the transformation is chosen greater than unity in the region of stretch.
The parameters a and b can be obtained by examing the histogram of the image. For
example, the gray scale intervals where pixels occur most frequently would be stretched
most to improve the overall visibility of a scene.
v = <
Chapter 2. BACKGROUND 19
Cl ipping and Thresholding A special case of contrast stretching where a = 7 = 0
is called clipping. This is useful for noise reduction when the input signal is known to lie
in the range [a, b]. Thresholding is a special case of clipping where a = b — t and the output becomes
binary. For example, a seemingly binary image, such as a printed page, does not give
binary output when scanned because of sensor noise and background illumination varia
tions. Thresholding is used to make such an image binary.
Intensity Level Slicing This technique permits segmentation of certain gray level
regions from the rest of the image. It is useful when different features of an image are
contained in different gray levels.
If the background is not wanted, the technique is
v
Otherwise, it is
v
Histogram Model ing The histogram of an image represents the relative frequency
of occurrence of the various gray levels in the image. Histogram-modeling techniques
modify an image so that its histogram has a desired shape. This is useful in stretching
the low-contrast levels of images with narrow histograms. Histogram modeling has been
found to be a powerful technique for image enhancement [41, 20].
Spatial Operations
Many image enhancement techniques are based on spatial operations performed on lo
cal neighborhoods of input pixels. Often, the image is convolved with a finite impulse
response filter called spatial mask.
L, a < u < b = < (2.7)
0, otherwise
L, a < u <b ~ (2.8)
u, otherwise
Chapter 2. BACKGROUND 20
Noise Smoothing It is desirable to remove the noise out of a noise degraded
picture. An typical smoothing technique is the so-called spatial averaging. Here each
pixel is replaced by a weighted average of its neighborhood pixels, that is
v(m, n) = y ^ y ^ a(k,l)u(m — k,n — /) (2.9)
W is a suitably chosen window, and a(k,l) are the filter weights. A common class of
spatial averaging filters has all equal weights, giving
v(m,n) = -^YS2 u(m-k,n-l) (2.10) w {k,i)ew
where N\y is the number of pixels in the window W, i.e., Nw = \W\. Another spatial
averaging filter used often is given by
v(m,n) = -(u(m,n) + - (u(m — 1,n) + u{m + l ,n) + u(m,n — l) + u(m,n + 1))) (2.11)
that is, each pixel is replaced by its average with the average of its nearest four pixels.
Although the spatial averaging can smooth a picture, it also blur the edges. To protect
the edges from blurring while smoothing, a directional averaging filter can be useful [26].
Spatial average v(m, n : 6) are calculated in several directions (see Figure 2.7) as
v(m, n : 6) = - J r u(m-k,n-l) (2.12)
and a direction 9* is found such that \u(m,n) — u(m,n : #*)| is minimum. Then
v(m,n) = v(m,n : 0*) (2.13)
gives the desired result.
Median Filtering Median filtering is a non-linear process useful in reducing impulse
or salt-and-pepper noise [90]. Here the input pixel is replaced by the median of the pixels
contained in a window around the pixel, that is
u(m, n) = median{ii(m — k, n — I) : (k, I) € W} (2-14)
Chapter 2. BACKGROUND 21
We
e
* - 1
k
Figure 2.7: Directional Smoothing Filter
where W is a suitably chosen window. The algorithm for median filtering requires ar
ranging the pixel values in the window in increasing or decreasing order and picking the
middle value. Generally the window size is chosen so that \W\ is odd. If \W\ is even,
then the median is taken as the average of the two values in the middle.
Unsharp Masking The unsharp masking technique is used commonly in the print
ing industry for crispening the edges [75]. A signal proportional to the unsharp, or
low-pass filtered, version of the image is subtracted from the image. This is equivalent to
adding the gradient, or a high-pass signal, to the image. In general the unsharp masking
operation can be represented by
where A > 0 and g(m,n) is a suitably defined gradient at (m, n). A commonly used
gradient function is the discrete Laplacian.
g(m, n) = u(m, n) — ^(u(m — 1, n) + u(m, n — 1) + u(m + l,n) + ii(m, n — 1)) (2.16)
Low-pass, Band-pass, and High-pass Filtering Low-pass filters are useful for
noise smoothing and interpolation. High-pass filters are useful in extracting edges and in
u(m,n) + \g(m, n) (2.15)
Chapter 2. BACKGROUND 22
sharpening images. Band-pass filters are useful in the enhancement of edges and other
high-pass image characteristics in the presence of noise.
If hijp(rn,n) denotes a FIR low-pass filter, then a FIR high-pass filter, hjjp(rn,n),
can be defined as
hffp(m,n) = 8(m,n) — hLp(m,n) (2-17)
where <S(m,n) is a two-dimensional delta function defined as
1 if m = n — 0 tS(m, n) (2.18)
0 otherwise
Such a filter can be implemented by simply subtracting the low-pass filter output from its
input. Typically, the low-pass filter would perform a relatively long-term spatial average
(for example, on a 5 x 5, 7 x 7, or larger window).
A spatial band-pass filter can be characterized as
hBp(m, n) = hLl (m, n) - hL2 (m, n) (2.19)
where and denote the FIRs of low-pass filters. Typically, and would
represent short-term and long-term averages, respectively.
Zooming Often it is desired to zoom on a given region of an image. This requires
taking an image and displaying it as a larger one. Typical techniques of zooming is
replication and linear interpolation [45].
Replication is a zero-order hold where each pixel along a scan line is repeated once
and then each scan line is repeated. This is equivalent to taking a M x N image and
interlacing it by rows and columns of zeros to obtain a 2M x 2N matrix and convolving
the result with an array H, defined as
A I" 1 1 H = (2.20)
1 1
Chapter 2. BACKGROUND 23
(2.22)
This gives
u(m,n) = u(M), k = [ y ] , ' = |y] , m,n = 0,1,2,... (2.21)
Linear interpolation is a first order hold where a straight line is first fitted in between
pixels along a row. Then pixels along each column are interpolated along a straight line.
For example, for a 2 x 2 magnification, linear interpolation along rows gives
ui(m,2n) = u(m,n), 0 < m < M - 1, 0 < n < TV - 1
vi(m,2n + l) = |[u(m,n) + u(m,n + 1)], 0 < m < M - 1, 0 < n < iV - 1
Linear interpolation of the preceding along columns gives the result as
«i(2m,n) = vi(m,n), 0 < m < M - 1, 0 < n < N - I
ui(2m + 1, n) = i[t>!(m,n) + vx(m + l,n)], 0<m<M-l,0<n<N~l
Here it is assumed that the input image is zero outside [0,M — 1] x [0, N — 1]. The
above result can also be obtained by convolving the 2M x 2N zero interlaced image with
the array H
(2.23)
H
i i i 4 2 4
i 1 i 1 1 1
L 4 2 4
(2.24)
whose origin (m = 0, n = 0) is at the center of the array. In most of the image processing
applications, linear interpolation performs quite satisfactorily. High-order (say, p) inter
polation is possible by padding each row and each column of the input image by p rows
and p columns of zeros, respectively, and convolving it p times with H. For example
p — 3 yields a cubic spline interpolation in between the pixels.
Transform Operations
In the transform operation enhancement techniques, zero-memory operations are per
formed on a transformed image followed by the inverse transformation. We start with
Chapter 2. BACKGROUND 24
the transformed image U ' = {u'(k, /)} as
U' = A U A r (2.25)
where U = {u(m,n)} is the input image. Then the inverse transform of
v'(k,l) = f(u'(k,l)) (2.26)
gives the enhanced image as
V = A-1V'[AT]-1 (2.27)
The transform can be Discrete Fourier Transfer (DFT) or other orthogonal transforms.
Generalized Linear Filtering In generalized linear filtering, the zero-memory
transform domain operation is a pixel-by-pixel multiplication
u'(k,l) = g(k,l)u'(k,l) (2.28)
where g(k, I) is called a zonal mask.
A filter of special interest is the inverse Gaussian filter, whose zonal mask for TV x N
images is defined as
g(k,l) = l P l 2S2 J ' - 2 ( 2 2 9 )
g(N — k, N — /), otherwise
for the case when A in (2.25) is DFT. This is a high-frequency emphasis filter that restores
images blurred by atmospheric turbulence or other phenomena that can be modeled by
Gaussian PSFs.
Root Filtering The transform coefficients u'(k, I) can be written as
u'(M) = K ( M ) | e j m ° (2.30)
In root filtering, the a-root of the magnitude component of u'(k, I) is taken, while retain
ing the phase component, to yield
v'(k, I) = \u'(k, l)\ ae j e^' l\ 0 < a < 1 (2.31)
Chapter 2. BACKGROUND 25
For common images, since the magnitude of u'(k, I) is relatively smaller at higher spatial
frequencies, the effect of a-rooting is to enhance higher spatial frequencies relative to
lower spatial frequencies.
Homomorphic Filtering If the magnitude term in (2.31) is replaced by the loga
rithm of |w'(A:,/)| and define
v'(kJ)^[log(\u'(k,l)\)}e^ (2.32)
then the inverse transform of v'(k,l), denoted by u(m,ra), is called the generalized cep-
strum of the image. In practice a positive constant is added to |u'(fc,Z)| to prevent the
logarithm from going to negative infinity. The image t>(ra, n) is also called the general
ized homomorphic transform, H, of the image u(m,n). The generalized homomorphic
linear filter performs zero-memory operations on the transform of the image followed
by inverse .//-transform. The homomorphic transformation reduces the dynamic range
of the image in the transform domain and increases it in the cepstral domain [39].
Pseudocoloring
In addition to the requirements of monochrome image enhancement, color image en
hancement may require improvement of color balance or color contrast in a color image.
Enhancement of color images becomes a more difficult task not only because of the added
dimension of the data but also due to the added complexity of color perception [55].
A practical approach to developing color image enhancement algorithms is shown in
Figure 2.8. The input color coordinates of each pixel are independently transformed into
another set of color coordinates, where the image in each coordinate is enhanced by its
own (monochrome) image enhancement algorithm, which could be chosen suitably from
the foregoing set of algorithms. The enhanced image coordinates T[, T£ are inverse
transformed to R', G', B' for display. Since each image plane Tk(m,n), k G {1,2,3},
Chapter 2. BACKGROUND 26
T i ( monochrome •li „ R' T i ( monochrome •li „ R' image enhancemant
inve
rse
oord
inat
e sf
orm
atio
n
coordinate T 2 monochrome Ti (
inve
rse
oord
inat
e sf
orm
atio
n
G* cd
comversion image enhancemant inve
rse
oord
inat
e sf
orm
atio
n
Cl. cn
° S B ° S B monochrome T3' ( B' (
image enhancemant image enhancemant
Figure 2.8: Color Image Enhancement
is enhanced independently, care has to be taken so that the enhanced coordinates T'k
are with the color gamut of the R-G-B system. The choice of color coordinate system
Tk, k 6 {1,2,3}, in which enhancement algorithms are implemented may be problem-
dependent.
2.2.2 Image Restoration
Any image acquired by optical, electro-optical or electronic means is likely to be degraded
by the sensing environment. The degradation may be in the form of sensor noise, blur
due to camera misfocus, relative object-camera motion, random atmospheric turbulence,
and so on. Image restoration is concerned with filtering the observed image to minimize
the effect of degradations. The effectiveness of image restoration filters depends on the
extent and the accuracy of the knowledge of the degradation process as well as on the
filter design criterion. Image restoration techniques are classified according to the type
of criterion used.
Image restoration differs from image enhancement in that the latter is concerned more
with accentuation or extraction of image features rather than restoration of degradations.
Image restoration problems can be quantified precisely, whereas enhancement criteria are
Chapter 2. BACKGROUND 27
difficult to represent mathematically. Consequently, restoration techniques often depend
only on the class or ensemble properties of a data set, whereas image enhancement
techniques are much more image dependent.
The most common degradation or observation model is
g = H f + n (2.33)
where g is the observed or degraded image, f is the original image and n is the noise
term. The objective of image restoration is to find the best estimate $ of the original
image f based on some criterion.
Unconstrained Least Squares Filters
From Equation (2.33), the noise term in the degradation model is given by
n = g - H f (2.34)
In the absence of any knowledge about n, a meaningful criterion is to seek an f such that
H f approximates g in a least-squares sense, that is,
J(f) = ||g-Hf||2 (2.35)
is minimum, where ||x|| is the L2 norm of vector x.
Inverse Filter Solving Equation (2.35) for f yields
f = ( H 2 ^ ) " 1 ^ (2.36)
If H is a square matrix and assuming that H - 1 exists, Equation (2.36) reduced to
f = H _ 1 g (2.37)
This filter is called inverse filter [26].
Chapter 2. BACKGROUND 28
Constrained Least-Squares Filters
In order that the restoration filters have more effect than simply inversions, a constrained
least-square filter might be developed in which the constraints allow the designer addi
tional control over the restoration process.
Assuming the norm of the noise signal ||n||2 is known or measurable a posteriori from
the image, the restoration problem can be reformulated as minimizing ||Qf||2 subject
to ||g — H f ||2 = ||n||2. By using the method of Langrange multipliers, the restoration
problem becomes finding f such that
J(f) = ||Qf||2 + a(||g-Hf||2-||n||2) (2.38)
is minimum.
The solution to Equation (2.38) is
f = ( H T H + 7 Q T Q ) - 1 H r g (2.39)
where 7 — —. Pseudo-inverse Filter. If it is desired to minimize the norm of f, that is, Q = I,
then an estimate f is given by
f = ( H T H + 7 I ) - 1 H T g (2.40)
In the limit as 7 —> 0, the resulting filter is known as the pseudo-inverse filter [2].
Wiener Filter. Lef Rf and R n be the correlation matrices of f and n, defined
respectively as
Rf = £ { f f r } (2.41)
and
R n = £{nn r } (2.42)
Chapter 2. BACKGROUND 29
where E{.} denotes the expected value operation.
By defining
Q r Q 4 R ^ R Q (2.43)
and substituting this expression to Equation (2.39), we obtain
f = ( H T H + 7 R f 1 R n ) - 1 H T g (2.44)
which is known as the Wiener filter[32].
M a x i m u m Entropy Filter. If the image f is normalized to unit energy, then each
/,• scalar value can be interpreted as a probability. Then the entropy of the image f would
be given by
Entropy = - f T In f (2.45)
where lnf refers to componentwise natural logarithms, that is
lnf = ( l n / 1 , l n / 2 , . . . , l n / N ) T (2.46)
If constrained least-squares approach is applied as before, then the negative of the entropy
could be minimized subject to the constraint that ||g — H f ||2 = ||n||2. Thus the objective
function becomes
J(f) = Flnf - a(||g - H f|| 2 -||n|| 2 ) (2.47)
The solution to this equation is
f = exp{- l - 2aH T (g - Hf)) (2.48)
Baysian Methods
In many imaging situations, for instance, image recording by film, the observation model
is non-linear as
g = s{Hf} + n (2.49)
Chapter 2. BACKGROUND 30
where -s{x} is a componentwise non-linear function of vector x. The Bayes estimation
problem associated with Equation (2.49) is to find an estimate f such that
max p(f|g) (2.50)
where p(f |g) is the density function of f given g. From Bayes' law, we have
p ( f | g ) = K^M) ( 2 5 1 )
p(g)
Maximizing the above equation requires that a priori probability density on the right-
hand side be defined.
M a x i m i m u m A Posteriori Estimate Under the assumption of Gaussian statistics
for f and n, with covariance Rf and R n , the MAP estimate is the solution of minimizing
the following function [6]
lnp(f|g) = - | ( g - a{m})TR?(s - s{Hf})
- i ( f - ? f R f - 1 ( f - f ) + lnp(g) + ^ (2.52)
where K is a constant factor and ? is the mean of f . The solution is
{ = f + R f H F D R ^ g - s{Ui}) (2.53)
where D is a diagonal matrix defined as
D ^ D i a g { ^ | x = 6,} (2.54)
and bi are the elements of the vector b = Hf. Equation (2.53) is a nonlinear matrix
equation for f, and since f appears on both sides, there is a feedback structure as well [42].
Analogies in the estimation of continuous waveforms have been derived for communication
theory problems [91].
Chapter 2. BACKGROUND 31
Maximimum Likelihood Estimate Associated with the M A P estimate is the
maximum likelihood (ML) estimate, which is derived by assuming that p(f|g) = p(g|f);
that is, the vector f is a nonrandom quantity. Accordingly, Function (2.52) is reduced to
lnp(f |g) = - i ( g - 5 { H f } ) T R i 1(g - s{Hf}) + Inp(g) + K (2.55)
The solution of minimizing the above function is
H T D R ; 1 ( g - a { H f } ) = 0 (2.56)
That is
g = s{m] (2.57)
Chapter 3
D E F I N I T I O N S A N D N O T A T I O N S
The field of neural networks has attracted many people from different disciplines. The
diversity of their backgrounds is reflected in the variety of terminologies. Although some
efforts are being made to address the terminology problem, a standard terminology is
still yet to come [16]. For the sake of clarity and some other reasons stated later in this
chapter, we give our definitions of neurons, neural networks, and other related concepts
here.
Definition 3.1 A neuron is a simple computing element with n inputs x\,x2, •••^n (n >
1) and one output y. It is characterized by n + 1 numbers, namely, its threshold t, and
the weights w\, w2, ...,wn, where W{ is associated with X{. A neuron operates on a discrete
time k = 1,2,3,4,..., its output at time k + 1 being determined by its inputs at time k
according to the following rule
The function of a neuron is to map points in a multi-dimensional space Xn to points
in a one-dimensional space y, that is,
n y{k + l) = fa(EwMk)-t) (3.1)
where fa(.) is a monotonic function.
fa-xn->y (3.2)
From the definition, fa is a composite function
fa = fi o f2 (3.3)
32
Chapter 3. DEFINITIONS AND NOTATIONS 33
Xi
x,—
xN
Figure 3.1: The Schematical Representation of a Neuron
where
h-.n^y (3.4)
and
f2:X n-*K (3.5)
Function f2 is a linear function in the sense that
/2(Ai£i + X2x2) = A1/2(x1) + A2/2(x2) VAx, A2 € K (3.6)
A neuron is defined on the triple (X n, y, f) where / is an element of the set J- of
activation functions. Let's denote the linear activation function as L, the threshold-logic
activation function as T, and the quasi-linear activation function as Q. A neuron is
schematically represented as that shown in Figure 3.1, where A £ T.
Definition 3.2 A neural network is a collection of neurons, each with the same time
scale. Neurons are interconnected by splitting the output of any neuron into a number of
lines and connecting some or all of these to the inputs of other neurons. A output may
thus lead to any number of inputs, but an input may come from at most one output.
A neuron is denoted as p. The i t h neuron in a network is denoted as pi. The set of
all the indices which specify the sequence of neurons is denoted as X. For a network of n neurons, T = { 1 , 2 , n } .
Chapter 3. DEFINITIONS AND NOTATIONS 34
The output of a neuron is called the state of the neuron. The state of pi at time
k is denoted as Si(k). In this thesis, we assume that the states of all the neurons in a
network are updated at the same time instance. Such updating is known as synchronous
updating [25]. At k = 0, the network is initialized with appropriate values, and the
network then updates itself until it halts when certain criterion is met. The function of
a neural network with N inputs and M outputs is to map points in a multi-dimensional
space to points in another multi-dimensional space, that is, it performs the following
function
/ : XN -• yM (3.7)
The input of a network is denoted as x or x , and the output of a network is denoted as
V or y .
Neurons in a network are classified into three types: input neurons, output neurons,
and hidden neurons. An input neuron is a neuron whose initial state is an element of
the network input x; an output neuron is a neuron whose state at the time when the
network halts is an element of the network output y; a hidden neuron is a neuron which
does not belong to either one of the first two classes. A neuron can be both input and
output neuron.
The input neuron is somewhat special. For example, in a feedforward network, an
input neuron has only one input line. If the input x € B, then a threshold-logic neuron
can be used as the input neuron; if x € V, then a quasi-linear neuron can be used as
the input neuron; if x € 72., then a linear neuron has to be used as the input neuron.
In all the cases, a linear neuron can always be used as the input neuron. Therefore, for
the sake of simplicity, input neurons are always linear neurons unless specified otherwise.
An input neuron is schematically shown in Figure 3.2, which represents an linear neuron
with t = 0 and w — 1.
Chapter 3. DEFINITIONS AND NOTATIONS 35
0 Figure 3.2: The Schematical Representation of the Input Neuron
0 0 (a) (b) (c) (d)
Figure 3.3: Three Ways of Setting Up an Input Neuron
There are three ways to set up an input neuron: (1) 5(0) = x, s(k > 0) = 0; (2)
s(k > 0) = x; and (3) s(k) = x(k). These three cases are shown in Figure 3.3. To
represent either one of the three settings, an input neuron without the input line is used
(see Figure 3.3.d).
At time k = 0, the input neurons are loaded with the network input, and the rest are
loaded usually with zeros. The computation time Tp of a network is the time between
when the network is initialized and when the solution appears on the output neurons.
Our definitions of neuron and neural network are not arbitrarily chosen. They provide
a unified framework to describe many kinds of neural network models. For instance, an
2 x 2 x 1 back-propagation network can be implemented by our network as that shown
in Figure 3.4, where S denotes the sigmoid activation function. At time k = 0, the
input neurons are loaded with network input x, and the rest of the neurons with zeros.
At k = 2, the state of the output neuron is the true output of the network—hence the
computation time of this network is 2. If the input neurons are set up as in Figure 3.3.b,
then s5(k > 2) = s5(k = 2).
Feedback network can also be implemented by our networks. For instance, a four-
neuron Hopfield network is implemented as shown in Figure 3.5. Note that here every
Chapter 3. DEFINITIONS AND NOTATIONS 36
Figure 3.5: The Hopfield Network
neuron is both the input and output neuron. After the network is initialized, it updates
itself until s{(k = h + 1) = Si(h) ^ Si(h - 1) Vi <E J = {1,2,3,4}. The states of all the
neurons at k > h are the true network output—hence, h is considered the computation
time of this network.
Another example is the implementation of an 2 x 2 x 2 Hamming network as shown
in Figure 3.6. In Lippmann's original definition of Hamming network, there is a control
problem that the lower subnet has to be removed after the upper subnet is initialized
with the output of the lower subnet (see [58]). Since the network itself has no means
to do so, an external mechanism has to be employed, for example, a set of synchronized
switches between the lower subnet and upper subnet. In our networks, such a control
problem is easily solved by setting the input neurons as shown in Figure 3.3.a.
Chapter 3. DEFINITIONS AND NOTATIONS 37
Y i Y 2
Figure 3.6: The Hamming Network
Neurons in our networks are usually numbered in an one-dimensional manner. Never
theless, they can be spatially arranged in a two or greater dimensional manner as shown in
previous examples. In those cases, the neurons are numbered either from top-to-bottom
and left-to-right (see Figure 3.4), or from left-to-right and bottom-to-top (see Figure 3.5
and Figure 3.6).
There are two kinds of connections from pi to pj. One is from pi directly to pj, and
the other is from pi to some other neurons and then to pj. The former is called direct
connection, and the latter indirect connection.
There are also two distinctive direct connections between two neurons, say, pi and pj,
in a network. One is from the output of />,• to the input of pj, whose strength or weight
is denoted as Wji. The other is from the output of pj to the input of pi, whose weight is
denoted as Wij.
The matrix W whose ijth element is is called the weight matrix of the network. A
vector whose ith element is the threshold of pi is called threshold vector and is denoted as
t. A vector whose ith element is the type of activation function of pi is called activation
vector and is denoted as a. A neural network is fully defined by the triple (W,t ,a) ,
Chapter 3. DEFINITIONS AND NOTATIONS 38
which specifies what function the network performs.
Neurons in a network are usually grouped in layers. A layer of neurons is a group of
neurons each of which has no direct connections with others in the same group. Symbol
Ch denotes the set of all neurons belong to layer h, and C denotes the set of all the
possible Ch- From the definitions, we know that the following is true
E\Ch\ = \I\ (3.8)
h
The communication time tc(i,j) from pi to pj is defined as the shortest period of
time it takes for data in pi to reach pj. For example, in the Hamming network shown in
Figure 3.6, tc(2, 3) = 1 and ic(2,5) = 2. If there is no connection between two neurons,
then tc(i,j) = tc(j,i) = oo. The communication time ti(k,h) from layer k to layer h is
defined as
tt(k, h) = ma,x{tc(i,j) : i € Ck, j € Ch] (3.9)
A matrix C whose ij t h element is defined as 1 if wu ^ 0
3 T (3.10) 0 otherwise
is called connectivity matrix. The architecture Ap of a network is defined by the connec
tivity matrix of the network. The configuration Cp of a network is defined by the triple
(C,t,a). The architecture of a network specifies the structure of the network; there
are two major classes of network architectures, namely, feedforward and feedback. The
configuration of a network specifies what class of functions this network can perform.
The connectivity matrix C of a network contains a great deal of information about
the properties of the network.
Fact 3.1 A neural network has a feedforward architecture iffcij A Cji = 0 Vz, j € X.
Chapter 3. DEFINITIONS AND NOTATIONS 39
Fact 3.2 A neural network has a feedback architecture if3i,j 6 T such that c t J Ac,,- = 1.
Fact 3.3 There is a direct connection between pi and pj iff c,j V Cji = 1
Fact 3.4 There is an indirect connection between pi and pj iff there exists a non-empty
sequence (kt, k2,A;n) such that either cklj A ck2kl A. . . A cfcnfcn_1 A cikn = 1 or ckli A ckikl A
...Acknkn_1Acjkn = l.
The neural networks of our definition are, mathematically speaking, a class of au
tomata networks [25]. Yet, our networks differ from the automata networks of the usual
sense in that the state of our networks is inherently infinite. Moreover, our networks are
different from that defined in the field of automata networks where neural networks are
defined as systems of McCulloch-Pitts (threshold-logic) neurons (see [7, 25]). Note that
such networks are only a subset of our networks. Nevertheless, analytical tools used in
the field of automata networks are useful in the analysis of our networks.
The analysis of our networks is certainly an interesting and challenging subject. How
ever, the main interest of this thesis is on the synthesis part, that is, how to map an
function to a network such that the network can perform the function. In other words,
we are interested in finding procedure(s) P such that
P:A^JV (3.11)
where A is the set of functions and J\f is the set of the triple (W,t, a).
Chapter 4
REASONS FOR ALGEBRAIC DERIVATION
The essence of neural network problem solving is that the network performs a function
which is a solution to the problem. The function can be viewed as a computing rule which
states how to compute a given input to get the correct output. This rule is coded in the
network architecture, the weights, the thresholds, as well as the activation functions.
A neural network is a collection of interconnected simple computing units, each of
which can be viewed as a computational instruction. Thus constructing a network to
solve a problem is equivalent to arranging these instructions such that a computing rule
is formed which solves the problem. This is similar to programming in digital computers.
There are basically two ways of "programming" a neural network: (1) analytically de
riving the network architecture and the network parameters based on a given computing
rule; or (2) assuming a network architecture and adjusting network parameters until a
proper computing rule is formed. Adjusting network parameters in such a meaningful
way is referred to as learning or training in the field of neural networks [94].
Let us refer to the first approach as analytical approach, and the second as learning
approach. There are two typical network models which are representatives of the two
approachs respectively. One is the Hamming network [59], which represents the former
approach. The other is the back-propagation network [80], which represents the latter
approach.
The learning approach is necessary when no computing rule is available or at least not
completely available. The role of learning here is to find a computing rule, which solves
40
Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 41
the given problem, based upon partial solution of the problem. When the computing
rule is completely known, the learning approach is still useful in the sense that here the
role of learning is to automate the process of coding the computing rule.
However, there are some practical problems associated with the learning approach,
one of which is that it usually takes a very long period of time to train a network. This
problem is caused by several factors which shall be discussed in the next section.
When the computing rule is known, it is sensible to utilize it in constructing neural
networks. The analytical approach ensures proper network architectures and proper
network parameters.
However, the existing network models designed using the analytical approach are
only capable of solving a narrow range of problems. Moreover, these network models
are more or less hand crafted, heavily based upon the imagination and intuition of those
who developed these models. There is still no systematic method of network design. The
objective of this thesis is to develop such a method.
In the following sections, we shall demonstrate the drawbacks of the learning approach
with the example of training the back-propagation network to solve the parity problem;
we shall also describe the analytical approach in more details with examples of designing
the Hamming network, as well as designing a network model to solve the parity problem.
We then discuss the current situation of the analytical approach, and propose our method.
4.1 Drawbacks of the Learning Approach
The back-propagation network is the most popular network model used in neural network
applications because it can be trained to implement any function to any specified degree
of accuracy [30]. However, there are some practical problems with its learning process.
The first problem is that learning usually takes a long period of time even for a small
Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 42
scale problem. One cause of this problem is due to the back-propagation algorithm itself.
Although many improved versions have been developed [69, 28, 44], the fastest version is
still time consuming. Another cause is that it is difficult to know in advance how many
neurons in each layer are required. These "magic" numbers have to be found, at least
for the time being, through trial-and-error. This means that after days, even weeks of
training, one has to start the training process all over again if the number of neurons in
the network are not enough.
The second problem is that after training a network to implement a certain function,
one rarely gains any insight that would help in training another network to implement a
similar function, e.g., a function which is in the same family as the previous function.
The nature of image enhancement and restoration problems makes the first problem
even more severe. Taking the example of implementing a median filter with the window
size of three (the simplest case), assuming that each pixel takes integer values ranging
from 0 to 255, then there are 2563 = 16,777,216 possible input patterns. In order to
achieve zero error rate, the training set may have to consist of all the input patterns
and their corresponding correct outputs. Such a training set is astronomically enormous,
needless to say for the cases where the window size is bigger. To underscore the com
plexity of learning, suppose we have the means to have neural networks learn one pattern
every 1 ms, then to implement the above mentioned median filter, it would take a net
work about 5 hours to learn all the patterns. However, with the same learning rate, to
implement a median filter with a window size of five, it would take a network about 35
years to learning all the possible patterns.
The first problem is well documented in the literature [92, 43, 81], hence we shall only
illustrate the second problem here through the example of training the back-propagation
network to solve the parity problem. This problem is impossible for the Perceptron
to solve [65], but is solvable by the back-propagation network [80], which is one of the
Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 43
(a) (b) (c)
Figure 4.1: Networks for Solving : (a) XOR problem; (b) Parity-3 problem; and (c) Parity-4 problem.
reasons for the fame of this network model. We shall start with the XOR problem, which
is a special case of the parity problem when the problem size n = 2, and continue with
cases when n — 3 and n — 4.
Example 4.1: The Parity Problem.
The truth table of the Exclusive OR (XOR) is shown in Table 4.1. The network
architecture we have chosen is shown in Figure 4.1. The generalized delta rule (see
Appendix A) is used to train the network. All the patterns in Table 4.1 are included in
the training set. The result after the network has learnt how to solve the XOR problem,
i.e., for every input pattern its output is the same as that in Table 4.1, is shown in
Table 4.2.
Similarly, we carried out the training for n = 3 and n = 4. The network architectures
for solving these problems are also shown in Figure 4.1. The network parameters after
training are shown in Table 4.3 and Table 4.4 respectively. The number which has pj to
Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 44
Table 4.1: The Exclusive OR
input output Xi x2 z 0 0 0 0 1 1 1 0 1 1 1 0
Table 4.2: Network Parameters For Solving XOR Problem
Pi Pi
P3 4.146451 -2.968075 P4 1.379474 -2.464423
h U -1.889660 0.380817
P3 P4 Ps -3.351054 2.622306
h -1.845504
the top and pi to the left is the value of the weight of the connection from pj to />,-. The
number which has to the top is the value of the threshold of neuron pi.
By looking at the three tables of network parameters, it is difficult to generalize what
the parameters of the network will be for cases where the problem size is five or larger.
Consequently, one has to start from scratch to train networks for these cases.
The number of learning steps required for the network to have learnt how to solve the
parity problem is shown on Table 4.5 for n = 2,3,4,5 and rj = 0.10,0.15,0.20,0.25, where
r] is the gain term of the generalized delta rule (see Appendix A). Here, the learning step
is defined as the interval during which all the network parameters are being updated
Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 45
Table 4.3: Network Parameters For Solving Parity-3 Problem
Pi P2 P3
P4 -4.396586 5.399827 -3.631473 P5 -1.965704 2.695202 -0.726498 Pe 1.025570 -0.856012 2.323546
U h U -1.486861 -1.788286 -0.119762
P5 Pe Pi -4.192873 4.991574 -2.708791
t? 0.151018
Table 4.4: Network Parameters For Solving Parity-4 Problem
Pi P2 P3 P4
Ps -4.865602 -6.202526 5.522527 -6.278468 Pe -3.045859 4.738367 -4.957426 4.048179 Pr -6.845311 4.469164 -2.124128 4.499167 Ps -3.836795 -2.465528 2.714290 -2.221879
te -3.129540 3.778817 2.953945 -3.396108
Ps Pe P7 P9 -4.868719 5.105030 -5.718770 6.778777
1.817817
Chapter 4. REASONS FOR ALGEBRAIC DERIVATION
Table 4.5: Number of Learning Steps
46
n 7/ = 0.10 n = .15 n = 0.20 7/ = 0.25 2 9830 6485 4885 4003 3 30652 20581 15490 12527 4 94317 80384 69852 75428 5 170604 120640 89585 67585
once. The learning step is assumed constant, which implies that the updating is done
in parallel. Note that the number of learning steps increases in a manner that it soon
gets very large as the problem size increases. As a matter of fact, the so-called loading
problem (see [48]), which concerns how long it takes a neural network to have learnt every
pattern in the training set with the respect to the set size, turns out to be NP-complete
[47, 48]. Moreover, for the back-propagation network to solve the parity problem, the
training set may have to include every possible pattern, which means that the size of
the training set is two to the power of the problem size, i.e., the size of the training
set increases exponentially as the problem size increases. Therefore, training a network
for the cases of larger problem size not only means starting all over again, but may be
infeasible when the problem size is fairly large.
Finally, the network architectures shown in Figure 4.1 were chosen because we knew
they work from experiments done by other researchers [80]. Otherwise, extra amount of
work would have had to be done on finding the right network architectures.
4.2 The Advantages of The Analytical Approach
The analytical approach is demonstrated with examples of designing the Hamming net
work, as well as designing a network model to solve the parity problem.
Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 47
4.2.1 Designing the Hamming Network
A classic problem in communication theory is that given a set of binary patterns, called
exemplars, for an given binary pattern, how to determine which exemplar is the closest to
the input pattern. This problem occurs when binary fixed-length signals are sent through
a memoryless binary symmetric channel. The optimum minimum error classifier in this
case calculates the Hamming distance to the exemplar for each class and selects that
class with the minimum Hamming distance [23]. The Hamming distance is the number
of bits in the input which do not match the corresponding exemplar bits. The Hamming
network developed by Lippmann [58] implements this algorithm.
Let us denote the input pattern and the exemplar as v and e respectively. Each
element of both vectors are binary and belong to the set {+1,-1}. Suppose there are
M exemplars {ei, e2,e^}, then the Hamming distance between an input pattern and
the i t h exemplar is
d^^N-v-Z) (4.1)
where N is the number of elements in the vector and v • e,- is the inner product of two
vectors.
Since minimizing d{ is equivalent to maximizing 6, = N — di = |(A/ + v • e;), the
optimum minimum error algorithm can be restated as follows:
1. Calculate 6,- Vt € { 1 , 2 , A f } ;
2. Find the index k such that bk = max{6j : i £ { 1 , 2 , M } } ;
3. Output the exemplar vector eV
To implement this algorithm, the Hamming network is structured in two subnets: the
lower subnet calculates bi (the i t h matching score) and the upper subnet finds the index
k. Note that the Hamming network does not implement the third step.
Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 48
In the lower subnet, one neuron is enough to calculate each fe, by setting the weights
to vo'- = ^e\, the threshold to 0j — — y , and the activation function to
u if a > u
a if u> a > 0 (4.2)
0 otherwise
It is assumed that this neuron operates in its linear range, that is, it is essentially a linear
neuron. The lower subnet consists of M such neurons.
For the upper subnet, Lippmann used a feedback network known as lateral inhibitive
network or winner-take-all network, which mimics the heavy use of lateral inhibition
evident in the biological neural networks of the brain [49]. The weights in this subnet
are set as follows
w kl = < 1 if k = I
(4.3) -e iik^l
where 0 < e < All thresholds are set to zero. Neurons in this subnet have the
same activation functions as in (4.2) and also operate in the linear range. After this
subnet converges, only one neuron's output is positive and the rest are zeros. The one
with none-zero output corresponds to the exemplar which has the minimum Hamming
distance to the input pattern.
4.2.2 Designing the Parity Network
Now let us use the analytical approach to construct a neural network which solves the
parity problem. A function which solves this problem is such that it outputs 1 if there
are odd number of l's in the input pattern; 0 otherwise. This function is the so called
parity function.
Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 49
Let
A 1 if E ? = i * ; > « V i e { i , 2 , . MJV} (4.4)
0 otherwise
Then the following function
N N
y = E(-i) ,'~ 1* = Efc* (4.5) i=i t=i
is equivalent to the parity function in the sense that the input-output behaviours are the
same.
Zi can be realized by a threshold-logic neuron shown in Figure 4.2, and y can also be
realised by a threshold-logic neuron shown in Figure 4.3. Therefore, the whole network
can be built as that shown in Figure 4.4.
Xi
•N
Figure 4.2: Realizing Function (4.4)
Zi
y
N
Figure 4.3: Realizing Function (4.5)
Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 50
y
Figure 4.4: The Network for Solving the Parity Problem
As shown by this and previous examples, one can easily derive a network using the an
alytical approach. Note that both the network architectures and the network parameters
are simultaneously derived.
The advantages offered by the analytical approach are as follows: (1) no training is
needed; (2) the network architecture derived is proper if not optimal; and (3) complete
understanding of the role of every individual neuron.
Although a lot can be learnt from the work of researchers such as Hopfield, Koho-
nen and Lippmann on the constructing processes of their network models, they have not
yet developed a systematical method of network design. They relied mainly on their
imagination, intuition, as well as inspiration from their knowledge of the nervous sys
tem. However, to effectively use neural networks as computing machines, a systematical
method of network design is very much needed.
4.3 The Algebraic Derivation Methodology
Here we propose a methodology of network design, which consists of the following five
stages:
1. Find the minimum set of neuron models for a given class of functions;
Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 51
2. Devise symbolic representations for these neurons and their networks;
3. Establish theorems for manipulating these symbols based on the computational
properties of the neurons and their networks;
4. Establish procedures for deriving neural networks from functions;
5. Use these procedures and theorems to derive and simplify network models for spec
ified functions.
During our study on network design, we found that it is very useful to symbolically
represent these units and their networks. These symbols can then be manipulated to
yield proper network models. Because symbol manipulation is an important feature of
our methodology, we call it. algebraic derivation of neural networks.
The approach of symbolically representing computing units and manipulating these
symbols was used before in [57]. However, they were only concerned with designing
networks of threshold-logic neurons to realize logical functions. Here, we are concerned
with designing networks of diverse neuron types to realize more complex functions.
Chapter 5
S Y M B O L I C R E P R E S E N T A T I O N O F N E U R O N S A N D T H E I R
N E T W O R K S
Our task is to map functions or algorithms to neural networks. As required by the
algebraic derivation methodology, the first step is to determine a minimum set of neuron
models; the second step is to devise symbolic representations for the chosen neuron
models and their networks. These two steps are completed in this chapter.
5.1 Neuron Models
Since we restrict the choice to the class of neuron models defined in Chapter Three, the
task in the first step is reduced to deciding what types of activation functions to use.
Because of practical restrictions, these functions have to be as simple as possible.
Many techniques in image enhancement and restoration involve a weighted summa
tion of neighboring pixels and passing the sum to a function to determine the value of
processed pixel (see Section 2.2). This function is usually piecewise. An example is the
function used in the contrast stretching technique, which is shown in Figure 5.1. Denote the contrast stretching function as fc{%)- We can write another function as
f(x) = afQ(x) + p?fQ(x -a) + 7/o(x - b) (5.1)
where functions /Q(.) are shown in Figure 5.1 as dotted fines. By proper choice of
coefficients, we can ensure that
f(x) = fc(x) Vxe[0,c] (5.2)
52
Chapter 5. SYMBOLIC REPRESENTATION OF NEURONS AND THEIR NETWORKS53
y
o
d
x \(2)
Figure 5.1: The Contrast Stretching Function
Therefore, we can rewrite many functions used in image processing as sums of several
quasi-linear functions. Hence, the quasi-linear activation function is chosen.
where fh is a linear function. Linear activation function is also needed.
A lot of logic operations are employed in image processing techniques, therefore, the
threshold-logic activation function is also required.
Since functions used in image processing techniques can be expressed as compositions
of these three functions, we choose them to form the minimum set of activation functions.
Because neither one of these functions can be realized by combinations of the other two,
this set is indeed minimum.
The neuron models we have just chosen are linear, quasi-linear, and threshold-logic
neurons, which are referred to as LQT neurons. They are the simplest among all the
neuron models proposed in the literature, hence should have minimum difficulty in their
hardware implementations. In Chapter Seven, a formal proof is given which shows that
any image processing technique can be realized by networks of these neurons.
Note that
f(x) = / i O (fQ{x)JQ(x - a),fQ(x - b)) (5.3)
Chapter 5. SYMBOLIC REPRESENTATION OF NEURONS AND THEIR NETWORKS54
For the convenience of algebraic derivation, symbols are used to represent LQT neu
rons. The linear neuron is denoted as
y =C xi,x2,...,xn | w-l,w2,...,wn D t (5.4)
The quasi-linear neuron is denoted as
y =< x1,x2,...,xn | w1,w2,...,wn >* (5.5)
The threshold-logic neuron is denoted as
y =C xux2,...,xn \w1,w2,...,wnZl t (5.6)
Where xt- represents the i t h input to the neuron, W{ the weight associated with x,-, and
t the threshold of the neuron. To represent either one of LQT neurons, the following
notation is used.
V = {|xi,x2,...,xn \w1,w2,...,wn\} t (5.7)
If Wi = 1 Vi € { 1 , 2 , n } , then the neuron is denoted as
y = {|£i,X2, ...,£„!}' (5-8)
Sometimes, it is more convenient to use the following vector representations
y = {\x\ w\Y (5.9)
or
r/ = {|x|w|r (5.10)
where x = x. — (xi,x2,xn) and w — w = (w\, w2,wn).
Chapter 5. SYMBOLIC REPRESENTATION OF NEURONS AND THEIR NETWORKS55
5.2 LQT Networks
A network which is composed of LQT neurons is called a LQT network. Al l the networks
derived in this thesis are such networks.
Three typical topologies of LQT networks are illustrated in Figure 5.2. Both networks
in Figure 5.2.a and Figure 5.2.b are feedforward, and the network in Figure 5.2c is
feedback. Note that the network shown in Figure 5.2.b is different from conventional
feedforward neural networks in that some neurons in the third layer take inputs from the
input neurons.
y,(k) y,(k) y3(k) y4(k)
(a) (b) (c)
Figure 5.2: Typical LQT Network Topologies
The notation of a network is similar to that of a neuron. The network input is denoted
by x and the network output is denoted by y. The output of the i t h neuron is denoted
as Hi.
Chapter 5. SYMBOLIC REPRESENTATION OF NEURONS AND THEIR NETWORKS56
The notation of the network shown in Figure 5.2.a is
V = C < C x i , x 2 I w3Uw32Zit3, LZ x 1 ? x 2 | w41,w42-Jiu | w53, w54 >'5,
Note that the input neurons are not explicitly expressed here. When all the variables
in the notation are independent, the notation represents a whole network. Part of the
network can also be represented, such as
y = C Ps,pe | w 7 5 , w76Dt7 (5-12)
which represents the output neuron, and
y =C< nz, PA I ™53, wb4> t s, H I ™75, w76D t 7 (5.13)
which represents a part of the network as shown in dashed lines in Figure 5.3. The
Figure 5.3: Representing Part of a Network
notation of the network shown in Figure 5.2.b is
y = C < Xi , L Z X i , X 2 | ltf31,tU 3 2Z]' 3 ,IZ X i , X 2 | W41,W42Z\U | U>51,U>53, ™54 ><5,
< L Z x i , x 2 | w31,w32Z} t 3,n x i , x 2 | w41, wA2-3u,x2 \ w63, w64, w62 >'6 (5.14)
| 75,^76 D t 7
Chapter 5. SYMBOLIC REPRESENTATION OF NEURONS AND THEIR NETWORKS57
Another way of representing a whole network is
fi5 = < xufi3,p4\w51,w53,w54 >ts
= < M3,M4,X2 | W63, ^ 6 4 , W 6 2 >'6 (5.15)
^3 = C XUX2 IW31, w32n t 3
Hi = (Z X i ,X 2 I W 4i,l0 4 2]* 4
where symbol { represents one layer of neurons. Note that input neurons are not explicitly
expressed in this form either.
The form used in equations (5.11) and (5.14) is called compact form, and the form
used in (5.15) is called layered form. The layered form is easier to comprehend.
Sometimes, a neuron can be grouped to either one of several layers. An example is
shown in Figure 5.4, where the neuron with threshold t7 can be grouped either to the
second or to the third layer. To avoid ambiguity, in this thesis such neurons are always
grouped to the layer closest to the output. In this case, the neuron with the threshold t7
is grouped to the third layer, that is, the network's layered form representation is
y = C p5,fie,^7 \wS5,w8e,w87 D t s
^5 = rz p3,p41 w53,w54zi
Me = IZ fl3,P4 | ^ 6 3 , W64Z1
- rz x2 \ w723 t 7
M3 = \ZXl7X2\ 1031,1032 •
/ i 4 = rz xx,x2 \ w4l, w42]u
(5.16)
1*3
The symbolic representation of a feedback network is different from the representation
of a feedforward network. Since feedback networks are dynamic in the sense they have to
iterate many times, an indication of the iterations has to be included in the representation.
Chapter 5. SYMBOLIC REPRESENTATION OF NEURONS AND THEIR NETWORKS58
(5.17)
Figure 5.4: A LQT network
The representation of the feedback network shown in Figure 5.2.c is
yt(h + l) = LZ yi(h),y2(h),y3(h),y4(h) | wu,w12, i ^ , ^ Z)*1
y2(h + l) = LZ yi{h),y2(h),y3(h),y4(h) \wn,w22,w23,w24Zlt2
y3(h+l) = H yi(h),y2(h),y3(h),y4(h) \w31,w32,w33,w34Z}t3
y4(h + l) = LZ yi(h), y2(h), y3(h), y4(h) \ w41, w42, w43, w44 Z2U
where yi(h) is the i t h element of the network output at the h t h iteration. Vector y(0) =
(yi(0), 2/2(0), 2/3(0), y4(0))T are the network input. Note that neurons in this network are
both input and output neurons.
Another representation of the above feedback network is as follows
where
and
y(h + l) = L Z y ^ l W z 1
y(h) = (y1(h)1y2(h)Mh)Mh)?
t — (ti, t2, t3, t4)T
and W is the matrix composed of u^'s.
(5.18)
(5.19)
(5.20)
Chapter 6
C O M P U T A T I O N A L P R O P E R T I E S O F L Q T N E T W O R K S
As required in the third step of the algebraic derivation methodology, this chapter es
tablishes some theorems based on the computational properties of LQT networks. These
theorems are useful in simplifying and evaluating network realizations. Also, we shall
discuss some factors which affect the quality of a LQT network, and give a criterion for
network simplification.
6.1 Network Equivalence
Definition 6.1 If a function f(x) and a Network NA satisfy the following condition
where X is the definition domain of f, then f is said to be realized by NA, or NA is said
to be the network realization of f.
A well defined LQT network can only realize one function; but a function can be
realized by many different networks, thus comes the concept of network equivalence.
Definition 6.2 // Network A and Network B satisfy the following condition
where X is the definition domain of both NA and NB, then Network A and Network B
are said to be equivalent.
f(x) = NA{x) VxeX (6.1)
NA(x) = NB(x) VX<EX (6.2)
59
Chapter 6. COMPUTATIONAL PROPERTIES OF LQT NETWORKS 60
The following theorems concern the property of network equivalence. Theorems 6.1 to
6.8 concern single neuron, or networks of single neuron. The rest in this section concern
networks of multiple neurons.
Theorem 6.1 If NA is
VA = O l , Z 2, Zn | WX, W2, U>„|}* (6.3)
and Ng is
VB = ^zil,zia,...,Zin | wil,Wi9,...,win^t (6.4)
where sequence {z^, z , - 2 , Z i n } and {w^, W{2, ...,u;,„} are permutations of {z\,z2,...,zn}
and {wi, w 2 , w n } respectively, then NA and NB are equivalent.
Theorem 6.2 Let NA be
VA = {\zuz2,...,zn | w1,w2,...,wn§ t (6.5)
If W{ = 0, then NA is equivalent to
VB = {|2i,...,*t-i,*t+i,-,*n | «?!,...,u;.-_i,«;,-+!,.„,U7„|}* (6.6)
Theorem 6.3 Let NA be
VA = {\zi,z2,...,zn | wuw2,...,wn$* (6.7)
If Zi == a, where a is a constant, then NA is equivalent to
NB = {\z1,...,zi_i,zi+1,...,zn |iwi,...,u7,-_i,u;i +i,...,«; n|} t~ 0 ," i (6.8)
Theorem 6.4 Let NA be
VA = l > i , z 2 , z n | wx,w2,wn\Y (6.9)
Chapter 6. COMPUTATIONAL PROPERTIES OF LQT NETWORKS 61
If Zi = Zj and i < j, then NA is equivalent to
NB = { | * i,...,Zi,...,Zj_i,Zj + i , . . . , z n |ti^,...,tx>,- + u;j,...,u;J-_i,u;,-+i,...,u;n|}* (6.10)
or
NC = {|zi,...,Z,-_i,2,-+i,...,Zj,...,2„ | +«»,•,...,U>„|}' (6.11)
Theorem 6.5 / /TV^ is
y>i = C z I tu • * (6.12)
and NB is
yB=Cz\ 0w Z\p t (6.13)
where P > 0, then NA and NB are equivalent.
Theorem 6.6 Let NA be
VA = C ZX, Z2, *n I wi, w2,u)„D( (6-14)
and NB be
VB = < Zx,Z2,...,Zn | V)i,W2,...,Wn > ' . (6.15)
If the output of NA is always greater than or equal to zero, then NA and NB are equivalent.
Theorem 6.7 Let NA be
yA = C zx,z2,...,zn | wx,w2,...,wn^ (6.16)
and NB be
yB = C z i , z 2 , . . . , z n | Wi,w2,...,wnZi+ s (6.17)
where 1 > <5 > 0. If the output of NA is either zero or one, then NA and NB are
equivalent.
Chapter 6. COMPUTATIONAL PROPERTIES OF LQT NETWORKS 62
Theorem 6.8 Let NA be
VA =< zx,z2,...,zn | wi,w2,...,wn >* (6.18)
and NB be
VB =C * I , z 2 , z n | w-i, w2,u)Jt+< (6.19)
where 1 > 6 > 0. If the output of NA is either zero or one, then NA and NB are
equivalent.
Theorem 6.9 Let NA be
yA = {W,z2,...,znf (6.20)
and NB be
VB = { k i , ••, Z i - i , C Zi, ...,zi+m D'Szi+m+i, ••-,Zn|}<2 (6.21)
where i G {1 ,2 , ra} anc? n > i + m > 1. Ift = t1+t2, then NA and NB are equivalent.
Theorem 6.10 let NA be
{\zx,z2,...,zn \w1,w2,...,wn\} t (6.22)
and NB be
{| C zx | ^ 3 " , C 2 2 |u>2 3 < 2 , . . . ,C z n | u>„ D ( " l }^ 1 (6.23)
IfYJi=\ti = t, then NA and NB are equivalent.
Theorem 6.11 Let NA be
yA = {iCz\wD t* |a|}<2 (6.24)
and NB be
yB = {\z\aw\y (6.25)
Ift — at. +12, then NA and NB are equivalent.
Chapter 6. COMPUTATIONAL PROPERTIES OF LQT NETWORKS
Theorem 6.12 If NA is
yA = {\Cza,zbDt> |a|}<2
and NB is
yB = { \ C z a , z b \ a , a D a h |}<2
then NA and NB are equivalent.
Theorem 6.13 Let NA be
yA = {\<za,zb>ti |a|}<2
and NB be
yB = {\<za,zb\ct,a>at> |}<2
lfa>0, then NA and NB are equivalent.
Theorem 6.14 If NA is
yA = C {\zx,z2,...,zn \wx,w2,...,wn§tl Dh
and NB is
VB = {\z1,Z2,...,Zn | iyi,W 2,..., Wn^
where t = ti + t2, then NA and NB are equivalent.
Theorem 6.15 Let NA be
yA = « zx,z2, ...,zn | wuw2, ...,wn >tl>'2
and NB be
yB =< Zi, z 2 , - . , zn | w1,w2,wn >*
If t = tx + t2 and t2 > 0, then NA and NB are equivalent.
Chapter 6. COMPUTATIONAL PROPERTIES OF LQT NETWORKS 64
The theorems mentioned so far can be easily verified by writing out the functions
realized by both networks NA and NB, and checking if the functions are equivalent.
Example 6.1 Prove Theorem 6.15 is true.
SOLUTION The function realized by NA is
yA = fd(fQ(zw-t1)-t2) (6.34)
If zw — ta > 0, then
VA = IQ(ZW - h - t2) (6.35)
otherwise,
VA = 0 (6.36)
This implies
VA = VB = IQ(ZW - t) (6.37)
which is the function realized by NB- Therefore, NA = NB- tt
6.2 Network Quality
Since a function can be realized by an infinite number of LQT networks, a criterion is
needed for us to choose a proper network among all its equivalents. Obviously, this choice
is based on the quality of the networks. There are several factors which affect the quality
of a network. They are introduced in the following sections.
6.2.1 Network Depth
The depth of a LQT network is defined as the longest path from the input neurons to the
output neurons. For the sake of simplicity, it is assumed that all the LQT neurons have
the same processing time, which is taken as one time unit. For a feedforward network,
its depth equals its computation time. For a feedback network, its computation time
Chapter 6. COMPUTATIONAL PROPERTIES OF LQT NETWORKS 65
depends not only on the convergence property of the algorithm it implements but also
on its depth. Since our objective of using neural networks is to achieve high computation
speed, the depth of a network is thus an important factor in choosing one network over
another.
For a feedforward network, its depth also corresponds to the number of layers it has.
Denote the depth of a network NA as D(NA) and the number of layers as L(NA), the
following relationship exists
6.2.2 Network Size
Neural networks eventually have to be implemented in hardware, hence we should be
aware of their size. It is natural to think the size of a network in terms of the number
of neurons in the network. However, since a neuron is allowed to have indefinite number
of inputs, it is more proper to measure the size of a network in terms of the number
of connections. Moreover, in VLSI implementation, it is the connections which occupy
most of the space on a chip.
The number of connections in a neural network equals the number of weights. Since it
is easier to count the number of weights in the symbolic representation of a LQT network,
we define the network size as the number of weights in the network.
Let us denote the number of neurons in network NA as N(NA) and its network size
as S(NA). It is clear that
D(NA) = L(NA) - 1. (6.38)
N(NA) < S(NA). (6.39)
Chapter 6. COMPUTATIONAL PROPERTIES OF LQT NETWORKS 66
6.3 Criterion for Network Simplification
Both network depth and network size have to be considered when choosing a network over
its equivalents. In this thesis, the following simple criterion is used: Given two networks
NA and NB which realize the same function, if D(NA) < D(NB) and S(NA) < S(NB),
or if D(NA) = D(NB) and S(NA) < S(NB), then NA is chosen over NB, and NA is said
to be better than NB- The process of choosing a network over another is referred to as
network simplification. The theorems in Section 6.1 are developed for such purpose.
Obviously, a network with the shortest depth and the minimum size is the best among
all the equivalent networks. Unfortunately, such a case is rare. The network with the
shortest depth often does not have the minimum size. How to effectively balance these
two factors is still an on-going research. In the following chapters, we only give upper
bounds on network depth and size for some functions.
There are also some other factors which affect the quality of a network. For example,
since there are inevitably some noise in any hardware implementation, how tolerant a
network is to noise is certainly a good measure of network quality. However, due to time
constraints, these factors are not considered in this thesis.
Chapter 7
D E R I V A T I O N P R O C E D U R E S
This chapter provides procedures of deriving LQT networks to realize functions. The
usage of these procedures is demonstrated through some examples, which in turn are
useful in deriving networks in later chapters.
A function which can be realized by a LQT network is referred to as a realizable
function; the network which realizes the function is referred to as the network realization
of the function. A function which has a known network realization is called a realized
function.
There are four classes of functions: single-input single-output (SISO) functions, single-
input multiple-output (SIMO) functions, multiple-input single-output (MISO) functions,
and multiple-input multiple-output (MIMO) functions. We shall only describe procedures
for realizing SISO and MISO functions since realizing SIMO or MIMO functions is just
multiple applications of these procedures.
From the definitions of LQT neurons, the following statements are true.
Fact 7.1 The network realization for the following function
n (7.1)
is
y = C x1,x2,...,xn | w1,w2,...,wn D* (7.2)
67
Chapter 7. DERIVATION PROCEDURES 68
Fact 7.2 The network realization for the following function
E ? = i W{Xi - t if J2i=1 w{Xi > t
0 otherwise (7.3)
is
y =< x x , x 2 , . . . , x n \wuw2,...,wn > ' (7.4)
Fact 7.3 The network realization for the following function
\ i ifU=i^i>t ,7_, y = \ (7-5)
[ 0 otherwise
is
y =n x 1 , x 2 , ...,xn | wt,w2, ...,wn • * (7.6)
Now that we have established network realizations for the above three simple func
tions, we can work on network realization of more complex functions.
7.1 Realization of SISO Functions
The procedure of deriving a network to realize a SISO function / is as follows:
step 1. Call decompose(/);
step 2. Compose all network realizations to form a network realization of / .
step 3. Apply theorems in Chapter Six to simplify the resulting network.
In the above procedure, decompose(/) is a recursive subroutine:
decompose(/):
1. Decompose the function / as
/ = fo 0 ( / 1 , / 2 , - , / n )
Chapter 7. DERIVATION PROCEDURES 6 9
such that f0 has a known network realization;
2. For all i € { 0 , 1 , 2 , n } , if fi is a realized function, replace it with its network
realization; else, call decompose(/,)
3. Return.
The derivation procedure may be viewed as consisting of two stages: step one is the
analysis stage, and steps two and three are the synthesis stage. The use of this procedure
is demonstrated through the following examples. Functions in these examples are real
unless otherwise specified.
Example 7.1.1 Given a function F whose output is either 0 or 1 , find a network
realization of its complement F.
S O L U T I O N Since
F = l - F ( 7 . 7 )
according to Fact 7 . 1 , the network realization of F is
F = c F | - I D " 1 ( 7 . 8 )
t)
Example 7.1.2 Find a network realization of the following function
a if x > t
0 otherwise ( 7 . 9 )
S O L U T I O N Let
1 iix>t
0 otherwise ( 7 . 1 0 )
then
V = a / i ( * ) ( 7 . 1 1 )
Chapter 7. DERIVATION PROCEDURES 70
which can be decomposed to
2/ = / o ° / i (7.12)
where / 0 = a fa. F rom Fact 7.3 and Fact 7.1, we know both functions / 0 and / i have
known network realizations as
fo=Cfi\aD° (7.13)
and
fi(x) =C x3* (7.14)
Therefore, the network realization of function (7.9) is
y = C C xZi* | a D° (7.15)
Example 7.1.3 F i n d a network realization of the following function
S O L U T I O N Let
y = s a if t2 > x > ti
0 otherwise
a if x > t\
0 otherwise
(7.16)
(7-17)
and
a if x > t2
0 otherwise
then
which can be decomposed to
y(x) = fi(x)-f2(x)
y = fo° (/i,/2)
(7.18)
(7.19)
(7.20)
Chapter 7. DERIVATION PROCEDURES 71
where / 0 = fi — f%. From Fact 7.1, fo can be realized by a linear neuron,
/ o = C / 1 , / 2 | l , - l D ° (7.21)
From previous example, both fx and / 2 can be realized as
h{x) = C C x3h | oc D° (7.22)
and
/ 2(x) = C C x3t2 \ a D° (7.23)
Therefore, the network realization of y(x) is
y(x) = C C C xl]h | a D°, C C xZ)'2 | a D° | 1, - 1 D° (7.24)
Applying Theorem 6.10 and Theorem 6.11, this network can be simplified to
y(x) =CC€1 xii* 1 | a D ° , c n xZi t 2 | - a D°D° (7.25)
Applying Theorem 6.10 again, this network can be further simplified to
y(x) =CO xZi u,\Zx-3t2 | a , - a D° (7.26)
tt
E x a m p l e 7 .1 .4 Find a network realization of the following function
a if x > t
0 otherwise (7.27)
SOLUTION Function y(x) can be decomposed to
y = foof1{x) (7.28)
where
fo = + <x (7.29)
Chapter 7. DERIVATION PROCEDURES
and
/ i ( x ) = 1 if -x > -t
0 otherwise
F r o m Fact 7.1, we know that
F r o m Fact 7.3, we know that
fo =C fi | - a D'
fi(x) = C x | - 1 • -t
Substituting this to (7.31), we have
y(x) = f0(x) = C C x\ - 1 | - a D~ a
Example 7.1.5 F i n d a network realization of the following function
S O L U T I O N Let
and
then
a if x = t
0 otherwise
A W = { a if x > t
0 otherwise
a if x > t
0 otherwise
y(x) = /i(x) - / 2 ( x )
Chapter 7. DERIVATION PROCEDURES 73
which can be decomposed to
V = / o ° ( / i , / 2 )
where fo = fi — / 2 , whose network realization according to Fact 7.1 is
/ o = C / i , / 2 | l , - l D°
(7.38)
(7.39)
We know that both /i(x) and /2(x) have network realizations (see Example 7.1.2 and
Example 7.1.4) as
/ i (x) =CC x •* |oO° (7.40)
and
/ 2 (x) =CC x | - 1 | - a D~ a (7.41)
Substituting (7.40) and (7.41) to (7.39), we have
y(x) = /o(x) = C C C x •* | a D°, C C x | - 1 | - a D " a | 1, - 1 D° (7.42)
Applying Theorem 6.10 and Theorem 6.11, we can simplify this network to
y(x) = C C x C x | - 1 | a,a D a . (7.43)
It can be rather tedious to follow the procedure explicitly, from now on we only follow
it implicitly, and put more emphasis on the simplification process.
Example 7.1.6 If function (7.34) is an integer function, find its network realization.
S O L U T I O N Let
a if x > t
0 otherwise (7.44)
and
a if x > t + 1
0 otherwise (7.45)
Chapter 7. DERIVATION PROCEDURES 74
then
whose network realization, according to Fact 7.1, is
y=<Zfi(x\f2(x)\l,-l D°
Since, according to example 7.1.2,
fi(x) = C C x • * | o O °
and
/ , ( I ) = C C I : ' + 1 \aD°
substituting these two equation to equation (7.47) yields
2 / = C C C x = ] < | a D ° , C C x Zit+1 | a D ° [ 1 , - 1 D °
Applying Theorem 6.10 and Theorem 6.11, this network can be simplified to
y = C C x C x 3 t + 1 | a , - a D °
(7.46)
(7.47)
(7.48)
(7.49)
(7.50)
(7.51)
Example 7.1.7 Find a network realization of the following function
Oil x = t\
ct2 x = t2
y = \
S O L U T I O N Let
Mx)
Ctji X — tfi
a, if x = t{
0 otherwise
(7.52)
(7.53)
Chapter 7. DERIVATION PROCEDURES 75
where i £ { 1 , 2 , n } , then
n (7.54)
t'=l
Recall Fact 7.1 and Example 7.1.5, we know that
V = C / l ( x ) , / 2 ( x ) , . . . , / n ( x ) D° (7.55)
and
(7.56)
Substitute (7.56) to (7.55), we have
y = C C C X D ' ^ I Z x | - I D - * 1 I a i , a : i D 0 1 , . .
C C xD*",IZ x I - I a n , a n 3 Q " D °
(7.57)
Applying Theorem 6.10, we can simplify this network to
y = C C xi]* 1 , C x I - l D _ t , , . . . , C xD*",IZ x | - 1ZT*" | o ^ a i , . . . , a B , a B D a (7.58)
where a = 52 "=1 a t - tt
This example shows that any single variable finite function can be realized by a LQT
network with a depth at most of 2 and a size at most of n. It leads to the following
Theorem 7.1 Any single variable finite function is realizable.
Example 7.1.8 If function (7.52) is an integer function, find its network realization.
S O L U T I O N Let
theorem.
a; if x = ti (7.59)
0 otherwise
Chapter 7. DERIVATION PROCEDURES 76
where i € {1,2, ...,n}. It can be shown that
Recall Fact 7.1, and Example 7.1.6, we know that
y = C / i ( x ) , / 2 ( x ) , . . . , / n ( x ) D°
(7.60)
and
fi(x) = C C x • x • < i + 1 | a,-, - a ; D°
(7.61)
(7.62)
Substituting equation (7.62) to equation (7.61) yields
y = C C C x Z) 1 1 , C x Z)11"1"1 l a ^ - a ! D " , c r Z x D ' 2 , r Z x D t 2 + 1 | a 2 , - a 2 3 u,
CIZ x rz x | ctn,-ctn D°D°
which can be simplified to
(7.63)
y = crz x z i ^ r z x z i ' i + 1 , r z x z i < 2 , r z x z i < 2 + 1 , r z xz\ t n, rz xz\ t n + 1
(7.64)
If = + 1, then the above network realization can be further simplified to
y =CIZ xD '^rZ xD t 2,...,IZ xD*",rZ x • < n + 1 | a i , a 2 - <*i , . . . , a„ - a „ _ i , - a „ D° (7.65)
t Example 7.1.9 Find a network realization of the absolute function
y = |x| (7.66)
S O L U T I O N Let
x if x > 0
0 otherwise (7.67)
Chapter 7. DERIVATION PROCEDURES
anc —x if —x > 0
0 otherwise
then
y(x) = fi(x) + f2(x)
According to Fact 7.1 and Fact 7.3, the network realization is
y = C < x > ° , < x | - 1 > ° D °
Example 7.1.10 Find a network realization of the following function
ax + b if x > t
0 otherwise
S O L U T I O N Let
and
f2(x) = {
-t iix>t
otherwise
1 if x > t
0 otherwise
then
y(x) = afi(x) + (at + 6)/2(x)
Recall Fact 7.1, Fact 7.2, and Fact 7.3, we have
y = C < x >*, C x \a,at + b D°
Chapter 7. DERIVATION PROCEDURES 78
Example 7.1.11 Find a network realization of following function
y= < x if F = 1
0 otherwise (7.76)
where F is a function whose output is either 0 or 1.
S O L U T I O N Let
x-a if F = 1
0 otherwise (7.77)
where a < x Vx £ X, and A" is the definition domain of function (7.76). Then
y(x) = f.ix) + aF
whose network realization, according to Fact 7.1, is
y=CMx),F\l,aD°
fi(x) can be expressed as
AGO =
where b > x — a Vx £ A\ Let
x - a - f e F iix-a-bF>0
0 otherwise
then according to Fact 7.2,
and, according to Fact 7.1,
/ u (x ) = x - a - bF
fi(x) =< fn(x) >°
(7.78)
(7.79)
(7.80)
(7.81)
(7.82)
/ n (x ) = C z , F | l , - & D ° (7.83)
Chapter 7. DERIVATION PROCEDURES
Hence,
h{x) =< x,F\l,-b> a
The network realization of F, according to Example 7.1.1, is
F = c F | - l y 1
Substituting (7.85) to (7.84) yields
Mx) =<x,F\l,b>a+b
Substituting this to (7.79) yields
y=C<x,F\l,b> a+ b,F\l,aD°
tt
Example 7.1.12 Find a network realization of the following function
x if F = 0
0 otherwise
where F is a function whose output is either 0 or 1
SOLUTION According to the previous example, the network realization
y =C< x,F | l , 6 > ° + b , F | l , a D°
and recall Example 7.1.1, we know that
F=cF\ - ID-1
Substituting this equation to (7.88) and after simplification, we have
y =C< x,F\ l , - f e> a , F | 1,-a D~ a
y = s
Chapter 7. DERIVATION PROCEDURES 80
7.2 Realization of M I S O Functions
This section shows the network realization of MISO functions. The procedure of network
derivation of MISO functions is as follows.
step 1. Decompose function / as
/ = / i ° h
where fr is a linear function;
step 2. Proceed with the procedure for realizing SISO functions to find a network
realization of fY; step 3. Compose the network realization found in step 2 and the network realization
of /L to form the network realization of / ;
step 4. Using theorems in Chapter Six to simplify the resulting network.
The use of this procedure is shown through the following examples.
Example 7.2.1 Find a network realization of the following function
/ :Z>?-2> 2 (7.91)
where £>, is a finite set of numbers.
SOLUTION We consider the following two cases.
Case I: / is a linear function.
Since we can express / as
f = Y,biXi + b0 (7.92) t=i
where Xi £ X^. According to Fact 7.1.1, we have
f(x)=Cxux7,...,xn \bub2, ...A D b 0 (7.93)
Case II: / is a non-linear function.
Chapter 7. DERIVATION PROCEDURES 81
Here we introduce a linear function as follows
fL : V3 (7.94)
and a finite non-linear function
fi : X>3 - V2 (7.95)
We now find the network realizations for both / i and fr,, and then compose them to form
the network realization of / .
Let
Mi±\Di\ V i e {1,2,3}
and Zi denotes an element in V3, and a,- = f\{zi). According to Example 7.1.7, the
network realization of fi is
fi{z) = C C 2 D * 1 , • 2 | - 1 I T * 1 , C z Z]ZM3, • 2 | - 1 • - * w 3
(7.96) | Ol, «1, % 3 ) Z M 3 ^ a
where a = a t - According to Fact 7.1,
z = JL(X) = C x1,x2,...,xn\bl,b2,...,bn Dbo (7.97)
Substituting (7.97) to (7.96) and after simplification, we have
y = C C x i , ...,xn\bi, ...,bn I f 1 , C X i , . . . ,x n| - 6 i , - b n Zi~ c i,
• xi , . . . ,x„|&i, ...,&„ U c « 3 , C xj , . . . , x „ | - & i , . . . , - & „ • " ^ (7.98)
where c, = z; + fe0- if
This example shows that any multi-variable finite function can be realized by a LQT
network with a depth at most of 2 and a size at most of 2Ma(n + 1). It leads to the
following theorem.
Chapter 7. DERIVATION PROCEDURES 82
Theorem 7.2 Any multi-variable finite function is realizable.
Since a pixel takes integer value from 0 to 255, any image processing technique can
be expressed as finite functions. According to this theorem and Theorem 7.1, it is clear
that the LQT networks are capable of realizing any image processing technique.
In Example 7.2.1, we made an assumption / = f\ o //,, here we show under what
condition this assumption is true. Denote f2 = / i o / L , we have the following theorem.
Theorem 7.3 The necessary condition for f = fi is thatVxi, x2 E if f{x\) 7 f(x2),
then fL(xx) ^ / L ( X 2 ) .
P R O O F If there exist Xi, x2 E Z>" s u c n t n a t f(xi) / f(x2) but /L(£*I) = /L (^ 2 ) , then
fx o = /1 0 h(x2)- This implies that / 2(x x) = f2(x2), hence / 2 ^ / . I
Corollary 7.3.1 ITie necessary condition for f = f2 is M3 > M2.
P R O O F If M3 < M2, then there must exist a pair x a , x2 E T>™ such that f(xi) ^ f{x2)
but fL(xi) = h(x2) I
Theorem 7.4 Functions ft and fL are not unique.
P R O O F This theorem will be shown to be true in some of the following examples. I
As shown in Example 7.2.1, the size of the network realization of a non-linear multi-
variable finite function is proportional to M3. Therefore, according to Corollary 7.3.1, the
best projection function /j, is the one whose codomain size (M3) is equal to the codomain
size (M2) of the non-linear function itself.
Here, we introduce a projection function which is proper for all multi-variable finite
integer functions.
z = Y,xiM1i-1 ' (7.99)
8 = 1
Chapter?. DERIVATION PROCEDURES 83
Such a function is called one-to-one projection since / L ( X I ) ^ / L ^ ) iff £1 7 %2 Vx*i, x2 €
which implies M 3 = M xn .
Another projection function, which is proper only for a certain class of finite integer
functions, is as follows
z = X > < (7.100)
which has the property that Af3 = n(M\ — 1) + 1. We will refer to functions (7.99) and
(7.100) as projection I and projection II.
The following examples show network derivation of some multi-variable finite integer
functions.
Example 7.2.2 Find a network realization of the AND function
y — Xi A x2 A . . . A xn (7.101)
SOLUTION Using projection I /L{X) = £"=1 2 , _ 1a; t-, the AND function can be ex
pressed as
V = fx o h{x) (7.102)
where
' 1 if z > 2 n - l AW = (7.103)
0 otherwise which, according to Fact 7.1, has the following network realization
/ i ( z ) = C z D 2 " - 1 (7.104)
According to Fact 7.1, the network realization for J L { X ) is
h(x) = C xux2, xn I 1 , 2 , 2 " " 1 D ° (7.105)
Substituting it to (7.104) yields
y = C C x u x 2 , x n I 1 , 2 , . . , 2 " " 1 D ° D 2 " " 1 (7.106)
Chapter 7. DERIVATION PROCEDURES 84
which, according to Theorem 6.10, can be further simplified to
y=CI :r i ,S2, - . . , *» | l , 2 , . . , 2 " - 1 z T - 1 (7.107)
The resulting network has a depth of 1 and a size of n. If we use projection II / L ( X ) = 5Z"=i and chose
1 if z > n
0 otherwise
then
Since
and
hence
(7.108)
y = fiofL(x) (7.109)
h(z) =C ^ •» (7.110)
h{x)=CxuX2,...,xnD 0 (7-111)
y = d x i , x 2 , . . . , s B (7.112)
The resulting network also has a depth of 1 and a size of n. (J Example 7.2.3 Find a network realization of the OR function
y = Xl V x 2 V,. . . , Vx n (7.113)
SOLUTION Using the projection I / L ( X ) = 2 ,_1x,-, the OR function can be
expressed as
y = fxofL(x) (7.114)
where
' 1 if z > 1 Mz) = { ~ (7.115)
0 otherwise
Chapter 7. DERIVATION PROCEDURES 85
From previous example, the network realization can be easily derived as
If projection II is used, then we have the following network
y = C xi,x2, ...,xnZi
Note that both the two networks have a depth of 1 and a size of n. ft
Example 7.2.4 Find the network realization of an XOR function
y — X\ © x2 © ... 0 xn (7.116)
SOLUTION Using projection II JL(X) = £"=i xi-> t n e XOR function can be expressed
as
w here
Let
then
Since
y = / i ° h(x)
In =
1 if z is odd
0 if z is even
1 if z > i
0 otherwise
/i(*) = E ( - i r 7 i , ( * ) t=i
hi = C z
(7.117)
(7.118)
(7.119)
(7.120)
(7.121)
hence
h(z) = C C z l l ' ^ z D 2 , C z | 1,-1, (-I)*- 1 3° (7.122)
Chapter 7. DERIVATION PROCEDURES 86
5ince
h(x) —C xi,x2,xn D ° (7.123)
we have
(7.124) y = C C xi,x2,xn Z]1, C #i, x 2 , a ; n • 2 , • x l 5 x 2 , x n
I c — i ) " - 1 ^ °
The resulting network has a depth of 2 and a size of ra2 + n.
For the sake of comparison, let us use projection I. Since it is difficult to express the
non-linear function / i for an arbitrary n , we only show a special case when n — 3.
Let
fL(x) = J2?-1*i
and
AC*) =
t ' = l
0 z = 0 1 z = 1 1 z = 2 0 z = 3 1 z = 4 0 z = 5 0 z = 6 1 z = 7
then
y(s) = / i ° h(x)
According to Example 7.1.8, we have
(7.125)
(7.126)
(7.127)
fi(z) = c r z z z i 1 , r z z z i 3 , r z z z i 4 , r z z z i 5 , r z z z i 7 |.i,—1,1,—1,1 D ° (7.128)
Chapter 7. DERIVATION PROCEDURES 87
hence,
y = C C Xi,x2,x3 | 1,2,4 X i , x 2 , x 3 | 1,2,4 Z l 3 , C Xi, x 2 , x 3 | 1,2,4
C x ! , x 2 , x 3 | l , 2 , 4 D 5 , C I x 1 , x 2 , x 3 | 1,2,4 | 1 , - 1 , 1 , - 1 , 1 D°
Note that this network has a size of 20 while the network derived using projection II
has a size of 12 (n = 3).
Remark Example 7.2.4 shows that using projection II results in smaller network
size, this is generally true. Although in both Examples 7.2.2 and Example 7.2.3, using
projection II has no advantage in terms of network depth and network size, the magnitude
of weights is much smaller. This is important in hardware implementation, since the value
range of the weights have to be limited.
Chapter 8
N E T W O R K R E A L I Z A T I O N S O F I E T E C H N I Q U E S
In this chapter, image enhancement (IE) techniques are re-categorized according to their
realization difficulty into two classes: linear and non-linear filtering techniques. Examples
of realizing some of the techniques are provided here to show the effectiveness of the
algebraic derivation approach. Since the linear techniques are easy to realize, they are
treated briefly with two examples; the rest of this chapter is devoted to the realization
of non-linear techniques.
8.1 Network Realizations of Linear Techniques
A two-dimensional image can be projected to a one-dimensional image. For instance, an
image U whose size is K x L can be mapped to a one-dimensional image g with the
following relationship
9i = Ukji Vfce {1,2, ..,#}, V/G{1,2,...,I} (8.1)
where i — (k — 1)L + I. This mapping is one-to-one, that is, U can be uniquely mapped
back from g using equation (8.1).
For all the linear filtering techniques, the enhancement process can be expressed, in
matrix form, as
f = E g + t (8.2)
where g is the image to be enhanced and f is the enhanced image. For an input image
88
Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 89
of size n and an enhanced image of size m, the network realization of this function is
fi = C gii92i— ,9n | e n ,e 1 2 , . . . ,e l n D h
h — C g1,g2,...,gn | e 2i, e 2 2 , e 2 n D < 2
fm — ^ 9li92i ••••>9n | emli em2i ••••> emn 3*
A more compact notation of this network is
f = C g | E D *
The architecture of (8.3) is shown in Figure 8.1.
en
(8.3)
(8.4)
Figure 8.1: Network Architecture for Linear Filtering
Now, for the linear filtering techniques, the task left is to map those techniques to
the form shown in (8.2). The following two examples illustrate such mapping.
Example 8.1 Find a network realization of the averaging filter.
SOLUTION Suppose the following image
U
Ul,l « 1 , 2 "1,3 Ul,4
"2,1 "2,2 "2,3 "2,4
«3,1 "3,2 "3,3 "3,4
"4,1 "4,2 "4,3 "4,4
(8.5)
Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 90
is to be enhanced by the equally weighted averaging filter of (2.10) and the window of
the filter is 3 x 3, then the processing, in the matrix form, is
where
with
1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0
1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0
0 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0
0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0
1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0
1 1 1 0 1 1 1 0 1 1 1 0 0 0 0 0
0 1 1 1 0 1 1 1 0 1 1 1 0 0 0 0
0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0
0 0 0 0 1 1 1 0 1 1 1 0 1 1 1 0
0 0 0 0 0 1 1 1 0 1 1 1 0 1 1 1
0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1
0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0
0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 0
0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 1
0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1
f — (/l»/2, •••,/l6)T
g = {gi,g2,--,gi6)T
g = A g (8.6)
(8.7)
(8.8)
9i = uKl Vfc,Ze {1,2,3,4}
Comparing (8.6) with (8.2), we obtain the weight matrix and the threshold vector as
Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 91
follows
E = A t = 0
where 0 represents the zero vector whose elements are all zeros, ft
Example 8.2 Find a network realization of the zooming technique.
S O L U T I O N Suppose the following image
U =
(8.9)
(8.10) "1,1 "1,2
« 2 , 1 "2,2
is to be magnified two times by the the replication technique described in Section 2.2
such that the resulting image has the size of 4 x 4, then the zooming process, in the
Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 92
matrix form, is
where
with
f =
1 0
1 0
0 1
0 1
1 0
1 0
0 1
0 1
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
1 0
1 0
0 1
0 1
1 0
1 0
0 1
0 1
g = R g
f — ( / i , h, fie)T
(8.11)
(8.12)
(8.13)
9i = Uk,i Vfc,/ € {1,2}
Comparing (8.11) with (8.2), we obtain the weight matrix and the threshold vector as
Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 93
follows
E = R
t = 0 (8.14)
8.2 Network Realizations of Non-linear Filters
There is no unified theory of non-linear filtering techniques. As a result, there is no
universal network structure which can be used for all of them as in the case of linear
filters.
8.2.1 Dynamic Range Modification
Contrast stretching, clipping, thresholding, and window slicing are techniques which
modify the dynamic range of a given picture. For this group of techniques, the processes
have a generalized form as follows.
axx + b\ > x > t0
a2x + b2 t2> x >ti V
anx + bn tn > x >
The derivation of the network realization of above function is as follows.
Let
a'{x + 6- x > t{_i
0 otherwise M*) = { \/i E {l,2,...,n}
(8.15)
(8.16)
where
Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 94
with
It can be verified that
«o = Oo = U
V = / o ° ( / i , / 2 , ...,/„)
where fo = YA=I fi, whose network realization is
fo = C / l , / 2 , —,fn 3°
According to Example 7.1.10, we know that
fi(x) = C < x >u~\ C X-3u~ l | Ci,di D°
(8.18)
(8.19)
(8.20)
(8.21)
where c, = a\ and di = (a(f,_i + &(•). Substitute this to equation (8.20) , we obtain the
whole network as follows
y = C / i , / 2 , ...,/„ Z>° {/,• = C M,1,M»2 I c « ' ^ « 3 °
= < x > t i~1
[ Mi2 = rz x z i ' - 1
(8.22)
Applying Theorem 6.10, the above network can be simplified to
y — C Mll,/i 2l,---,Mnl,Ml2,M22,---,Mn2 | Ci, C2, ..., C n, dX , d 2 , d n 3°
^ = < x >«••-! (8.23)
Hi2 = r z X D ' - 1
Based on the network model of (8.23), the network realization of the contrast stretch
ing technique (see section 2.2.1) is
v = c < u >°, < u > a,< u > b, r z « z i a , r z u3 b \ a , (/? - a ) , ( a - fl + 7), (va - a/3), (vb - va + a/3 - 67) Z)°
Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 95
Similarly, the network realization of the clipping is
v =C< u >°, < u > b, C C u3 b | /?, -/?, va, (vb - va) D° (8.24)
Finaly, the network realization of the thresholding is
(8.25)
8.2.2 Order Statistic Filtering
Order statistic filtering is a simple yet powerful technique widely used in signal processing
called the k t h order statistic of the array. An order statistic filter is a linear combination
of the order statistics. Variation in the weights results in various members of the order
statistic filter (OSF) family, some of which have been intensively studied and widely used
in signal processing. Median filters is one example [77, 46, 38, 68]. Order statistic filters
have the virtue of being very simple and having the edge preserving property. Their
performance is, if not optimal, satisfactory in most cases.
To formulate the problem of order statistic filtering in such a way that a network
realization can be derived, some functions need to be introduced. They are the rank
function, the comparison function, and the order statistic function.
Definition 8.1 For X = (xx, x 2 , x n ) , the rank function of Xi is
[11, 56]. The core operation is to find the k t h largest element in an array. This element is
ran k(Xi :X) = k, if X{ is the k t h largest element in X (8.26)
Definition 8.2 For X — (xx, x 2 , x n ) , the comparison function for x,- and Xj is
c(Xi,Xj) = <
0 Xi > Xj
0 Xi = Xj i > j
1 Xi = Xj i < j
1 Xi < Xj
(8.27)
Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 96
Definition 8.3 For X = (a;1} x 2 , x n ) , the order statistic function is
os(k : X) = Xi if rank(x< : X) = k (8.28)
From the above definitions, we have the following theorem.
Theorem 8.1 The rank function rank(xi : X) and the sum of comparison functions
related to x,- have the following relationship,
n
rank(xi : X)— ^2c(xi,Xj) + I j / i (8.29)
The core operation in order statistic filtering is to to find the k t h largest member of
an array of pixels. For instance, the median filtering is to find the { r^) t h element of
the input array. Therefore, it is necessary to derive the network realization of the order
statistic function first.
Let i
m = Xi if rankfx,- : X) = k
V ' V i e {1,2,...,n} (8.30) 0 otherwise
then it is trivial to show that
os ( * :X) = (8-31) »'=i
that is,
os(k:X) = f0o(f1,f2,...,fn) (8.32)
where
/o = E / i (8-33)
which can be realized by a linear neuron as
/ o = C / i , / 2 , . . . , / „ D ° (8.34)
Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 97
Let
F(Xi) ± { 1 if rank(x,- : X) — k 0 otherwise
Vi € {l,2,...,n} (8.35)
then,
, Xi if F(xi) = 1 fi(x) = { ; ViG{l ,2 , . . . ,n}
0 otherwise
According to Example 7.1.11, function fi(x) can be realized as
Mx) =C< xu F{xi) | 1, b > a+b, F(xi) | 1, a D°
(8.36)
(8.37)
where a < min{min{xi},min{a;2},min{a;n}} and b > max{max{ii}, m a x{ i2}
max{in}} — a. According to Example 7.1.8, the network realization of F(xi) is
F(xi) =CC rank(xt- : X) • f c , C rank(x,- : X) Zi k+1 | 1, -1 D°
According to Fact 7.1, the network realization of the rank function is
(8.38)
Since
rank(xi : X) =C c(xi,xi), ...,c(xi,xi_1),c(xi,xi+1), ...,c(xi,xn) D
1 «Z/j ^ x% c(xi,Xj) - \ if z > j
0 otherwise
- l (8.39)
(8.40)
and
1 Xj > Xi if i < j
0 otherwise
according to Example 7.1.4, and Fact 7.3, the network realization of c(xi,Xj) is
C C Xi, Xj | — 1,1Z3° j - 1 D - 1 if i > j
(8.41)
C Xi,Xj | - 1, I n 0 if i < j i € {l,2,...,n} (8.42)
Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 98
Therefore, the network realization of the order statistic function is
V = C / i , / 2 , ...,/„ D°
{/,-=C Pi,F(Xi) | l ,<O 0
{pi =< xu F(xi) | 1,6 > a + 6
{F(xi) = C Hiufii2 I 1,-1 D°
pu = LZ rank(xj : X)Z3 k
pi2 = C rank(xj : X)3 k+1
{rank(xj : X) =C d x (x j , X i ) , ^ ( X J , X i_ j ) , C 2 (XJ, xi+1),C2(XJ, x„) D _ 1
{a\(xj, Xj) =C c^x^Xj) | - 1 D _ 1
Ci (x ,- ,X j ) = Cx,-,Xj | 1, —lZZf
c 2(x t",Xj) — LZXj, Xj | l , l d
where i £ {1,2, ...,n}. The above network can be simplified to
V —, C Pl,Pll, Hl2, A*2, H21, H22-, A*n, Pnl, Pn2 | 1, O, ~«, 1, «, ~ « , 1, «, ~« 3°
{/X,- =< X,-, / i j i , ^ j 2 | 1, 6, -6 > a + 6
/^ti — CZ CJI , . . . , C j j _ j , Cj j_|_i ,...,Cj > n | 1,..., 1,1,...,Id
fii2 — C CJI, C j ( j _ i , C t , i + i , C j j n I — 1 , — 1,1, lZZ\ k 1 + 1
(8.43)
(8.44)
id. CXi,Xj | l , - l n ° if i > j
C X J , X J | - 1,1D° if i < j
This network has a depth of 4 and a size of 2n(2n + 1). It is shown in Figure 8.2 for the
case n = 3.
This network is called OSnet and is denoted as OSnet(k,n), where n is the number
of inputs and k means that the network's output is the k t h largest element of the input
array. This network's schematical representation is shown in Figure 8.3.
The OSnet can be made adjustable by changing
Hij LZ Ci\, ..., Cj ti—\, , •• •, Q,n | 1,1,...,lZ] k-i+j-1 (8.45)
Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 99
Figure 8.3: The Schematical Representation of OSnet
Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 100
1 Xi
x2 r© Figure 8.4: The Schematical Representation of Adaptive OSnet
where j € {1,2}, to
,%+\i ••••> c t , n , X s
(8.46) | - 1 , . . , - 1 , 1 , . . . , 1 , - 1 ^ ' — 1
where xs is used to change the value of k. This new network model is called adaptive
OSnet. Its schematical representation is shown in Figure 8.4
Median Filtering
Median filtering is a simple operation in which the median value of the data within a
sliding window centered at a point is taken as the output of that point. Median filtering
has been shown to be effective in suppressing impulsive noise components while preserving
edges of a signal. It has been applied in several areas of digital signal processing which
include speech processing and image enhancement.
For an array of (2m + 1) numbers, the median is the (m + l) t h largest number.
Therefore, an OSnet(m + 1,2m + 1) can implement the median filter. Detailed account
of this network can be found in [83].
Separable Median Filtering. When median filtering is extended to two or more
dimensions, signal features are generally distorted or deleted. Separable median filtering
proposed in [68] yields somewhat better results in preserving signal features.
Separable median filters consist of two types of one-dimensional median filters — one
oriented in the horizontal direction and the other in the vertical. More explicitly, the
Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 101
output value, y(l,k), at position (l,k) is given by
v(l, k) = median{z(/ — L, k),z(l, k),z{l + L, k)}
where z is defined as
z(p, q) = median{u(p, q-K),..., u(p, q),u(p, q + K)}
(8.47)
(8.48)
and u(l, k) are the sample values of the input signal.
A separable median filter with a window size of (2L + 1) x (2i^+l) can be implemented
by using (2L + 2) or (2K + 2) OSnets depending on whether the horizontal direction or
the vertical direction is median filtered first. The network of a separable median filter
with the horizontal direction filtered first is shown in Figure 8.5.
u(l-LJc-K) u(l-LJc) u(l-L,k+K)
u(l,k-K) u(l,k) u(l,k+K)
u(l+L,k-K) u(l+L,k) u(l+L,k+K)
z(l-LJc)
z0»
0
v(W
z(l+L,k)
Figure 8.5: A Max/Median Filter Realized by Using OSNets
Max/Median Filtering. Max/Median filtering is proposed by Arce and McLough-
lin [8] in order to achieve better preservation of signal features. In the two dimensional
case, if samples in lines separated by 45° is taken into account, the Max/Median filtering
is defined as
y{l, k) = max{z(s, 1), z(s, 2), z(s, 3), z{s,4)} (8.49)
Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 102
where
z(s,l) = median{a(/, k - M), ...,a(l,k), ...,a(/, k + M)}
z(s,2) = median{a(/ — M, k), ...,a(/, k), ...,a(l + M, k)}
z(s,3) = median{a(/ + M,k — M),a(l,k),a(l - M,k + M)}
z(s,4) = median{a(/ - M , k - M ) , ...,a(/, fc), ...,a(/ + M, k + M)}
where a(/, A;) is the pixel's value of the input picture at position (/, k), and y(l, k) is the
output of the filter at the same position.
A Max/median filter with a window size of (2M + 1) x (2Af + 1) can be realized by
five OSNets as shown in Figure 8.6.
aG^-M) —r-a(W — T aOJc+M) — -
a(l-M,k) — r
aQ+MJc) — -
a(l+MJc-M) — --4
a(l-MJc+M) —' -
a(l-MJc-M) — -
a 0 « —f-a(l+MJc+M) — L
z(s,D
z(s,2)
z(s,3)
z(s,4)
Figure 8.6: A Max/median Filter Realized by Using OSNets
M a x / M i n Filtering
Taking neighborhood min or max is a generalization of shrinking or expanding the l's
in a two-valued picture. Iterated local min followed by iterated max can be used to
remove high-valued blemishes on a low-valued background. Similarly, iterated local max
Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 103
followed by iterated local min can be used to remove low-valued blemishes on a high-
valued background.
Finding the maximum of an array is equivalent to finding the 1st largest element,
therefore, the OSnet(l,n) can be used. To find the minimum of an array of size n is
equivalent to find the n t h largest element, thus the OSnet(n,n) can be used. There
fore, the Max/Min filtering can be implemented by using OSnet(l,n) and OSnet(n,n)
iteratively.
Adaptive Order Statistic Filtering
A type of adaptive order statistic filters which are called comparison and selection filters
(CS filters) is proposed by Lee and Fam [56]. The output yi of the CS filter with parameter
h at position / through the input array Xi = (x/_m, ...,x/ + m) is defined as
where u\ and v\ are the sample average and sample median respectively, and h is an
integer satisfying 1 < h < m.
Vi = < os(m + 1 + h : Xi) if u; > V[
os(m + 1 — h : X{) otherwise (8.50)
Let
1 if ui > vi (8.51) <
0 otherwise
and
fi A os(m + l + h:Xi) if F = 1
(8.52) <
0 otherwise
(8.53)
Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 104
then, function (8.50) can be rewritten as
yi = fi + h (8-54)
which can be realized by a linear neuron as follows
yi=Ch,f2D° (8.55)
According to Example 7.1.11, we have
/ i =C< os(m + l + h: X{),F | 1,6 > ° + 6 , F\l,aD° (8.56)
and according to Example 7.1.12, we have
f2 =c< os(m + l-h: Xt), F | 1, -b > a, F | 1, - a D~ a (8.57)
In the above two equations, a < min{os(A; : Xi) : k £ fc} and 6 > max{os(A; : X{) : A; £
/C} — a, where fc = {1,2,...,2m -f- 1}. Since os(& : X/) > 0 VA: £ fc, we choose a — 0.
Consequently, network realizations of / i and f2 can be simplified to
/ i =<os(m + l + / i : X , ) , F | l , 6 > 6 (8.58)
and
/ 2=<os(m + l - / i : X j ) , F | l , - 6 > ° (8.59)
According to Fact 7.3, the network realization of F is
F=LZuhv, | 1,-1 13° (8.60)
Since the network realizations of ui and vi are already known, the network realization of
the CS filter is
y, = C / i , / 2 D °
/ i = <os(m + H - / i : X , ) , F | l , 6 > 6
<
/ 2 = <os(m + l - / i : X / ) , J P | l , - 6 > ° (8.61)
{F =C os(m + 1 : X,), | 1, -1 Z3° {vi =C X;_m, X/, X; + m | C, C, C D°
Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 105
where c = l/(2m + 1).
Three order statistic networks are needed to realize the above network. However, we
can combine all the three networks together into one network and reduce the number of
weights needed.
Let us define a function as follows
Z\ = os(m + 1 — h : X\)
z2 = F (8.62)
z3 = os(m + 1 + h : Xi)
where F is defined as in (8.51). Recall the network realization of the order statistics
function described earlier, we can now combine three such networks to form a network
realization for function defined in (8.62). After simplification, the network realization is
as follows.
Z\ — C Mn, M12, • Min D°
Z2 = (Z M21, M22, M2n, £/-m, X l + m | 1, 1, 1, — C, — C ZJ°
z3 = C M31) M32, M3n Z)°
{pki =< a;/_m+t_i,MfcoMfc,- I 1, 6, —6 >6 » G K and fc <G {1,2,3}
H\i = C c,i,c i 2, ...,cin
<
H2ki = ZZ qi,c<2,...,c,-n
• a;/_ m +t-i, x ( _ m + j _ i | 1,-1 Zf \ii> j
IZ x / _ m + , _ i , X ( _ m + j _ ! | - 1,1 if z < j
where dk{ = m + 1 + (fc — l)/i — i , and n = 2m + 1, i.e., the number of input elements.
This network model has a depth of 4 and a size of n(8n + 5).
(8.63)
i cij =
i e£ and fc € {1,2,3}
i,j <E IC
Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 106
X l
Z i /
• V Z2 /
* \ -
Z3 7
1 h
,b Q)
0 L)
0 Q)
Figure 8.7: A Network Model of the CS Filter
Now, the CS filer has a network realization as follows:
y, = C / i , / 2 D °
(8.64) fi = <z1,z2\l,b> b
. /2 = < zz, z2 | 1, -b >°
The whole network has a depth of 5 and a size of n(8n + 5) + 6. This network is shown
in Figure 8.7.
8.2.3 Directional Filtering
Although smoothing techniques can clear a noisy image to certain degree, a common
side-effect they have is that edges get blurred. To alleviate the blurring effect, directional
filtering is sometimes used. An example is the directional averaging filter described in
Chapter Two.
Suppose the directional averaging is done along four different directions, say, 0°, 45°,
90°, 135°, then for each direction, an average is calculated as
v(m,n:9i) = u ( m - k , n - l ) , 0{ <E { (« - 1)45° : i € X = {1,2,3,4}} (8.65)
where We{ is the window along the direction of 0,-, and N is the window size which is the
same for all the windows. An optimal direction 6* is chosen such that \u(m, n) — v(m, n :
Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 107
9*)\ = min{|u(m, n) — v(m,n : #;)| : i £ I}. The desired result is
v(m,n) — v(m,n : 9*) (8.66)
Let y = v(m,n), ?/,• = u(m, n : 9*), x = u(m, n), and fc = |y — yi\. We define the
following function
fi = { yi if rank(</>,- : $) = 4
0 otherwise
where $ = (^1? <>2, <j>3, 4>A)- It is easy to show that 4
2/ = J2f i=i
which can be realized by a linear neuron as follows.
(8.67)
(8.68)
y = C / i , / 2 , / 3 , / 4 D ° (8.69)
Let
A I 1 if rank(< ,- : $) = 4 I^i —
0 otherwise
then,
fi = S Vi ifFi = l
0 otherwise
which, according to Example 7.1.11, has a network realization
fi=C<yi,Fi\l,b> a + b,Fi\l,aD°
(8.70)
(8.71)
where a < min{min{t/,} : i £ 1} and b > max{max{y,} : i £ X} —a. Since ?/,• > 0 Vi £ X,
we choose a = 0. Consequently, the above network can be simplified to
fi=<yuFi\l,b> b (8.72)
Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 108
(8.73)
(8.74)
i cij (8.75)
The network realization of Fi is
Ft =CC rank(& : $) ID4, C rank(fa : $) ZI5 | 1, - 1 Z>°
Because rank( >,- : $) < 4, this network can be simplified to
F{x{ : X) =LZ rank(& : *) ZI4
The rank function has a network realization as follows.
rank(<^ : $) - C c t l , c , - , , - - i , c,-4 | - 1 , - 1 , 1 , 1 D~*
LZ | 1,-1 Z)° iii>j <
- l , l Z ) ° if * < i
The network realization of fa is
fa =C< x, y{ | 1, -1 >°, < x, y{ | - 1 , 1 >°D°
The network realization of ?/,• is
Vi = C x-, x 2 , x f | d, d,d D°
where where cZ = 1/iV, and x is the j t h element of the sequence X , which is the set of
pixels in the window W6i. Note that if N = 2M + 1, x = xf+1. Composing all the network realizations together, we get the network model for the
directional averaging filter. After simplification, the network model is as follows.
V = C / i , / 2 , / 3 , / 4 D°
{/,- =< x], x 2 , ...x?, Fi | d, d , d , b >b i e l
{Fi =C C i i , . . . , C i ) , - _ i , C i i j + i , . . . , C , - 4 Z) 4
{cfj =LZ u},p2,n),n) ZZ\°
(8.76)
(8.77)
(8.78)
A f + l ..,X AT — d,—a", 1 — d,— c? >°
Mt2 = <x, 1 , . . ,xf ,x ,x^+ 1 , . . . ,xf |<*, . . . ,d,d-l , . . . ,d>°
where i , j G X. This network model has a depth of 5 and a size of 4(37V + 17). For the
case N = 3, it is shown in Figure 8.8, where the connection weights are not shown.
Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 109
Figure 8.8: A Network Realization of the DA Filter for the Case N = 3
Chapter 9
N E T W O R K R E A L I Z A T I O N S O F I R T E C H N I Q U E S
Image restoration (IR) techniques are classified here, as in Chapter Eight, into two classes:
the linear and the non-linear filtering techniques. The inverse filter, the Wiener filter, and
so on belong to the first class; the Maximum Entropy filter and the Bayesian methods
belong to the second. This chapter shows derivation of LQT networks for realizing
some linear and non-linear filters, and a comparison with the Hopfield network which is
commonly used for solving optimization problems.
9.1 Network Realizations of Linear Filters
For most linear filters, the essential problem is to find the best estimate f such that the
following cost function
£(f) = a(||g-Hf||2-||n||2) + ||Qf||2 (9.1)
is minimum (see Section 2.2). The solution to this problem is
f = ( H T H + A Q T Q ) x H T g (9.2)
where A = 1/a. For function (9.2), the network realization, as shown in Chapter Eight, is
f = C g | W D 0 (9.3)
where
W = ( H T H + A Q T Q ) - 1 H r (9.4)
110
Chapter 9. NETWORK REALIZATIONS OF IR TECHNIQUES 111
and 0 is a row vector of zeros.
However, the complexity of image restoration stems from the difficulty of calculating
the inversion ( H T H + AQTQ) \ This is the inversion of a large matrix. Suppose we are
dealing with a picture of size 64 x 64 (a very small picture), the matrix ( H T H + AQTQ) has the size of 4096 x 4096. To reduce the computation complexity, Discrete Fourier
Transform (DFT) techniques [26] or iterative methods are used [40]
Here, we derive a network model which implements the steepest decent method, which
is the simplest iterative method.
The steepest decent method finds the optimal estimate by making an initial guess
and approaching the true optimal estimate step by step. Denoting the estimate at time
fc as f(fc), then the difference (f(fc + 1) — f(fc)) is proportional to the opposite direction
of the gradient of the cost function E at time k. The gradient vector of E is usually denoted as S7E, which at time k is
VE(k) = -2(AQ TQ + H T H ) f + 2 H T g (9.5)
Therefore, the estimate at time fc -f 1 is
f(fc + l) = i(fc).+ £((AQTQ + HTH)f(fc) - H T g )
= (I + aQ TQ + 6 H T H ) f (fc) - 8H Tg (9.6)
where 8 takes a small positive value. The optimal 8 is a function of f (fc). For the sake of
simplicity, it is usually chosen as a constant [93].
The network realization of function (9.6) is
f(fc + l )=Cf(fc )|W3 t (9.7)
where
W = I + £AQ TQ + 8H TH (9.8)
Chapter 9. NETWORK REALIZATIONS OF IR TECHNIQUES 112
and
t = 6 H r g (9.9)
For an n x n, this network has a depth of 1 and a size of n 4 .
The task of finding the network realization of linear filters is now reduced to find the
weight matrix W and the threshold vector t, which only involve simple matrix operations
such as transposition and multiplication. The following are network realizations of two
well-known linear filters.
Psudo-inverse Filter. For the inverse filter, the weight matrix and the threshold
vector are
W = I + 6H TH (9.10)
t = <5Hrg
Wiener Filter. For the Wiener filter, the weight matrix and the threshold vector
are
W = I + ffif-1^ + 6 H T H ^
t = SWg
The matrix Rf is usually assumed diagonal, hence its inverse can be easily calculated.
9.2 Network Realizations of Non-linear Filters
Similar, to the network realization of image enhancement techniques, there is no universal
network architecture for non-linear image restoration filters. Here, we show the derivation
of LQT networks for realizing some of these filters and the limitation of LQT networks.
M a x i m u m Entropy Filter The cost function of a maximum entropy filter is
J(f) = Flnf - A(||g-Hf|| 2 -||n|| 2 ) (9.12)
Chapter 9. NETWORK REALIZATIONS OF IR TECHNIQUES 113
The gradient vector is
VJ(f) = lnf + 1 - A(2H r(g - Hf)) (9.13)
where 1 = ( 1 , 1 , 1 ) T . Based on the steepest decedent method, the estimate at time
(ib + 1) is
f(k + l) = f(fc)-^(lnf(fc) + l + A(2H T(g-Hf(fc))))
= (I-2A<5HTH)f(fc) + (2<5AHTg-l)-<51nf(fc) (9.14)
The realization of the term In f by using LQT networks requires a huge amount of neurons.
One way to reduce the network size is to include a new type of neurons which have
the logarithm as their activation function. Since the logarithm function can be easily
implemented in analog circuits [64], it is not unreasonable to include such a neuron
model.
Let us denote the natural logarithm activation function as N. Function (9.14) can be
realized by a network model shown in Figure 9.1. Where
W = I - 2 A £ H T H (9.15)
and
t = 2<5AHTg - 1 (9.16)
M a x i m u m Likelihood Filter The cost function of a maximum likelihood filter is
A*) = " | ( g - *{Hf ^ R ^ g - s{Hi}) + lnp(g) + K (9.17)
The gradient vector is
VJ(f) = B ^ D R - ^ g - 5{Hf}) (9.18)
Chapter 9. NETWORK REALIZATIONS OF IR TECHNIQUES 114
fi(k)
f.OO fn(k)
where
Figure 9.1: A Network Realization of the Maximum Entropy Filter
D 4 D i a g { ^ | x = k} (9.19)
and bi are the elements of the vector b = Hf". The estimate at time (fc + 1) is
i(k + l) = f(k) - 5 H r D R ^ 1 g + (5HTDR~15{Hf(A;)}
= Wf(fc) - t + Qs{Hf(fc)}
Assume the noise term n is independent, then ( R " 1 ) 7 = R^ 1 , hence we have
(9.20)
m r D R ~ 1 s { H f } - m ' R ^ D s j H f } (9.21)
The term
Ds{Hf} = Diag(s'(b1),s'(b2),..,s'(bn))(s(b1),s(b2),..,s(bn)) T
= (s'(b1)s(h),s'(b2)s(b2),...,s'(bn)s(K)) T (9.22)
where s'(i,) = x = bi. The term s'(6,-)s(6,-) is a multiplication of two variables,
whose network realization by LQT networks requires a huge amount of neurons. One
way to reduce the network size is to use the sigma-pi neurons mentioned in Section 2.1.
Chapter 9. NETWORK REALIZATIONS OF IR TECHNIQUES 115
In the case when s(.) is linear, function (9.20) is reduced to
i(k + l) = i(k)-SR TR?g + 6H. TTL?s{H.i(k)}
= (I + H T R~ 1 H)f(fc) - ^ H T R z 1 g (9.23)
This filter then can be realized by the network model of (9.7) with
W ^ I + H 3 ! ^ 1 ! ! (9.24)
and
t = 6RTR?E (9.25)
M a x i m u m A Posteriori ( M A P ) Fi lter The cost function of a MAP filter is
J&) = -\(s-s{Bi})TR?te-8{Ht})
- f ) T R7 1 (? - f) + lnp(g) + K (9.26)
The gradient vector is
VJ(f) = H ^ D R - ^ g - s{Ui}) - Kfii - f) (9.27)
where D is defined as in (9.19). The estimate at time (k + 1) is
f( ib+l) = f(k) — H T D R ~ 1 g + 6H T DR^s{Hf(fc)} + ^ R j 1 ^ ^ ) - f)
= (I + ^Rj x ) f ( fc) - < 5R7 1 f -<5H T DRZ 1 g + (5HTDR^1s{Hi(fc)}(9.28)
Because of the term 6H rDR^ 15{Hf(fc)}, we have the same problem as in the implemen
tation of the maximum likelihood filter.
In the case when s(.) is linear, the MAP filter is reduced to
i(k + !) = (! + SRJ1 + 6H TR^n)i(k) - ^ R j 1 ? + H T R^ 1 g) (9.29)
Chapter 9. NETWORK REALIZATIONS OF IR TECHNIQUES 116
This filter can be realized by the network model of (9.7) with
W = I + SR-J1 + 6H TR^H (9.30)
and
t = ^Ry 1? + H ^ g ) (9.31)
9.3 Comparison with the Hopfield Network
Zhou, Chellappa, et al. applied the Hopfield network to solve an image restoration
problem [99]. The restoration criterion they chose is that of minimizing the following
cost function
E = i | | g - H f | | 2 + ^ | | D f ||2 (9.32)
where || . || represents the L 2 norm, A is the Langrange multiplier, and f is the restored
image. Such a constrained cost function is widely used in the image restoration problems
[6] and is also similar to the regularization techniques used in early vision problems [73].
The Hopfield network can minimize the following energy function:
EH = - i v T T v - 0Tv, (9.33)
where T is the connection matrix, and 0, are the output and the threshold of the i t h
neuron respectively. By choosing the right connection matrix T and threshold vector 0, the minimization problem of the restoration can then be solved by the Hopfield network.
The cost function (9.32) has to be modified to fit the Hopfield network model because
an image vector is not binary. Since each pixel takes integer value (usually ranging from
0 to 255), it can be represented by a limited number of neurons, such as
f, = J2cjVj (9.34)
Chapter 9. NETWORK REALIZATIONS OF IR TECHNIQUES 117
An example is Cj = 2^ x \ and L = 8, which is a binary representation of an integer
ranging from 0 to 255. Equation (9.34) can be expressed in the matrix form as
f = C v (9.35)
where
C =
ci,..,,ct, 0,...,0, 0,...,0
0, ...,0, c i , . . . , c i , 0, ...,0
0,...,0, 0,...,0,
Substitute (9.35) to (9.33), we get
Ci, ...,cL
(9.36)
E = i ( g - H C v ) r ( g - H C v ) + i A v T C T D T D C v
= I ( g T g - 2 g T H C v + v T C T ( A D T D + H T H ) C v )
Minimizing E is equivalent to minimizing
1 E' = - ( v r C r ( A D T D + H T H ) C v - 2 g T H C v ) (9.37)
Comparing (9.37) with (9.33), it can be seen that a Hopfield network can solve the
restoration problem by setting its parameters as follows:
(9.38) T = C T ( A D r D + H T H ) C
e = gT H c
The Hopfield network can solve any optimization problem whose cost function can
be modified to the form shown in (9.32), which is the case for most image restoration
problems. However, some practical problems arise when the Hopfield network is applied
to image restoration.
Chapter 9. NETWORK REALIZATIONS OF IR TECHNIQUES 118
The first problem is that an enormously large number of neurons are needed for image
restoration even when the size of the picture is small. This is because that several neurons
are needed to represent a pixel since each neuron of a Hopfield network has only binary
output. Suppose eight neurons are used to represent a pixel, then for an n x n image,
8 x n2 neurons (e.g., if n=64, then 32,768 neurons), are needed, that is, the size of the
Hopfield network is 64n4.
The second problem is that neurons in the Hopfield network have to be updated
sequentially, i.e., one at a time, to ensure the convergence of the network [33]. This
is a serious restriction, since it nullifies the advantage of parallelism offered by neural
networks. Moreover, the need for a huge number of neurons makes this problem much
worse.
Our network model has a size of n4 neurons, and, most importantly, is guaranteed to
work in parallel. Some detailed comparison between our network model and the Hopfield
network in solving optimization problem in general are available in [87].
Chapter 1 0
A P P L I C A T I O N S I N O T H E R A R E A S
The purpose of this chapter is two-fold. First of all, we shall show that the applicability
of our methodology as well as the LQT networks is not limited to the area of image
processing. Secondly, we shall show that by using our methodology, better neural network
implementations can be easily developed. To serve such purpose, we have chosen some
algorithms which are used in areas other than image processing and have known neural
network implementations developed by other researchers.
1 0 . 1 Sorting
Sorting is one of the most important operations performed by computers. It is believed
that 25%-50% of all the works performed by computers consist of sorting data [1]. In
addition, there exist other techniques which are based on sorting, for instance, the order
statistic filtering. Therefore, increasing the speed of sorting is of great importance and
interest.
Since the advent of digital computers, many sorting schemes have been proposed in
the literature [1, 51, 61]. Some of the sorting schemes are executed in serial or parallel
digital computers, while others are executed in special-purpose machines. For bounded
fan-in logic circuits (networks of AND, OR, NOT gates), the minimum sorting time
achievable is 0(nlog 2n) for serial sorting schemes, and 0(log2 n) for parallel sorting
schemes (n here is the number of items to be sorted) [1]. For unbounded fan-in logic
circuits, the minimum sorting time is 0(1); however, the circuit size has to be more than
119
Chapter 10. APPLICATIONS IN OTHER AREAS 120
polynomial [13, 22]. It is implicit in [13] that if threshold-logic gates (neurons) are used,
the circuit size can be reduced to polynomial.
It is certainly possible to achieve constant sorting time using a polynomial-size LQT network since it employs threshold-logic neurons as one of three kinds of basic elements.
Here, we derive such a network for sorting integers, and by doing so, give exact size and
depth. This network implements the enumeration sorting algorithm, which is the fastest
sorting algorithm when implemented in parallel [1].
Given an sequence S = (xi, x 2 , x n ) where Xi is an integer, it can be sorted in
two ways: (1) S is sorted to a sequence (2/1,2/2, •••,2/n), where ?/,• < yj iff i < j; and (2)
iS" is sorted to a sequence (y1? y 2 , Vn), where j/; > yj iff i < j. Let us refer to the
first sequence as an increasingly sorted sequence and the second an decreasingly sorted
sequence. Without loss of generality, we shall only consider the decreasingly sorted
sequence and denote it as S~. It can be easily shown that for a sequence S of distinct numbers, it has the following
relationship with its decreasingly sorted sequence S~: yk = x; iff there are (k — 1) elements
in S which is greater than x,-.
To include cases where there are some elements in S which are equal, we introduce
the following definition and theorem.
D e f i n i t i o n 10 .1 x,- t> Xj means that X{ > XJ if i > j or xt- > Xj if i < j.
Define Qi = {XJ : Xj > x,-, VXJ £ 5 - (x,)}, we have the following theorem.
T h e o r e m 10 .1 Given a sequence S, if we arrange its elements to form an new sequence
S' according to the following rule:
p(xi) - k, ifl\Gi\ = k-1
where p(xi) means the position of xt- in S', then S' is a decreasingly sorted sequence of S.
Chapter 10. APPLICATIONS IN OTHER AREAS 121
PROOF The validity of this theorem is obvious. I
To derive a LQT network for sorting, we need to restate the enumeration sorting
algorithm in a more mathematically rigorous way. This requires two special functions
introduced in Chapter Eight; they are the comparison function and the rank function.
The comparison function is defined as
0 xt- > Xj
0 Xi = Xj i > j
1 Xi = Xj i < j
1 Xi j
C^X ^ ^ Xj^j — ^ (10.1)
and the rank function is defined as
(10.2)
(10.3)
rank(x,- : S) — k, if x,- is the k largest element in S
According to Theorem 10.1, the following equation is true. n
rank(Xi : S) = ^c (x , - ,Xj ) + 1 J=I
Now the enumeration sorting algorithm can be rephased as follows:
1. Calculate c(x;, Xj) G I = {1,2,.., n}, but i ^ j ;
2. Calculate rank(xj : S) Vi € T',
3. Vfc € 2T, Vk = xi if rank(x8- : S) = k.
We first derive network realizations for each step and them compose them together
to form the whole network.
For the case i > j ,
C^tCj^ X— * 0 XJ I X j
1 otherwise
1 Xi — Xj > 1 0 otherwise
(10.4)
Chapter 10. APPLICATIONS IN OTHER AREAS 122
The network realization of this function is
C X f Xj ^ — 3C^y Xj | 1 j X ]
For the case i < j,
The network realization of this function is
0 3C j %C j
1 otherwise
1 Xj — X{ > 0
0 otherwise
For the second step, the network realization of the rank function is
rank(xj : S) = C C(XJ,, x i ) , C ( X J , Xj_x) , C(XJ, X J + I ) , c ( x , - , x n ) D _ 1
Substitute (10.5) and (10.7) to (10.8), we have
rank(x,- : S) = C L Z X; , x i | — 1,1 Z I 1 , Z x ; , X j _ i | — 1,1 Z I 1 ,
LZ X i , x i + 1 | - 1,1 Z I 0 , L Z x,-, x n | - 1,1 Z I 0 ! ) " 1
For the third step, let
z(i, k) = <
Then it can be shown that
Xj if rank(xj : S) — k 0 otherwise
n Vk = k)
t = i
which has the following network realization
(10.5)
(10.6)
(10.7)
(10.8)
(10.9)
(10.10)
(10.11)
(10.12)
yk = C z(l,k), z(2, k),z(n, k) D° (10.13)
Chapter 10. APPLICATIONS IN OTHER AREAS 123
Let
1 if rank(x,- : S) = k 0 otherwise
(10.14)
then,
X{ F{ — k
0 otherwise z(i, k) = <
According to Example 7.11.11, the network realization of this function is
(10.15)
z{i,k) = C < Xi,Fi | l,b> c,Ft | l,a D° (10.16)
where a < min{a:,- : i £ I and x t 6 X}, b > max{i,- : i 6 I and Xi € X} — a, and
c = a + b. X is the set of all the possible values X{ may take. The network realization of
Fi, according to Example 7.1.6, is
Fi = C C ri Z l f c , C rt- | 1 , - 1 3 ° (10.17)
where = rank(a;,- : 5). Substitute this equation to (10.11), after simplifying the
resulting network realization, we have
z(i,k) = C < x i , r Z r , - Z l A \ r Z r i Z ] f c + 1 \l,b,-b> c,nriZl k,\ZriZ2 k + 1
(10.lo) \l,a,-aD°
Composing all the network realizations together and after some simplification, we
have the network implementation of the enumeration sorting algorithm:
yk = C fJ-l, /il,fc, Hl,k+l, 1*2, H2,ki M2,fc+1, Hn,Hn,k, Hn,k+\ | 1, «, —«, 1, «, — — , 1, —a Z>°
{ /Xi = < Xi,fiitk,fii,k+i I 1,6, > c
{/*,-,/ = C c{xi,xx),...,c(xi,xi_l),c{xi,Xi+l),...,c(xi,xn) Zi1'1 ( 1 °
r z Xi,xj | - l , 1 z i 1
111,-,^ | - 1 , 1 D ° {c(x,-, Xj) = <
Chapter 10. APPLICATIONS IN OTHER AREAS 1 2 4
{c(xi,Xj) = <
where i , j,k £ 1 and / G k + 1}. There are two special cases in the above network realization, namely, when k — 1 and
k — n. When k — 1, = 1 Vi £ X. Therefore, the network realization of ?/i can be
further simplified to
V\ = C A»i,2» /*2, A*2,2, A»«, A*n,2 I l , - a , l , - a , 1 , - a D ~ n a
{pi =< xt,mt2 | 1,-6 > a
{fr,2 =• c(x,-, x i ) , C ( X J , x,_i), c(x,-, x i + i ) , c ( x , - , i n) Z)1 ( 1 0 . 2 0 )
LZ Xi,Xj | - 1,1 ZJ1
LZ_ Xj | 1 1 ZD
where i,j £ J .
When A; = n, / ^ j 7 l + 1 = 0 Vi G X. Therefore, the network realization of t / n can be
further simplified to
Vn = C / / l , / * l,n, /l2, /t2,n, -, /^n, N , n | 1, a, 1, O, 1, fl D °
{m =< Xi,ui>n | 1,6 > c
{Pi,n =LZ c(xi, x j ) , c ( x i , x.-.i), c(x,-, x , - + 1 ) , c ( x j , xn) ( 1 0 . 2 1 )
Z ^-15 I I 5 1 3
CZ 3?t? \ 1 1 ZD
where i , j G X.
For an input array of n elements, our network has a depth of 4 and a size of n(n 2 + 6n — 5 ) . The computation time of our network is constant irrespective of the size of
the input array. It is always four times the processing time of one single neuron. Our
network for n — 3 is shown in Figure 1 0 . 1 , where the weights which are not shown have
the value of 1. To our knowledge, the aforementioned network is the first published neural network
which employs three kinds of neurons to implement the enumerate sorting algorithm [86].
{c(x,-,xj) = <
Chapter 10. APPLICATIONS IN OTHER AREAS
Figure 10.1: A Neural Network for Sorting an Input Array of Three Elements
Chapter 10. APPLICATIONS IN OTHER AREAS 126
Later, Bruck and Siu propose another neural network model which has an architecture
similar to that of ours [88]. It is composed of threshold-logic neurons and has a depth of
5.
Note that the size of our sorting network is polynomial only with respect to the
number of input elements. This is justifiable in applications where the value range of the
input elements is fixed, for instance, in image processing, the value range of a pixel is
restricted to [0,255].
10.2 Communication
Here, we chose an algorithm which is used in communication and has a well-known neural
network implementation called Hamming network.
10.2.1 Improvement over Hamming Network
In Chapter Two, we introduced the Hamming network which implements a minimum er
ror classification algorithm (referred to as Hamming Classification (HC) algorithm here).
The Hamming network has some shortcomings: (1) the network output is not the ex
emplar; (2) the network has to iterate several times; (3) the matching scores have to be
distinct; and (4) the output dynamic range of neurons has to be very large. Here, we
derive a network which overcomes the above shortcomings.
Assume there are m exemplars, and the size of the input pattern is n. Denote the
input pattern as x, the output pattern as y, the i t h exemplar as e,-, and the j t h element
of ei as e\. The HC algorithm is reformulated as follows:
1. Calculate d,- = \{n + x • e,) Vj G J = { 1 , 2 , m } ;
2. Find dk = max(<f,- : i € J);
Chapter 10. APPLICATIONS IN OTHER AREAS 1 2 7
3 . Assign j/, = t\ Vi 6 J = { 1 , 2 , n } .
The matching score d,- can be expressed as
di = \ S ' * i + \ (10-22)
whose network realization is
dj=Cx\ Sj D _ T ( 1 0 . 2 3 )
From the definition of the rank function, we know that
dk = max(dj : j £ J) iff rank(<4 : D) = 1
Define
A , -Zfc = <
1 if rank(4 : D) = 1 0 otherwise
then it can be shown that
(10.24)
K = £ c i - * i (10-25)
which has a network realization as
Vi =C ^ i , 2 2 , 2 m | e\, e 2 , D ° (10.26)
The network realization of zk is (recall the example in the previous section)
zk =CC rank (4 : D) Z)\ C rank (4 : D) | 1, -1 D° (10.27)
Since rank(cffc : D) > 1, C rank((4 : Z)) • 1= 1. According to Theorem 6.3, the above
network can be simplified to
zk =CC rank (4 : D) Z? | - 1 D~\ (10.28)
Chapter 10. APPLICATIONS IN OTHER AREAS 128
The network realization of the rank function (see Section 10.1) is
rank(d* : D) -C c(dk, dx),c(dk, dfc-i), c(dk, dk+i),.-, c(dk, dm) D 1 (10.29)
where
c(dk,dj) = < LZ dk,dj | - 1,1 D 1 if k > j
\Zdk,dj | - 1 , 1 ZI0 if k<j
Let
ch =LZ dk,dj | - 1,1 ZZ\h
Substituting (10.23) to (10.31), we have
ch = ZZCx\ek D~n/\ Cx\ej D~n/2 \ - 1,1
= LZ x, x | — ek, ej ZJh
= LZ x | (ej - ek) Z3h
Substituting the above network realization to (10.29), we have
rank(dfc : D) - CLZ x \ ekti Z21,LZ x \ efc,fc-i ZI 1 ,
LZ x | 4 , f c + 1 Z 0 , L Z x | ekim Z3°D~1
(10.30)
(10.31)
(10.32)
(10.33)
where ek>j = ej — ek.
Substitute (10.33) to (10.28), and then substitute the resulting network realization to
(10.26). After some simplification, we have the network realization of the HC algorithm
as follows:
{fik —LZ pk<i, ...,ft,it-i,ft,Hi, •••,Hk,m ZJ1
LZ x | ekj ZZ)1 if k > j <
LZ x | 4 j ZJ° if k < j
(10.34)
Chapter 10. APPLICATIONS IN OTHER AREAS 129
Figure 10.2: The Network Model for Implementing the HC Algorithm
where i £ {l,2,...,ra}, k,j £ {l,2,...,m}, and a,- = Z^LieJ- This network has a depth of
3 and a size of m2(n + 1) — m. It is shown in Figure 10.2 for the case m — n = 3. Our network overcomes the shortcomings of the Hamming network mentioned at the
beginning of this section. In particular, since the processing time of the Hamming network
is at least nine units of time, the processing speed of our network is at least three times
as fast as that of the Hamming network. Furthermore, since our network is perfectly
synchronized, input patterns can b e e f e d to the network in a pipeline fashion. The only
disadvantage of our network is that the network size is bigger than that of the Hamming
network (its network size is m(m + n)). Whether or not this is negligible or tolerable
depends on the relative importance of the cost of hardwares for particular applications.
Chapter 10. APPLICATIONS IN OTHER AREAS 130
10.3 Optimization
Optimization has been an major field of application for neural networks since the pub
lication of Hopfield's papers [35, 36]. In Chapter Nine, we have derived some network
models for solving image restoration problems, which are essentially optimization prob
lems. Here, we show the derivation of some network models for solving other optimization
problems, and compare the performance of our networks with the Hopfield network which
is a popular neural network used for optimization.
10.3.1 Solving Simultaneous Equations
The matrix format of the simultaneous equations is expressed as
A x = c (10.35)
where A is n x m matrix, x is a m x 1 vector, and c is a n x 1 vector. The objective
here is to find the vector x which satisfies equation (10.35).
Let us define a computational energy function as follows
£ = ( A x - c f ( A x - c ) (10.36)
This energy function can be shown to be a convex function, therefore any local minimum
is also the global minimum [14]. Since E attains value zero if and only if A x = c,
the problem of solving the simultaneous equation is now transformed to the problem of
minimizing E. If equation (10.35) has a solution, by minimizing E this solution can
always be found. If equation (10.35) has no solution, by minimizing E a best fit solution
in the L2 sense is found.
To find the minimum of E, we use the method of steepest descend. Here the change
of x is along the best searching direction, which is the opposite direction of the gradient
Chapter 10. APPLICATIONS IN OTHER AREAS 131
of E. Denoting Vi£ as the gradient vector, we get
VE = 2 A T A x - 2 A T c
= 2 A T ( A x - c ) (10.37)
Therefore, the increment of x is
A x = -<5AT(Ax - c) (10.38)
where 6 takes a small positive value. The optimum 6 is a function of x. Here for the sake
of simplicity we choose 5 as a constant.
We know that
x(Jb + 1) = x(fc) + Ax(Jb) (10.39)
This leads to
x(fc + l) = (I — <SATA)x(fc) + 6A.Tc
= Wx(Jfc) + t (10.40)
The network realization of the above function is
x(Jb + 1) =C x(Jb) | W D* (10.41)
This network has a depth of 1 and a size of ra2.
Simultaneous equations when transferred to a optimization problem can also be solved
by the Hopfield network—for a detailed account of how the Hopfield network is used for
solving optimization problems, please see Chapter Nine. The Hopfield network, or more
precisely the discrete version of Hopfield network, can only take binary values. Assume
that / neurons are used to encode one number, then the Hopfield network has a depth of
1 and a size of Z x m 2 . That is, it's network size is / times that of our network.
Chapter 10. APPLICATIONS IN OTHER AREAS 132
Another problem with the Hopfield network is its oscillatory behaviour, i.e., wondering
around a minimum. This phenomenon is due to the simultaneous updating of its neurons.
To avoid such behaviour, one way is to update its neurons sequentially, and as a matter
of fact, it is the only way to ensure the convergence of the network. However, by doing
so, the advantage of parallelism is lost. Our network uses simultaneous updating and
has no oscillatory behaviour. It always converges smoothly to the minimum if the search
step is properly chosen.
The precision of the solution obtained by using the Hopfield network is limited by the
number of neurons used. In our network, the precision is proportional to the number of
iterations.
Some results of using our networks to solve simultaneous equations and some detailed
comparison with the Hopfield network are available in [87].
10.3.2 Matr ix Inversion
In image processing and some other areas, matrix inversions are often required. Matrix
inversion is a time-consuming task when performed using digital computers. Here, we
derive a L Q T network for performing matrix inversion.
The objective of finding an inversion of some matrix A is to find another matrix X
such that
Equation (10.42) is a collection of n simultaneous equations (assuming A is a n x n
matrix) as follows:
where X; and i; are the i t h column vectors of X and I respectively. We know from the
A X = 1 (10.42)
Ax,- = U Vi € {l,2,...,n} (10.43)
Chapter 10. APPLICATIONS IN OTHER AREAS 1 3 3
previous example that the network model for solving the simultaneous equations is
x,.(Jfc + 1 ) = c xt- | W D * ' ( 1 0 . 4 4 )
where W = I — 6ATA, and t, = 6ATi,. Therefore, we can solve the matrix inversion by
either using the above network n times, or using n such networks at the same time.
Chapter 11
CONCLUSIONS
11.1 Summary
In this thesis, a systematic method of neural network design, which we call algebraic
derivation methodology, is proposed and developed. This work is motivated by the need
for a systematic method with which neural networks can be constructed for computational
purposes. The lack of such a method has limited the application of neural networks in
many areas, particularly the area of image processing.
The algebraic derivation methodology consists of the following stages:
1. Find the minimum set of neuron models for a given class of functions;
2. Devise symbolic representations for these neurons and their networks;
3. Establish theorems for manipulating these symbols based on the computational
properties of the neurons and their networks;
4. Establish procedures for deriving neural networks from functions;
5. Use these procedures and theorems to derive and simplify network models for spec
ified functions.
This methodology is then applied and developed with an emphasis on deriving networks
to realize image processing techniques.
It is shown informally in Chapter Four and then formally in Chapter Seven that any
image processing technique can be realized by networks of LQT neurons. These neurons
134
Chapter 11. CONCLUSIONS 135
can be easily implemented in hardware, hence the implementation of their networks
should be straightforward.
Symbolic representations of LQT neurons and their networks (LQT networks) are then
devised in Chapter Five. Such representations enable us to algebraically manipulate the
process of network design so as to derive and simplify LQT networks.
Theorems are established in Chapter Six for manipulating symbols mentioned above.
These theorems are like theorems of Boolean algebra and used for network simplification.
Procedures of deriving networks to realize both single-input and multiple-input func
tions are given in Chapter Seven. Their usages are demonstrated through some simple
examples, which in turn are useful in deriving LQT networks in later chapters.
To demonstrate the merits of the algebraic derivation methodology, networks for
realizing some image enhancement and image restoration techniques, as well as techniques
in some other areas are derived in Chapters Eight, Nine, and Ten respectively. Most of
these networks are the first known neural network models; in the case there are other
known network models, our networks have the same or better performance in terms of
computation time.
11.2 Contributions
The main contribution of this thesis is the development of the algebraic derivation
methodology. Some important features of this methodology are as follows: 1) both
the network parameters and the network architectures are derived; 2) neurons and their
networks are represented in symbolic forms; and 3) these symbols are to be manipulated
so as to yield proper network models.
During the development of the algebraic derivation methodology, some specific con
tributions have been made. They are listed as follows.
Chapter 11. CONCLUSIONS 136
Formal definitions of neurons and neural networks have been given. Networks so
defined can simulate the computation process of any network composed of linear-sum
neurons, such as the back-propagation network and the Hopfield network. Therefore,
they provide a unified framework for studying the computational properties of neural
networks, and may also serve as a model of general purpose neural computers.
Symbols to represent LQT neurons and their networks have been devised. Such
symbolic representations enable us to algebraically manipulate the process of network
design so as to yield proper network models.
A type of neural networks called LQT networks have been established with the ob
jective of realizing image processing techniques. These networks use the simplest neuron
models, and thus should encounter minimum difficulty when implemented in hardware.
The LQT networks are capable of realizing any image processing technique and some
techniques in other areas.
To simplify LQT networks, that is, to find networks with fewer layers and/or fewer
neurons, the concept of network equivalence has been introduced, and theorems of net
work equivalence have been developed.
Procedures for deriving neural networks to realize both single-input and multiple-
input functions have been developed. They are effective in deriving LQT networks as
shown in this thesis. Moreover, they are general procedures, not just restricted to the
LQT networks.
These theorems and procedures also provide a tool to analyze the computation com
plexity of neural networks. Consequently, not only we can evaluate different networks
which realize the same function, but also we can make some concrete comparison between
neural networks and other parallel computing machines.
Network models have been derived for implementing some well-known algorithms in
image processing and some other areas. For example, we have derived networks for
Chapter 11. CONCLUSIONS 137
implementing the median filter and the Wiener filter, and a network for sorting. Most
of the networks we have derived are the first known such neural network models; in
the case there are other known network models, our networks have the same or better
performance in terms of computation time. For example, the computation time of our
sorting network is the same as that of the sorting network developed by Siu and Bruck
[88]; the computation time of our network which implements the Hamming Classification
algorithm is at least two times less than that of the Hamming network developed by
Lippmann [58].
Some interesting questions are raised during the development of the algebraic deriva
tion methodology, one of which is as follows: what is the minimum number of weights a
network must have in order to realize a given function? This is an important question
with both theoretical and practical values. It is evident, (see [30] for example), that
any problem solvable by digital computers can be solved by neural networks in a fixed
amount of time irrespective of the problem size. However, the number of weights is pro
portional to the problem size. Consequently, we may encounter such a situation that a
neural network may solve a given problem in, say, nanoseconds, but it may take millions
of years to construct such a network. Therefore, it is important to know the minimum
number of weights needed to solve a problem by neural networks. To our knowledge, this
important question hasn't been seriously considered in the field of neural networks.
11.3 Future Work
The algebraic derivation methodology is developed with an emphasis on deriving networks
to realize image processing techniques. It is desirable to expand this methodology to
form a unified theory of algebraic derivation. To do so, we may start with identifying
basic functions used in many areas such that a minimum set of neuron models can be
Chapter 11. CONCLUSIONS 138
constructed, in much the same way as in Boolean algebra where the basic functions are
AND, OR, and NOT. Of course, the basic functions we are looking for are at a much
higher computation level so as to build more efficient computing machines.
As shown in Chapter Nine, some unconventional neuron models are more efficient for
realizing certain functions. One of such neuron models is the so-called sigma-pi neuron
(see [15]). Since it is not difficult to implement such neurons in hardware [64], there is
no reason to exclude them. We may also need to search for other neuron models which
are either computationally more efficient or easier to implement in hardware. That is,
we look for the basic neuron models from both perspectives of pure computation and
hardware implementation. This task is an essential part of the task mentioned in the
previous paragraph.
Another purpose of developing a theory of network design is to evaluate the efficiency
of neural networks as computing machines. A solid theory is needed for us to make
concrete comparisons between neural networks and other parallel computing machines.
Ultimately, neural networks have to be implemented in hardware for any practical
use. Since the neuron models used in our networks already have corresponding hard
ware implementations, it is reasonable to expect easy implementations of these networks.
However, hardware problems, such as the effect of inaccuracy in setting up network pa
rameters and thermal noise in analog circuits, have to be taken into consideration. Some
theoretical and experimental works are certainly needed.
Fault tolerance is one of the characteristics of many existing neural network paradigms,
and is considered a remedy to the inherent imperfection of analog implementations [64].
Fault tolerance is believed due to the distributed nature of rule representation in neu
ral networks and the redundancy of neurons. Our networks, however, are derived with
an emphasis on minimizing the number of neurons, thus eliminating such redundancy.
Hence, it is expected that our networks are less fault tolerant than networks having
Chapter 11. CONCLUSIONS 139
component redundancy. This, however, may be solved by using some fault tolerance
techniques such as those already developed for digital computers. No matter what the
result will be, such an approach of explicitly building in fault tolerance ability may give
us some hints to better understanding of the fault tolerance ability of both artificial and
biological neural networks.
In this thesis, we are only concerned with deriving networks for solving problems
which have analytical solutions. There are also many interesting problems (e.g. pattern
recognition problems) which only have partial solutions. How to utilize this partial
knowledge is thus important. The underlining principle of our methodology may be
applied to help create better learning schemes with which not only network parameters
but also network architectures are adjusted in the learning process. Or we may derive
the network architecture based on the characteristics of the problem, and then let the
network adjust its parameters in the learning process.
Appendix A
T H E G E N E R A L I Z E D D E L T A R U L E
The generalized delta rule is an iterative gradient search algorithm. The objective of the
algorithm is to minimize the mean square error between the actual output of a back-
propagation network and the desired output.
Step 1 Initialize Weights and Thresholds:
Set all weights and neuron thresholds to small random values.
Step 2 Present Input and Desired Outputs:
Present a continuous valued input vector Xo,Xi,...,xn_i and specify the desired out
put ao,di,...,dm-i- If the network is used as a classifier then all desired outputs
are typically set to zero except for that corresponding to the class the input is
from. The input could be new on each trial or samples from a training set could
be presented cyclically until weights stabilize
Step 3 Calculate Actual Outputs:
Calculate outputs yo,yi,...,ym-i-
Step 4 Adapt Weights:
Use a recursive algorithm starting at the output neurons and working back to the
first hidden layer. Adjust weights by
Wji(h + 1) = Wji(h) + t]SjXi (A.l)
140
Appendix A. THE GENERALIZED DELTA RULE 141
In this equation Wji(h) is the weight from the i t h hidden neuron or from the i t h
element of the input to the jth neuron at time h, X{ is either the output of the i t h
neuron or the i t h element of the input , rj is a gain term, and 8j is an error term for
the j " 1 neuron. If the j t h neuron is an output neuron, then
= VA1 ~ Vj)(dJ ~ Vi) (A-2)
where dj is the desired output of the j t h neuron and yj is the actual output.
If the j t h neuron is an hidden neuron, then
Sj = Xj(l - Xj) ^2 SuWkj (A.3) fc
where k is over a l l neurons in the layers above neuron j. Thresholds are adapted in
a similar manner by assuming they are connection weights on links from auxiliary
constant-valued inputs. Convergence is sometime faster if a momentum term is
added and weight changes are smoothed by
Wji(h + 1) = wji(h) + rjSjXi + \(wji(h) - w{j(h - 1)) (A.4)
where 0 < A < 1.
Step 5 Repeat by Going to Step 2.
Appendix B
H O P F I E L D N E T W O R K A L G O R I T H M
The Hopfield network algorithm is described as follows.
Step 1 Assign Connection Weights
Wij - V«,i€{l,2 , . . ,n} (B.l) 0 * =j
where m is the total number of exemplars and n is the total number of neurons in
the network. In this formula Wij is the connection weight from the jth neuron to
the i t h neuron; and x* £ {0,1} is the i t h element of the exemplar for class s.
Step 2 Initialize with the Input Pattern
yi(0) = xi V t € { l , 2 , . . . , n } (B.2)
In this formula yi(k) is the output of the i t h neuron at time k and X{ £ {0,1} is the
i t h element of the input pattern.
Step 3 Iterate Until Convergence
+ = E - = i U W ; ( * ) V i e { l,2 , . . , n }
Vi(k + l) = fT{vi(k+l)) V i€ { l,2 , . . . , n }
where function fx is the activation function of the threshold-logic neuron. The
process is repeated until the network output remains unchanged with further itera
tions. The network output then represents the exemplar pattern that best matches
the input pattern.
142
Bibliography
Akl, S.G., Parallel Sorting Algorithms, Vol. 12 in Werner Rheinboldt (ed.), Notes and Reports in Computer Science and Applied Mathematics, Orlando: Academic Press, 1985.
Albert, A. Regression and the Moore-Penrose Pseudoinverse, Academic Press: New York, 1972.
Aleksander, I. (ed.), Neural Computing Architectures: The Design of Brain-like Machines, The MIT Press: Cambridge, Massachusetts, 1989.
Aleksander, I., Thomas, W.V., and Bowden, P.A., "WISARD, A Radical Step Forward in Image Recognition," Sensor Review, Vol. 4, No. 3, pp. 120-124, 1984.
Anderson, J.A., "A Simple Neural Network Generating An Interactive Memory," Mathematical Biosciences, Vol. 14, pp. 197-220, 1972.
Andrews, H.C. and B.R. Hunt, Digital Image Restoration, Englewood Cliffs, NJ: Prentice-Hall, 1977.
Arbib, M.A. , Brains, machines, and mathematics, 1987.
Arce, G.R. and M.P. McLoughlin, "Theoretical Analysis of the Max/Median Filter," IEEE Trans, on Acoust., Speech, Signal Processing, Vol. ASSP-34, No. 1, pp. 60-69, Jan. 1987.
Barto, A.G. , R.S. Sutton, and C.W. Anderson, "Neuronlike Adaptive Elements That Solve Difficult Learning Control Problems," IEEE Trans, on Systems, Man, and Cybernetics, Vol. SMC-13, No. 5, Sept./Oct., 1983.
Baum, E.B., J. Moody, and F. Wilczek, "Internal Representations for Associative Memory," NSF-ITP-86-138 Institute for Theoretical Physics, University of California, Santa Barbara, California, 1986.
Bovik, A.C. , T.S. Huang, and D.C. Munson, Jr., "A generalization of median filtering using linear combinations of order statistics," IEEE Trans. Acoust., Speech, Signal Processing, Vol. ASSP-31, pp. 1342-1350, Dec. 1983.
Carpenter, G.A. and S. Grossberg, "The ART of Adaptive Pattern Recognition by A Self-Organizing Neural Network," IEEE Computer, March 1988, pp. 77-88.
143
Bibliography 144
Chandra, A.K. , L. Stockmeyer, and U. Vishkin, "Constant Depth Reducibility," Siam J. Comput., Vol. 13, pp. 423-439, 1984.
Cooper, L. and D. Steinberg, Introduction to Methods of Optimization, Saunders: Philadelphia, 1970.
Durbin, R. and D.E. Rumelhart, "Product Units: a Computationally Powerful and Biologically Plausible Extension to Back-propagation Networks," Neural Computation, Vol. 1, 1989, pp. 133-142.
Eberhart, Russell C , "Standardization of Neural Network Terminology," Neural Networks, Vol. 1, No. 2, June, 1990, pp. 244-245.
Elman, J.L., "Learning the Hidden Structure of Speech," Institute for Cognitive Science, University of California at San Diego, ICS Report 8701, Feb., 1987.
Feldman J.A. and D.H. Ballard, "Connectionist Models and Their Properties," Cognitive Science, Vol. 6, pp. 205-254, 1982.
Fox, G.C. and P.C. Messina, "Advanced Computer Architectures," IEEE Computers Magazine, pp. 67-74, 1989.
Frei, W., "Image Enhancement by Histogram Hyperbolization," Computer Graphics and Image Processing, Vol. 6, pp. 286-294, 1977.
Fukushima, K. and S. Miyake, "Neocognitron: A New Algorithm For Pattern Recognition Tolerant of Deformations And Shifts In Position," Pattern Recognition, Vol. 15, No. 6, pp. 455-469, 1982.
Furst M . , J.B. Saxe and M . Sipser, "Parity, Circuits and the Polynomial-time Hierarchy," Proc. 22nd IEEE Symposium on Foundations of Computer Science, 1981, pp. 260-270.
Gallager, R.G., Information Theory and Reliable Communication, John Wiley & Sons, New York, 1968.
Gevins, A.S. and N.H. Morgan, "Applications of Neural-Network (NN) Signal Processing in Brain Research," IEEE Trans, on Acoustic Signal Speach Processing, Vol. 36, No. 7, July, 1988.
Goles Eric and Servet Martinez, Neural and Automata Networks: Dynamical Behaviour and Applications, Kluwear Academic Publishers: Boston, 1990.
Gonzalez, R.C. and P. Wintz, Digital Image Processing, Reading, MA:Addison-Wesley, 1977.
Bibliography 145
Grossberg, S., The Adaptive Brain I: Cognition, Learning, Reinforcement, and Rhythm, and The Adaptive Brain II: Vision, Speech, Language, and Motor Control, Elsevier/North-Holland, Amsterdam, 1986.
Hagiwara, M . , "Accelerated Back Propagation Using Unlearning Based on Hebb Rule," in Proc. of IJCNN'90, 1-617-620, Jan., 1990.
Hebb, D.O., The Organization of Behavior, John Wiley &; Sons: New York, 1949.
Hecht-Nielsen, R., "Theory of the Back-propagation Neural Networks," In Proc. of the Intl. Joint Conf. on Neural Networks, Washington, D.C., New York: IEEE Press, pp. 1593-606, 1989.
Hecht-Nielsen, R., "Counter-Propagation Networks," Proc. of IEEE First International Conference on Neural Networks, New York, 1987.
Helstrom, C.W., "Image Restoration by the Method of Least Squares," J. Opt. Soc. Amer., Vol. 57, Mar. 1967, pp. 297-303.
Hopfield, J.J., "Neural Networks and Physical Systems with Emergent Collective Computational Abilities," Proc. Natl. Acad. Sci. USA, Vol. 79, 2554-2558, April 1982.
Hopfield, J.J., "Neurons with Graded Response Have Collective Computational Properties Like Those of Two-State Neurons", Proc. Natl. Acad. Sci. USA, Vol. 81, 3088-3092, May 1984
Hopfield, J.J., and D.W. Tank, "Computing with Neural Circuits: A Model," Science, Vol. 232, 625-633, August 1986.
Hopfield, J.J., and D.W. Tank, "Simple 'Neural' Optimization Networks: An A/D Convert, Signal Decision Circuit, and a Linear Programming Circuit," IEEE Trans, on Circuits and Systems, Vol. CAS-33, No. 5, pp. 533-541, May 1986.
Hopfield, J.J. and D.W. Tank, "Neural Computation of Decisions in Optimization Problems," Biol. Cybern. Vol. 52, pp. 141-152, 1985.
Huang, T.S. (Ed.), Two-Dimensional Digital Signal Processing II: Transforms and Median Filters, New York: Springer-Verlag, 1981.
Huang, T.S., W.F. Schreiber, and O.J. Tretiak, "Image Processing," Proc. IEEE Vol. 59, pp. 1586-1609, 1971.
Huang, T.S., D.A. Barker and S.P. Berger, "Iterative Image Restoration," Applied Optics, Vol. 14, No. 5, pp. 1165-1168, May, 1975.
Bibliography 146
[41] Hummel, R .A . , "Image Enhancement By Histogram Transformation," Comput. Graph. Image Pore. Vol. 6, pp. 184-195, 1977.
[42] Hunt, B.R. , "Digital Image Processing," I E E E Proc , Vol. 63, pp. 693-708, Apri l , 1975.
[43] Irie, B. and S. Miyake, "Capabilities of Three-layered Perceptron," in Proc. of IJCNN'89, pp. 1-641-648, June, 1989.
[44] Izui, Y . and A . Pentland, "Speed Up Back Propagation," in Proc. of IJCNN'90, pp. 1-639-642, Jan. 1990.
[45] Jain, A . K . , Fundamentals of Digital Image Processing, Prentice-Hall: New Jersey, 1988.
[46] Jayant, N.S., "Average and median-based smoothing techniques for improving speech quality in the presence of transmission errors," I E E E Trans. Commun., Vol. Com-24, pp. 1043-1045, Sept. 1976.
[47] Judd, J.S., "Learning in Networks Is Hard," Proc. of the First Intl. Conf. on Neural Networks, pp. 685-692, IEEE, San Diego, California, June, 1987.
[48] Judd, J.S., Neural Network Design and the Complexity of Learning, The MIT Press, Cambridge, Massachussetts, 1990.
[49] Kandel, E.R., and J . H . Schwartz, Principles of Neural Science, Elsevier, New York, 1985.
[50] Klassen, M.S. and Y . H . Pao, "Characteristics of the Functional-link Net: A Higher Order Delta Rule Net," Proc. of IEEE Second Annual International Conference on Neural Networks, June San Diago, C A , 1988.
[51] Knuth, D.E . , The Art of Computer Programming: Sorting and Searching, Vol. 3, Reading, M A : Addison-Wesley, 1973.
[52] Kohonen, T., "Self-organized Formation of Topologically Correct Feature Maps," Biological Cybernetics, Vol. 43, pp. 59-69, 1982.
[53] Kohonen, T., "Correlation Matrix Memories," IEEE Trans, on Computers, C-21, pp. 353-359, 1972.
[54] Kosko, B. , "Adaptive Bidirectional Associative Momories," Applied Optics, Vol. 26, pp. 4947-4960, 1987.
[55] Kreins, E.R. and L . J . Allison, "Color Enhancement of Nimbus High Resolution Infrared Radiometer Data," Appl . Opt. Vol . 9, pp. 681, March 1970.
Bibliography 147
[56] Lee, Y . H . and A.T. Fam, "An edge gradient enhancing adaptive order statistic filter," IEEE Trans. Acoust., Speech, Signal Processing, Vol. ASSP-35, No. 5, pp. 680-695, May 1987.
[57] Lewis, P.M. II, and C.L. Coates, Threshold Logic, John Wiley & Sons: New York, 1967.
[58] Lippmann, R.P., "An Introduction to Computing with Neural nets," IEEE ASSP Magazine pp. 4-22, April 1987.
[59] Lippmann, R.P., B. Gold, and M.L. Malpass, "A Comparison of Hamming and Hopfield Neural Nets for Pattern Classification," MIT Lincoln Laboratory Technical Report, TR-769, 1987.
[60] Lippmann, R.P., "Pattern Classification Using Neural Networks," IEEE Communications Magazine, Vol. 27, No. 11, pp. 47-64, Nov., 1989.
[61] Lorin, H. , Sorting and Sort System, Reading, MA: Addison-Wesley, 1975.
[62] Matthews, M.B., E.S. Moschytz, "Neural Network Nonlinear Adaptive Filtering Using the Exteneded Kalman Filter Algorithm," in Proc. of the Intl. Neural Network Conf., July 9-13, 1990, Paris, pp. 115-118.
[63] McCulloch, W.S. and W. Pitts, "A Logical Calculus of the Ideas Immanent in Nervous Activity," Bulletin of Mathematical Biophysics, 5, pp. 115-133, 1943.
[64] Mead, C.A., Analog VLSI and Neural Systems, Addison-Wesley Publishing Company, 1989.
[65] Minsky, M . and S. Papert, Perceptron: An Introduction to Computational Geometry, Expanded Edition, The MIT Press, 1988.
[66] Mota-oka, T. and M . Kitsuregawa, The Fifth Generation Computer: The Japanese Challenge, John Wiley & Sons, 1985.
[67] Nakagawa, Y. and A. Rosenfeld, "A note on the use of local min and max operations in digital picture processing," IEEE Trans. Sys., Man, and Cybern. Vol. SMC-8, No. 8, pp. 632-635, Aug. 1978.
[68] Narendra, P.M. "A separable median filter for image noise smoothing," IEEE Trans. Pattern Anal. Mech. Intell., Vol. PAMI-3, No. 1, pp. 20-29, Jan. 1981.
[69] Owens, A.J . and D.L. Filkin, "Efficient Training of the Back-propagation Network by Solving a system of Stiff Ordinary Differential Equations," in Proc. of IJCNN'89, pp. 77-381-386, June, 1989.
Bibliography 148
Pao, Y . H . , Adaptive Pattern Recognition and Neural Networks, Addison-Wesley Publishing Company, 1989.
Parten, C.R., R .M. Pap, and C. Thomas, "Neurocontrol Applied to Telerobotics for the Space Shuttle," in Proc. of the Intl. Neural Network Conf., July 9-13, 1990, Paris,pp. 229-236.
Peeling, S.M., R.K. Moor, and M.J. Tomlinson, "The Multi-Layer Perceptron as a Tool for Speech Pattern Processing Research," in Proc. IoA Autumn Conf. on Speech and Hearing, 1986.
Poggio, T., V. Torre, and C. Koch, "Computational vision and regularization theory," Nature, Vol. 317, pp. 314-319, Sept. 1985.
Posch, T.E., "Models of the Generation and Processing of Signals by Nerve Cells: A Categorically Indexed Abridged Bibliography," USCEE Report 290, August 1968.
Powell, P.G. and B.E. Bayer, "A Method for the Digital Enhancement of Unsharp Grainy Photographic Images," Proc. Int. Conf. Electronic Image Processing, IEEE, UK, pp. 197-183, July 1982.
Pratt, W.K., "Median Filtering," in Semi-annual Report, Image Processing Institute, Univ. of Southern California, Sept. 1975, pp. 116-123.
Rabiner, L.R., M.R. Sambur, and C.E. Schmidt, "Applications of a nonlinear smoothing algorithm to speech processing," IEEE Trans. Acoust., Speech, Signal Processing, Vol. ASSP-23, No. 6, pp. 55-557, Dec. 1975.
Rosenblatt, R., Principles of Neurodynamics, New York, Spartan Books, 1959.
Rosenblatt, R., "The Perceptron: A Probabilistic Model for Information Storage and Organization In The Brain," Psychological Review 65, pp. 386-408, 1957.
Rumelhart, D.E., G.E. Hinton, and R.J. Williams, "Learning Internal Representations by Error Propagation," in D.E. Rumelhart &; J.L. McClelland (Eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Vol. 1: Fundations, MIT Press, 1986.
Sawai, H. , A. Waibel, P. Haffner, M . Miyatake, and K. Shikano, "Parallelism, Hierarchy, Scaling in Time-delay Neural Networks for Spotting Japanese Phonemes/CV-syllables," in Proc. of IJCNN'89, pp. 77-81-88, June, 1989.
Sejnowski, T., and C R . Rosenberg, "NETtalk: A Parallel Network That Learns to Read Aloud," Johns Hopkins Univ. Technical Report JHU/EECS-86/01, 1986.
Bibliography 149
[83] Shi, Pingnan and Rabab K. Ward, "Using the Hopfield neural network to enhance binary noisy images," Proceedings of the Canadian Conference on Electrical and Computer Engineering, Vancouver, B.C. Canada, Nov. 3-4, 1988, pp. 760-763.
[84] Shi, Pingnan and Rabab K. Ward, "Using the perceptron to enhance binary noisy images," presented at the SCS Western Multiconference, San Diego, California, January 4-6, 1989.
[85] Shi, Pingnan and Rabab K. Ward, "A neural network implementation of the median filter," Proceedings of 1989 IEEE Pacific Rim Conference on Communications, Victoria, B.C., Canada, June 1-2, 1989, pp. 513-516.
[86] Shi, Pingnan and Rabab K. Ward, "A neural network structure for sorting non-negative integers in fixed time," Proceedings of 1989 Canadian Conference on Electrical and Computer Engineering, Montreal, Canada, Sept. 17-20, 1989, pp. 420-423.
[87] Shi, Pingnan and Rabab K. Ward, "The case for abandoning the biological resemblance restriction: An example of neural network solution of simultaneous equations," Proceedings of 1990 International Joint Conference on Neural Networks, San. Diego, California, June 17-21, 1990, pp. I l l 875-882.
[88] Siu, Kai-Yeung, and Jehoshua Bruck, "Neural Computation of Arithmetic Functions," IEEE Proceedings, Vol. 78, No. 10, Oct. 1990, pp. 1669-1675.
[89] Takeda M . and J.W. Goodman, "Neural Networks for Computation: Number Representations and Programming Complexity," Applied Optics, Vol. 25, No. 18, pp. 3033-3046, 15 Sept. 1986.
[90] Tukey, J.W., Exploratory Data Analysis, Addison-Wesley: Reading, Massachusetts, 1971.
[91] Van Trees, H.L., Detection, Estimation, and Modulation Theory, Vol. I, John Wiley & Sons: New York, 1967.
[92] Waibel, A., T. Hanazawa, G. Hinton, K. Shikano, and K. Larg, "Phoneme Recognition Using Time-delay Neural Networks," ATR Internal Report TR-1-0006, Oct.30, 1987.
[93] Walsh, G.R., Methods of Optimization, John Wiley & Sons: New York, 1975.
[94] Wasserman, P.D., Neural Computing: Theory and Practice, Van Nostrand Rein-hold: New York, 1989.
Bibliography 150
[95] Widrow, B. and M.E. Hoff, "Adaptive Switching Circuits", 1960 IRE WESCON Conv. Record, Part 4, 96-104, August 1960.
[96] Widrow, B. and R. Winter, "Neural Nets for Adaptive Filtering and Adaptive Pattern Recognition," IEEE Computer Magazine, pp. 25-39, March, 1988.
[97] Widrow, B. S.D. Stearns, Adaptive Singal Processing, Prentice-Hall: New Jersey, 1985.
[98] Woods, R.E. and R.C. Gonzalez, "Real Time Digital Image Enhancement," Proc. IEEE 69, pp. 643-644, 1981.
[99] Zhou, Y.T., R. Chellappa, and et al., "Image Restoration Using a Neural Network," IEEE Trans, on Acoust., Speech, and Sig. Proc, pp. 1141-1151, Vol. 36, No. 7,1988.