Upload
dinhtuyen
View
245
Download
0
Embed Size (px)
Citation preview
41
CHAPTER 4
IMAGE PREPROCESSING USING EVOLVABLE
HARDWARE
4.1 INTRODUCTION
Images acquired through modern cameras may be contaminated by
a variety of noise sources (e.g. photon or on chip electronic noise) and also by
distortions such as shading or improper illumination. Therefore a
preprocessing unit has to be incorporated before recognition to improve image
quality. Effective processing of the grabbed image requires computing
architectures that are less complicated, highly flexible and more cost-
effective. In this context, the image preprocessing is done in this work, using
a novel evolvable hardware (EHW) architecture and is dedicated for
implementing high performance image noise filter together with a flexible
local interconnect hierarchy. It is shown that evolutionary design is fully
compliant with the existing design methodologies and is capable of
performing a better exploration of the design space. In this work, it is shown
that evolved circuit has better performances and can be optimized with
respect to different parameters. After processing the initially grabbed images
using the EHW system, the improved quality images of surfaces are used for
evaluation of surface finish. FPGA implementation of the evolved circuit
allows complex and fast computation to be performed by dedicated hardware
instead of software, since hardware units can operate in parallel and makes it
ideally suited for online applications.
42
4.2 MERITS OF EHW FILTER
General-purpose image filters lacks the flexibility and adaptability
for un-modeled noise types. On the contrary, evolutionary algorithms based
filter architectures seem to be very promising due to their capability to
provide solutions to hard design problems. Using this novel approach, it is
possible to have an image filter that can employ a completely different design
style having the whole design performed by an evolutionary algorithm. In this
context, an evolutionary algorithm based filter is designed in this work with
the kernel (a small weight matrix) or the whole circuit being automatically
evolved.
The EHW architecture proposed in this work can evolve filters
without apriori information. The proposed filter architecture considers spatial
domain approach and uses the overlapping window to filter the signal. The
approach that is chosen in this work is based on functional level evolution
whose architecture includes nonlinear functions and uses an evolutionary
algorithm to find the best filter configuration.
4.3 EVOLVABLE HARDWARE
Evolvable Hardware (EHW) is a new concept in the development
of online adaptive machines. In contrast to conventional hardware where the
structure is irreversibly fixed in the design process, EHW is designed to adapt
to changes in task requirements or changes in the environment through its
ability to reconfigure its own hardware structure online and autonomously
(Higuchi 1999). The capacity for adaptation is achieved through evolutionary
algorithms such as Genetic Algorithm. EHW is a hardware which is built on
software reconfigurable logic devices such as FPGA and PLD and whose
architecture can be reconfigured using genetic learning. To design
43
conventional hardware, it is necessary to prepare all the specifications of the
hardware functions in advance. In contrast to this, EHW can reconfigure itself
without such specifications. FPGAs are hardware devices whose architecture
can be determined by downloading a binary string, called architecture bits.
The basic idea of EHW is to regard the architecture bits of a reconfigurable
device as a chromosome for genetic algorithm (GA), which searches for an
optimal hardware structure. The GA chromosome, that is the architecture bits,
is downloaded onto the reconfigurable device during genetic learning. In this
context, EHW can be considered as an online adaptive hardware. This is
illustrated in Figure 4.1.
Figure 4.1 Configuration of device using genetic operators
4.3.1 Evolvable Hardware Classification
Apart from classification based on applying artificial evolution to
the design of hardware systems, there is another classification according to
the functions to be evolved. This classification of EHW is alone discussed in
this section and accordingly EHW can be classified into two classes:
44
1. Gate-level EHW
2. Function-level EHW
4.3.1.1 Gate level evolvable hardware
In Gate-level evolution the functions in the processing element (PE)
are realized as primitive gate-level functions such as AND, OR, NOT, etc.
This is shown in Figure 4.2. The size of the circuits evolved by gate-level
evolution is restricted. This makes it less suited for practical applications.
Figure 4.2 Gate-level Evolution
4.3.1.2 Function level evolvable hardware
The function level circuit evolution involves higher level hardware
functions than the primitive gates of gate-level evolution. The function-level
EHW design is conveniently implemented using FPGA chips, where a
number of programmable functional units (PFU) are embedded. Each PFU
10011010000011000
10011010000011000
architecture bit
architecture
I0
I1
I2
FPGA I
I
I
FPGAA
Evolved circuit
Evolve
GA
Down load Down load
45
can perform one of high-level functions such as addition, subtraction,
multiplication, etc. The selectable high-level hardware functions of each PFU
make the function-level EHW to be suited for a wide variety of applications
in practice and to design larger hardware circuits. This is shown in Figure 4.3.
In this work, the functional level EHW is used for implementing the image
preprocessing algorithm.
Figure 4.3 Function-level Evolution
4.4 GENETIC ALGORITHM
Genetic Algorithm (Goldberg 1989) determines how the hardware
structure should be reconfigured whenever a new hardware structure is
needed for a better performance. In this work, GA is chosen to model
adaptation of natural and artificial systems through evolution, and is well
known as one of the most powerful search procedures.
The sequence of operations performed by the GA is shown in
Figure 4.4. At an initial state, chromosomes in the population are generated at
random, and processed by many operations, such as evaluation, selection,
crossover and mutation. The latter three operations are called the genetic
operations, and one cycle of the evaluation and the genetic operation is
Sin
*
+
X
Chromosome
FPGA download
Sin
*
+
X
Chromosome
FPGdownload
Cos
Z Z
GA
evolve
Evolved circuit Evaluate in the environment
46
counted as a generation. The evaluation assigns the fitness values to the
chromosomes, which indicates how well the chromosomes perform as
solutions of the given problem. According to the fitness values, the selection
determines which chromosomes can survive into the next generation. The
crossover chooses some pairs of chromosomes, and exchange their sub-strings
at random. Finally, the mutation randomly picks some positions in the
chromosome and flips their values.
Figure 4.4 Flowchart of Genetic Algorithm
The major advantages of GA are its robustness and superior search
performance in much type of problems without a prior knowledge. Indeed, a
image filter can be represented as a sequence of elementary operations which
can easily be encoded to be handled by a genetic algorithm. However, the
implementation of GA in software poses a time constraint for fitness
evaluation. Hence, if the evaluation can be executed very quickly by the
specific hardware device, the most serious problem of GA can be solved, and
we can use GA more effectively. Thus, in this work, the implementation of
GA in a FPGA chip by writing a VHDL program solves this time constraint
since it consists of parallel hardware units. Such a migration helps in
improving the convergence of the genetic algorithm towards a valid solution
quickly.
47
Input buffer
Pseudo Random Number
Generator
Virtual Reconfigurable
Circuit
Mutation Unit
Selection Unit
Output Buffer
Chromosome Memory
Fitness Calculation
Input
Output
4.5 IMAGE PREPROCESSING USING EVOLUTIONARY
DESIGN
The EHW architecture proposed in this work to filter the noise
present in the image and subsequently realized on an FPGA based image
processing board consist of the GA processor and a virtual reconfigurable
circuit and is shown in Figure 4.5. This type of implementation integrates a
hardware realization of genetic algorithm and a reconfigurable device. These
two modules of the EHW are described in the following sections:
Figure 4.5 EHW chip with the VRC and GA Processor
4.5.1 Implementation of the GA Processor
The implementation of simple GA is composed of basic modules;
pseudo random number generator, population memory, selection unit,
mutation unit, fitness evaluator and output buffer as shown in Figure 4.5.
48
4.5.1.1 Pseudo Random Number Generator (PRNG)
One of the most common PRNG for FPGA implementation is a
Linear Feedback Shift Register (LFSR) and this technique is used in this work
also with a word size of twelve. It is important to choose a good polynomial
to ensure that the RNG can generate a maximal sequence of 2n-1 random
numbers, while keeping the number of taps to a minimum for efficiency. For
the twelve bit word the polynomial x12 (xnor) x6 (xnor) x4 (xnor) x1 was used.
The block diagram of the LFSR used is shown in Figure 4.6.
Figure 4.6 LFSR implementation
4.5.1.2 Input Buffer
Input buffer consists of RAM. Reference and noise corrupted
images are read from the file and stored in the input buffer. During runtime
pixels are given as input to the virtual reconfigurable circuit from this input
buffer.
4.5.1.3 Initial population creation
A chromosome of length 250 bits is created using a 10 bit random
number generator in 25 clock cycles and used as the initial chromosome. In
Bit
12
Bit
11
Bit
10
Bit
9
Bit
8
Bit
7
Bit
6
Bit
5
Bit
4
Bit
3
Bit
2
Bit
1
49
hardware implementation, chromosomes are stored in the Block RAM of
FPGA. The initial population size is chosen as 16. Totally 16x25 clock cycles
are needed for initial population generation.
4.5.1.4 Fitness calculation
MDPP is used as the fitness function. The original and filtered
images are taken from the memory and the absolute difference between the
corresponding pixel values is added and the fitness is evaluated.
4.5.1.5 Selection unit
The chromosome which has highest fitness is selected as the best
chromosome and is retained for subsequent generations.
4.5.1.6 Mutation Unit
The chromosome which has highest fitness is selected for mutation.
Bit by bit mutation is used. Fifteen new chromosomes are created in every
generation and stored in the population memory.
4.5.1.7 Output Buffer
After the specified number of generations the evolution is complete
and the chromosome with the best fitness value is stored in the output buffer.
4.5.2 Flow diagram of reconfiguration in VRC
The configuration bits are obtained using genetic algorithm and
downloaded into the reconfiguration circuit which results in relocation of
50
hardware modules inside VRC. The flow diagram of reconfiguration process
is shown in Figure 4.7.
Figure 4.7 Flow diagram of reconfiguration
4.6 IMPLEMENTING THE VRC
The VRC is implemented as a combinational circuit using the
concepts of pipelining. It consists of processing elements (PE’s) arranged in
rows and columns. In this work, a total of 25 PE’s are selected and are
arranged in six rows and four columns with the 25th PE representing the final
output.
4.6.1 Evolving Image Operators for Preprocessing
The function to be performed by each PE is selected from a set of
evolved operators such that the evolved circuit is inherently testable without
the need for a specialized data path. A total of 16 functions are selected in this
work and this is given in Table 4.1. Another novel feature of this work is that
Chromosome 1
Chromosome 2
Chromosome k
Chromosome λ
Initiailsation
Fitness Evaluation
Selection Best
Chromosome
Mutation
Condition
met?
End
Best
Chromosome
yes
no
51
using the primary inputs and outputs alone, the evolved circuit can be tested.
As a prerequisite to this, it is assumed that none of the inputs of elements can
be connected to the same data source.
Table 4.1 Evolved image Processing Operators
Function Code
Function Function Code Function
0000
0001
0010
0011
0100
0101
0110
0111
X >> 1
X
~ X
X & Y
X | Y
X ^ Y
(X+Y)>>2
(X+Y) >>1
1000
1001
1010
1011
1100
1101
1110
1111
X & 0x0F
X & 0xF0
X | 0x0F
X | 0x F0
Min (X,Y)
Max (X,Y)
Y<<1
X+Y
4.6.1.1 Description of evolved operators
In this section, the important image operators evolved are
described. Some of the operators are self-explanatory and hence is omitted for
brevity.
Function code ‘0000’: This operator shifts the first selected input
element i.e. the 8-bit pixel to the right by one bit position.
Function code ‘0001’: This is a buffer operator and transmits the
first selected input element to the output.
52
Function code ‘0010’: This operator complements the first
selected input element. The one’s complement operation is performed.
Function code ‘0011, 0100, 0101’: These operators represent the
logical AND, OR and EX-OR operations respectively.
Function code ‘0110’: This operator adds the selected two 8-bit
input pixels and then right shifts by two bit positions.
Function code ‘1110’: This operator shifts the second selected
input pixel to the left by one bit position.
It can be observed from Table 4.1 that the proposed work is an
integration of the general hardware implementation of an image filter with a
kernel 3x3 which employs tree of adders with shifters and an implementation
of the median filter which employs a network of comparators and
multiplexers.
4.6.2 Configuring the PE’s in VRC
Each PE in the VRC is configured using Genetic Programming
(GP) by the GA processor and is described in Figure 4.8. The control bits are
stored as a configuration word in the SRAM and are downloaded by the GA
processor. The inputs controlling the functionality of the VRC is selected
through the multiplexer.
53
Figure 4.8 VRC with the internal MUX for selecting inputs and
functions
In figure Slice 1 is a ‘m’ bit vector and selects any one of many
inputs and assigns it as first input X
Slice 2 is a ‘m’ bit vector and selects any one of many inputs and
assigns it as second input Y.
Slice 3 is a ‘n’ bit vector and selects any one of 16 functions to be
performed on X and Y
The X and Y are both 8-bit vectors and processed output is also an
8-bit vector.
The complete configuration of the EHW chip is shown in
Figure 4.9.
3 3 3
8
(8)
3x3 Input window
I0 I1 I2 I3 I4 I5
I6 I7 I8 PE11
Mux
Configuration Word
Slice 2 (Select Input 2)
Output pixel replaces I4
Slice 1 (Select Input 1)
Slice 3 (Select Input 3)
8
54
Inputs
Outputs
Ran
dom
Num
ber g
ener
atio
n
Initial Population Generation
Population Memory
Fitness Memory
Selection
Cross Over
Mutation
Component for Communication with VRC
I / O
Virtual Reconfigurable Circuits (VRC)
P
P
P
PE
P
PE
Configuration Memory
Control Logic
Fitness Evaluation
Genetic Unit
Figure 4.9 Configuration of EHW chip
4.7 IMAGE PREPROCESSING ALGORITHM
4.7.1 Fitness Function
Popular measures of performance to evaluate the difference
between the original and filtered images includes
i. Pear Signal to Noise Ratio (PSNR) and
ii. Mean Difference per Pixel (MDPP)
In many applications the error is expressed in terms of a signal-to-
noise ratio (SNR), and is given in equation 4.1.
55
dB MSEσ10log SNR
2
10 (4.1)
where ² is the variance of the desired or original image. The peak signal-to-
noise ratio (PSNR) is expressed as equation 4.2,
dB MSE25510log PSNR
2
10 (4.2)
The fitness function using MDPP is given by
N
1j,i)j,i(filt)j,i(orig
NxN1MDPP (4.3)
where )j,i(filt)j,i(orig is the absolute different between the original image
and the filtered image
In this work, the Mean Difference per Pixel (MDPP) is used as a
performance measure by the fitness evaluator module in the GA processor for
the reason that MDPP fitness function is computationally easier for hardware
implementation as compared to PSNR. The EHW architecture that has the
best MDPP (minimum MDPP) after a specified number of generations is
chosen as the evolved architecture.
4.7.2 Algorithm for Image Noise Filtering
The chosen EHW architecture for image preprocessing was
described in sections 4.6 and 4.7. Using this architecture the algorithm for
noise removal in the image is performed as follows:
56
1. Read the corrupted image and store it in input buffer
2. Generate initial population of size ‘n’ with each of
chromosome length L. Each chromosome contains details
about the interconnection between PE’s and also the function
to be performed by the PE.
3. For each chromosome in the population
a. Take 3x3 overlapping window and input the nine pixel
value to the VRC which processes and replaces the
center pixel. Every pixel value of the filtered image is
processed using a corresponding pixel and its eight
neighbors. This process is illustrated in Figure 4.10 and
is repeated for the whole image.
b. Calculate the Mean Difference Per Pixel (MDPP) using
the reference image and assign it as fitness value.
c. Retain the chromosome that has maximum fitness.
4. Apply the crossover and mutation operation on the selected
chromosome to get the next generation strings. The roulette
wheel selection procedure is chosen in this work.
5. Replace the old population
a. Repeat from steps 3 for a specified number of
generations ‘N’.
57
I1,1 I1,2 I1,3
I2,1 I2,2 I2,3
I3,1 I3,2 I3,3
Stage I 1st 3x3 window presented to VRC and I2,2 replaced by filter output
Stage II 2nd 3x3 window presented to VRC and I2,3 replaced by filter output
Figure 4.10 Different Pixel Windows processed by the evolved circuit
In Figure Ip,q represents the pixel value at pth row and qth column
respectively.
Note: The window presented as inputs to the evolved circuit employs a
spatial filtering approach.
4.8 FPGA IMPLEMENTATION
In order to create an FPGA design, a designer has several options
for algorithm implementation. Originally intended as a simulation language,
VHDL represents a formerly proprietary hardware design language. VHDL
was chosen as a target design language in this work because of familiarity and
its wide-ranging support, both in terms of software development tools and
I1,2 I1,3 I1,4
I2,2 I2,3 I2,4
I3,2 I3,3 I3,4
58
vendor support. In the first state, a design is created in VHDL. Next, the
code’s syntax is verified and the design is synthesized, or compiled, into a
library. The design is next simulated to check its functionality. Next, the
design is processed with vendor-specific place-and-route tools and mapped
onto a specific FPGA in software.
The VHDL code of the algorithm described in the previous section
is developed and imported into the Xilinx FPGA image processing board, the
details of which are presented in chapter 3 of this thesis. The EHW is
configured to accept nine 8-bit inputs I0 – I8 and produce a single 8-bit output
which processes gray-scaled (8 bits/pixel) images. Every pixel value of the
filtered image is calculated using a corresponding pixel and its eight
neighbors as discussed already. The operation performed on the selected input
pixels depends on the configuration bits downloaded into the configurable
memory from the genetic unit. The VRC consists of 25 PEs as shown in
Figure 4.11. Four PEs are implemented as a single stage of the pipeline.
Figure 4.11 Architecture of the Virtual Reconfigurable Circuit
59
Each PE can process two 8-bit inputs and produce a single 8-bit
output. The outputs of PEs are equipped with registers. The two inputs of
every PE can be connected to one of the outputs from the previous l columns
where l is the level back parameter. In this work, l is chosen as ‘2’. Every PE
executes a certain function from Table 4.1, depending on the function code
configuration, sel3 which is applied to its two inputs. The architecture of
single PE is shown in Figure 4.12. The configuration bit stream consists of ten
bits for each PE. The output of the PE is given by
} sel3 mux(sel2), mux(sel1), { F Output (4.4)
The fewer the functions, the faster is the evolution. Further
functions can be included but this is dependant on the resource requirements,
as there is a trade-off between the functionality and the complexity of the
hardware structure.
Figure 4.12 Architecture of a Single Processing Element
60
4.8.1 VHDL Implementation of GA Processor
The model of GA processor discussed in section 4.5.1 is
implemented in VHDL. The VHDL code is given in Figure 4.13. Each of the
signal implements the 12 bit random number generator and a sample output
captured using the modelsim package is shown in Figure 4.14. These bits (GA
processor output) are the configuration bits and control the interconnection
among the PE’s and also the functionality of each PE in the VRC module.
Figure 4.13 Implementation of GA processor using VHDL
61
Figure 4.14 Model SIM Captured GA Processor output
4.8.2 Evolved Architecture
An architecture evolved to filter the noise present in the image is
shown in Figure 4.15. Nine inputs (pixel values) are used to calculate a new
pixel value. The parameters of the evolved network are: 9 inputs, 1 output,
circuit topology 6x4, l-back=2. Only utilized programmable elements are
highlighted in the Figure 4.15.
62
Figure 4.15 Virtual Reconfigurable Circuit
4.9 DISCUSSION ON THE PERFORMANCE OF EHW
ALGORITHM
In the previous sections of this chapter, it was shown that the
evolved image operators and architectures are used in the preprocessing phase
and form an important part of the computer vision systems. The automatic
design of such image operators enables the machine vision system to adapt to
changing environments automatically. Also, the implementation of the EHW
algorithm in low-cost, commercial off-the-shelf FPGA hardware device in
this work makes it ideally suited for on-line measurement of surface
roughness. In this section, it is established that the evolved solutions is able to
perform better than conventionally designed filters.
63
4.9.1 Experimental Results
Experimental studies were performed with the developed system and
the obtained results allow the present work to claim that it is possible to
evolve general filters that exhibit in average less MDPP than conventional
filters for a given noise. For comparison studies, the EHW algorithm is
compared with the super resolution algorithm suggested by Keren et al
(1988), Vandewalle (2003) and obtained results are shown in Figures 4.16a
to 4.16c corresponding to different cutting conditions for ground surfaces.
Similarly Figures 4.17a to 4.17c gives the results for milled surfaces
corresponding to different cutting conditions. In each case, a Gaussian noise
of mean zero and variance 0.03 was assumed to corrupt the original image. It
can be clearly perceived from the results shown that the proposed EHW filter
gives an improved PSNR and has produced a reconstructed image with
enhanced sharpness bringing out the finer details in the machined surfaces.
The degradation and blurring of edges, which accompanies magnification, has
been removed the best in the EHW approach. It is also worth mentioning here
that the proposed EHW filter has used only a local knowledge (i.e. 3x3
kernel) to enhance the image while a conventional solution will probably need
the position of the currently processed pixel.
64
(a)
(b)
(c)
Figure 4.16 Texture of Ground surfaces before and after
preprocessing by different algorithms
Original image
EHW (PSNR 39.41dB)
Keren (PSNR 25.18 dB)
Vandewalle (PSNR 12.99dB)
Original image
EHW (PSNR 36.25dB)
Keren (PSNR 23.54 dB)
Vandewalle (PSNR 21.32 dB)
Original image
EHW (PSNR 40.02dB)
Keren (PSNR 26.39 dB)
Vandewalle (PSNR 28.92 dB)
65
(a)
(b)
(c)
Figure 4.17 Texture of Milled surfaces before and after preprocessing
by different algorithms
Original image
EHW (PSNR 26.95dB)
Keren (PSNR 17.45 dB)
Vandewalle (PSNR 18.54 dB)
Original image
EHW (PSNR 29.21dB)
Keren (PSNR 19.32 dB)
Vandewalle (PSNR 22.61 dB)
Original image
EHW (PSNR 24.59dB)
Keren (PSNR 16.46 dB)
Vandewalle (PSNR 19.58 dB)
66
4.10 CONCLUSION
In this chapter a novel EHW based image filter to remove the noise
present in the image was designed and implemented on a Xilinx FPGA based
image processing board. Experimental results were obtained for ground and
milled surfaces corresponding to different cutting conditions and it was shown
that the proposed filter is superior to the conventional designs reported in
previous studies both in terms of performance measure, high speed
computation, implementation cost and ability to adapt to changes in task
requirements or environment or hardware structure both online and
autonomously. The reconfigurable architecture based computer vision
analysis enables the addition of new features in the image enhancement,
allows rapid implementation of new standards and protocols on an as-needed
basis and protects the investment in computing hardware. It functions as a
programmable hardware with higher performance, flexibility of a software
based solution while retaining the execution speed of a more traditional
hardware based approach.