26
41 CHAPTER 4 IMAGE PREPROCESSING USING EVOLVABLE HARDWARE 4.1 INTRODUCTION Images acquired through modern cameras may be contaminated by a variety of noise sources (e.g. photon or on chip electronic noise) and also by distortions such as shading or improper illumination. Therefore a preprocessing unit has to be incorporated before recognition to improve image quality. Effective processing of the grabbed image requires computing architectures that are less complicated, highly flexible and more cost- effective. In this context, the image preprocessing is done in this work, using a novel evolvable hardware (EHW) architecture and is dedicated for implementing high performance image noise filter together with a flexible local interconnect hierarchy. It is shown that evolutionary design is fully compliant with the existing design methodologies and is capable of performing a better exploration of the design space. In this work, it is shown that evolved circuit has better performances and can be optimized with respect to different parameters. After processing the initially grabbed images using the EHW system, the improved quality images of surfaces are used for evaluation of surface finish. FPGA implementation of the evolved circuit allows complex and fast computation to be performed by dedicated hardware instead of software, since hardware units can operate in parallel and makes it ideally suited for online applications.

CHAPTER 4 IMAGE PREPROCESSING USING EVOLVABLE HARDWAREshodhganga.inflibnet.ac.in/bitstream/10603/43156/9/09_chapter4.pdf · IMAGE PREPROCESSING USING EVOLVABLE HARDWARE ... counted

Embed Size (px)

Citation preview

41

CHAPTER 4

IMAGE PREPROCESSING USING EVOLVABLE

HARDWARE

4.1 INTRODUCTION

Images acquired through modern cameras may be contaminated by

a variety of noise sources (e.g. photon or on chip electronic noise) and also by

distortions such as shading or improper illumination. Therefore a

preprocessing unit has to be incorporated before recognition to improve image

quality. Effective processing of the grabbed image requires computing

architectures that are less complicated, highly flexible and more cost-

effective. In this context, the image preprocessing is done in this work, using

a novel evolvable hardware (EHW) architecture and is dedicated for

implementing high performance image noise filter together with a flexible

local interconnect hierarchy. It is shown that evolutionary design is fully

compliant with the existing design methodologies and is capable of

performing a better exploration of the design space. In this work, it is shown

that evolved circuit has better performances and can be optimized with

respect to different parameters. After processing the initially grabbed images

using the EHW system, the improved quality images of surfaces are used for

evaluation of surface finish. FPGA implementation of the evolved circuit

allows complex and fast computation to be performed by dedicated hardware

instead of software, since hardware units can operate in parallel and makes it

ideally suited for online applications.

42

4.2 MERITS OF EHW FILTER

General-purpose image filters lacks the flexibility and adaptability

for un-modeled noise types. On the contrary, evolutionary algorithms based

filter architectures seem to be very promising due to their capability to

provide solutions to hard design problems. Using this novel approach, it is

possible to have an image filter that can employ a completely different design

style having the whole design performed by an evolutionary algorithm. In this

context, an evolutionary algorithm based filter is designed in this work with

the kernel (a small weight matrix) or the whole circuit being automatically

evolved.

The EHW architecture proposed in this work can evolve filters

without apriori information. The proposed filter architecture considers spatial

domain approach and uses the overlapping window to filter the signal. The

approach that is chosen in this work is based on functional level evolution

whose architecture includes nonlinear functions and uses an evolutionary

algorithm to find the best filter configuration.

4.3 EVOLVABLE HARDWARE

Evolvable Hardware (EHW) is a new concept in the development

of online adaptive machines. In contrast to conventional hardware where the

structure is irreversibly fixed in the design process, EHW is designed to adapt

to changes in task requirements or changes in the environment through its

ability to reconfigure its own hardware structure online and autonomously

(Higuchi 1999). The capacity for adaptation is achieved through evolutionary

algorithms such as Genetic Algorithm. EHW is a hardware which is built on

software reconfigurable logic devices such as FPGA and PLD and whose

architecture can be reconfigured using genetic learning. To design

43

conventional hardware, it is necessary to prepare all the specifications of the

hardware functions in advance. In contrast to this, EHW can reconfigure itself

without such specifications. FPGAs are hardware devices whose architecture

can be determined by downloading a binary string, called architecture bits.

The basic idea of EHW is to regard the architecture bits of a reconfigurable

device as a chromosome for genetic algorithm (GA), which searches for an

optimal hardware structure. The GA chromosome, that is the architecture bits,

is downloaded onto the reconfigurable device during genetic learning. In this

context, EHW can be considered as an online adaptive hardware. This is

illustrated in Figure 4.1.

Figure 4.1 Configuration of device using genetic operators

4.3.1 Evolvable Hardware Classification

Apart from classification based on applying artificial evolution to

the design of hardware systems, there is another classification according to

the functions to be evolved. This classification of EHW is alone discussed in

this section and accordingly EHW can be classified into two classes:

44

1. Gate-level EHW

2. Function-level EHW

4.3.1.1 Gate level evolvable hardware

In Gate-level evolution the functions in the processing element (PE)

are realized as primitive gate-level functions such as AND, OR, NOT, etc.

This is shown in Figure 4.2. The size of the circuits evolved by gate-level

evolution is restricted. This makes it less suited for practical applications.

Figure 4.2 Gate-level Evolution

4.3.1.2 Function level evolvable hardware

The function level circuit evolution involves higher level hardware

functions than the primitive gates of gate-level evolution. The function-level

EHW design is conveniently implemented using FPGA chips, where a

number of programmable functional units (PFU) are embedded. Each PFU

10011010000011000

10011010000011000

architecture bit

architecture

I0

I1

I2

FPGA I

I

I

FPGAA

Evolved circuit

Evolve

GA

Down load Down load

45

can perform one of high-level functions such as addition, subtraction,

multiplication, etc. The selectable high-level hardware functions of each PFU

make the function-level EHW to be suited for a wide variety of applications

in practice and to design larger hardware circuits. This is shown in Figure 4.3.

In this work, the functional level EHW is used for implementing the image

preprocessing algorithm.

Figure 4.3 Function-level Evolution

4.4 GENETIC ALGORITHM

Genetic Algorithm (Goldberg 1989) determines how the hardware

structure should be reconfigured whenever a new hardware structure is

needed for a better performance. In this work, GA is chosen to model

adaptation of natural and artificial systems through evolution, and is well

known as one of the most powerful search procedures.

The sequence of operations performed by the GA is shown in

Figure 4.4. At an initial state, chromosomes in the population are generated at

random, and processed by many operations, such as evaluation, selection,

crossover and mutation. The latter three operations are called the genetic

operations, and one cycle of the evaluation and the genetic operation is

Sin

*

+

X

Chromosome

FPGA download

Sin

*

+

X

Chromosome

FPGdownload

Cos

Z Z

GA

evolve

Evolved circuit Evaluate in the environment

46

counted as a generation. The evaluation assigns the fitness values to the

chromosomes, which indicates how well the chromosomes perform as

solutions of the given problem. According to the fitness values, the selection

determines which chromosomes can survive into the next generation. The

crossover chooses some pairs of chromosomes, and exchange their sub-strings

at random. Finally, the mutation randomly picks some positions in the

chromosome and flips their values.

Figure 4.4 Flowchart of Genetic Algorithm

The major advantages of GA are its robustness and superior search

performance in much type of problems without a prior knowledge. Indeed, a

image filter can be represented as a sequence of elementary operations which

can easily be encoded to be handled by a genetic algorithm. However, the

implementation of GA in software poses a time constraint for fitness

evaluation. Hence, if the evaluation can be executed very quickly by the

specific hardware device, the most serious problem of GA can be solved, and

we can use GA more effectively. Thus, in this work, the implementation of

GA in a FPGA chip by writing a VHDL program solves this time constraint

since it consists of parallel hardware units. Such a migration helps in

improving the convergence of the genetic algorithm towards a valid solution

quickly.

47

Input buffer

Pseudo Random Number

Generator

Virtual Reconfigurable

Circuit

Mutation Unit

Selection Unit

Output Buffer

Chromosome Memory

Fitness Calculation

Input

Output

4.5 IMAGE PREPROCESSING USING EVOLUTIONARY

DESIGN

The EHW architecture proposed in this work to filter the noise

present in the image and subsequently realized on an FPGA based image

processing board consist of the GA processor and a virtual reconfigurable

circuit and is shown in Figure 4.5. This type of implementation integrates a

hardware realization of genetic algorithm and a reconfigurable device. These

two modules of the EHW are described in the following sections:

Figure 4.5 EHW chip with the VRC and GA Processor

4.5.1 Implementation of the GA Processor

The implementation of simple GA is composed of basic modules;

pseudo random number generator, population memory, selection unit,

mutation unit, fitness evaluator and output buffer as shown in Figure 4.5.

48

4.5.1.1 Pseudo Random Number Generator (PRNG)

One of the most common PRNG for FPGA implementation is a

Linear Feedback Shift Register (LFSR) and this technique is used in this work

also with a word size of twelve. It is important to choose a good polynomial

to ensure that the RNG can generate a maximal sequence of 2n-1 random

numbers, while keeping the number of taps to a minimum for efficiency. For

the twelve bit word the polynomial x12 (xnor) x6 (xnor) x4 (xnor) x1 was used.

The block diagram of the LFSR used is shown in Figure 4.6.

Figure 4.6 LFSR implementation

4.5.1.2 Input Buffer

Input buffer consists of RAM. Reference and noise corrupted

images are read from the file and stored in the input buffer. During runtime

pixels are given as input to the virtual reconfigurable circuit from this input

buffer.

4.5.1.3 Initial population creation

A chromosome of length 250 bits is created using a 10 bit random

number generator in 25 clock cycles and used as the initial chromosome. In

Bit

12

Bit

11

Bit

10

Bit

9

Bit

8

Bit

7

Bit

6

Bit

5

Bit

4

Bit

3

Bit

2

Bit

1

49

hardware implementation, chromosomes are stored in the Block RAM of

FPGA. The initial population size is chosen as 16. Totally 16x25 clock cycles

are needed for initial population generation.

4.5.1.4 Fitness calculation

MDPP is used as the fitness function. The original and filtered

images are taken from the memory and the absolute difference between the

corresponding pixel values is added and the fitness is evaluated.

4.5.1.5 Selection unit

The chromosome which has highest fitness is selected as the best

chromosome and is retained for subsequent generations.

4.5.1.6 Mutation Unit

The chromosome which has highest fitness is selected for mutation.

Bit by bit mutation is used. Fifteen new chromosomes are created in every

generation and stored in the population memory.

4.5.1.7 Output Buffer

After the specified number of generations the evolution is complete

and the chromosome with the best fitness value is stored in the output buffer.

4.5.2 Flow diagram of reconfiguration in VRC

The configuration bits are obtained using genetic algorithm and

downloaded into the reconfiguration circuit which results in relocation of

50

hardware modules inside VRC. The flow diagram of reconfiguration process

is shown in Figure 4.7.

Figure 4.7 Flow diagram of reconfiguration

4.6 IMPLEMENTING THE VRC

The VRC is implemented as a combinational circuit using the

concepts of pipelining. It consists of processing elements (PE’s) arranged in

rows and columns. In this work, a total of 25 PE’s are selected and are

arranged in six rows and four columns with the 25th PE representing the final

output.

4.6.1 Evolving Image Operators for Preprocessing

The function to be performed by each PE is selected from a set of

evolved operators such that the evolved circuit is inherently testable without

the need for a specialized data path. A total of 16 functions are selected in this

work and this is given in Table 4.1. Another novel feature of this work is that

Chromosome 1

Chromosome 2

Chromosome k

Chromosome λ

Initiailsation

Fitness Evaluation

Selection Best

Chromosome

Mutation

Condition

met?

End

Best

Chromosome

yes

no

51

using the primary inputs and outputs alone, the evolved circuit can be tested.

As a prerequisite to this, it is assumed that none of the inputs of elements can

be connected to the same data source.

Table 4.1 Evolved image Processing Operators

Function Code

Function Function Code Function

0000

0001

0010

0011

0100

0101

0110

0111

X >> 1

X

~ X

X & Y

X | Y

X ^ Y

(X+Y)>>2

(X+Y) >>1

1000

1001

1010

1011

1100

1101

1110

1111

X & 0x0F

X & 0xF0

X | 0x0F

X | 0x F0

Min (X,Y)

Max (X,Y)

Y<<1

X+Y

4.6.1.1 Description of evolved operators

In this section, the important image operators evolved are

described. Some of the operators are self-explanatory and hence is omitted for

brevity.

Function code ‘0000’: This operator shifts the first selected input

element i.e. the 8-bit pixel to the right by one bit position.

Function code ‘0001’: This is a buffer operator and transmits the

first selected input element to the output.

52

Function code ‘0010’: This operator complements the first

selected input element. The one’s complement operation is performed.

Function code ‘0011, 0100, 0101’: These operators represent the

logical AND, OR and EX-OR operations respectively.

Function code ‘0110’: This operator adds the selected two 8-bit

input pixels and then right shifts by two bit positions.

Function code ‘1110’: This operator shifts the second selected

input pixel to the left by one bit position.

It can be observed from Table 4.1 that the proposed work is an

integration of the general hardware implementation of an image filter with a

kernel 3x3 which employs tree of adders with shifters and an implementation

of the median filter which employs a network of comparators and

multiplexers.

4.6.2 Configuring the PE’s in VRC

Each PE in the VRC is configured using Genetic Programming

(GP) by the GA processor and is described in Figure 4.8. The control bits are

stored as a configuration word in the SRAM and are downloaded by the GA

processor. The inputs controlling the functionality of the VRC is selected

through the multiplexer.

53

Figure 4.8 VRC with the internal MUX for selecting inputs and

functions

In figure Slice 1 is a ‘m’ bit vector and selects any one of many

inputs and assigns it as first input X

Slice 2 is a ‘m’ bit vector and selects any one of many inputs and

assigns it as second input Y.

Slice 3 is a ‘n’ bit vector and selects any one of 16 functions to be

performed on X and Y

The X and Y are both 8-bit vectors and processed output is also an

8-bit vector.

The complete configuration of the EHW chip is shown in

Figure 4.9.

3 3 3

8

(8)

3x3 Input window

I0 I1 I2 I3 I4 I5

I6 I7 I8 PE11

Mux

Configuration Word

Slice 2 (Select Input 2)

Output pixel replaces I4

Slice 1 (Select Input 1)

Slice 3 (Select Input 3)

8

54

Inputs

Outputs

Ran

dom

Num

ber g

ener

atio

n

Initial Population Generation

Population Memory

Fitness Memory

Selection

Cross Over

Mutation

Component for Communication with VRC

I / O

Virtual Reconfigurable Circuits (VRC)

P

P

P

PE

P

PE

Configuration Memory

Control Logic

Fitness Evaluation

Genetic Unit

Figure 4.9 Configuration of EHW chip

4.7 IMAGE PREPROCESSING ALGORITHM

4.7.1 Fitness Function

Popular measures of performance to evaluate the difference

between the original and filtered images includes

i. Pear Signal to Noise Ratio (PSNR) and

ii. Mean Difference per Pixel (MDPP)

In many applications the error is expressed in terms of a signal-to-

noise ratio (SNR), and is given in equation 4.1.

55

dB MSEσ10log SNR

2

10 (4.1)

where ² is the variance of the desired or original image. The peak signal-to-

noise ratio (PSNR) is expressed as equation 4.2,

dB MSE25510log PSNR

2

10 (4.2)

The fitness function using MDPP is given by

N

1j,i)j,i(filt)j,i(orig

NxN1MDPP (4.3)

where )j,i(filt)j,i(orig is the absolute different between the original image

and the filtered image

In this work, the Mean Difference per Pixel (MDPP) is used as a

performance measure by the fitness evaluator module in the GA processor for

the reason that MDPP fitness function is computationally easier for hardware

implementation as compared to PSNR. The EHW architecture that has the

best MDPP (minimum MDPP) after a specified number of generations is

chosen as the evolved architecture.

4.7.2 Algorithm for Image Noise Filtering

The chosen EHW architecture for image preprocessing was

described in sections 4.6 and 4.7. Using this architecture the algorithm for

noise removal in the image is performed as follows:

56

1. Read the corrupted image and store it in input buffer

2. Generate initial population of size ‘n’ with each of

chromosome length L. Each chromosome contains details

about the interconnection between PE’s and also the function

to be performed by the PE.

3. For each chromosome in the population

a. Take 3x3 overlapping window and input the nine pixel

value to the VRC which processes and replaces the

center pixel. Every pixel value of the filtered image is

processed using a corresponding pixel and its eight

neighbors. This process is illustrated in Figure 4.10 and

is repeated for the whole image.

b. Calculate the Mean Difference Per Pixel (MDPP) using

the reference image and assign it as fitness value.

c. Retain the chromosome that has maximum fitness.

4. Apply the crossover and mutation operation on the selected

chromosome to get the next generation strings. The roulette

wheel selection procedure is chosen in this work.

5. Replace the old population

a. Repeat from steps 3 for a specified number of

generations ‘N’.

57

I1,1 I1,2 I1,3

I2,1 I2,2 I2,3

I3,1 I3,2 I3,3

Stage I 1st 3x3 window presented to VRC and I2,2 replaced by filter output

Stage II 2nd 3x3 window presented to VRC and I2,3 replaced by filter output

Figure 4.10 Different Pixel Windows processed by the evolved circuit

In Figure Ip,q represents the pixel value at pth row and qth column

respectively.

Note: The window presented as inputs to the evolved circuit employs a

spatial filtering approach.

4.8 FPGA IMPLEMENTATION

In order to create an FPGA design, a designer has several options

for algorithm implementation. Originally intended as a simulation language,

VHDL represents a formerly proprietary hardware design language. VHDL

was chosen as a target design language in this work because of familiarity and

its wide-ranging support, both in terms of software development tools and

I1,2 I1,3 I1,4

I2,2 I2,3 I2,4

I3,2 I3,3 I3,4

58

vendor support. In the first state, a design is created in VHDL. Next, the

code’s syntax is verified and the design is synthesized, or compiled, into a

library. The design is next simulated to check its functionality. Next, the

design is processed with vendor-specific place-and-route tools and mapped

onto a specific FPGA in software.

The VHDL code of the algorithm described in the previous section

is developed and imported into the Xilinx FPGA image processing board, the

details of which are presented in chapter 3 of this thesis. The EHW is

configured to accept nine 8-bit inputs I0 – I8 and produce a single 8-bit output

which processes gray-scaled (8 bits/pixel) images. Every pixel value of the

filtered image is calculated using a corresponding pixel and its eight

neighbors as discussed already. The operation performed on the selected input

pixels depends on the configuration bits downloaded into the configurable

memory from the genetic unit. The VRC consists of 25 PEs as shown in

Figure 4.11. Four PEs are implemented as a single stage of the pipeline.

Figure 4.11 Architecture of the Virtual Reconfigurable Circuit

59

Each PE can process two 8-bit inputs and produce a single 8-bit

output. The outputs of PEs are equipped with registers. The two inputs of

every PE can be connected to one of the outputs from the previous l columns

where l is the level back parameter. In this work, l is chosen as ‘2’. Every PE

executes a certain function from Table 4.1, depending on the function code

configuration, sel3 which is applied to its two inputs. The architecture of

single PE is shown in Figure 4.12. The configuration bit stream consists of ten

bits for each PE. The output of the PE is given by

} sel3 mux(sel2), mux(sel1), { F Output (4.4)

The fewer the functions, the faster is the evolution. Further

functions can be included but this is dependant on the resource requirements,

as there is a trade-off between the functionality and the complexity of the

hardware structure.

Figure 4.12 Architecture of a Single Processing Element

60

4.8.1 VHDL Implementation of GA Processor

The model of GA processor discussed in section 4.5.1 is

implemented in VHDL. The VHDL code is given in Figure 4.13. Each of the

signal implements the 12 bit random number generator and a sample output

captured using the modelsim package is shown in Figure 4.14. These bits (GA

processor output) are the configuration bits and control the interconnection

among the PE’s and also the functionality of each PE in the VRC module.

Figure 4.13 Implementation of GA processor using VHDL

61

Figure 4.14 Model SIM Captured GA Processor output

4.8.2 Evolved Architecture

An architecture evolved to filter the noise present in the image is

shown in Figure 4.15. Nine inputs (pixel values) are used to calculate a new

pixel value. The parameters of the evolved network are: 9 inputs, 1 output,

circuit topology 6x4, l-back=2. Only utilized programmable elements are

highlighted in the Figure 4.15.

62

Figure 4.15 Virtual Reconfigurable Circuit

4.9 DISCUSSION ON THE PERFORMANCE OF EHW

ALGORITHM

In the previous sections of this chapter, it was shown that the

evolved image operators and architectures are used in the preprocessing phase

and form an important part of the computer vision systems. The automatic

design of such image operators enables the machine vision system to adapt to

changing environments automatically. Also, the implementation of the EHW

algorithm in low-cost, commercial off-the-shelf FPGA hardware device in

this work makes it ideally suited for on-line measurement of surface

roughness. In this section, it is established that the evolved solutions is able to

perform better than conventionally designed filters.

63

4.9.1 Experimental Results

Experimental studies were performed with the developed system and

the obtained results allow the present work to claim that it is possible to

evolve general filters that exhibit in average less MDPP than conventional

filters for a given noise. For comparison studies, the EHW algorithm is

compared with the super resolution algorithm suggested by Keren et al

(1988), Vandewalle (2003) and obtained results are shown in Figures 4.16a

to 4.16c corresponding to different cutting conditions for ground surfaces.

Similarly Figures 4.17a to 4.17c gives the results for milled surfaces

corresponding to different cutting conditions. In each case, a Gaussian noise

of mean zero and variance 0.03 was assumed to corrupt the original image. It

can be clearly perceived from the results shown that the proposed EHW filter

gives an improved PSNR and has produced a reconstructed image with

enhanced sharpness bringing out the finer details in the machined surfaces.

The degradation and blurring of edges, which accompanies magnification, has

been removed the best in the EHW approach. It is also worth mentioning here

that the proposed EHW filter has used only a local knowledge (i.e. 3x3

kernel) to enhance the image while a conventional solution will probably need

the position of the currently processed pixel.

64

(a)

(b)

(c)

Figure 4.16 Texture of Ground surfaces before and after

preprocessing by different algorithms

Original image

EHW (PSNR 39.41dB)

Keren (PSNR 25.18 dB)

Vandewalle (PSNR 12.99dB)

Original image

EHW (PSNR 36.25dB)

Keren (PSNR 23.54 dB)

Vandewalle (PSNR 21.32 dB)

Original image

EHW (PSNR 40.02dB)

Keren (PSNR 26.39 dB)

Vandewalle (PSNR 28.92 dB)

65

(a)

(b)

(c)

Figure 4.17 Texture of Milled surfaces before and after preprocessing

by different algorithms

Original image

EHW (PSNR 26.95dB)

Keren (PSNR 17.45 dB)

Vandewalle (PSNR 18.54 dB)

Original image

EHW (PSNR 29.21dB)

Keren (PSNR 19.32 dB)

Vandewalle (PSNR 22.61 dB)

Original image

EHW (PSNR 24.59dB)

Keren (PSNR 16.46 dB)

Vandewalle (PSNR 19.58 dB)

66

4.10 CONCLUSION

In this chapter a novel EHW based image filter to remove the noise

present in the image was designed and implemented on a Xilinx FPGA based

image processing board. Experimental results were obtained for ground and

milled surfaces corresponding to different cutting conditions and it was shown

that the proposed filter is superior to the conventional designs reported in

previous studies both in terms of performance measure, high speed

computation, implementation cost and ability to adapt to changes in task

requirements or environment or hardware structure both online and

autonomously. The reconfigurable architecture based computer vision

analysis enables the addition of new features in the image enhancement,

allows rapid implementation of new standards and protocols on an as-needed

basis and protects the investment in computing hardware. It functions as a

programmable hardware with higher performance, flexibility of a software

based solution while retaining the execution speed of a more traditional

hardware based approach.