10
1326 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, VOL. 36, NO. 10, OCTOBER 1989 VLSI Arclutecture for Digital Picture Comparison HENG-DA CHENG, MEMBER, IEEE, HON-SON DON, MEMBER, IEEE, AND LAWRENCE T. KOU Abstract-One of the important problems in picture processing is to identify two digital pictures of the same scene taken under different lighting conditions. This kind of problem can be found in remote sensing, satellite signal processing, and related areas. Identification can be made by matching histograms of the gray levels of the two pictures. A dynamic programming procedure can be designed to solve this problem with the worst case time complexity of U( m3 X n), if using a uniprocessor and a straightforward computation approach, where m and n are the numbers of the gray levels of the input picture and the reference picture, respectively. This algorithm can be easily modified to have the time complexity of O(m2 x n). In this paper, we propose a VLSI architecture consisting of m X n processing elements with extensive parallel and pipelining computa- tional capabilities to speed up the process. The worst-case time complexity is reduced to U(max( m, n)) which is a significant improvement over the uniprocessor approach. The algorithm partition problem, as an important issue in VLSI design, and the verification of the proposed architecture are also studied in this paper. A series of experiments is conducted to verify the proposed algorithms. Index terms -Digital picture comparison, packing algorithm, very large wale integration (VLSI), algorithm partition, VLSI architecture verifica- tion. I. INTRODUCTION ERY large-scale integration (VLSI) technology has V triggered the idea of implementing computation algo- rithms directly in hardware with extensive parallel and pipelining computation capabilities. The employment of VLSI architectures for implementation of pattern recogni- tion and image processing algorithms has been investigated for many applications [5-18). In this paper, we propose a VLSI architecture for identi- fying digital pictures that have been taken from the same scene under different lighting conditions. This is an impor- tant problem related to remote sensing, satellite signal processing and other areas. The proposed VLSI architec- ture is also suitable for probability distribution compar- isons, waveform comparisons and other pachg’’ or “re- shaping” problems. As an important issue in VLSI design, the algorithm partition problems is also discussed. The backtracking procedure is discussed in much detail. The formal verification of the proposed architecture is also given. An example is used to illustrate the operation of the proposed VLSI architecture. A series of experiments has Manuscript received October 14,1988; revised February 14, 1989. This work was su ported in part by the Natural Sciences and Engineering Council of 8anada and in part by the National Science Foundation under Grant IRI-8710856. H. D. Cheng is with the School of Computer Science, Technical University of Nova Scotia, Halifax, Nova Scotia, Canada, B3J 2x4. H. S. Don is with the Department of Electrical Engineering, State University of New York, Stony Brook, NY. L. T. Kou is with the Electrical and Computer Science Department, University of California, Davis, CA. IEEE Log Number 8929989. also been conducted to verify the performance of the proposed algorithms. 11. PRELIMINARY Picture matching techniques have been used extensively for many applications such as curvature sequences detec- tion [2], template matching and pattern matching [l], char- acter recognition, aerial navigation and stereo mapping, earth resource analysis, missile guidance, intelligence gath- ering systems, and robotics, etc. [2], [4], [5]. There are many situations in which we want to match (or register) two pictures, or some given pattern with a picture [2]. For example, given two or more pictures of the same scene taken by different sensors, we want to determine the characteristics of each pixel with respect to each sensor and then to classify the pixels; given two pictures of the same scene taken at different times, we want to determine the points at which they differ and then analyze the changes that gwen two pictures of the same scene taken from different angles or altitudes, we want to identify the corresponding points in the pictures and then de- termine their distances from the camera in order to obtain three-dimensional information from the scene; given a subpattern of a picture, we want to identify this subpattern. H PI have taken place; D > 0 m In this paper, we study another very important applica- tion of picture matching, that is, to identify two digital pictures of the same scene taken under different lighting conditions. We assume that the two pictures have only relatively small movement between them such that the changes of their histograms due to the relative movement are not as prominent as those due to the different lighting conditions. Mathematically, a picture can be specified by a function of two variables F(x, y), where the value of F(x, y) is the brightness, or K-tuples of brightness in the cases of multi-spectral bands [2]-[4], and (x, y) are the coordinates in the picture plane. In black and white pic- tures, the values are called gray levels which are real, nonnegative and bounded numbers. Without loss of gener- ality, we consider pictures as matrices with integer ele- ments. Thus each element is a pixel. A gray-level his- togram of an image is a function that represents the frequency of occurrence of each gray level in the picture. 0098-4094/89/1000-1326$01.00 01989 IEEE

VLSI architecture for digital picture comparison

  • Upload
    lt

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: VLSI architecture for digital picture comparison

1326 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, VOL. 36, NO. 10, OCTOBER 1989

VLSI Arclutecture for Digital Picture Comparison HENG-DA CHENG, MEMBER, IEEE, HON-SON DON, MEMBER, IEEE, AND LAWRENCE T. KOU

Abstract-One of the important problems in picture processing is to identify two digital pictures of the same scene taken under different lighting conditions. This kind of problem can be found in remote sensing, satellite signal processing, and related areas. Identification can be made by matching histograms of the gray levels of the two pictures. A dynamic programming procedure can be designed to solve this problem with the worst case time complexity of U( m3 X n), if using a uniprocessor and a straightforward computation approach, where m and n are the numbers of the gray levels of the input picture and the reference picture, respectively. This algorithm can be easily modified to have the time complexity of O(m2 x n). In this paper, we propose a VLSI architecture consisting of m X n processing elements with extensive parallel and pipelining computa- tional capabilities to speed up the process. The worst-case time complexity is reduced to U(max( m, n)) which is a significant improvement over the uniprocessor approach. The algorithm partition problem, as an important issue in VLSI design, and the verification of the proposed architecture are also studied in this paper. A series of experiments is conducted to verify the proposed algorithms.

Index terms -Digital picture comparison, packing algorithm, very large wale integration (VLSI), algorithm partition, VLSI architecture verifica- tion.

I. INTRODUCTION ERY large-scale integration (VLSI) technology has V triggered the idea of implementing computation algo-

rithms directly in hardware with extensive parallel and pipelining computation capabilities. The employment of VLSI architectures for implementation of pattern recogni- tion and image processing algorithms has been investigated for many applications [5-18).

In this paper, we propose a VLSI architecture for identi- fying digital pictures that have been taken from the same scene under different lighting conditions. This is an impor- tant problem related to remote sensing, satellite signal processing and other areas. The proposed VLSI architec- ture is also suitable for probability distribution compar- isons, waveform comparisons and other “ pachg’’ or “re- shaping” problems. As an important issue in VLSI design, the algorithm partition problems is also discussed. The backtracking procedure is discussed in much detail. The formal verification of the proposed architecture is also given. An example is used to illustrate the operation of the proposed VLSI architecture. A series of experiments has

Manuscript received October 14,1988; revised February 14, 1989. This work was su ported in part by the Natural Sciences and Engineering Council of 8anada and in part by the National Science Foundation under Grant IRI-8710856.

H. D. Cheng is with the School of Computer Science, Technical University of Nova Scotia, Halifax, Nova Scotia, Canada, B3J 2x4 .

H. S . Don is with the Department of Electrical Engineering, State University of New York, Stony Brook, NY.

L. T. Kou is with the Electrical and Computer Science Department, University of California, Davis, CA.

IEEE Log Number 8929989.

also been conducted to verify the performance of the proposed algorithms.

11. PRELIMINARY Picture matching techniques have been used extensively

for many applications such as curvature sequences detec- tion [2], template matching and pattern matching [l], char- acter recognition, aerial navigation and stereo mapping, earth resource analysis, missile guidance, intelligence gath- ering systems, and robotics, etc. [2], [4], [5]. There are many situations in which we want to match (or register) two pictures, or some given pattern with a picture [2]. For example,

given two or more pictures of the same scene taken by different sensors, we want to determine the characteristics of each pixel with respect to each sensor and then to classify the pixels; given two pictures of the same scene taken at different times, we want to determine the points at which they differ and then analyze the changes that

gwen two pictures of the same scene taken from different angles or altitudes, we want to identify the corresponding points in the pictures and then de- termine their distances from the camera in order to obtain three-dimensional information from the scene; given a subpattern of a picture, we want to identify this subpattern.

H PI

have taken place; D > 0 m

In this paper, we study another very important applica- tion of picture matching, that is, to identify two digital pictures of the same scene taken under different lighting conditions. We assume that the two pictures have only relatively small movement between them such that the changes of their histograms due to the relative movement are not as prominent as those due to the different lighting conditions. Mathematically, a picture can be specified by a function of two variables F ( x , y ) , where the value of F ( x , y ) is the brightness, or K-tuples of brightness in the cases of multi-spectral bands [2]-[4], and ( x , y) are the coordinates in the picture plane. In black and white pic- tures, the values are called gray levels which are real, nonnegative and bounded numbers. Without loss of gener- ality, we consider pictures as matrices with integer ele- ments. Thus each element is a pixel. A gray-level his- togram of an image is a function that represents the frequency of occurrence of each gray level in the picture.

0098-4094/89/1000-1326$01.00 01989 IEEE

Page 2: VLSI architecture for digital picture comparison

CHENG et d. : VLSI FOR DIGITAL PICTURE 1327

Suppose that the gray levels are numbered with integer values from 0 to n . Then at gray level p , the corresponding histogram H( p ) denotes the number of pixels in the image with that gray level [2]-[4]. When pictures of the same scene are taken under different lighting conditions, differ- ent histograms are obtained. Histogram equalization tech- niques have been widely used to transform a picture such that its histogram approximates a given distribution. Our proposed histogram transformation technique not only can transform the histogram of a picture to that of another picture but can also provide a measure of how close the two histograms are by using a distance based on a “minimal deviation” criterion. As a result, the distance can be used for picture comparison. The details of the transformation technique are described as follows.

Let Hl and H2 be the histograms of two pictures obtained from the same scene with m and n gray levels, respectively. The following algorithm was proposed to “reshape” Hl (i.e., rescale its gray levels) so that it has the minimal deviation ( Z ) from H2 [19]. The problem can be defined mathematically as

XJ -1

I = x/-, z= min I H 2 ( j ) - c Hl(i)l

( xo 9 . . ’ * X“) / = 1

subject to 1 = X , < X , < . < X,, = m + 1

X , = integer, for i =1; . -, n. In other words, we transform the gray levels

X, - l,. . . , XI - 1 in picture 1 into gray level j in picture 2, for optimally chosen X,- , and X,, j =1; * *, n . This prob- lem can also be interpreted as a packing problem: to pack m objects of sizes H,(l); - e , H,(m) into n boxes of sizes H2(1),-. ., H 2 ( n ) in such a way that

if the ith object has been placed in the j t h box, the ( i + 1)th object is not allowed to be packed into the k th box for any k < j , and

(ii) the accumulated error due to space overpacked or leftover is minimized.

Such a problem can be solved by using dynamic pro- gramming techniques. Let S,( i) be the minimal accumu- lated error caused by transforming the gray levels 1,. , i in Hl into the gray levels 1,- . 0 , j in H2. The recurrence relation is given by

(i)

for i=O, l ; . . ,m and j = O , ; . . , n where the initial conditions are S,(O) = 0; S,(i) =

Ct=,H,(u) for all i =1;. ., m; S,(O) = CJ,=,H,(u) for all j = 1; * e , n ; and C/k=,H,(k) = 0 if i > j . The overall mini- mal accumulated error, S,,(m), can then be readily com- puted.

The straight forward execution of this procedure would yield the optimal solutions for all (i, j ) pairs with time complexity O( m3 X n ) using a uniprocessor. The algorithm can be easily improved to have the time complexity

O(m2 X n ) by rearranging the computation order in which CH,(u) is only computed once for all j ’ s . The detail is omitted here. In this paper, we want to propose an m x n VLSI array with extensive parallel and pipelining capabili- ties to speed up the computation. The time complexity for the proposed VLSI architecture is O(max(m, n ) ) , which is a significant improvement over the uniprocessor approach.

111. VLSI DIGITAL PICTURE COMPARATOR We will propose a VLSI architecture based on the

space-time domain expansion approach [ l l , 12, 14, 151, which has a natural and regular configuration and can be easily implemented by applying current VLSI technology. Another important issue in VLSI design, the algorithm partition problem, is also solved by using the proposed VLSI architecture. The proposed VLSI architecture can greatly speed up the digital picture comparison procedure due to its extensive parallel and pipelining computation capabilities. Before discussing the VLSI architecture in detail, we first propose the following algorithm to solve the computation and to perform the backtracking process which provides the “packing” sequence to obtain the mini- mal derivation of the “reshape”.

Algorithm 1: /* Digital picture comparison */ /* Let Hl and H2 be the histograms of two pictures

taken at the same scene with m and n gray levels, respec- tively */

begin SO(O) := 0; for i:= 1 to m do

for j : = 1 to n do

for j : = 1 to n do for i := 1 to m do

So( i ) := So( i - 1) + Hl( i);

Sj(0) := Sj-,(0) + H2( j ) ;

begin

for u:=O to i - 1 do

u : = u + l ;

for t := U to i do

Z : = 00;

begin

:= 0;

T := T + Hl( t ) ;

if T < Z then T : = S , - , (U)+ I H 2 ( j ) - TI;

begin Z := T; := U;

end end;

T‘ := Si- ,( i) + H2( j ) ; if T’< Z then

begin Z : = T’; I : = j;

end; S,( i ) := Z ; output Sj ( i ) ;

Page 3: VLSI architecture for digital picture comparison

1328 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, VOL. 36, NO. 10, OCTOBER 1989

Forth &in, the index-pair path w e not &own here.

to the ( i + l i ) t h PE to the (if1i)th PE

-b@ :Adder, DO time unit b needed

[ a :AccumuL(or

: Mh (..h) m d the mrraponding index-plir,

the rsult rill be kept until the mer input CO- - : I time -it delay

ID

Fig. 1. The structure of the (i, j) th processing element.

Append index-pair ( I , j - 1) to index-pair (i, j ) , when the identification signal arrives, and form ((1, j - 1),(i, j ) ) .

end end.

We can now build a VLSI array with rn X n processing elements (PE's) for the above algorithm. Each processing element has a subtractor which will produce the absolute value of the two inputs difference, a comparator which will compare two input values and output the smaller value with the corresponding index-pair to the next processing element below it. The structure of each processing element is shown in Fig. 1. The symbolic representation for each processing element is shown in Fig. 2. For clarity, we will not show the data and index-pair channels in the figures. The functions performed by the (i, j)th processing element are as follows:

Input: H2( j ) , outputs of ( i - 1, j) th processing element, Hl( i), Hl( i + l), . . . , Hl( m), index-pair, Si- i - 1) and SjPl(i).

Output: S,( i) and index-pair to the right processing element when the identification signal arrives, and the intermediate results to the processing element below.

Operation: Each processing element has a local connec- tion to the processing element beneath it which will accept the intermediate results including the accumulated errors and the index-pairs. it also has local connection to the right processing element which will receive S,(i) and in-

dex-pair (i, j ) when the identification signal arrives. Each PE can perform accumulation, comparison and compute absolute value la - bl. Each of the computations requires one time unit. The adder uses the combinational circuit which does not require the time unit or its delay is much smaller than a time unit. The data move from one PE to another in one time unit. There is also an identification signal which starts at the fifth time unit, and it takes two time units to move downward one PE and one time unit to move rightward one PE.

The details are discussed as follows. The input data are arranged in a way to ensure the first datum of the i th row data arrives at the (i, 1)th PE at the (2i - 1)th time unit.

1) Input Hl(i) to the (i, j)th PE which calculates the accumulation in one time unit. Then (H,( j ) - Hl(i)( will be computed in another time unit.

Page 4: VLSI architecture for digital picture comparison

CHENG et al.: VLSI FOR DIGITAL PICTURE 1329

ID

Sj- ,( i - 1) arrives at the (i, j)th processing element and it performs S!:,(i -l)+ I H 2 ( j ) - H,(i)l opera- tion without requinng time unit. When the result of

I I

arrives at the (i, j)th PE from the ( i - 1, j)th PE, it is compared with the result of step 2) whch takes one time unit. Then

is delayed one time unit by the delay element. S,- ,( i ) + H,( j ) is computed, and is compared with the result of step 3) since the identification signal arrives at the same time. This takes one time unit. S,(i) and the corresponding index-pair will then be sent to the (i, j + 1)th PE.

The entire VLSI structure will be an m X n array con- sisting of the PE‘s shown in Fig 1. In the following we will discuss the VLSI implementation of Algorithm 1 in detail.

Algorithm 2: VLSI Implementation of Algorithm I This algorithm will describe the structure and the inter-

connection of the entire array. Input: Gray levels of the input picture-H,(i) for 1 Q i <

m, and those of the reference picture-H2(j) for 1 Q j Q n; indexes, index-pairs; initial conditions: So(0), So( i), and S,(O) (for 1 Q i Q m, 1 d j Q n - 1); and the identification signal.

Output: The accumulated error Sj( i ) and corresponding index-pairs.

Operation: Move the gray levels H , ( j ) of the reference picture, the identification signal and the index j from the top to the bottom one PE per time unit. Move the gray levels H,(i) of the input picture and index i from the left to the right of the VLSI array one PE per time unit. The

identification signal will be sent at the fifth time unit and it moves downward one PE in every two time units and rightward one PE per time unit. When the identification signal arrives at the ( i , j)th PE, it will open the connection channel to the comparator which connects the right side processing element, and the result Sj(i) will be sent to the processing element (i, j + 1). The symbolic representation of the PE is shown in Fig. 2. There is a correspondence between the index-pairs and the “packing” sequence. An index-pair {(i,, j,), ( i 2 , j , + 1)) means that Z H , ( k ) , i, < k < i,, will be packed into boxes, H2( j , + 1). For example, (2,2), (4,3) means that H,(3) and H,(4) are packed into H2(3). The entire VLSI architecture for digital picture comparison is shown in Fig. 3. Here we want to indicate that the proposed architecture can also be used to solve other tasks in operation research and related areas. For many applications we not only need to know the minimal derivation but also need to know the “packing” sequence used for reaching the goal. To obtain the “packing” se- quence, we have to perform a backtracking procedure which can be done in several ways as described below.

Output the accumulated error matrix S and /or the index-pairs to the host machine which will perform the backtracking procedure. Attach another VLSI module and use the tag of the index pair as the search key to perform the back- tracking procedure. Expand the ‘append’ operation such that it appends index into the index list of its ancestor. An index list is formed by appending an index or an index list. We can use index (m, n) as the tag to find the “packing” sequence. This will change the backtrack- ing procedure into forward and speed up the com- putation. However, it requires a large output chan- nel capacity, specially for the PE located at the uppemght-hand comer. The upper bound of the channel capacity for the ( i , j)th PE will be ( i + j + 1).

Page 5: VLSI architecture for digital picture comparison

1330 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, VOL. 36, NO. 1 0 , OCTOBER 1989

J-L .-r--- Counter

Fig. 4. VLSI architecture for P digital pictures comparison.

4) Add an index-pair register to each PE which con- sists of two parts, the first part for the first index and the second part for the second index. The second part of the index-pair register will be com- pared with the tag. If they match, the second part is output into the output channel and the first part is output as the tag to its top and left side neighbors. The tag will move until it matches with another index pair. The procedure will then continue. At the (2m + n + 3)th time unit, a backtracking signal will be sent from the right-bottom PE and moves along the channels connecting to the left neighbor and the one on the top of it, each takes one time unit. The index ( m , n ) is used as the tag of the ( m , n)th PE. It needs at most ( m + n ) time units to complete this prwedure.

From the above description, it can be seen that the proposed architecture can compare two digital pictures by transforming the gray levels. In many applications, only the error of the “packing” is required. In such cases, we can simpily the structure of the PE and the entire VLSI architecture further. If there are P digital pictures which are compared with the reference picture (or an input digital picture is compared with P reference digital pic- tures), we can make a P-time expansion [ll], [12], [14] as shown in Fig. 4. The time complexity will be O(max(P X m, n)) or O(max(P x n , m ) ) compared to the time com- pbkity O ( m 3 x n x P) for a uniprocessor. We can use a three dimensional array ( P X m x n processing elements) to solve this problem, the time complexity will be O(max(P, m , n ) ) . For indicating the most matched digital picture, we number the digital pictures and add a register consisting of two parts. One part is for the summation error and another part is for the index of the numbered digital pictures. We also add a counter which is initially set to zero and starts at the (2m + n +3)th time unit.

2

(mm) (II,SZ) (22 ,~s ) ( 4 ~ )

Fig. 5. An example of digital picture comparison.

2

(mm) (II,SZ) (22 ,~s ) ( 4 ~ )

Fig. 5. An example of digital picture comparison.

The operation of the register is as follows:

begin error.register := 00;

if error.register > error.array then begin err or .register := err or. array; index.register := counter;

end end

The final result of index.register gives the index of the best matched digital picture. If we want to find the “pack- ing” sequence for the best matched picture, we can reinput the best matched picture with the reference picture and perform the backtracking procedure.

An example to illustrate the operation of the proposed algorithm and architecture is gven in Fig. 5. The input picture has five gray levels Hl(l) = 8, H,(2) = 2, H1(3) = 3H1(4) = 5 , H1(5) = 2. The reference picture has four gray levels H2(1) = 4, H2(2) = 3, H2(3) = 9, H2(4) = 4. The opti- mal accumulated error is 8 and the index-pairs are

Page 6: VLSI architecture for digital picture comparison

CHENG et al. : VLSI FOR DIGITAL PICTURE 1331

Using (5,4) as the search key, we can find the “packing” sequence as follows. First, we output (5,4) and find the index-pair { (4,3)( $4)). Then we output (4,3) and use it to find the index-pair {(2,2)(4,3)}. Continuing this proce- dure, we will have { (O,O), (1, l), (2,2), (4,3), ($4)) as the “ packing” sequence.

IV. VERIFICATION OF THE PROPOSED ALGORITHM We will not adopt the conventional assumption that the

computation time of the processing element of a VLSI architecture is one time unit since it cannot reflect the structural feature of a particular design, specially, when there are multiple data channels. We have to consider the structure of each processing element, and prove that the proposed architecture can make the right data meet at the right PE at the right time and perform the correct func- tions and produce the correct result. To verify Algorithm 2, we need the following lemmas and a theorem.

Lemma I : The identification signal arrives at the (i, j)th processing element at the (2i + j + 2)th time unit.

Proof: According the structure in Fig. 3, the identifica- tion signal is sent at the fifth time unit. It needs 2(i - 1) time units to reach the i the row, j time units to amve at the (i, j ) th PE. Totally, 5 + 2i - 2 + j - 1 = 2i + j + 2 time units.

Lemma 2: XLH,(u) will be computed at the (U, j)th PE at the ( i + U + j - 2)th time unit, for all 1 < U < i, 1 < i < m and 1< j < n .

Proof: First consider the case where j=1. From the data arrangement in Fig. 3, the first input of the uth row will arrive at the boundary of the array at the (2u - 1)th time unit. Then (i - U ) time units are needed to compute r u H 1 ( U). Totally, 2u - 1 + (i - U) = i + U - 1 time units are needed. Since the computation of the (U, k)th processing element will start one time unit earlier than the one of the (U, k + 1)th processing element, the time units needed for the (U, j ) th processing element will be i + U + j - 2 to produce the summation.

Theorem: After receiving the inputs, S,(i) will be pro- duced at the (2i + j + 3)th time unit, for all 1 < i < m and I < j < n .

Proof: We prove the theorem by double induction on i and j.

Basis: First consider the i = j = 1 case. So(0) and So(l) are fixed values which exist already. At the first time unit, H,(l) inputs into PE (l,l), then it takes one time unit to compute the accumulation. JH2(1)- H,(l)J is then calcu- lated by spending another time unit. it will then be added to So(0). The result will be compared with So(l) at the fourth time unit. It is delayed one more time unit by the delay element. At the fifth time unit, the result will be compared with S,(O), when the indentification signal ar- rives at the (1,l)th PE. It will take one more time unit. Finally, S,(l) is output at the sixth time unit; 6 = 2 x 1 + 1 + 3 = 2 X i + j + 3 .

Induction Step: The induction hypothesis is that all (p , q)th PE can produce the outputs and the index-pairs at the ( 2 ~ p + q + 3 ) t h time unit, for all l < p < i and l < q Q j.

Now consider the (i + 1, j)th PE. According to Lemma 2 and the induction hypothesis, Y:’H,(u) will be com- puted by the (U, j ) th PE at the ( i + l + u + j - 2 ) t h time unit, for all l < u < i , l < i < m and l < j < n . The com- parators are connected in a pipeline version, so the value,

I i + l

will be output from the (i, j - 1)th PE at the (2i + j - 1 + 2 + 3)th time unit. Also S,-,(i + 1) will be input at the (2 X (i + 1) + j - 1 + 3)th time unit. At the same time N =

SJ- ,( i + 1) + H2( j ) will be computed. According to Lemma 1, the identification signal arrives at the (i + 1, j ) th PE at the (2(i + 1) + j + 2)th time unit. M and N will then be compared; min { M , N } = S, (i + 1) will be sent to the (i + 1, j ) th PE at the (2i + j + 9 t h time unit. Since SJ+l(i + 1) will be computed one time unit later than S,(i + l), S,+,(i + 1) will be obtained at the (2i + j + 6 ) = (2(i + 1) + ( j + 1) + 3)th time unit. We thus complete the proof by induc- tion.

Corollary: The accumulated error and the index pairs can be obtained at the (2m + n + 3)th time unit.

Proof: Follow the above theorem and let i = m and j = n .

V. ALGORITHM PARTITION A. Computational Model and Partition Rule

We can use a one or a two-dimensional array when the size differs from the problem size, according to the space-time-domain expansion approach, [ll], [12], [14] to solve the “ packing” problem.

The space-time expansion method has been used for building VLSI architectures of vector inner-product, ma- trix-matrix multiplication, convolution computation, com- parison operations in relational database, fast fourier transformation (FFT), hierarchical scene-matching, con- text-free language recognition, pattern-matching, curve de- tection, clustering analysis, etc. [9], [ll], [12], [U]-[17]. By using the computational model based on the space-time domain expansion method, we can partition the recursive algorithms and solve them on fixed-size VLSI architectures

There are several measurements for a computational task. We will use problem size N which is the number of operations needed to solve the given task to measure the computational task. The computation model of a VLSI architecture obtained by the space-time domain expansion can be described by the tuple (K, , K,, K, , Q,, . , Q,). Here K, denotes a K,-space expansion along the x , direc- tion and Q, denotes a Q,-time expansion in the jth-time expansion. K , equals one if there is no space- expansion along x , direction, and Q, equals one if there is no expansion in the j th time-expansion. A necessary con-

[91, i111, [121, [151-[171.

Page 7: VLSI architecture for digital picture comparison

1332 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, VOL. 36, NO. 10, OCTOBER 1989

dition for solving a recursive task by using the space-time domain expansion is

I IK, x IIQ, N .

This inequality indicates that the VLSI architecture based on the space-time expansion can perform a number of operations equal to the product of all the space-time domain expansions. Partitioning Rule

When the computation task size is larger than the VLSI architecture size, we have to partition the algorithm to solve it on a fixed-size VLSI architecture. According to the avove inequality, if we make a k-space condensation along the x , direction, then we also have to make a k-time expansion. Multi-dimensional condensations need multi- time expansions which may require some input data to be used repeatedly. See [9], [ll], [12], [14], [16], [17] for details.

B. Using One-Dimensional array First we assume that the size of the array is m. We can

consider it as an m-space expansion along the x1 direction [l l] , [12], [14]. The architecture of the PE is the same as the one Fig. 1. The input channels will form the queues. The entire structure is shown in Fig. 6. The control signal is used to indicate that the linear array computes S l ( j ) or SI( j ) for i > 1. The input will repeat n times and the time complexity will be O(m X n ) .

C. Using Two-Dimensional Array with Dimensions k X I We have already discussed the case where k = m and

I = n. We now consider the other cases. According to the partition rule [l l] , [12], [14], we have to make an [m/kl- time expansion and an [n/ll-time expansion. There are also queues to feed the data back. The structure is shown in Fig. 7. The lengths of the queues will vary in accordance with the values of m and n to make the right data meet at the right PE at the right time. T h s will cause some difficulty to the design of the control system and the queue structures. Hence, we prefer to use either a sufficiently large size VLSI architecture or a one-dimensional array tc solve the partition problem.

VI. EXPERIMENTS We have conducted a series of experiments to verify thG

proposed algorithms. The office scenes, as shown in Fig. 8(a)-(d), were used for the experiments. The goal is to demonstrate the capability of identifying the same scene under different lighting conditions by using the proposed method. Fig. 8(a) is the picture of four office scenes (in the same office). Fig. 8(b)-(d) show the pictures of the same office scenes with more lights in the increasing order. Fig. 9(a)-(d) show the histograms of the pictures in Fig. 8(a)-(d). As can be seen in Fig. 9, the hstograms tend to be smoothed out and the peaks in the histograms have shifted when more lights are added to the scene. However, the general shape of the hstogram of each scene remains unchanged. The pictures in Fig. 8(a) are used as the reference pictures and those in Fig. 8(b)-(d) are to identi-

I

I I

1

i = 1, 2, 3, . ,n

Fig. 6 . Digital picture comparison using one-dimensional VLSI array

Fig. 7. Using k X I two-dimensional array to solve the algorithm parti- tion problem. (1) upper left; (2) upper right; (3) lower left; (4) lower right; reference pictures. (b)-(d) Pictures taken under different lighting conditions. The four office scene pictures are numbered: (1) upper left, (2) upper right, (3) lower left, (4) lower right, respectively.

fied with respect to reference pictures. After applying the proposed algorithm, the results obtained are shown in Table I(a)-(d). In these tables, the first columns are the indexes of the reference pictures and the first rows are the indexes of the pictures to be identified. The entries of these tables are the “packing” distances (errors) between the different pairs of pictures. Note the non-symmetric entries of these tables. From these tables, we can see that the diagonal entries are all smaller than the off-diagonal en- tries. This is due to the fact that the basic histogram differences between scenes are greater than the differences caused by different lighting conditions. It also indicates that the proposed algorithm is suitable for identifying the

Page 8: VLSI architecture for digital picture comparison

CHENG et al. : VLSI FOR DIGITAL PICTURE

1 2 3 4

1333

I

2

3

4

TABLE I(a) DISTANCES OF PICTURES TAKEN UNDER LIGHTING CONDITION (a)

VERSUS REFERENCE PICTURES

0 183250 183725 185149

48188 0 78473 81387

72239 95191 0 83384

81175 73183 47792 0

TABLE I(b) DISTANCES OF PICTURES TAKEN UNDER LIGHTING CONDITION (b)

VERSUS REFERENCE PICTURES

(c) (4 Fig. 8. The office scene pictures used in experiments. (a) The reference

pictures. (b)-(d) Pictures taken under different lighting conditions, The four office scene pictures are numbered: (1) upper left. (2) upper right. (3) lower left. (4) lower right.

same scene under different lighting conditions. In our experiments, it took about 140 min CPU time of a VAX11/780 to compute the distance between a pair of pictures. This large computation time also justifies the need for parallel and pipelining techniques to speed up computation. By using the VLSI architecture proposed in this paper, it will take only about 0.5 ms to finish the same computation by using the same circuit technology, i.e., the same CPU time cycle length as that of VAX 11/780 since it will speed up O ( n 3 ) .

VII. CONCLUDING REMARKS We have proposed a VLSI architecture for digital pic-

ture comparison. The time complexity is O(max(m, n ) ) by using two-dimensional m X n array, where m is the num- ber of gray levels of the input digital picture and n is the number of gray levels of the reference digital picture. The comparison process will have the time complexity O( m3 x n ) if using a straightforward computation approach on a uniprocessor. If there are P reference pictures, using the proposed architecture, the comparison process will be solved in time O(max( m X P, n)); while using a uniproces- sor, the time complexity will be O(m3 X n x P). If a three- dimensional array is used, this problem can even be solved in time O(max(m, n, P)). The important issue of the VLSI design-algorithm partition problem is also discussed. For- mal verification of the proposed VLSI architecture is given. To apply our algorithm to Dicture comDarison. we assume

TABLE I(c)

VERSUS REFERENCE PICTURES DISTANCES OF PICTURES TAKEN UNDER LIGHTING CONDITION (C)

TABLE I(d) DISTANCES OF PICTURES TAKEN UNDER LIGHTING CONDITION (d)

VERSUS REFERENCE PICTURES

tha the two pictures have only a very small rL.,Jtive movement such that the change of their histograms caused by the relative movement are not as prominem as those caused by different lighting conditions. Due to the limit of our picture database, a series of experiments based on pictures of our office scenes have been conducted to verify the performance of the proposed algorithm. The results are

_ _ - I ~~~. satisfactory. For future study, moreiest pictures are needed

Page 9: VLSI architecture for digital picture comparison

1334 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, VOL. 36, NO. 10, OCTOBER 1989

his togram his togram his togram his togram

gray levels

( 1 )

gray levels

(2)

gray levels gray levels

(2)

his togram his togram

1.000

1,000

10000

a o m

gray levels

(4)

g r a y levels gray levels

(4)

h i s t o g r a m his togram

gray levels

(1)

gray levels

(2)

gray levels gray levels

(1) (2)

h i s togram his togram his togram his togram

L.000 :::::L 8000 ~~ :;L El4 6000

..3c.o ,000

1000

IS so irmo 11s 150 11s m a 125 IS I I S 50 1 D 123 IS0 115 100 22s 1 5 0 23 so 7 L 0 11s 150 115 200 I2 I S 0

gray levels gray levels gray levels

(3) (4) (3)

(4 (4 Fig. 9. (a)-(d) The histograms of the pictures shown in Fig. U(a)-(d), respectively.

to prove the reliability of this approach. Using histograms for picture comparison, one potential problem may hap- pen, that is, two totally different pictures may have similar histograms. This will certainly cause classification errors. To handle this problem, we need to compute the distances between each pair of pictures in the database to make sure that they all meet a safety margin, then the input pictures with different lighting conditions can be more correctly compared and classified. If the distance between two pic- tures in the database cannot meet the safety margin,

additional measurements may be taken to assist in the classification task. In general, we can always expect some classification error in a pattern recognition system because of the failure of pattern features. Certainly, adding more “significant” features can improve the classification per- formance. The proposed algorithm is also suitable for probability distribution comparisons and waveform com- parisons. The proposed VLSI architecture can be useful for remote sensing, satellite signal processing and other related applications requiring real-time processing.

Page 10: VLSI architecture for digital picture comparison

CHENG et al. : VLSI FOR DIGITAL PICTURE 1335

REFERENCES K. S. Fu, Syntactic Pattern Recognition with Application. Englewood Cliffs, NJ: Prentice-Hall, 1982. A. Rosenfeld and A. C. Kak, Digital Picture Processing. vol. 2, 2nd edition, Academic, New York: 1982. T. Pavlidis, Algorithm /or Graphics and Image Processing. New York: Computer Science Press, 1982. D. H. Ballard and C. M. Brown, Computer Vision. Englewood Cliffs, NJ: Prentice-Hall, 1982. R. J. Offen, VLSI Image Processing. Collins Professional and Technical Books, William Collins Sons & Co. Ltd., 1985. K. S. Fu, VLSI for Pattern Recognition and Image Processing. Berlin, Heidelberh, Springer-\lerlag, 1984. K. H. Chu and K. S. Fu, VSLI architectures for high speed recognition of general context-free languages nd finite-state lan- guages,” in Proc. 9th Ann. Int. Symp. Comput. Arch., Austin, TX, Apr. 1982. Y. T. Chiang and K. S. Fu, “Parallel parsing algorithms and VLSI implementations for syntactic pattern recognition,” IEEE Trans. Patt. Anal. Mach. Intell., vol. PAMI-6, May 1984. H. D. Cheng and K. S. Fu, “Algorithm partition and parallel recognition of general context-free languages using fixed-size VLSI architecture,” Patt. Recog., vol. 19, no. 5, 1986. H. H. Liu and K. S. Fu, “VLSI arrays for minimum-distance classifications,” in VLSI for Pattern Recognition and Image Process- ing, (Ed. K. S. Fu), Berlin: Springer-Verlag, 1984. H. D. Cheng and K. S. Fu, “VLSI architectures for a string matching and pattern matching,” Part. Recog., vol. 20, no. 1, 1987. -, “VLSI archtetture for dynamic time-warp recognition of hand-written symbols, IEEE Trans. Acoust., Speech, Signal Pro- cessing, vol. ASSP-34, June 1986. M. J. Clarke and C. R. Dyer, “Curve Detection in VLSI,” VLSIfor Pattern Recognition and Image Processing, (Ed. K. S. Fu), Berlin: Springer-Verlag. 1984. H. D. Cheng and K. S. Fu, “Algorithm partition for a fixed-size VLSI architecture using space- time domain expansion,” Proc. 7th Symp. Comput. Arithmetic, Urbana, IL, June 1985. H. D. Cheng, W, C. Lin, and K. S. Fu, “Space-time domain expansion approach to VLSI and its application to hierarchical scene matching,” I € € € Trans. Part. Anal. Mach. Intel., vol.

H. D. Cheng and C. Tong, “VLSI architecture for curve detection,” I € € € Comp Euro 87, Hamburg, Germany, May 1987. H. D. Cheng and C. Tong, “Clustering analyser for pattern recog- nition,” Visual Communication and Image Processing, Cambridge, MA, Oct. 1987. H. D. Cheng and C. Tong, “VLSI curve detector,” Patt. Recog., to be published. W. M. Chow and L. T. Kou, “Matching two digital pictures,” in Proc. Inr. Computer Symp., Dec. 1978.

PAMI-7, 1985.

Q

Heng-Da Cheng (S’81-M85) received the B.S degree in computer science from the Harbin Polytechnique Institute, Harbin, China, in 1967. the M.S. degree in electrical engineering from Wayne State University, Detroit, MI, in 1981, and the Ph.D degree in electrical engineering from Purdue University, West Lafayette, IN, in. 1985.

He was a Teaching and Research Staff Mem- ber of the Computer Science Department, Harbin Shipbuilding Institute, Harbin, from 1971 to

1976. He worked as a Techmcian at the Harbin Railway Science and Technique Research Institute, Harbin, from 1976 to 1978. He was a graduate student of the Chinese Academy of Sciences from 1978 to 1980. He was also a Research Assistant in the Advanced Automation Research Laboratory, School of Electrical Engineering, Purdue University, from 1983 to 1985. He was a Visiting Assistant Professor at the University of California, Davis, from 1985 to 1986, and an Assistant Professor, Com-

puter Science Department, Concordia University, Montreal, from 1987 to 1988. Since 1988 he has been an associate professor with the Technical University of Nova Scotia, Halifax, Nova Scotia. His research interests include parallel processing, parallel algorithms, pattern recognition, artifi- cial intelligence, VLSI architectures and advanced computer architectures for pattern recognition, image processing and artificial intelligence.

Dr. Cheng is a member of the Association of Computing Machinery.

Q

His research interests processing, computer

Hon-Son Don (S’82-M85) received the B.S. de- gree from National Taiwan University, Taipei, Taiwan, in 1977, the M.S. degree from the Uni- versity of Texas, Austin, in 1981, and the Ph.D. degree from Purdue University, West Lafayette, IN, in 1985, all in electrical engineering.

He joined the Department of Electrical Engi- neering, SUNY at Stony Brook in 1985, where he is currently an Assistant Professor.

Dr. Don received an Outstanding Paper Award from the Pattern Recognition Society in 1985.

include computer vision, pattern recognition, image graphics, artificial intelligence, and robotics.

3

Lawrence T. Kou received the B S degree in electrical engineering from the National Taiwan Umversity, Taipei, Tawan, in 1943, the M.S. degree in electrophysics form the Polytechnic Institute of Brooklyn, NY, in 1967, and the Ph.D degree in computer science from the Um- versity of California, Berkeley, in 1973.

From 1973 to 1974, he was an assistant profes- sor at the Cornell University. From 1974 to 1981, he was a research staff member at IBM Watson Research Center, NY. During this pe-

riod, he was also an adjunct assistant professor and later adjunct associ- ate professor in the Computer Science Departments at the State Umver- sity of New York at Stony Brook, Columbia University, New York Umversity, and Polytechmc Institute of New York. Since 1981, he joined the faculty of the University of California, Davis, first as an associate professor and then a professor in the Computer Science Division of the Electrical Engineering and Computer Science Department. His research interest is mainly in algorithm analysis emphasizing approximation algo- nthms. He has been engaged in industrial research projects related to compiler code optimization, relational database systems and VLSI circuit testing