Bilateral Symmetry Detection for Real Time Robotics Applications

8/3/2019 Bilateral Symmetry Detection for Real Time Robotics Applications

1/26

1

Bilateral Symmetry Detection for Real Time

Robotics ApplicationsWai Ho Li, Alan M. Zhang and Lindsay Kleeman

Intelligent Robotics Research Centre

Department of Electrical and Computer Systems Engineering

Monash University, Clayton, Victoria 3800, Australia

{ Wai.Ho.Li, Alan.Zhang, Lindsay.Kleeman } @eng.monash.edu.au

Abstract Bilateral symmetry is a salient visual feature ofmany man-made objects. In this paper, we present research thatuse bilateral symmetry to identify, segment and track objects inreal time using vision. Apart from the assumption of symmetry,the algorithms presented do not require any object models, suchas colour, shape or three dimensional primitives. In order toremedy the high computational cost of traditional symmetry

detection methods, a novel computationally efficient algorithmis proposed. To investigate symmetry as an object feature, ourfast detection scheme is applied to the tasks of object detection,segmentation and tracking. We find that objects with a lineof symmetry can be segmented without relying on colour orshape models by using a dynamic programming approach. Objecttracking is achieved by estimating symmetry line parametersusing a Kalman filter. The tracker operates at 40 frames-per-second on 640x480 video while running on a standard laptopPC. We use ten difficult real world tracking sequences to testour approach. We also quantitatively analyze symmetry as atracking feature by comparing detected symmetry lines againstground truth. Colour tracking is also performed to provide aqualitative comparison.

Index Terms bilateral symmetry, feature detection, real time,fast, model-free, computer vision, segmentation, tracking

I. INTRODUCTION

Computer Vision systems employed in robotics for the pur-

poses of detecting, segmenting and tracking objects generally

require a priori models. These object models range in com-

plexity from simple colour histograms to three dimensional

mesh grids consisting of thousands of polygons. This initial

knowledge allows for robust operation, especially when several

models are used in a synergistic manner. For example, Boosted

Haar classifiers [Viola and Jones, 2001] allow robust multi-

scale tracking of objects after offline training on positive and

negative data sets of target objects.

Collecting data sets and constructing prior models for every

object that can appear in a robots environment is neither prac-

tical nor cost effective for many real world situations. In many

environments, novel objects may appear without warning. For

example, a domestic robot can expect to encounter new objects

regularly, such as cans of soft drinks, cups and bowls, when

performing cleaning tasks. For increased adaptability, a robot

should possess some means of segmenting and tracking novel

objects, in real time, without any a priori models.

The work presented here attempts to remedy this problem

by providing a set of model-free solutions for object detection,

segmentation and tracking. In order to do this robustly, the set

of objects we target are limited to those with strong bilateral

symmetry. The algorithms presented here can operate rapidly

on rigid objects with bilateral symmetry. The algorithms work

especially well for objects with surfaces of revolution, such as

cups, bottles and cans, as they appear bilaterally symmetric

from many view points. Our methods will not work for

deformable symmetric objects such as humans or animals. This

research is designed to deal with objects in a limited context,

which allows for the use of a model-free approach.

The motivations for using symmetry as an object feature

are as follows. Gestalt suggested that symmetry is one of

several salient features humans use to visually locate and

model objects. Indeed, many man-made objects are bilaterally

symmetric. Apart from aesthetics, many objects are intention-

ally designed to be bilaterally symmetric for practical reasons.

For example, furniture maybe bilaterally symmetric to provide

better balance under load. Most drinking utensils, such as

bottles and cups, are solids of revolution to allow for easymanipulation, which makes them bilaterally symmetric when

viewed from the side. As such, bilateral symmetry appears

to be a salient feature of objects and should be useful for a

variety of robotic applications.

To avoid confusion in later sections, our definition of bilat-

eral symmetry is as follows. Bilateral symmetry is represented

as a line parameterized in a polar fashion, as shown in

Figure 1. A bilateral symmetry line is a mirror line that

bisects an object to provide two symmetric halves. This kind

of bilateral symmetry can be seen in Subfigure 7(a) and 7(b).

Another point of note is the use of the term ground truth.

By ground truth, we mean any measurement of an observed

quantity, such as an object symmetry line, that can be used

to validate other measurements. The main assumption is that

ground truth is a more accurate measurement of the physical

world than the quantities it is compared against.

Bilateral symmetry has traditionally been used in offline

applications, due to the high computational cost of detection.

To remedy this, the authors have developed a fast bilat-

eral symmetry detection algorithm [Li et al., 2005]. This

algorithms implementation and performance are detailed in

Section II. A comparison between our algorithm and the

Generalized Symmetry Transform [Reisfeld et al., 1995] is

also presented in the same section. The detection algorithm

has been successfully applied to the tasks of static object


2/26

2

segmentation and object tracking. Earlier versions of the object

segmentation and tracking algorithms can be found in [Li

et al., 2006] and [Li and Kleeman, 2006] respectively. This

paper provides more detailed coverage of our segmentation

and tracking research. In addition, new experiment results

on long tracking sequences provide quantitative analysis of

symmetry as a tracking feature. We also try colour centroid

as a tracking feature for the same video sequences to providea qualitative comparison with a well established model-free

tracking approach operating on different visual cues.

Object segmentation is a useful tool for robots that must

find, classify and interact with objects. In situations where pre-

built object models are unavailable, a fast model-free segmen-

tation approach is needed. Our symmetry detection algorithm

has been applied to the task of locating and segmenting static

objects. We show that object contours can be found in noisy

images, without the use of prior object models, by applying

a dynamic programming approach to find symmetric edge

contours. This model-free segmentation approach can operate

in real time on 640x480 pixel images. Images of symmetric

and partially-symmetric household objects, such as cups and

bottles, are used to test the segmentation approach. The

algorithm and experimental results are detailed in Section III.

Robots that deal with moving objects generally require

the ability to perform visual tracking in real time. Object

movement can come about through purposeful robotic ma-

nipulation or accidentally, as an unintended consequence of

the robots actions. A human user may also move objects

for teaching purposes. As mentioned earlier, many man-made

objects are bilaterally symmetric. The task of collecting and

labeling images for every single object that can appear maybe

highly difficult or impossible. As such, a robot operating in

such environments should be equipped with some means totrack novel objects in real time. Ideally, the method can also

allow the construction of better models over time through the

collection of object data. Section IV covers our work on real

time object tracking using symmetry directly addresses these

issues. Experiments are carried out on difficult tracking se-

quences, including cases where the target object is transparent,

subjected to occlusions and undergoing large orientation and

scale changes. Some sample result frames from the tracking

experiments can be found in the Appendix. Time trials show

that the tracking system can operate at 40 frames per second

on 640x480 images.

A quantitative analysis of bilateral symmetry as a tracking

feature is also performed by mounting a test object on a

custom-made pendulum. The details of the experiment and

analysis are available in Section V. The symmetry tracker

is also qualitatively compared against a simple HSV colour

tracker, which operates on different visual cues. Sample video

frames from the test sequences and error plots are located in

the Appendices.

II . FAS T SYMMETRY DETECTION

A. Introduction

This section provides a brief survey of research on symme-

try detection. Levitt was the first to detail a Hough transform

scheme to detect bilateral symmetry in point clusters [Levitt,

1984]. Ogawa suggested a method of symmetry detection that

can be used to find symmetry between edge segments [Ogawa,

1991]. The Generalized Symmetry Transform [Reisfeld et al.,

1995] can detect bilateral and radial symmetry at different

scales using gradient information. Yips [Yip, 2000] symmetry

detector can detect skew symmetry in edge images using

a multi-pass Hough transform approach. More recently, afeature-based bilateral and radial detection scheme [Loy and

Eklundh, 2006] has been used to find symmetry in clusters

of feature points. A method based on matching quartets of

SIFT [Lowe, 2004] features to detect bilateral symmetry under

perspective [Cornelius and Loy, 2006] has also been proposed.

While real time radial symmetry detection [Loy and Zelin-

sky, 2003] has been achieved, bilateral symmetry detectors

are generally used in offline processing applications due to

their high computational cost. For example, the Generalized

Symmetry Transform operates on every possible pixel pair

in the input image. It has a computational complexity of

O(n2), where n is the total number of pixels in the inputimage. Yips symmetry detector uses mid-point pairs, each

generated from two edge pixel pairs. The algorithm has acomplexity of O(n4edge), where nedge is the number of edgepixels. Due to their high complexity, real time detection using

these algorithms cannot be achieved for large images using

standard computing hardware at the time of writing.

Our Fast Global Reflectional Symmetry Detection algo-

rithm [Li et al., 2005] is inspired by the Hough transform

approach of Levitt [Levitt, 1984]. We improve detection speed

by rotating edge pixels through discrete detection angles as

described in Subsection II-B. When used in tracking applica-

tions, we limit the angle range of detection to further improve

performance.

B. Algorithm Description

Our approach performs symmetry detection on an images

edge pixels. In our experiments, we have found that a million-

pixel image reduce down to an edge image with roughly

10000 to 30000 non-zero pixels. Of course, this number

will depend on the visual complexity of the scene, and the

characteristics of the edge filter. Apart from reducing data size,

symmetry detection also benefit from the noise rejection, edge

linking and weak edge retention properties of edge filters. The

Canny [Canny, 1986] edge filter is used to generate all the edge

images used in our experiments.

The polar parametrization described in Figure 1 is used for

the detected symmetry lines. Symmetry lines are represented

by their angle and distance relative to the center of the image.

Edge pixels are grouped into pairs and each pair votes for

a single line in parameter space. Unlike traditional Hough

Transform [Duda and Hart, 1972], which requires multiple

votes per edge pixel, our approach only requires a single vote

per edge pixel pair. This convergent voting scheme is similar

to that utilized in Randomized Hough Transform [Xu and Oja,

1993].

Algorithm 1 details the fast symmetry detection method.

Edge pixels are rotated about the center of the image for


3/26

3

Fig. 1. An edge pixel pair, shown in black, voting for a symmetry line withparameters R and . Note that

2< Dmax then

continue to next pair

x0 (x2 + x1)/2Increment H[x0][index] by 1

for i 1 to Nlines dosym[i] max(Rindex, index) HBins around sym[i] in H 0

Fig. 2. Edge pixel rotation and discretization procedure. Edge pixels () arerotated by about the image center, marked as a +. Then, the horizontalcoordinate of the rotated pixels are inserted into the 2D array Rot. Pixelsfrom the same scanline are placed into the same row in Rot. Pixels in thesame row are paired up and each pair votes for a single symmetry line inR parameter space. The rows containing [3,1] in Rot will vote for thedashed symmetry line a total of five times

on the Hough angle quantization and the vertical quantization

used in Rot. Assuming uniformly distributed edge pixelsacross the rows of Rot, the algorithm requires BINS

D n2edge

voting operations, where BINS is the number of Houghangle divisions and D is the number of rows in Rot. Theaccuracy of the method can be improved by increasing the

number of angle divisions, sacrificing execution time as a trade

off. The reverse is true if we increase the number of rows in

Rot. In essence, the BINSD

term allows for an adjustable trade

off between detection accuracy and computational efficiency.

Symmetry lines are detected by looking for peaks in the

Hough accumulator. The final for-loop in Algorithm 1 de-


4/26

4

scribes the non-maxima suppression algorithm used for peak

finding. Maxima in the Hough accumulator H are founditeratively. Each iteration is followed by setting the maxima

and its surrounding neighbourhood of bins to zero. As with

the edge rotation operation, the contribution of peak finding to

execution time is negligible when compared with the Hough

voting stage of the algorithm.

Our detection scheme has also been extended to allowfor the detection of skew symmetry resulting from weak

perspective projection. Figure 7(d) shows detection results for

a horizontally-skewed skull-and-crossbones poison logo. This

is achieved by modifying the voting operation to vote for all

lines passing through the mid-point of an edge pair, in the

same way as the standard Hough transform. The algorithms

order of computational complexity remains the same when

detecting skew symmetry, as the number of angle divisions in

the Hough accumulator is fixed. In addition to the constant-

factor increase in computational cost, two additional matrices,

the same size as the Hough accumulator, are required. Please

refer to Lei and Wongs work on skew symmetry detection [Lei

and Wong, 1999] for additional information concerning these

extra Hough accumulators.

C. Comparison with Generalized Symmetry

The Generalized Symmetry Transform calculates weights

based on image gradient symmetry, distance between pixels

and image gradient intensity to generate symmetry maps.

The symmetry map is essentially an image of symmetry

contributions made by all pixel pairs in the original image.

Two symmetry maps are produced by the transform, one

containing the magnitude of symmetry contribution and the

other the phase. The contributions of all points are used in thegeneration of the symmetry maps. For a detailed description of

the algorithm, such as the way in which the different weighting

functions are combined, please refer to the seminal paper by

Reisfeld et al [Reisfeld et al., 1995].

In order to obtain an axis of bilateral symmetry from

the symmetry map, thresholding followed by a line search

method, such as Hough transform, is required. Various ad-

hoc techniques can also be applied to the early stages of

Generalized Symmetry to limit the required processing. This

includes ignoring pixels of low gradient magnitude and sam-

pling pixels values at a coarse scale. However, due to the use

of computationally expensive weighting functions and the fact

that every image pixel is processed, the algorithm does not

lend itself easily to real time implementation.

Generalized symmetry and fast symmetry detection results

are presented below. In order to quantify the comparison

between these two methods, synthetic test images are used.

The test images each contains a grey vase-like shape with a

vertical line of symmetry through its center. The location of

this line of bilateral symmetry is used as ground truth in our

tests. If the maximum distance between the symmetry found

using fast symmetry and ground truth is less than or equal

to 1 pixel, the test is successful. As generalized symmetry

does not return a symmetry line, the location with maximum

contributed value, that is, the point in the symmetry map with

maximum isotropic symmetry, is used instead. If this point is

located within 1 pixel of the ground truth symmetry line, the

test is considered a success.

Three test images are used. The first has a dark shape set

against a light background. The remaining two images have

the same vase set against backgrounds with smooth changes in

intensity. The tests are repeated after adding Gaussian noise

to the test images. All the test images are 64x81 pixels insize. Note that the detection scale of generalized symmetry is

governed by the variable .For all three test images, the fast symmetry detector is able

to find the line of symmetry at the exact, ground truth, location.

The generalized symmetry transform is able to find the axis of

symmetry in test image 1, with no noise added. Lowering the

scale parameter produces a corner-detecting behaviour, as seen

in Figure 3(c). With added noise, the generalized Symmetry

algorithm is only able to detect the axis of symmetry with

= 10. For test images with intensity variations in theirbackgrounds, the generalized symmetry algorithm is only

successful for test image 3 with a of 10. As seen in Figure5(d), the line of symmetry is found near the top of the vase.

With the addition of Gaussian noise, the generalized approach

failed for both images, regardless of the scale factor. To save

space and needless repetition, the results of adding Gaussian

noise to test image 3 has been omitted.

The problems Generalized Symmetry Transform have with

varying background intensity stems from its core assumptions.

The transform is designed to favour opposing image gradients,

while rejecting image gradients in the same direction. In high-

level terms, the algorithm assumes either light objects against a

dark background, or dark objects on a light background. With

variations in the background perpendicular to the symmetry

line, this leads to zero contributions being made by pixelpairs across the left and right edges of the vase. This can

be seen in the results for test image 2. The algorithm is

still able to find the correct symmetry in Figure 5 as the

gradient variations only effect pixel pairings that contribute

to horizontal symmetry.

The computational complexity of both symmetry algorithms

are O(n2), with n equal to the number of input pixels.

The number of possible pixel pairs is given byn(n1)

2.

However, the fast symmetry algorithm only operates on edge

pixels, while the generalized symmetry algorithm operates

on all image pixels. The number of edge pixels is generally

much smaller than the total number of pixels in an image.

Additionally, the complexity of computations in the inner loop

of the fast symmetry detector has been drastically reduced

by edge pixel rotation. Hence, the fast symmetry algorithm

require fewer computations than the generalized approach. Our

approach has also incorporated the post-processing stage of

applying Hough transform line detection to the symmetry map.

The calculation of local gradient intensities have been removed

by the use of edge images, which removes pixels with low

image gradient magnitude.

In order to evaluate the performance and suitability of

both algorithms for real time applications, they have been

implemented in C++. Input images of size 80x60 pixels are

used as test data in experiments. The test image set contained


5/26

5

(a) Test Imag e 1 (b) Fast Symmetry

(c) Generalized Symmetry, = 2.5

(d) Generalized Symmetry, = 10, with edge imageoverlayed

(e) Test Image 1 with withGaussian noise added

(f) Fast Symmetry

(g) Generalized Symmetry, = 2.5

(h) Generalized Symmetry, = 10

Fig. 3. Symmetry detection results for Test Image 1. (b)-(d) contain resultsfor the test image. (e) is Test Image 1 with added Gaussian noise. Thenoise has = 0.1, with image intensity defined between 0 and 1. (f)-(h)contain detection results for the noisy image. Bright pixels in the generalizedSymmetry results have high levels of detected symmetry

( a) Test Image 2 (b) Fast Sy mmetry

(c) Generalized Symmetry, = 2.5

(d) Generalized Symmetry, = 10

(e) Test Image 2 with Gaus-sian noise added

(f) Fast Symmetry

(g) Generalized Symmetry, = 2.5

(h) Generalized Symmetry, = 10

Fig. 4. Symmetry detection results for Test Image 2. Note the intensityvariation in the background of (a). (b)-(d) contain results for the test image.(e) is Test Image 2 with added Gaussian noise. The noise has = 0.1, withimage intensity defined between and 1. (f)-(h) contain detection results forthe noisy image


6/26


7/26

7

(a) Line 2

(b) Line 4

Fig. 8. Detection of non-object symmetry lines from Subfigure 7(c). Edgepixels have been dilated for improved visibility and are shown in black. Thered edge pixels are those that voted for the green symmetry line.

7(a), both the symmetry of the forearm and the symmetry

between the forearm and its shadow contribute votes to the

accumulator. However, the bottles total symmetry contribution

is much higher. Subfigure 7(d) shows the detection of skew

symmetry using the aforementioned modified voting proce-

dure.

Subfigure 7(c) displays the detection results for a more

complicated arrangement of objects. The lines are labeled

according to the number of votes they received, with 1

being the symmetry line with the most votes. Notice that

the symmetry lines of all three objects are found. However,

background symmetries are also detected. Line 2 is due to the

a combination of edge pixel noise and symmetry of the long

horizontal shadow. Line 4 is caused by inter-object symmetry,

primarily between the two cups.

Figure 8 shows the edge pixel pairs that voted for the

non-object symmetry lines of Subfigure 7(c). Notice the large

number of edge pixels contributed by the multi-coloured mug

in both cases. This is caused by the use of low canny edge

filtering thresholds which produced many noisy edges due to

the mugs textured surface. We chose not to alter the canny

thresholds, which will remove much of the edge noise, to

fully test the noise robustness of our symmetry detection

approach. Note also that the canny thresholds are kept constant

TABLE II

EXECUTION TIME OF THE FAS T SYMMETRY DETECTION ALGORITHM

Image Image No. of

Number Dimensions Edge Pixels Execution Time (ms)

1 640 X 480 9766 136

2 640 X 480 15187 224

3 640 X 480 9622 153

4 640 X 480 9946 141

5 640 X 480 9497 128

6 640 X 480 9698 145

7 640 X 480 11688 167

8 640 X 480 11061 180

9 640 X 480 12347 196

10 640 X 480 8167 81

11 610 X 458 6978 97

during our experiments. Unless otherwise specified, detection

parameters are not adjusted for different objects or lighting

conditions.

The detection of background symmetry and inter-object

symmetry is unavoidable due to the lack of high-level knowl-edge available to our algorithm. However, the distance thresh-

old Dmin in algorithm 1 can be increased to reject narrowsymmetry, such as line 2 in Subfigure 7(c). The expected

orientation of symmetry lines can also be used to reject

unwanted symmetry, especially when some knowledge of the

scene is available. For example, a humanoid robot trying to

manipulate cups and bottles on a table will only deal with near-

vertical lines of symmetry. As such, the angular range of the

symmetry detector can be constrained accordingly. Limiting

the range of detection angles will also improve detection

speed.

A C++ implementation of Algorithm 1 is used for allexperiments. The experiment platform is a desktop PC with a

Xeon 2.2GHz CPU and 1GB of main memory. No platform-

specific optimizations, such as MMX or SSE2, are used in

the code. Referring to the parameters defined in Algorithm 1,

the hough accumulator has 180 angle divisions (BINS). Thenumber of radius divisions (BINSR) is equivalent to the sizeof the image diagonal in pixel units. Dmax is half the imagewidth and Dmin is set to 5 pixels.

Borrowing from randomized hough transform [Xu and Oja,

1993], the list of edge pixels are sampled before detection. The

sampling occurs as follows. After edge detection, a random

subset of edge pixels are chosen from the edge image. In

our experiments, the subset is one quarter the size of all

edge pixels, meaning only one-in-four edge pixels is kept.

This subset is given as input to fast symmetry detection. We

have found that detection reliability and accuracy degrade

noticeably when the sampling ratio drops below 0.1, that

is, one-in-ten edge pixels. The timing results are shown in

Table II. Note that the execution times include edge filtering

as well as non-maxima suppression peak finding.

The execution times confirm that the amount of computation

increases as the number of edge pixels extracted from the input

image increase. More complicated images require more time as

they tend to generate more edge pixels. The detection time for

640x480 images ranges from 80 to 224 milliseconds. This will


8/26

8

allow for frame rates between 5 to 12 Hz, which is acceptable

for many real time applications. The use of a processing

window or smaller images, a faster PC and processor-specific

optimizations can further improve the speed of the algorithm

to meet more stringent real time requirements. The sampling

ratio of edge pixels can also be adjusted at run time to alter

the detection speed. Frame rates of 20Hz has been achieved

by reducing the input image size to 320x240 pixels.

III. SEGMENTATION USING SYMMETRY

A. Introduction

There are many definitions of Object Segmentation in

Robotics and Computer Vision. For digital images, it can

be seen as an intelligent version of Image Segmentation, a

review of which can be found in [Pal and Pal, 1993]. Here,

we define object segmentation as the task of find sections

of an image that correspond to a 3D object in the real

world. Through object segmentation, a robot can obtain useful

information about its surroundings. For domestic robots, the

ability to quickly and robustly segment man-made objects in

the household or office environment is highly desirable. For

example, a robot designed to clean and tidy desks will need

to locate and segment common objects such as cups, pens and

books.

Object segmentation methodologies differ in their assump-

tions, as well as in their level of prior knowledge. When

models of objects are available, it can be argued that the

system is in fact performing object recognition, by matching

sensor data with pre-built models. The Generalized Hough

Transform [Ballard, 1981] is an example of a model-based

segmentation approach. A predefined parameterized model of

a 2D shape, essentially an object model, is required before the

transform can be applied.In many situations, the a priori object information, such as

shape and colour, may not be available. Also, the generation

of detailed object models can be costly and in many cases,

not fully automated. Hence, a robot that can segment objects

without requiring a prior model is much desired, especially for

use in domestic environments. Returning to the desk cleaning

robot example, in the case where it encounters a cup without

its model, a solution would be to have the robot generate its

own learning data by physically interacting with the cup. In

order to do this, the robot must begin by using a model-free

approach to object segmentation. The ability to detect and

segment objects quickly, ideally in real time, will also greatly

benefit the robots responsiveness and robustness to changing

environments.

Colour, gradients and shape are some common visual cues

used for segmentation. Colour has proved to be useful in

segmenting a variety of entities and is relatively simple to

detect. For example, skin colour filters are widely used in face

recognition and hand tracking applications. However, many

man-made household objects are multi-colour, consisting of

several segments of different colour. For a survey of colour-

based image segmentation techniques, refer to [Skarbek and

Koschan, 1994].

There are several symmetry-based segmentation methods in

existing literature. Methods such as [Gupta et al., 2005] applies

an existing segmentation algorithm, such as the normalized-cut

algorithm, but modifies the affinity matrix using the property

of symmetry. This is a region based segmentation method and

requires the pixel values within the object to be symmetric.

As such, the method can not segment transparent objects or

objects with asymmetric textures.

To overcome this problem, we use a purely edge based

method. Simply identifying all edge pixels that voted forthe symmetry line is not acceptable due to the possibility

of coincidentally matching pairs of edge points. A more

robust method is needed. Before continuing, we must define

object segmentation. Because no prior model or geometric

properties of the object are assumed apart from its symmetry,

a definition is difficult. We define an object segmentation as

the most continuous contour symmetric about the objects line

of symmetry. While the definition is not perfect, it does allow

for the problem to be solved robustly.

The task then becomes finding the most continuous and

symmetric contour in the edge image about a detected sym-

metry line. For real time applications, the proposed algorithm

must have predictable execution times. This criteria rejects

approaches that require initialization and multiple iterations

such as active contours. Our proposed algorithm uses a single

pass Dynamic Programming (DP) approach. While much re-

search has been performed on the use of DP to find contours in

images [Yan and Kassim, 2004], [Lee et al., 2001], [Mortensen

et al., 1992], [Yu and Luo, 2002], they require a human-

selected starting point. For the object outlines being consid-

ered, these methods would require a human user to provide

an initial pair of symmetric pixels on the outline. As a major

goal of object segmentation in robotics is image understanding

without human intervention, we engineered our approach to

requires none. Section III-B describes the pre-processing stepused to remove asymmetric edge pixel pairs, which we call the

Symmetric Edge Pair Transform. Section III-C describes the

dynamic programming segmentation algorithm. Results and

processing times are presented in Section III-D.

B. The Symmetric Edge Pair Transform

We introduce the Symmetric Edge Pair Transform (SEPT)

as a preprocessing step applied prior to dynamic program-

ming segmentation. The edge image is first rotated such that

symmetric edge pairs lie in the same row. The idea behind

the transform is to parameterize a point pair as its distance

of separation and deviation of its midpoint from the objects

symmetry line. The algorithm has also been generalized to

accommodate skew symmetry by using a non-vertical sym-

metry line after rotating the edge pixel pairs. The transform

is described in Algorithm 2.

The weighting function W() in Algorithm 2 is a monoton-ically decreasing function, such that the larger the deviation

(d) of the midpoint from the symmetry line, the lower theweight. That is, the more asymmetric an edge pair is about

the symmetry line, the lower its weight in the resulting

SEPT buffer. In our implementation, the weighting function is

W(d) = 1 d2WND . The variables d and W N D are defined

in algorithm 2. Note that the distance threshold M AXhw can


9/26


10/26

10

Fig. 10. Object Segmentation and Contour Refinement. The object outlinediscovered by back tracking from the maximum value in the score table isshown on the left. Object outline after Contour Refinement is on the right

cycles to form in the same row of the backPtr array. Toprevent these cycles a copy of backPtr is made in Step 1.If the score from horizontal continuity is higher than from

vertical continuity then the higher score is recorded, and

backPtr is updated. Horizontal continuity is given less rewardthan vertical continuity. The reason for this is to reject long

horizontal edge segments, which are technically symmetric

about its mid point. The symmetry detected for these straight

lines very rarely represent actual symmetric objects. Humans

generally consider symmetric objects as those that have sym-metric boundaries along the direction of the mirror line. If the

horizontal continuity reward is too high, it may also lead to

unwanted zigzag patterns in the generated contours.

After filling the score table, the best symmetric contour can

be found by starting at the highest scoring cell, and back

tracking through cells of lower weight. The back tracking

algorithm is described in Algorithm 4. Note that both copies

of back pointers are utilized by the algorithm so that there will

be no purely horizontal segment in the contour. The contour

of the object can be extracted by keeping a list of position

indices {r, c} during the back tracking process. The columnindex c indicates the horizontal distance of the contour fromthe symmetry line. An example of the resulting contour can

be seen superimposed on to SeptBuf in Figure 9(b) and inthe left image of Figure 10.

The contour obtained thus far does not directly correspond

to edge pixels. This is due to the tolerance introduced in the

SEPT preprocessing, which also caused our aforementioned

edge weighting ambiguity. In order to produce a contour that

corresponds to actual edge pixels, a refinement step is taken.

In this step, the same window size used in the SEPT, W N D,is employed to refine the contour. The algorithm produces a

near-symmetry outline by looking for edge pixels near the

symmetric contour within the window. The contour refinement

step is similar to Algorithm 3, substituting the SeptBuf with

Algorithm 3: Finding Continuous Symmetric Contours

with Dynamic Programming

Input: SeptBufOutput: sTab Table of scores, same size as SeptBufbackPtr back pointersParameters:

Himg image heightM AXhw half of the maximum expected width ofsymmetric objects

{Pver , Rver} penalty/reward for vertical continuity{Phor, Rhor} penalty/reward for horizontal continuity

sTab[ ][ ] 0for r 1 to Himg do

Step 1, vertical continuity

for c 1 to MAXhw doif SeptBuf[r][c] is not 1 then

cost SeptBuf[r][c] Rverelse

cost Pver

vScore[c] max

0sTab[r1][c1] + costsTab[r1][c] + costsTab[r1][c+1] + cost

if vScore[c] > 0 thenSet backPtr[r][c] to record which of the 3neighbouring cells is used to produce

vScore[c]

backPtrAux[r][c] backPtr[r][c]

Step 2, horizontal continuity from left to right

prevScore neg. inf.for c 1 to MAXhw do

if SeptBuf[r][c] is not 1 thencost SeptBuf[r][c] Rhor

elsecost Phor

hScore prevScore + cost

if vScore[c] >= hScore thenprevScore vScore[c]

columnPtr celseprevScore hScore

if sTab[r][c] < prevScore thensTab[r][c] prevScore

Set backPtr[r][c] to record position{r, columnP tr}

Step 3, horizontal continuity from right to left

Repeat Step 2, moving right to left in column index


11/26

11

Algorithm 4: Back tracking highest score in the score table

Input: sTab, backPtr, backPtrAuxOutput: {r, c} {Row, column} indices

{r, c} position of MAX(sTab)while sTab[r][c] is not zero do

{r, c} backPtr[r][c]if r did not change, i.e. no vertical position changethen

{r, c} backPtrAux[r][c]

(a) (b)

Fig. 11. Segmentation of a multi-colour object. The purple contour has beenmanually thickened and overlaid on to the edge image. The symmetry line,in yellow, is identical to the symmetry detection results shown in Figure 7(b)

the edge image. Results from contour refinement is shown in

the right image of Figure 10.

Objects with internal symmetries, such as pliers, maybe

difficult to segment accurately using our approach. Both the

inside and outside edges of a set of pliers are symmetric about

the same line. In such cases it maybe more useful to identify

the outer edges for a robot manipulation task. One of the

possible solutions investigated is the use of a weighting thatfavours symmetric edge pairs that harbours more symmetric

edge pairs between its edges. For example, in the case of the

pliers, the pairs of outer handle edges contains in between

them the inner handle edges, so the outer edge pairs are more

heavily weighted. However, this is may not be a satisfactory

solution as it tends to favour widely separated edge pairs.

D. Results

Figure 11 shows the segmentation of a multi-colour object.

This result demonstrates the algorithms ability to segment

objects of non-uniform colour. Note that the edge image is

quite noisy due to texture on the cup surface and on the book.

This noise did not adversely affect the segmentation results. In

Figure 12, all three symmetric objects are segmented using our

approach. Note that, in all of our results, no prior information

such as geometric models, object colour or texture is used. The

only information received by the segmentation algorithm is the

detected symmetry line parameters and the edge image. Due

to shadows and specular reflections, the vertical side edges of

the small, multi-coloured cup are distorted and has very large

gaps. Hence, the more symmetric and continuous elliptical

contour of the cups opening is returned by the segmentation

algorithm. There is a slight distortion in the detected ellipse.

This distortion is caused by large gaps in the outer rim of the

Fig. 12. Object segmentation performed on a scene with multiple objectsusing the results from Figure 7(c). The object outlines have been thickenedand rotated such that their symmetry lines are vertical

TABLE III

EXECUTION TIME OF THE OBJECT SEGMENTATION ALGORITHM

Image Size of Cumulative No. of Execution Time (ms)

No. Score Table Edge Pairs SEPT+DP CR

1 356 x 576 (= 205056) 77983 26 4

2 322 x 482 (= 155204) 142137 25 4

3 322 x 482 (= 155204) 65479 19 3

4 326 x 493 (= 160718) 68970 21 4

5 402 x 476 (= 191352) 67426 36 7

6 382 x 801 (= 305982) 44901 36 8

7 349 x 556 (= 194044) 90104 26 6

8 345 x 546 (= 188370) 133784 28 4

9 402 x 777 (= 312354) 121725 40 8

10 393 x 705 (= 277065) 177077 32 711 383 x 722 (= 276526) 51475 32 6

CR: contour refinement stage

cup in the edge image. This produced a contour that contains

a combination of the inner and outer rim of the cups elliptical

opening.

Table III contains execution times of a C++ implementation

of our object segmentation algorithm. The same computer

described in Section II-D is used for these experiments. The

image numbers are the same as those used in Table II. The test

cases with smaller DP score tables are able to be processed

at 30 frames per second (FPS). Test cases with larger tables

can still be processed at 20FPS. The third column of Table III,

labeled No. of Edge Pairs, is the number of edge pixel pairs

processed by SEPT. In our implementation, the SEPT code that

filled the SeptBuf and the dynamic programming code areplaced within the same loop to improve efficiency. As such,

their combined execution time is shown in the SEPT+DP

column. Looking at Table III, the size of the cumulative score

table appear to be the main factor affecting the execution time.

This agrees with expectations as a score is calculated for each

entry in the table. The maximum expected size of objects is

set to be the width of the image. In practice, the size of the

objects can be restricted to more reasonable bounds, especially


12/26

12

Fig. 13. System Diagram of Symmetry Tracker

considering the use of distance thresholds in our symmetry

detection algorithm. This will further improve execution time.

IV. OBJECT TRACKING USING SYMMETRY

A. Introduction

To represent an object without its prior model, features that

are robust to affine transformation and illumination changes

are needed. Descriptive and noise robust features, such as

SIFT [Lowe, 2004] or MSER [Matas et al., 2002], are difficult

to apply in real time applications due to their high computa-

tional costs, especially when matching against large descriptor

databases. The need for a prebuilt database of features can

also be an issue when dealing with novel objects. In order to

perform tracking in real time, model-free approaches generally

use features that are computationally inexpensive to extract

and match. Huang et al [Huang et al., 2002] used region

templates of similar intensity. Satoh et al [Satoh et al.,

2004] utilized colour histograms. Both approaches can tolerate

occlusions, but are unable to handle shadows and colour

changes caused by variations in lighting. To track objects

under different illumination conditions require features that

do not directly rely on colour or intensity information.

Figure 13 gives an overview of the tracking process. Motion

detection results are used to limit symmetry detection to

areas with movement. The Kalman filter prediction, before the

measurement update, is used to speed up symmetry detection

by limiting the detection angle. The detection results are then

passed to the Kalman filter as measurements. The motion

detection results are refined using the symmetry line estimate

produced by the Kalman filter. This produces a near-symmetricsegmentation of the object. A rotated bounding box is then

computed based on the segmentation.

B. Improving Symmetry Detection for use in Object Tracking

The raw symmetry detection results cannot be used directly

as measurements for tracking. Inter-object symmetry as well

as symmetric portions of the background, like table corners,

can overshadow the symmetry of the object being tracked.

Figure 14(a) is an example where background symmetry lines

may cause problems in tracking. The bottles symmetry line

is weaker, in terms of its Hough vote total, than the orange

symmetry line (line 1). As such, non-objectedge pixels should

(a) Top three symmetry lines (b) Angle limits

Fig. 14. Symmetry Detection for use in Object Tracking.Left: Top three symmetry lines returned by our detector. Lines are numberedaccording to the quantity of votes they received, with line 1 having receivedthe most votes. Notice that the objects symmetry line is not the strongest onein the image.Right: Angle limits (black) imposed on symmetry detection. The angle limitsare generated using the Kalman filter prediction and the prediction covariance

be rejected before applying symmetry detection, to improve

the robustness of tracking. This is achieved by only allowing

edges in the moving portions of an image to cast votes. A

motion mask, generated using the algorithm detailed in Section

IV-C, is used to suppress background edge pixels. By doing

this, the majority of votes will be cast by edge pixel pairs

belonging to the moving object.

The state prediction of the Kalman filter is used to improve

the computational efficiency of symmetry detection. Recall

that the symmetry detector iteratively rotates the edge pixels

to find symmetry lines at different angles. The range of

rotation angles can be limited by using the Kalman filter

prediction and the prediction covariance. Figure 14(b) isan example of such angle limits provided by the Kalman

filter. By limiting the Hough voting angle, the total number

of votes cast is reduced. This greatly improves the execution

time of our symmetry detection algorithm. In our experiments,three standard deviations is added to the symmetry line angle

prediction to generate the angle limits.

C. Block Motion Masking

As seen in Figure 14(a), the amount of background sym-

metry needs to be reduced before applying the symmetry

detector. In order to do this, a binary motion mask is used

to eliminate static portions of video frames. Background

modeling approaches are inappropriate for our application due

to their assumption of a near-static background and consistent

illumination conditions. Also, background modeling is not

suitable for the detection and tracking of transparent and re-

flective objects. Instead, a fast block-based frame differencing

approach is employed to generate the motion masks. We use

the classic two-frame difference [Nagel, 1978].

The colour video frames are first converted to grayscale

images. The absolute difference between time-adjacent images

is calculated. The resulting difference image is then converted

into a block image by spatially grouping pixels into 8x8 blocks.

The choice of block size is arbitrary, and should be determined

based on the smallest scale of motion to be considered by the

tracker. The sum of pixel values in the difference image is

calculated for each 8x8 block. Each blocks sum is compared

against the average value across all blocks. Blocks with a sum


13/26

13

higher than a multiple of the average are classified as moving

parts of a video frame. This multiple constant is determined

experimentally, by starting at a value of 1, and increasing it

until camera noise and small movements can be successfully

ignored. In all our experiments, we use a factor of 1.5.

Algorithm 5: Block Motion Detection

Input: I0, I1 Video frames at time t, t + 1Output: mask Motion MaskParameters: mf Motion threshold

diff |I1 I0|res, sum are images 1

blocksizethe size of diff

sum[][] 0i 0for ii 0 to height of res do

m ii i + blocksizefor increment m until m == i do

j 0for jj 0 to width of res do

n jj j + blocksizefor increment n until n == j do

sum[ii][jj ] sum[ii][jj ] + diff[m][n]

res THRESHOLD(sum, AVERAGE(sum) mf)Median filter res then Dilate resmask res resized by a factor of blocksize

Algorithm 5 details the procedure used to generate the

motion mask. The AVERAGE function returns an average

of input elements. The THRESHOLD(A,b) function returnsa binary image, consisting of 0s and 1s. An output element

is set to 1 if the corresponding element in A is above thethreshold value b. Otherwise, it is set to 0. Median filteringon the block level is used to remove spurious motion blocks

caused by small movements and camera noise. The result, res,is then dilated to ensure that all edge pixels belonging to the

moving object are included in the masked result. The mask

is then produced by resizing the res image by a factor ofblocksize.

D. Motion Mask Refinement

In Figure 15(a) and 15(c), motion masks have been over-

layed on to video frames for illustrative purposes. In actual

operation, the mask is used to suppress static edge pixels

before passing the edge image to the symmetry detector.

Images on the right column of the same figure is produced

by applying a refined motion mask to the source image. The

refined mask is produced through a two step process. Firstly,

the location of each block with motion, b, is reflected acrossthe symmetry line. The reflected location is searched using a

local window. If none of the blocks in the window is classified

as moving, the original block b is re-classified as static. Thisfirst step removes motion that are not symmetric about the

objects symmetry line, which may have been caused by the

Fig. 15. Block Motion MaskingLeft column: Images masked by the unrefined block motion mask.Right column: Images masked using the refined block motion mask. Thesymmetry line estimate from the tracker is shown in red. The refined mask isgenerated based on this symmetry line estimate

end effector, and other moving objects. After the generation

of a near-symmetric mask, the second step attempts to remove

holes and gaps in the mask. This is achieved by looking for

blocks that are surrounded by multiple neighbours that contain

motion. These two steps are very efficient as they only operate

on res, which has fewer pixels than the source image. Therefinement process is a single pass operation.

E. Kalman Filter

A Kalman filter, as described in [Bar-Shalom et al., 2002], is

used to estimate symmetry line parameters. Hough (R, ) in-dex values are used directly as measurements. We use a linear

acceleration motion model. The filter plant and measurement

matrices are shown below.

A =

1 0 1 0 12

00 1 0 1 0 1

20 0 1 0 1 00 0 0 1 0 10 0 0 0 1 00 0 0 0 0 1

H =

1 0 0 0 0 00 1 0 0 0 0

x =

R dRdt

ddt

d2Rdt2

d2dt2

T

Process and measurement noise are chosen empirically.

Measurement and process noise variables are assumed to be

independent. The noise values used for all experiments are as

follows. The R measurement variance is 9 pixels2 and the variance is 9 degrees2. The diagonal elements of the processcovariance matrix are (1, 0.1, 10, 1, 10, 1). The odd elements

are the position, velocity and acceleration covariance ofR, theeven elements are the covariances.

Data association and validation is performed using a vali-

dation gate. The top symmetry lines, in terms of their Hough

votes, are given to the Kalman filters validation gate. Sym-

metry line parameters that generate an error above 9.21 (2-

DOF Chi-square, P = 0.01) are discarded by the gate. If nosymmetry line passes through the gate without exceeding the

Chi-square error threshold, the next state will be estimated

using the state model alone.


14/26

14

To use the tracker in situations where new objects are being

discovered by a robot, it must have an automatic initialization

scheme. The initial state must be set to a value close to the

moving objects symmetry line to ensure convergence. An

automatic initialization method is used to find the objects

initial state. The number of moving blocks returned by the

motion detector is continuously monitored. By looking for

a sharp jump in the detected motion, frames where objectsbegin to move can be found. Symmetry lines detected from the

three time-consecutive frames after an object begins to move

are used to initialize the Kalman filter. Firstly, all possible

data associations across the three frames are generated. In

our experiments, the top three symmetry lines are used as

measurements for each frame. This produced 33 permutations.Each generated data association permutation is used as Kalman

filter measurement sets. The Kalman filter is initialized using

the first measurement in the permutation, and updated using

the second and third. The validation gate error for the updates

are accumulated and logged. After iterating through all 27

permutations, the permutations are ranked according to their

errors. The best permutation, that is, the data association

sequence with minimum error, is used to initialize the Kalman

filter. This automatic initialization procedure is used to start

the tracker for all video sequences used in our experiments,

without any manual intervention.

F. Results

The entire tracking system is implemented using C++, with

no platform specific optimizations. A notebook PC with a

1.73GHz Pentium M processor is used as the test computer.

The video frames are 640x480 pixels in size, and are recorded

at 25 frames per second. All experiments are performed usingthe same tracker parameter values. The Canny edge filter

thresholds are set to 30 and 60, with an aperture size of 3

pixels. The block motion detector uses a motion factor of 1.5.

Borrowing from Randomized Hough Transform [Xu and Oja,

1993], a sampling ratio of 0.6 is used to obtain a random

subset of the edge pixels.

Table IV contains the execution times of the tracking code.

Each sequence, numbered 1 to 10, contains up to 400 video

frames. The code responsible for symmetry detection, block

motion masking, mask refinement and Kalman filtering are

timed independently. The average run time of these code

segments can be found under the Average Time heading of

the table. The column labeled Init contains the time taken

to perform automatic initialization as discussed at the end of

Section IV-E. The average frame rates obtained are shown in

the column labeled as FPS. Note that the tracker is able to

perform at above 40 frames per second for many sequences.

The tracking system generates a rotated bounding box

around the object being tracked. The bounding box is oriented

such that two of its edges are parallel with the objects

symmetry line. The size of the box is determined by the refined

motion mask. Figure 16 shows two example bounding boxes,

and the motion masks from which they are generated.

Frame sequences of the tracking results can be found

at the end of this paper. Videos of tracking results can be

TABLE IV

OBJECT TRACKER EXECUTION TIMES AND FRAME RATES

# Average Time (ms) Init FPS

Sym Motion Refine Kalman (ms) (Hz)

1 37.87 4.84 0.86 0.09 10.41 2 2.91

2 16.76 4.76 0.75 0.06 9.74 44.77

3 17.95 4.85 0.85 0.04 10.69 4 2.22

4 18.31 4.74 0.75 0.04 11.90 4 1.96

5 33.69 4.87 0.87 0.05 11.38 2 5.33

6 20.84 4.94 0.85 0.04 13.18 3 7.50

7 35.29 5.01 0.87 0.13 11.32 2 4.22

8 34.48 4.94 0.79 0.14 11.14 2 4.79

9 18.19 4.91 0.79 0.06 11.83 4 1.75

10 27.01 4.89 0.82 0.06 12.50 30.51

Fig. 16. Generation of rotated bounding boxes from refined motion masksLeft column: Symmetry-refined motion masksRight column: Bounding boxes in green, symmetry lines in red

downloaded from:

www.ecse.monash.edu.au/centres/irrc/li_iro2006.php

V. ANALYSIS OF BILATERAL SYMMETRY AS A TRACKING

FEATURE

To evaluate the accuracy of symmetry as a tracking feature

under various background conditions, detected symmetry lines

are compared to the ground truth symmetry line of an object,

as it appears in the camera image. In many cases, such ground

truth data is unobtainable due to the lack of constraints inthe objects trajectory. Also, manually extracting the objects

symmetry line in long tracking sequences is not practical due

to the large number of video frames and the high likelihood

of introducing human errors. The following approach is used

to overcome these problems.

A. Finding Ground Truth

A custom-built pendulum, as seen in Figure 17, is used

to provide predictable oscillatory object motion along with

measurable ground truth. Our test object, a red plastic squeeze

bottle, is affixed to the end of the pendulum. The bottles

symmetry line is mechanically aligned with the pendulum arm


15/26

15

by drilling through the center of the bottle and then passing

the pendulum arm through it. Blue markers are placed above

and below the object on the pendulum arm. The centroids of

the coloured markers are used to determine the ground truth

polar parameters of the objects symmetry line. The markers

are found automatically using colour segmentation. An exam-

ple ground truth symmetry line, extracted automatically by

segmenting the coloured markers, is shown in Figure 18.

1 Degree-of-

Freedom Pivot

Carbon Fiber Tube

Ground TruthMarkers

Fig. 17. Pendulum hardware used to generate ground truth data

Fig. 18. Ground truth symmetry axis, drawn in black. The centroids of themarkers are shown as red and green circles

The test sequences each consists of 1000 video frames

captured with the pendulum swinging in front of different

backgrounds. The following four backgrounds are used. A

white background is used as control experiment to obtain

errors of our detector under ideal background conditions.

However, specular reflections and shadows are still quite

prominent at some object poses. In order to test the robustness

of the detector to missing edges due to similar foreground

and background colours, red distracters are added to the back-

ground in the second tracking sequence. To increase input edge

noise, random edge noise is added to the background of thethird tracking sequence. The fourth sequence consists of both

red distracters and edge noise in the pendulums background.

Example frames taken from these sequences can be found at

the bottom of the error plots located in Appendix II.

Another advantage of using the pendulum to actuate our test

object is the predictability of the objects pose over time. The

accuracy of our ground truth data is analyzed using a damped

pendulum model. As the range of angles during our experi-

ments are relatively small, the small-angle approximation of

sin() is applied. The damped pendulum described byEquations 1 and 2 is used as our model. Note that R(t) is afunction of (t). The damping is modelled as an exponential,with parameter governing the rate of decay. MATLABsnlinfit function is used to perform a non-linear regression,

which simultaneously estimates A, , t0, B, L and L0. The(t) and R(t) regressions are performed separately.

(t) = Aet{cos((t t0))} + B (1)

R(t) = L(t) + L0 (2)

The absolute means of the regression residuals of ground

truth and our symmetry detector are listed in Table V. For

all four sequences, and in both R and , the marker-based

ground truth provides a better fit than our symmetry detector.This is especially true for residuals, where the ground truthresults are at least three times more accurate than our detected

symmetry. These results clearly demonstrate the validity of our

automatic marked-based method to determine ground truth.

TABLE V

MEAN OF ABSOLUTE REGRESSION RESIDUALS

Background Parameter Ground Truth Symmetry

WhiteR (pixels) 0.39 0.46

(radians) 0.0014 0.0041

RedR (pixels) 0.76 1.29

(radians) 0.0021 0.0081

Edge R (pixels) 1.82 2.34 (radians) 0.0025 0.0063

MixedR (pixels) 0.51 3.06

(radians) 0.0014 0.0188

B. Quantitative Comparison of Detected Symmetry and

Ground Truth

To compare detected symmetry against ground truth, we

detect symmetry for each frame in each sequence. As the

motivation behind the comparison is to evaluate our detected

symmetry as a tracking feature, not to evaluate the tracker

itself, we do not use a Kalman filter or any other temporal


16/26

16

(a) Edge Pixels (b) Symmetry Line

Fig. 19. Example symmetry detection result from pendulum sequence withedge noise in the background. The motion-masked edge pixels show thatmany non-object edge pixels are passed to our fast symmetry detector. Thesymmetry line returned by our detector is shown in blue

estimation technique. However, we do use the block motion

masking method described in Subsection IV-C on these track-

ing sequences prior to detection. This simulates the kind of

edge data our detector will receive during a tracking operation.

The symmetry detection error for each frame is found by

taking difference between the polar parameters of our detected

symmetry line and the ground truth data. The error results areshown as line plots in Appendix II. Due to the length of the

experiments, only the detection errors of the first 400 frames

are shown. The mean-subtracted ground truth data is plotted

using a different vertical axis as reference. The detection error

is shown in blue. Histograms of detection errors are included

after the error line plots.

Figure 23 contains the symmetry detection error with the

test object placed against a white background. Both the radius

and errors are very small. This suggests that our symmetrydetection scheme will provide accurate measurements to a

tracker when the target object is placed against a plain back-

ground. The jumps in error magnitude tend to occur duringthe zero crossings of the overlayed ground truth plot. This

increase in error of detection appears to be correlated with

object motion, as the ground truth zero crossings occur at the

middle of the swing where the object is moving the fastest.

The application of a temporal filter, such as our Kalman filter

tracker, will further improve the error characteristics.

Figure 24 shows that a background littered with distracters,

similar in colour to the test object, has little effect on detection

error. The detection errors are larger when compared against

the white background sequence. This is due to a reduction in

the quality of detected edges caused by a lower intensity dif-

ference between object and background pixels. This reduction

in pixel contrast also adversely affect block motion masking

as seen in the colour-extracted blob of Figure 21(b). Similar to

the white background sequence, increases in error magnitude

occur at the middle of the pendulum swing.

The results in Figure 25 indicate that edge noise in the

background affect detection in a similar way as red distracters.

Figure 19 contain an example of the edge data given to

our symmetry detector during this tracking sequence. Notice

that the large amount of background edge noise has minimal

impact on detection performance. Unlike previous plots, the

detection error magnitude is not noticeably higher when the

object is moving the fastest. This maybe due to an improved

intensity contrast between the object and the background

TABLE VI

SYMMETRY DETECTION ERROR STATISTICS

Background Parameter Abs Mean STD Abs Median

WhiteR (pixels) 1.1256 1.1350 0.5675

(radians) 0.0057 0.0048 0.0043

RedR (pixels) 2.0550 1.7955 2.8732

(radians) 0.0134 0.0110 0.0129

EdgeR (pixels) 1.2118 1.0529 0.8765

(radians) 0.0078 0.0053 0.0079

MixedR (pixels) 3.4147 1.6186 7.4565

(radians) 0.0192 0.0099 0.0375

caused by the edge-laden piece of white paper.

The mixed background sequence plot, Figure 26, contains

the only large detection errors found in our pendulum ex-

periments. As with the red distracter and white background

sequences, these large errors occur during periods of fast

object motion. Given the sparseness of these error peaks,

temporal filtering should be able to correct them and a tracker

should successfully ignore them. The latter is confirmed by

our successful real world tracking experiments described in

Section IV-F.

Table VI provides a statistical summary of the symmetry

detection errors. The columns, from left to right are the mean

of absolute errors, standard deviation of errors and the median

of absolute errors. Looking at the statistics, it seems that

missing edges, due to distracters of similar colour to the

target object, cause larger detection errors than having noisy

edges in the background. This result is in agreement with

expectation as the hough transform voting method is inherently

robust to noisy edge data. On qualitative inspection of the

detection results, the relatively large detection errors of the

mixed sequence appear to be caused by missing object edgesduring periods of fast object motion.

C. Qualitative Feature Comparison: Colour Blob Centroid

This section provides a qualitative comparison of symmetry

with another commonly used tracking feature, a Colour Blob

Centroid. The centroid errors should not be compared against

the symmetry errors directly, as the comparison is inherently

biased and unfair since our test object is symmetric. Colour

tracking does not limit the target objects shape and the target

can in fact be deformable. Symmetry can be seen as a visually

orthogonal cue to colour and each has its own advantages and

disadvantages depending on the target application.

We use a Hue-Saturation-Value (HSV) colour filter in our

experiments. We implemented the filter using the OpenCV

library [Intel, 2006] histogram and our own C++ code. A

two-dimensional histogram is used to represent Hue and

Saturation. The Value component of HSV is used to reject

pixels of extreme darkness or brightness, which have noisy hue

characteristics. The hue and saturation is discretized into 45

and 8 histogram bins respectively. An example HSV histogram

is shown in Subfigure 20(e).

The colour blob centroid is obtained as follows. We apply

the HSV filter to the input image to obtain a histogram back

projection as described by [Swain and Ballard, 1991]. The


17/26

17

(a) Input (b) Back Projection

(c) Input (d) Back Projection

(e) Hue-Saturation Histogram

Fig. 20. HSV histogram back projection. In the back projection images,dark pixels have high probability of being the objects colour, as shown inthe Hue-Saturation histogram

back projection image approximately represent the probability

that a pixel in the input image belongs to the target object

based on its colour. Example back projection results are shown

in Figure 20, where darker pixels represent higher object

probability. The objects colour histogram used to generate

the back projection is built offline and optimized manually

prior to any centroid detection. A binary blob is produced by

thresholding the back projection image. The largest contiguous

blob is kept as the objects blob, the rest are discarded.

Example binary blobs are shown as yellow pixels in the images

of Figure 21. The colour blob centroid, drawn as a black dot

in the same images, is the center of mass (zeroth moment) of

the yellow binary blob.

Ideally, the best error measure would be to find the dis-

tance between the detected centroid and the ground truth

centroid of the object. However, the latter will require manual

segmentation of the object for all frames, including those

where the object is over distracters of similar colour. As weare only using the errors in a qualitative manner, our error

measure is simply the minimum distance between the objects

centroid and the ground truth symmetry line. As the object

is symmetric, its actual centroid is located somewhere along

its symmetry line, so zero distance means perfect centroid

detection. While not ideal, this error metric should provide

some indication of feature detection accuracy and reliability.

Line plots of the centroid detection errors are located

in Appendix III. The mean-subtracted ground truth radius

is shown along side the error data as a dotted black line

to provide a visual reference of the pendulum motion. As

mentioned earlier, the zero crossings of the dotted ground truth

(a) White Background (b) Red Distractors in Background

Fig. 21. HSV Blob Extraction and Centroid Detection against differentbackgrounds. The extracted blob is in yellow and the centroid is shown as ablack dot

curve coincides with the middle of the pendulum swing, where

the object is moving at maximum speed.

Figure 31 and 33 suggests that centroid detection is very

accurate when no distracters of similar colour to the target are

present. The magnitude of the average error is around 1 to

2 pixels for both cases. By inspection, the average centroid

error in Figure 32 is 4 to 5 times larger than the white

background sequence. This agrees with expectations as the

background is filled with distracters of similar colour to the

test object that will distort the shape of the objects binary blob.

An example of this distortion can be found in Figure 21(b).

The lopsided errors of Figure 34 confirms the detrimental

effects of red distracters, with the error magnitude climbing

much higher when the object is swinging in front of the

red portion of the background. From these results, it is clear

that both feature modalities have their own weaknesses and

strengths. Each feature should only be applied to tracking after

careful consideration of the expected object and background

characteristics.

V I. CONCLUSION

We have qualitatively and quantitatively analysed the use of

bilateral symmetry as an object feature. We show that bilateral

symmetry can be detected in real time under noisy conditions

using our Hough-based fast symmetry detector. Applying the

fast symmetry detector to object segmentation, a dynamic pro-

gramming based approach is able to segment multi-coloured

objects without using any prior shape or colour information.

Real time object tracking using bilateral symmetry has also

been achieved. Our Kalman filter tracker has been successfully

tested on 10 video sequences, which include situations where

the target object is transparent or partially occluded. Thetracker can also handle large changes in object scale and

orientation. Quantitative analysis of symmetry as a tracking

feature shows minimal increase of detection error in the

presence of similarly coloured distracters and background edge

noise. A qualitative comparison with HSV colour centroid

suggests that bilateral symmetry has the level of accuracy

and reliability required of a tracking feature. Overall, bilateral

symmetry appears to be a useful and surprisingly robust object

feature for robotic applications, especially those where robots

have to deal with novel symmetric objects.


18/26


19/26

19

APPENDIX II

SYMMETRY ERROR PLOTS

0 50 100 150 200 250 300 350 400-4

-3

-2

-1

0

1

2

3

4

RadiusError(pixels)

Frame Number

White Background: Fast Symmetry Radius Error

0 50 100 150 200 250 300 350 400

-200

-150

-100

-50

0

50

100

150

200

M

ean-SubtractedGroundTruthRadius(pixels)

Symmetry Radius ErrorGround Truth

0 50 100 150 200 250 300 350 400-0.03

-0.02

-0.01

0

0.01

0.02

0.03

sE

rror(radians)

Frame Number

White Background: Fast Symmetry s Error

0 50 100 150 200 250 300 350 400-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

Mean-SubtractedGroundTruths

(radians)

Symmetry s ErrorGround Truth

Fig. 23. White Background: Symmetry Error Plots. Sample video frames shown at the bottom


20/26

20

0 50 100 150 200 250 300 350 400-8

-6

-4

-2

0

2

4

6

8

RadiusError(pixels)

Frame Number

Background with Red Distractors: Fast Symmetry Radius Error

0 50 100 150 200 250 300 350 400

-200

-150

-100

-50

0

50

100

150

200

Mean-SubtractedGroundTruthRadius(pixels)


0 50 100 150 200 250 300 350 400

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

sE

rror(radians)

Frame Number

Background with Red Distractors: Fast Symmetry s Error

0 50 100 150 200 250 300 350 400

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

Mean-SubtractedGroundTruth

s(

radians)


Fig. 24. Background with Red Distractors: Symmetry Error Plots. Sample video frames shown at the bottom


21/26

21

0 50 100 150 200 250 300 350 400-6

-4

-2

0

2

4

6

RadiusError(pixels)

Frame Number

Background with Edge Noise: Fast Symmetry Radius Error

0 50 100 150 200 250 300 350 400

-200

-150

-100

-50

0

50

100

150

200



0 50 100 150 200 250 300 350 400

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

sE

rror(radians)

Frame Number

Background with Edge Noise: Fast Symmetry s Error

0 50 100 150 200 250 300 350 400

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25


s(

radians)


Fig. 25. Background with Edge Noise: Symmetry Error Plots. Sample video frames shown at the bottom


22/26

22

0 50 100 150 200 250 300 350 400-80

-60

-40

-20

0

20

40

60

80

RadiusError(pixels)

Frame Number

Mixed Background: Fast Symmetry Radius Error

0 50 100 150 200 250 300 350 400

-200

-150

-100

-50

0

50

100

150

200



0 50 100 150 200 250 300 350 400

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

sE

rror(radians)

Frame Number

Mixed Background: Fast Symmetry s Error

0 50 100 150 200 250 300 350 400

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25


s(

radians)


Fig. 26. Mixed Background: Symmetry Error Plots. Sample video frames shown at the bottom


23/26

23

0

10

20

30

40

50

60

70

80

90

00

(a) Symmetry Line Radius Error (pixels)

0

10

20

30

40

50

60

70

80

90

(b) Symmetry Line Error (radians)

Fig. 27. White Background: Histograms of Symmetry Errors

0

50

100

150

00

50

00

(a) Symmetry Radius (pixels)

0

50

100

150

00

50

(b) Symmetry (radians)

Fig. 28. Background with Red Distractors: Histograms of Symmetry Errors

0

10

20

30

40

50

60

70

80

90


0

50

100

150


Fig. 29. Background with Edge Noise: Histograms of Symmetry Errors

0

100

00

00

00

00

00


0

50

100

150

00

50

00

50

00

50


Fig. 30. Mixed Background: Histograms of Symmetry Errors


24/26

24

APPENDIX II I

COLOUR BLO B CENTROID ERROR PLOTS

0 50 100 150 200 250 300 350 400-4

-3

-2

-1

0

1

2

3

4

CentroidDisplacementErrors(pixels)

Frame Number

White Background: HSV Tracking Centroid Displacement Error

0 50 100 150 200 250 300 350 400

-200

-150

-100

-50

0

50

100

150

200

Mean-SubtractedGroundTruthRadius(pixels

)

Symmetry ErrorGround Truth

Fig. 31. White Background: Colour Centroid Error Plot

0 50 100 150 200 250 300 350 400-20

-15

-10

-5

0

5

10

15

20

CentroidD

isplacementErrors(pixels)

Frame Number

Background with Red Distractors: HSV Tracking Centroid Displacement Error

0 50 100 150 200 250 300 350 400

-200

-150

-100

-50

0

50

100

150

200

Mean-Subtracte

dGroundTruthRadius(pixels)


Fig. 32. Background with Red Distractors: Colour Centroid Error Plot


25/26

25

0 50 100 150 200 250 300 350 400-3

-2

-1

0

1

2

3

CentroidDisplacementErrors(pixels)

Frame Number

Background with Edge Noise: HSV Tracking Centroid Displacement Error

0 50 100 150 200 250 300 350 400

-200

-150

-100

-50

0

50

100

150

200

M

ean-SubtractedGroundTruthRadius(pixels)


Fig. 33. Background with Edge Noise: Colour Centroid Error Plot

0 50 100 150 200 250 300 350 400-20

-15

-10

-5

0

5

10

15

20

CentroidD

isplacementErrors(pixels)

Frame Number

Mixed Background: HSV Tracking Centroid Displacement Error

0 50 100 150 200 250 300 350 400

-200

-150

-100

-50

0

50

100

150

200



Fig. 34. Mixed Background: Colour Centroid Error Plot


26/26

26

ACKNOWLEDGEMENTS

The authors would like to thank Monash University, the

Intelligent Robotics Research Centre and PIMCE ARC Centre

for their financial support. The first author would also like

to thank Konrad Schindler of the Institute of Vision Systems

Engineering at Monash University for his suggestions and

comments regarding the colour and symmetry feature compar-

ison. We also thank the anonymous reviewers for their helpfulcomments.

REFERENCES

[Ballard, 1981] Ballard, D. H. (1981). Generalizing the hough transform todetect arbitrary shapes. Pattern Recognition, 13(2):111122.

[Bar-Shalom et al., 2002] Bar-Shalom, Y., Kirubarajan, T., and Li, X.-R.(2002). Estimation with Applications to Tracking and Navigation. JohnWiley & Sons, Inc.

[Canny, 1986] Canny, J. (1986). A computational approach to edge detec-tion. IEEE Transactions on Pattern Analysis and Machine Intelligence,8(6):679698.

[Cornelius and Loy, 2006] Cornelius, H. and Loy, G. (2006). Detectingbilateral symmetry in perspective. page 191, Los Alamitos, CA, USA.

IEEE Computer Society.[Duda and Hart, 1972] Duda, R. O. and Hart, P. E. (1972). Use of the houghtransformation to detect lines and curves in pictures. Communications ofthe ACM, 15(1):1115.

[Gupta et al., 2005] Gupta, A., Prasad, V. S. N., and Davis, L. S. (2005).Extracting regions of syemmetry. In IEEE International Conference on

Image Processing (ICIP), volume 3, pages 1336, Genova.

[Huang et al., 2002] Huang, Y., Huang, T. S., and Niemann, H. (2002). Aregion-based method for model-free object tracking. In InternationalConference on Pattern Recognition (ICPR), pages 592595, Quebec,Canada.

[Intel, 2006] Intel (2006). Opencv: Open source computer vision library.Online. http://www.intel.com/technology/computing/opencv/.

[Lee et al., 2001] Lee, B., Yan, J.-Y., and Zhuang, T.-G. (2001). A dynamicprogramming based algorithm for optimal edge detection in medicalimages. In Proceedings of the International Workshop on Medical Imagingand Augmented Reality, pages 193198, Hong Kong, China.

[Lei and Wong, 1999] Lei, Y. and Wong, K. C. (1999). Detection andlocalisation of reflectional and rotational symmetry under weak perspectiveprojection. Pattern Recognition, 32(2):167180.

[Levitt, 1984] Levitt, T. S. (1984). Domain independent object descriptionand decomposition. In AAAI, pages 207211.

[Li and Kleeman, 2006] Li, W. H. and Kleeman, L. (2006). Real time objecttracking using reflectional symmetry and motion. IEEE/RSJ InternationalConference on Intelligent Robots and Systems (IROS).

[Li et al., 2005] Li, W. H., Zhang, A., and Kleeman, L. (2005). Fast globalreflectional symmetry detection for robotic grasping and visual tracking.In Matthews, M. M., editor, Proceedings of Australasian Conference on

Robotics and Automation.

[Li et al., 2006] Li, W. H., Zhang, A. M., and Kleeman, L. (2006). Realtime detection and segmentation of reflectionally symmetric objects indigital images. IEEE/RSJ International Conference on Intelligent Robotsand Systems (IROS).

[Lowe, 2004] Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91.

[Loy and Eklundh, 2006] Loy, G. and Eklundh, J.-O. (2006). Detectingsymmetry and symmetric constellations of features. In Proceedings of

European Conference on Computer Vision (ECCV), Graz, Austria.

[Loy and Zelinsky, 2003] Loy, G. and Zelinsky, A. (2003). Fast radialsymmetry for detecting points of interest. IEEE Transactions on Pattern

Analysis and Machine Intelligence, 25(8):959973.[Matas et al., 2002] Matas, J., Chum, O., Urban, M., and Pajdla, T. (2002).

Robust wide baseline stereo from maximally stable extremal regions.Proceedings of the British Machine Vision Conference, 1:384393.

[Mortensen et al., 1992] Mortensen, E., Morse, B., Barrett, W., and Udupa,J. (1992). Adaptive boundary detection using live-wire two-dimensionaldynamic programming. In IEEE Proceedings of Computers in Cardiology,pages 635638, Durham, North Carolina.

[Nagel, 1978] Nagel, H. H. (1978). Formation of an object concept by anal-ysis of systematic time variations in the optically perceptible environment.

Computer Graphics Image Processing, 7(2):149194.

[Ogawa, 1991] Ogawa, H. (1991). Symmetry analysis of line drawings usingthe hough transform. Pattern Recognition Letters, 12(1):912.

[Pal and Pal, 1993] Pal, N. R. and Pal, S. K. (1993). A review on imagesegmentation techniques. Pattern Recognition, 26(9):12771294.

[Reisfeld et al., 1995] Reisfeld, D., Wolfson, H., and Yeshurun, Y. (1995).Context-free attentional operators: the generalized symmetry transform.

Internationl Journal of Computer Vision, 14(2):119130.[Satoh et al., 2004] Satoh, Y., Okatani, T., and Deguchi, K. (2004). A color-

based tracking by kalman particle filter. In International Conference onPattern Recognition (ICPR), pages 50250

Documents

Bilateral Symmetry Detection for Real Time Robotics Applications