12
PhD. Thesis Abstract 3D Object Learning and Recognition System Based on Planar Views Author: Carmen de Trazegnies Otero Supervisor: Cristina Urdiales Garc´ ıa Dpto. de Tecnolog´ ıa Electr´ onica, E.T.S.I. Telecomunicaci´ on, 29071-M´ alaga, Spain March, 2004 1 Introduction 3D object recognition is a central topic of research in computer vision, artificial intelligence and robotics. The ability of a robot to perform complex tasks is limited by its ability to analyze visual information from its environment. Some of the features a recognition system, meant to work in real environment, should have are: i) dealing with varied types of shapes; ii) being invariant to scale, rotation mild distortions and noise; iii) being able to detect and learn new objects on-line. However, 3D object recognition methods are usually complex, because they involve object segmentation and representation, feature extraction and matching of an usually large amount of data. Hence, most of them either work off-line or deal with a limited and previously learnt database. View based recognition problems can be roughly divided into representing a single view of the 3D object; and combining information extracted from several views to represent a unique 3D object. Single views can be represented either by processing the whole view bitmap or by its extracted relevant features. A direct com- parison or matching of whole bitmaps is not computationally feasible and, therefore, most appearance based methods work extracting a set of features from each view. Some methods extract the relevant information from bitmaps by means of vectorial decomposition methods, such us Principal Components Analysis (PCA) [Sirovich and Everson 1992]. Thus, every view becomes a point in an N-dimensional eigen-space and objects become trajectories in such a space [Murase and Nayar 1995]. Any object can be recognized by deciding which curve it belongs to. The main drawback of these approaches is that they are quite sensitive against shadows and illumination changes [Startchik et al. 1998]. Also, they are not valid for curves intersecting in the eigen-space, so different objects included in the database should not present similar views [Murase and Nayar 1995]. Some feature based view representation techniques rely on extracting the relevant points of a subset of canonical views of the object [Rothwell et al. 1995]. However, the number and positions of relevant points typically change, not only for different object views, but also for transformed or distorted versions of the same view. Hence, the recognition stage may become very complex. In this thesis, we propose a new view based 3D object recognition method. It relies on the sequentiality of the view capture process. Thus, not only every single view, but also the transitions between consecutive views are compared to stored models. In order to build a model from an object template, a set of uniformly distributed views around the object are studied. For every single view, the outer shape of the object is extracted. Shape has been reported to be robust to transformations, noise and illumination changes [Startchik et al. 1998]. However, real image segmentation is complex and segmented shapes may be distorted and noisy. This thesis does not focus on segmentation; hence, all objects used for the present study are either real objects captured against a simple background or synthetic objects. We represent shapes by using a new curvature function (CF) which has proven to be resistant against noise and transformations [de Trazegnies et al. 2003c]. To achieve resistance against rotation, we work in the Fourier domain with

PhD. Thesis Abstract - UMAwebpersonal.uma.es/~CTO/papers/trazegnies_abstract.pdf · PhD. Thesis Abstract 3D Object Learning and Recognition System Based on Planar Views Author: Carmen

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: PhD. Thesis Abstract - UMAwebpersonal.uma.es/~CTO/papers/trazegnies_abstract.pdf · PhD. Thesis Abstract 3D Object Learning and Recognition System Based on Planar Views Author: Carmen

PhD. Thesis Abstract3D Object Learning and Recognition System Based on Planar

Views

Author: Carmen de Trazegnies OteroSupervisor: Cristina Urdiales Garcıa

Dpto. de Tecnologıa Electronica, E.T.S.I. Telecomunicacion, 29071-Malaga, Spain

March, 2004

1 Introduction

3D object recognition is a central topic of research in computer vision, artificial intelligence and robotics.The ability of a robot to perform complex tasks is limited by its ability to analyze visual information fromits environment. Some of the features a recognition system, meant to work in real environment, should haveare: i) dealing with varied types of shapes; ii) being invariant to scale, rotation mild distortions and noise;iii) being able to detect and learn new objects on-line. However, 3D object recognition methods are usuallycomplex, because they involve object segmentation and representation, feature extraction and matching of anusually large amount of data. Hence, most of them either work off-line or deal with a limited and previouslylearnt database.

View based recognition problems can be roughly divided into representing a single view of the 3D object;and combining information extracted from several views to represent a unique 3D object. Single views can berepresented either by processing the whole view bitmap or by its extracted relevant features. A direct com-parison or matching of whole bitmaps is not computationally feasible and, therefore, most appearance basedmethods work extracting a set of features from each view. Some methods extract the relevant informationfrom bitmaps by means of vectorial decomposition methods, such us Principal Components Analysis (PCA)[Sirovich and Everson 1992]. Thus, every view becomes a point in an N-dimensional eigen-space and objectsbecome trajectories in such a space [Murase and Nayar 1995]. Any object can be recognized by decidingwhich curve it belongs to. The main drawback of these approaches is that they are quite sensitive againstshadows and illumination changes [Startchik et al. 1998]. Also, they are not valid for curves intersecting inthe eigen-space, so different objects included in the database should not present similar views [Murase andNayar 1995]. Some feature based view representation techniques rely on extracting the relevant points of asubset of canonical views of the object [Rothwell et al. 1995]. However, the number and positions of relevantpoints typically change, not only for different object views, but also for transformed or distorted versions ofthe same view. Hence, the recognition stage may become very complex.

In this thesis, we propose a new view based 3D object recognition method. It relies on the sequentialityof the view capture process. Thus, not only every single view, but also the transitions between consecutiveviews are compared to stored models. In order to build a model from an object template, a set of uniformlydistributed views around the object are studied. For every single view, the outer shape of the objectis extracted. Shape has been reported to be robust to transformations, noise and illumination changes[Startchik et al. 1998]. However, real image segmentation is complex and segmented shapes may be distortedand noisy. This thesis does not focus on segmentation; hence, all objects used for the present study areeither real objects captured against a simple background or synthetic objects. We represent shapes byusing a new curvature function (CF) which has proven to be resistant against noise and transformations[de Trazegnies et al. 2003c]. To achieve resistance against rotation, we work in the Fourier domain with

Page 2: PhD. Thesis Abstract - UMAwebpersonal.uma.es/~CTO/papers/trazegnies_abstract.pdf · PhD. Thesis Abstract 3D Object Learning and Recognition System Based on Planar Views Author: Carmen

the fast Fourier Transforms of the CFs (CFFFTs). CFFFTs are very redundant, hence, we extract theirPrincipal Components to reduce them to the minimum possible number of significant elements. Thus, theinput view is finally represented by the resulting components vector. This process is described in section2. 3D objects are represented by a sequence of views. Feature vectors related to close points of view areusually similar and they can be clustered into classes, as described in section 3. Thus, a 3D object canbe described by a class layout. Finally, we encode class layouts into Hidden Markov Models (HMM) asproposed in section 4. When a view from a new object is captured, we evaluate the matching probability ofthe input sequence with all available HMMs corresponding to stored templates to determine if the object isalready known. Otherwise, the object is learnt as a new template. Experiments and results of the trainingand recognition processes are presented in section 6. The main advantages of the proposed method are: i)stored templates present a low data volume; ii) objects presenting similar shapes are grouped together; iii)the algorithm implicitly estimates the pose of the object; and iv) new templates are learnt and included inthe database in an unsupervised way. Conclusions and future work are presented in section 7.

2 Single View Representation

Shape based object recognition methods rely on capturing the salient shape descriptors which shouldbe: i) invariant to geometric transformations; ii) robust to noise; iii) meaningful to matching algorithms;iv) robust to occlusions; and v) computationally feasible. Also, it is desirable to represent a shape with asfew features as possible. Contours, reported to be good shape descriptors, are often represented by theircurvature functions (CF). However, most CF calculation techniques implicitly filter contours at a fixed cutfrequency [Agam and Dinstein 1997] [Rosin 1992]. Consequently, relevant curvature information might belost after filtering noise. In [de Trazegnies et al. 2003c] we proposed a new adaptively estimated curvaturefunction (AECF) which filters noise in an adaptive way depending on the natural scale of the curve to avoidboth noise and distortions.

The proposed AECF is very resistant against noise, scale, translations and mild deformations (Fig. 1).It has been successfully compared to previous CFs in [de Trazegnies et al. 2003c], [de Trazegnies et al. 2004]and [Urdiales et al. 2003]. To prove its efficiency as shape descriptor, Fig. 1 shows how an adaptivelyfiltered version of the original shape can be recovered from our AECF. Also, similar shapes at differentscales present very similar curvature functions if such functions are interpolated to a fixed length. However,rotations provoke shifting in the functions. To avoid this problem, a shape is going to be represented inthe Fourier domain by the FFT of its curvature function (CFFFT) because the modulus of the CFFFT isinvariant against rotation. As an aditional advantage, symmetric shapes (e.g. a left and right hand) presentthe same ||CFFFT||.

Figure 1: a) Noisy original image; b) AECF of (a); and c) Contour recovered from (b)

Low frequency terms of the ||CFFFT|| are typically more significant than the high frequency ones becausethe information about the main shape features relies on intrinsic distances on the order of low divisors ofthe CF length. This fact suggests that the intrinsic dimension of the set that contains all possible CFscorresponding to shapes is lower than the length of the CF. It can be proven that the set of all possible||CFFFT||s conforms a manifold [Klassen et al. 2004]. It may be possible to use a vectorial subspace torepresent its elements with a minimal error, as long as the manifold curvature is negligible and their intrinsicdimensions are similar.

Page 3: PhD. Thesis Abstract - UMAwebpersonal.uma.es/~CTO/papers/trazegnies_abstract.pdf · PhD. Thesis Abstract 3D Object Learning and Recognition System Based on Planar Views Author: Carmen

We propose to choose a random set of varied shapes and to apply Principal Component Analysis (PCA)to their ||CFFFT||s. Their first P components provide a base of a P -dimensional subspace. In our tests, forP equal to 10 the representation error for any element of the original set is always lower than 5%. Besides,due to the manifold topology, the base may be also valid for elements not belonging to that set. Severaltests have been performed to empirically support that assertion [de Trazegnies et al. 2003b]. Hence, we canconclude that, provided that the number of shapes included in that the initial set is high and the shapes arevery different from one another, the so obtained PCA-basis can be used to represent objects not includedin the initial set with a minimum loss of information. In this thesis, the basis of the ||CFFFT||s vectorialsubspace was extracted form a set of 27 traffic signal icons, which includes from simple geometrical shapesto complex symbols [de Trazegnies et al. 2003b].

3 Multiple View Representation

When a camera moves slowly around a 3D object, object views tend to present a certain continuity. Thus,shapes corresponding to close points of view usually present very similar FVs. The variation of the capturedvectors with respect to the point of view characterizes a 3D object. FVs are quite resistant against scale, sowe assume that points of view differing only in their radial coordinate are represented by the same FV. Thus,an object can be represented by a 2-dimensional map of FVs. In order to represent objects in a compactway, FVs are clustered into classes. We use a modification of the Mode Analysis clustering algorithm withthe Tanimoto distance to split the map into classes. The resulting number of classes depends on the clusterradius.

Figs. 2.b-d show several class layouts for the object in Fig. 2.a for different clustering radii. Each classis printed in a different gray tone. In this particular case, the bilateral symmetry of the proposed objectcan be observed in all class layouts. The class layout in Fig. 2.b corresponds to a high cluster radius,therefore, most views are clustered into the same class. Thus, differences between planar shapes can not beappreciated. Fig. 2.d presents the class layout for a low cluster radius. Small variations in the observationangle provoke significant classification changes. Thus, too many classes are generated and consequently avery large number of views is required to learn a complete class layout. Further recognition processes wouldinvolve a high computational cost. Thus, the cluster radius depends on both the allowed similarity amongdifferent views and the desired maximal angular distance between consecutive viewpoints. Hence, the radiusis heuristically fixed to a value of 0.05. Fig. 2.c presents the class layout for this cluster radius. It can beobserved that there are some consecutive areas belonging to the same class. Nevertheless when there is aclear difference between views, they are classified into different classes. In this case a reduced number ofclasses is required to completely represent a new object.

Figure 2: a) Input object; and object cluster layout for b) radius=0.05; c) radius=0.1; and d) radius=0.2

The so developed structure can be seen as an aspect graph [Cyr and Kimia 2004]. Every node of theaspect graph is here codified by means of a cluster centroid, and represents the set of views included intothis cluster. Two nodes are considered adjacent if their represented regions on the layout map are adjacent.Every node is connected to adjacent nodes by means of Markov Model transition probabilities, as describedin section 4.

Page 4: PhD. Thesis Abstract - UMAwebpersonal.uma.es/~CTO/papers/trazegnies_abstract.pdf · PhD. Thesis Abstract 3D Object Learning and Recognition System Based on Planar Views Author: Carmen

4 3D Object Modelling

After a set of 3D objects represented by the proposed method is available, a new object could be recognizedby simply matching its 2D class layout with the stored ones. However, this approach is not valid for severalreasons: i) the pose of an input object is not known a priori, so matching may become computationallyexpensive; ii) we usually can not capture all the views of an input object and therefore only partial matchescan be made; and iii) noise and distortions affecting some views may provoke false matchings or a recognitionfailure. To avoid all these problems, our system should capture as few object views as possible to achieveenough evidence about the true nature of a given object in an incremental way. Distorted and noisy viewsmight decrease the probability of being a given object, but it should be possible to recover from these errors.This can be achieved by taking advantage from the sequentiality of views.

Markov Models (MM) have been successfully applied to image recognition when a clear sequentialitycan be derived from the problem definition [Natarajan et al. 2001]. Particularly, Hidden Markov Models(HMMs) have been successfully applied to planar shape recognition based on contours [Draper et al. 2001][Zhang and Lu 2004]. The main difference of our approach to these ones is that we use HMM to recognize3D objects instead of planar shapes.

In the present work, the recognition of a 3D object is based on the sequentiality of a set of views of theobserved object. New views are added to the original observed sequence if the similarity between the observedobject and the stored templates is not high enough to classify it with the desired degree of confidence. Ifevery view could be classified as belonging to a unique class, the recognition problem could be solved usingclassical Markov Models. However, a given view of an object could be similar to several prototypes and it isnot always possible to include it in an unique class. Hence, we use a recognition method based on HMMs.

4.1 HMM definition

The choice of appropriate parameters to define the HMM is an important task for the reliability of therecognition process. In our case we define the main HMM parameters as follows:

• A set of hidden states Hp = {Hp,1,Hp,2, ...,Hp,i, ...,Hp,M} for every template p. The hidden statesare classes of FVs, obtained by clustering the observed FVs as described in the previous section.

• The usual initial probability distribution Πp = (π1, π2, ..., πi, ..., πM ) of template p for a HMM.

• The usual transition matrix Ap representing the probability of transition between each two consecutivehidden states in the view sequence for template p.

• An observation probability Bp(V q) to relate the hidden states to the input sequence of views. If thedistance between the qth observed view (V q) and class i prototype in template p (Hp,i) is bigger thanthe defined cluster radius, Bp

i (V q) is equal to 0. Non zero elements adopt the same value so that theysatisfy the equation:

M∑n=1

Bpi (V q) = 1 (1)

The set of model parameters must be adjusted for every template p in the database. The initial probabilityvector Πp for each known object is assigned with the a-priori probabilities of finding the different views ofsuch an object at the first observed view. The transition matrix Ap is calculated by means of the Baum-Welchalgorithm [Rabiner 1989]. The Baum-Welch algorithm, derived from the expectation-maximization (EM)algorithm, is a local optimization method. Hence, the choice of the initial system parameters determines:i) the number of iterations needed to converge to a stable solution and ii) the tendency to converge to anoptimal or to a second order maximum. We initialize the transition matrix Ap by evaluating the number oftransitions between different views at the object layout map.

4.2 Query comparison

When an object x is detected, its first view V 1x is captured and an observation probability distribution,

B(V 1x ), relating the observation to every view prototype is assigned to V 1

x . Then, the probability P (V 1x |p) of

the object of belonging to any of the available stored templates, represented by their HMMs, is evaluated.

Page 5: PhD. Thesis Abstract - UMAwebpersonal.uma.es/~CTO/papers/trazegnies_abstract.pdf · PhD. Thesis Abstract 3D Object Learning and Recognition System Based on Planar Views Author: Carmen

If the object is not yet recognized, new views are captured and evaluated until the object, observed as asequence of captured views {V 1

x , V 2x , ..., V q

x }, is univocally recognized as one of the stored templates. Whenthe sequence length increases, so does the complexity of the probability calculation. Hence, this calculationis usually performed in an iterative way to keep a bounded computational load. In our case, the calculationis performed by using the Forward-Backward Procedure [Rabiner 1989].

If even after having evaluated the probability of a view sequence corresponding to a complete turn aroundthe object, it is still unidentified, the object is stored as a new template. Then, a new HMM is trained andincluded into the template database.

5 Virtual training

Even though the proposed recognition system is designed to work with an open database of real objects,it is interesting to have an initial set of virtual objects available for testing purposes. Virtual objects can berotated translated or scaled at will and hence all view points are available and a complete object model canbe constructed.

(Fig. 3) presents a set of virtual 3D objects used for tests i different conditions. This database includesvaried objects which present similar views.

Figure 3: 3D Object database in training order

Initially, the system has no knowledge about the database. Then, these objects are fed to the system inthe training order shown in Fig. 3. Any time a new object can not be recognized as similar to a previoustemplate, a new model is acquired. An example of this process is shown in Fig. 4, where glass 3 is presentedto the system for the first time. The first view of glass 3 is similar to one of the views of bottle 2. Hence,it presents an initial probability of being a bottle approximately equal to 0.7. However, as soon as secondview is captured, this probability drops to zero (Fig. 4.b) and the input object is stored as a new template.After its HMM is generated, glass 3 is fed again to the system to test if it has been correctly learnt. It canbe observed in Fig. 4.c that the probability of the input object of being glass 3 is increased to 0.9 afteronly two views. Besides, figure 4.c also shows the probability of the input object of being a bottle. It canbe observed that now the initial probability of being bottle 2 is lower than 0.1 because the system alreadyknows what glass 3 is. The rest of the objects are subsequently presented to the system and classified eitheras a new object or as variations of a stored template.

It is important to note that, since some of the objects were chosen to present similar shapes, not everyoneof them generates a new template. For instance, box 1 and the TV set are both recognized as cubes, whosetemplate was the first model learnt, thus, they do not generate models by themselves. This fact suggeststhat the database composition depends not only on the objects set chosen for training but also on the orderthese objects are presented to the system. The following experiment is aimed to test the system learningbehaviour when several similar objects are studied. For this test, four cylindrical objects have been chosen.They will be referred as cylinders a, b, c and d, following the labels on Fig. 5

The system is forced to keep all the already learnt models of objects in Fig. 3, except those correspondingto cylinders 1 and 2, that have been erased for this test. Cylinders in Fig. 5 are presented to the systemsuccessively. Every time the system trains a new model, all four cylinders are presented several times formaleatory orientations in order to register a recognition rate for every one with respect to the models. This

Page 6: PhD. Thesis Abstract - UMAwebpersonal.uma.es/~CTO/papers/trazegnies_abstract.pdf · PhD. Thesis Abstract 3D Object Learning and Recognition System Based on Planar Views Author: Carmen

Figure 4: Glass 3 training sequence a) consecutive views of bottle 2 and glass 3; b) probability of glass 3of being bottle 2 before training glass 3; and c) probability of glass 3 of being a bottle 2 and glass 3 aftertraining glass 3

Figure 5: Cylindric objects set

means that during this test and exceptionally, the automatic learning process will not be allowed untilrecognition rates are registered. Results of this experiments are shown in Figs. 6 and 7

If the first acquired model corresponds to cylinder b, which has an intermediate shape, all four cylinderspresent relatively high recognition rates as similar to b (Fig. 6.a). The lowest recognition rate in this casecorresponds to cylinder d, therefore, the next allowed training process will learn a model for this object. Itcan be observed (Fig. 6.b) that recognition rates for cylinders a and b remain at the same values, while thetotal recognition rate of cylinder c drops from 86% to 74%. This is because, as the system already knowsmore cylinder models, it begins to doubt between templates b and d and, in some trials, does not decide foran unique recognition result. If a new model for cylinder c is acquired, the complete set of cylinders can beclassified (Fig. 6.c) with only three templates.

The test can be repeated with a different training order. As it can be observed in Fig. 7, the fact thatthis first acquired model is that of cylinder a, makes that a complete classification of the cylinder set canonly be achieved with four templates.

It is important to note that an increasing number of models of similar objects generates the need of newmodels of the same class, thus, providing the system with higher sensibility to distinguish among them. Thisbehaviour is actually similar to the human learning process, who develops his skills to distinguish amongsutil differences if he has a deep knowledge in the area, while he tends to simplify in a unknown environment.

Page 7: PhD. Thesis Abstract - UMAwebpersonal.uma.es/~CTO/papers/trazegnies_abstract.pdf · PhD. Thesis Abstract 3D Object Learning and Recognition System Based on Planar Views Author: Carmen

Figure 6: Recognition rates for cylindrical objects in training order

Figure 7: Recognition rates for cylindrical objects in training order

6 3D Object Recognition

In order to prove the validity of the proposed recognition system against distortion, noise or segmentationerror, two different groups of experiments have been performed: i) recognition trials with distorted objects;and ii) recognition trials with real objects. Both of them are different aspects of the same recognitionproblem: how to deal with the variability of different samples of the same class of objects. This variabilitycan be artificially induced, or being the result of a poor real image segmentation process. It is also interestingto develop the possibility of classifying unknown objects as similar to stored templates, as in the cylindersexample of section 5, thus saving storing memory and computational load. The proposed system is speciallysuitable to deal with this kind of problems, since it does not put a high weight on one to one shape similarity,but takes advantage on the views sequentiality to evaluate overall 3D similarity. For the following tests thesystem database has been extended with household objects as well as some chairs (Fig. 8). Hence, thedatabase includes simple as well as complex objects.

Figure 8: Database of additional objects in training order

6.1 Recognition Trials with Distorted Objects

One of the challenges of the object recognition and classification is to deal with distorted and noisyimages. The feature vector chosen for view representation has proven to be resistant against transformationsand noise, as well as to mild distortions. Also, Hidden Markov Models are specially suitable for dealing withcorrupted views.

Fig. 9 presents a 3D glass and three distorted versions of it. The view clustering maps for the fourobjects are presented beneath them. The system already has a template for glass in Fig. 9.a. It has nodistortion or noise, hence its cluster map shows a strong cylindric symmetry. Glass in Fig. 9 is corruptedwith noise, and so its cluster map shows as well a random noise. Though glass in Fig. 9.c is linearly deformed,

Page 8: PhD. Thesis Abstract - UMAwebpersonal.uma.es/~CTO/papers/trazegnies_abstract.pdf · PhD. Thesis Abstract 3D Object Learning and Recognition System Based on Planar Views Author: Carmen

Figure 9: a) Glass template; b-d)Distorted glasses and their corresponding cluster layout maps; and e)probability of each distorted object of being glass 3 of Fig. 3

its cluster map does not show any change, thus showing a high resistance of the FV based representationagainst this deformation. Glass in Fig. 9.d represents a glass with partial occlusion. It can be observed thatthis occlusion affects the cluster map only at some views. Fig. 9.e shows the probability of distorted objectsof being glass a over a sequence of 5 views. In most of the cases the second view already provides a correctrecognition result, however, all 5 views have been represented for comparison purposes. When a distortedview is not recognizable, as it happens with glass d, the probability of being glass a decreases. Nevertheless,subsequent captures allow a recovery of the probability until the object is correctly recognized.

Figure 10: Recognition steps for a) distorted version of object in Fig. 8.c; and b) occluded version of objectin Fig. 8.c

Further examples are presented in Fig. 10. In Fig. 10.a, a distorted version of chair in Fig. 8.c ispresented to the system. When the first view is analyzed, there are four candidate matches: templates c, g,h and i presented in Fig. 8. It can be observed that, at this step, the probability of the correct match isnot the highest one. Nevertheless, only the most similar templates are proposed as matches. Further viewsaccumulate recognition probability of the correct match, which is achieved after having studied three views.

Fig. 10.b presents an example were the chair in Fig. 8.c has been partially occluded by a second object.Thus, acquired views include the original chair plus a non desired protuberance. In this case, the overallsimilarity between the deformed shape and template prototypes is enough to point out the correct templateas the most probable one already at the first view. Nevertheless, as it can be observed in Fig. 10.b, thesystem considers as well templates of chairs with arms. This is reasonable because the added protuberancecould be confused with a chair arm. As in the previous cases, rather than one to one similarity betweenviews, evidence accumulation is the key to achieve a correct result [de Trazegnies et al. 2003a] [de Trazegnieset al. 2003b].

Page 9: PhD. Thesis Abstract - UMAwebpersonal.uma.es/~CTO/papers/trazegnies_abstract.pdf · PhD. Thesis Abstract 3D Object Learning and Recognition System Based on Planar Views Author: Carmen

6.2 Recognition trials with real objects

After the virtual training stage was complete, we manually captured several sequences of views fromreal objects, similar to the trained prototypes, in an indoor environment. As this work does not focus onsegmentation methods, we have obtained real shapes by a simple off-line background substraction process.It is important to note that, even under these conditions real shapes are affected by capture noise andsegmentation errors. Also, the selected points of view for real image acquisition are manually fixed, soimages are captured at different distances and they lack angular precision. Points of view are neither equallyspaced. Fig. 11 shows some real views used for the examples in this section.

Figure 11: Real sequences of input images: a) TIPPEX bottle; b) tennis ball tube; c) cup 1; d) cup 2; e)chair 1; and f) chair 2

Fig. 12 shows a comparison between he probability evolution for two real objects and for the respectivelymost similar objects in the database. In order to achieve comparison conditions, virtual object resultscorrespond to camera orientations as close as possible to real ones. In Fig. 12.a, a TIPPEX bottle ispresented to the system and is recognized as bottle 1 of Fig. 3. Initially there are significant differencesbetween the probability of bottle 1 and the TIPPEX bottle of being bottle 1. This occurs because the firstviews of both bottles clearly show differences between their bottle necks. However, after three steps, it isconcluded that they belong to the same class. In this case, the smaller size of the TIPPEX bottle is notconsidered as a factor to distinguish it from bottle 1, because the system has been designed to dismiss scalechanges.

Figure 12: Recognition steps for a) real bottle in Fig. 11.a; b) real in Fig. 11.b

A similar experiment is presented in Fig. 12.b. The tennis ball tube in Fig. 11.b is compared with acylinder. It can be appreciated that there are significant changes from each real capture to the next. Also,the fourth capture is out of order in the sequence to test if the system can recover from these errors. In this

Page 10: PhD. Thesis Abstract - UMAwebpersonal.uma.es/~CTO/papers/trazegnies_abstract.pdf · PhD. Thesis Abstract 3D Object Learning and Recognition System Based on Planar Views Author: Carmen

case, it is difficult for the system to make a choice until the fifth view. This occurs because cylinder viewspresent artificial straight projection, which is different from the real perspective projection. Nevertheless,the evolution of the views is significant enough to finally conclude it belongs to the cylinder class.

Figs. 13 and 14 show recognition trials of real objects in Fig. 11c-f. These figures present the probabilityevolution of all templates considered as candidates by the system at every sequence step. The cup in Fig.11.c is quite similar to that in Fig. 8.a. As it can be observed in Fig. 13.a, its first view presents a cylindricalshape with a lateral deformation. This shape could belong to the cup in Fig. 8.a but also to a deformedversion of cylinder 1 in Fig. 3. The second view presents a shape that could also be interpreted either as acup or a cylinder. Nevertheless, the system classifies it correctly as belonging to template of cup a. It mustbe noted that even if every single view can be related to several objects, the transition probability betweensequential views can decide the result of the recognition process.

Figure 13: Recognition steps for a) real cup in Fig. 11.c; b) real cup in Fig. 11.d

Figure 14: Recognition steps for a) real chair in Fig. 11.e; b) real chair in Fig. 11.f

In Fig. 13.b another real cup is presented to the system. In this case, the first two views are similar toboth cups in Fig.8.a and b. The object is recognized as cup b only after the third view is analyzed, becausein this view its peculiar handle is clearly visible. It can be observed that the cup handle is a distinguishingfeature in the recognition of cups a or b, as it should be in an intuitive definition of a cup.

The real chair in Fig. 11.e is presented for recognition (Fig. 14.a). It is similar mainly to templates c fand g in Fig. 8. All of them have similar legs and similarly shaped back. It can be observed that already atthe second view, where the legs are clearly visible, template c has a high overall similarity with the proposedone, is pointed out as unique result.

Finally, Fig. 14.b presents the recognition trial for real chair in Fig. 11.f. It is important to note thatin this case the segmentation process produced significant deformations. After 2 observed views, templated appears to be the most probable, because the real chair leg is similar to the back of this template. After

Page 11: PhD. Thesis Abstract - UMAwebpersonal.uma.es/~CTO/papers/trazegnies_abstract.pdf · PhD. Thesis Abstract 3D Object Learning and Recognition System Based on Planar Views Author: Carmen

processing the third view, the recognition result is template h the only one having similar seat, back, legand arms than the observed chair.

Since real input objects have not exact virtual templates, results can not be considered as an exactmatchs. However, it must be noted that results are always objects of the same class than the observed onesand would also be considered as similar from human perception. Therefore, as commented in section 5,the system does not need to start a new template training for every single object. It can take advantagefrom the overall shape similarity between stored templates and observed objects [de Trazegnies et al. 2003b][de Trazegnies et al. 2003a].

7 Conclusions and Future Work

In this thesis, a 3D object recognition algorithm has been presented. The system relies on representingdifferent views of a given object by means of a reduced feature vector extracted from the curvature of itscontour. Given a sequence of views represented by a sequence of the forementioned vectors, an object canbe recognized by using Hidden Markov Models. The system has been tested for a variety of 3D objectsboth artificial and real. Tests proved that similar objects were clustered together despite mild distortions,deformations, slight segmentation errors, and contour noise. It must be noted that if two objects presenta very similar shape, they are clustered together as well. It is important to note that the system is notdesigned to work with a pre-learnt database. Instead, it can learn objects on line.

The main advantages of the proposed method are: i) it represents 3D objects with a reduced datavolume; ii) input objects can be recognized in real time; iii) objects which appear to be similar to humanperception are grouped together, in spite of slight differences among them; iv) the pose of the object isimplicitly estimated by the same algorithm; and v) new models are learnt and included in the database inan unsupervised way.

Future work will focus on implementing a gaze control technique in order to recognize input objects asefficiently as possible. This task will rely on probabilistic analysis of the HMM associated to each givenobject. Also, new clustering techniques are going to be analysed so that a given object is always classifiedin a unique cluster despite the number of objects.

References

G. Agam and I. Dinstein. Geometric separation of partially overlapping nonrigid objects applied to automaticchromosome classification. IEEE Trans. on Pattern Analysis and Machine Intelligence, 19(11):1212–1222, 1997.

C. M. Cyr and B. B. Kimia. A similarity based Aspect-Graph approach to 3D object recognition. Int. Journal ofComputer Vision, 57(1):5–22, 2004.

C. de Trazegnies, J. Bandera, C. Urdiales, and F. Sandoval. A real 3D object recognition algorithm based on virtualtraining. In IASTED Conference on Signal Processing, Pattern Recognition and Applications, (SPPRA 2003),pages 342–347, Rodas, Grecia, July 2003a.

C. de Trazegnies, C. Urdiales, A. Bandera, and F. Sandoval. 3D object recognition based on curvature informationof planar views. Pattern Recognition, 36(11):2571–2584, 2003b.

C. de Trazegnies, C. Urdiales, A. Bandera, and F. Sandoval. A Hidden Markov Model object recognition techniquefor incomplete and disorted corner sequences. Image and Vision Computing, 21(10):879–889, 2003c.

C. de Trazegnies, C. Urdiales, and F. Sandoval. Curvature based image recognition using Hidden Markov Models.Electronics Letters, 40(20):1258–1260, 2004.

B. A. Draper, U. Ahlrichs, and D. Paulus. Adapting object recognition across domains: A demonstration. LectureNotes in Computer Science, 2095:256–267, 2001.

E. Klassen, A. Srivastava, W. Mio, and S. H. Joshi. Analysis of planar shapes using geodesic paths on shape spaces.IEEE Trans. on Patt. Anal. and Mach. Intel., 26(3):372–383, 2004.

Page 12: PhD. Thesis Abstract - UMAwebpersonal.uma.es/~CTO/papers/trazegnies_abstract.pdf · PhD. Thesis Abstract 3D Object Learning and Recognition System Based on Planar Views Author: Carmen

H. Murase and S. K. Nayar. Visual learning and recognition of 3D objects from appearance. Int. J. of ComputerVision, 14:5–24, 1995.

P. Natarajan, Z. Lu, R. Schwartz, I. Bazzi, and J. Makhoul. Multilingual machine printed OCR. InternationalJournal of Pattern Recognition and Artificial Intelligence, 15(1):43–63, 2001.

L. R. Rabiner. A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings ofthe IEEE, 77(2):257–286, 1989.

P. L. Rosin. Representing curves at their natural scales. Pattern Recognition, 25(11):1315–1325, 1992.

C. A. Rothwell, A. Zisserman, D. A. Forsyth, and J. L. Mundy. Planar object recognition using projective shaperepresentation. International Journal of Computer Vision, 16(1):57–99, 1995.

L. Sirovich and R. Everson. Analysis and management of large scientific databases. Int. J. of SupercomputingApplications, 6(1):50–68, 1992.

S. Startchik, R. Milanse, and T. Pun. Projective and illumination invariant representation of disjoint shapes. InProc. of the Fifth European Conference on Computer Vision (ECCV ’98), page 264, Freiburg, Alemania, 1998.

C. Urdiales, C. de Trazegnies, A. Bandera, and F. Sandoval. Corner detection based on adaptively filtered curvaturefunction. Electronic Letters, 39(5):426–428, 2003.

D. Zhang and G. Lu. Review of shape representation and description techniques. Pattern Recognition, 37(1):1–19,2004.