R. Jain Computer Analysis of Scenes with Curved Objects.pdf

PROCEEDINGS OF THE IEEE, VOL. 67, NO. 5 , MAY 1979 805

Computer Analysis of Scenes with Curved Objects

M E S H JAIN AND J . K. AGGARWAL, FELLOW, IEEE

Absrmct-Most research efforts in scene analysis have concentrated on the analysis of Mock-world scene& Having developed a good under- stinding of this limited world of computer vision, -hers are now trying to make computers see curved objects also. This paper presents an overview of the techniques developed for segmentation, teptecrenta- tion, and recognition of curved objects in twedimensi~nal imlges and in tfueedimensional scenes. The possible future directions of research arealsodiscussed

c 1. INTRODUCTION OMPUTER vision systems are finding an increasing number of applications in diverse fields of science and tech- nology. A general computer vision or scene analysis

system should be capable of recognizing objects and the three- dimensional relationships between them in a given scene. In most cases, a scene analysis system is required to recognize three-dimensional objects and their three-dimensional spatial relationships on the basis of a twedimensional representation of the scene, called an image or a picture. In general, a scene analysis system should have at least the following capabilities: 1) image processing (or preprocessing) and segmentation; 2) object recognition; and 3) spatial analysis.

From a given image of the scene the system has to extract the relevant information and discard the irrelevant information. Various operations performed for accomplishing this come under image processing. Segmentation is the process of partitioning an image into meaningful parts such that all points belonging to a part have some common property or can be represented using a mathematical or logical predicate. Object recognition in scene analysis is the process of assigning a name to a picture part on the basis of its similarity with prestored models of objects. Spatial analysis is the process of establish- ing the two-dimensional relationships of the image parts or the three-dimensional relationships of the objects in a scene.

Most scene analysis research until now has concentrated on the study of block-world scenes, especially scenes containing only polyhedral objects. This was justified as it appeared to be a hard problem to design systems for the analysis of real world scenes. Thus before attacking a completely unknown problem, it was considered to be a good practice to understand some of the problem solving processes involved in scene analysis by gathering experience through experimentation with a simplified problem. Many systems were designed for understanding scenes containing polyhedral blocks and some of them were fairly successful. In fact, efforts by Roberts [52] , Guzman [ 221, Mackworth [31], Huffman [27], Waltz [ 581, and others have produced a deep understanding of scenes containing polyhedral blocks. In the early seventies there was a sufficiently rich collection of experience in this limited

Manuscript received May 8, 1978; revised August 7, 1978. This research was supported by the National Science Foundation under Grant ENG-74-04986.

versity of Texas, Austin, TX 78712. The authors are with the Department of Electrical Engineering, Uni-

world of computer vision to provide incentive for the researchers to develop systems to deal with more complex domains.

The next natural step in the scene analysis research after the scenes containing polyhedral objects is the analysis of scenes containing curved objects from their images. Some efforts have been made in this direction; however, so far no exciting results, comparable to early results of the work on polyhedra, have been obtained. It appears that the representation of polyhedral objects was easy in comparison to the representation of curved objects. In other words, in working with scenes containing polyhedral blocks it was not difficult to devise a good representation in terms of edges, vertices, and surfaces of the objects for the scene and then to analyze the scene. The representation of curved objects poses many problems, and although several methods for representing curved objects have been proposed, the search for a powerful representation is still continuing. Contrary to earlier expectations, most of the useful results of the polyhedral domain cannot be directly extended to the curved object domain.

Another direction followed for the analysis of scenes containing curved objects is to use depth information. Thus analysis is carried out, not on the basis of a two-dimensional projection of the three-dimensional scene, but on the basis of three-dimensional (two-dimensional image and depth) information about the scene. Some interesting results are available in this area [21-[41, [ 181, [44], [45], [55]. Tech- niques also have been proposed for obtaining three-dimensional shape information using the two-dimensional image [ 91, [ 251, [26], [30]. The concept of a 2.5-dimensional image is being developed [ 3 2 I .

In this paper, an overview of the state of the art for the analysis of curved objects is presented. Our aim is to indicate the important trends in this area, not to present a detailed or exhaustive survey of the area. Since it was expected that an understanding of the scene analysis techniques for polyhedra would be helpful in the curved object domain, in Section I1 we present the main results for scenes containing polyhedral objects. In Section I11 we give techniques developed for the analysis of a scene containing curved objects from its image. In Section III-A, methods for segmentation of object edges are discussed. Analysis of scenes containing polyhedral objects is simplified by the use of junction properties. In Section III-B, we discuss the problems arising due to curved surfaces, such as ambiguities in the junction properties. In Section IIIC, we discuss methods for finding three-dimensional shape information from image intensities. The techniques used for object recognition are given in Section III-D. Segmentation and recognition of objects in three-dimensions are also receiving attention from researchers in computer vision and compose the subject of Section IV. The method of object representation employed by a vision system plays a very important role in the success of the system. Many methods have been proposed for representing a three-dimen-

0018-9219/79/0S00-080~$00.75 0 1979 IEEE

806 PROCEEDINGS OF THE IEEE, VOL. 67, NO. 5, MAY 1979

sional object such that its reliable recognition from its twe dimensional image will be possible. The problem of representation of three-dimensional objects is discussed in Section V. Finally, we indicate some important areas for further research in Section VI.

II. POLYHEDRAL BLOCKS WORLD Roberts [52], in his pioneering paper, considered scene

analysis to comprise three main processes. 1) The input process produces a line drawing from a

photograph. 2) The construction process produces a three-dimensional

object list from the line drawing. 3) A display process produces a two-dimensional projec-

tion of the object from any point of view. Using three basic block models (a cube, a wedge, and a hex-

agonal prism), his program could represent any complex polyhedra in terms of these basic blocks (constituent parts). In fact, Roberts designed the first complete scene analysis system and his paper has been one of the most influential papers on machine perception.

Guzman’s [221 SEE program demonstrated that it is possible to separate the objects (which may be occluded and hence not completely visible) in a scene without any knowledge about these objects. The basic idea demonstrated by Guzman was to make global use of information collected locally at each vertex. Though Guzman’s SEE was quite successful, it relied on heuristic techniques. Huffman [271 and Clowes [ 151 , working independently, provided a solid theoretic foundation for the analysis similar to that of Guz- man. Huffman considered impossible objects (those objects which cannot exist physically) and argued that images of impossible objects can be useful in giving insight into the constraints of grammatical rules associated with the language of images. The incompatibilities among the various portions of images of impossible objects are a novel way of testing the image analysis procedures. The works of Huffman and Clowes resulted in the beginning of concentration of attention on junctions in a given image and the study of the nature of the junctions. Several common junctions which occur in images of polyhedral blocks are shown in Fig. 1.

Waltz’s work [58], though related to junction labeling, gave new directions to the research in this area. Waltz considered polyhedral objects with shadows. He showed that out of a large number of possible configurations of junctions, the physical world severely constrains the way lines and vertices can fit together in line drawings. He showed further that deciding if a particular line in a drawing is a shadow, crack, obscuring edge, or internal seam can be done in a way anal- ogous to solving a set of algebraic equations.

Among these and many other systems developed for the analysis of scenes containing polyhedral objects (not discussed here), only Falk [ 201 and Roberts [ 521 tried to cope with the vagaries of real picture data.

The research in the area of polyhedral objects has resulted in a good understanding of noisefree scenes containing polyhedral objects. It was expected that many ideas from this domain would be applicable to more complex scenes. In the following sections, however, it will be found that the techniques developed for polyhedral blocks can not be easily applied to curved objects.

0) (k) Fig. 1. Common junction types in images of polyhedral blocks.

111. ANALYSIS OF Two-DIMENSIONAL IMAGES

A . Segmentation of Object Edges An object in a picture may be represented using either the

surfaces forming it or using the borders of such surfaces. Sur- faces are usually represented using regions. A region is a set of connected pixels with certain similar picture properties. There are many approaches to segmentation (for a survey of research in this area, see [ 591 and [ 5 1 ] ). The borders or the edges are sets of pixels where some picture property changes. There ex- ists a wealth of literature on edge finding in pictures (see [40] and [ 171 for a review of edge detection). We shall not consider methods for region growing or edge detection here.

Most scene analysis systems use edges for representing o b jects in scenes. In polyhedral objects, edges are always straight and hence are easily represented using their end points and slopes. With curved objects representation of edges becomes a nontrivial problem. It is usually desirable to represent the edges after breaking them into meaningful pieces, say straight lines or circular arcs. This has two clear advantages: 1) representation becomes easy and compact; and 2) description is simplified, allowing more efficient matching at later stages.

There are many approaches to curve approximation [ 231, [39], [41], [46]-[49], [54], [56]. We consider here two curve approximation schemes used in the research on curved objects.

Let $ be the angle which the tangent to the curve makes with a fixed direction, and s be the distance along the arc from the beginning. If we plot $ as a function of s, then straight-line sections of the curve give rise to horizontal segments in the plot and circular sections give rise to straight- line segments in $-s plot. This transformation is very efficient

JAIN AND AGGARWAL: SCENES WITH CURVED OBJECTS

C

807

t I

k u /-- Fig. 2 . A segmentation of the edges of a curved object using the tech-

nique of Martin and Agganval [ 391.

in capturing the features of a contour. The most important feature of the $-s curve is that it reduces the problem of fitting circular arcs in the picture to the simpler problem of fitting straight line segments to the $-s plot. Turner [56] used best fit lines to compute the curvature, orientation, and angle parameters of segments. After the initial segmentation of a curve, Turner [56] used one more pass over the segmented curve to resegment the curve in an attempt to make the overall segmentation consistent.

McKee and Aggarwal [41] observed that if a chain code representation of a curve is modified so that its graph against arc length is continuous, then circular arcs of the image result in straight graph lines of slope proportional to the curvature of the arc. Martin and Aggarwal [ 391 used an equivalent method where the total angle subtended since the starting point is graphed against arc length. When plotting the graph of the code versus arc length, care is taken to rectify distortion due to the rectilinear nature of the digitization grid. The effect of the choice of the starting point on the line fitting process is eliminated. The resulting graph is smoothed by averaging each code value over a nine point window. A standard mean- square-error technique is used to fit the straight lines. A result of such segmentation is shown in Fig. 2.

Both methods, Turner [ 561 and Martin and Agganval [39], use smoothing of G-s curves by arbitrarily selecting seven and nine point windows, respectively. It seems that the segmentation will depend, for small segments, on the window size. Thus it would be interesting to study the effect of the window size on segmentation. Excepting this minor problem, both schemes perform well. Turner has the advantage of better segmentation due to a second pass over the first segmentation. A similar thing can be done with the scheme of Martin and Aggarwal. On the other hand, Turner’s segmentation may be inferior to that of Martin and Aggarwal since Turner uses only four directions as against eight used by Martin and Aggarwal. It would be interesting to see the curves reproduced from the smoothed representation of a given curve. If one finds some method for good reproduction of curves from their smoothed $-s diagram, then the utility of these representations will be certainly enhanced.

Perkins [48] , [49] uses a similar method for approximation of edges. His program examines the curvature of the con-

K x

x

Fig. 3. Common junction types in images of curved objects.

nected edge points and then looks for abrupt changes in curvature to set the initial groupings. The plot of edge points in $-s space has the same properties as discussed above.

There are several more techniques for approximation of curves [ 461 , [ 471 but they have not been used for analysis of scenes containing curved objects.

Some researchers [ lo ] , [ 2 11 suggest that after an initial coarse segmentation, the scene analysis system should make use of models of the objects in refining the segmentation of the image. This heterarchical approach is attractive and may be useful for the analysis of scenes containing objects from a library of a few known objects. For analysis of scenes which may contain an object from a library of a moderately large number (say, 100) of objects, the computational requirements seem to become prohibitive.

B. Role of Junctions in Analysis Most systems for analysis of scenes containing only poly-

hedral objects exploited the information contained in a junction. In fact, the main task of these systems was to analyze junctions to get consistent labels for each junction in the image. It seems that junctions may also be useful for the analysis of scenes containing curved objects.

The junctions in curved objects are not as clearly defined as in polyhedral objects. Common junction types in images of curved objects are shown in Fig. 3. It is expected that the possible number of junctions for curved objects will far exceed the possible number of junctions with polyhedral objects. However, it is found that the considerable increase in number of labels that curved objects cause is to some extent offset by the greater variety of junctions. Although there are more labels, they are spread over more junction types. This partially, but not completely, attenuates the increased com- binatorics of the analysis. Turner used junctions in the analysis of scenes containing curved objects. He found that the analy-


sis can be carried out with ambiguous junction classifications. The curved objects in the Scenes analyzed, however, had regular surfaces. The utility of this method for irregular surface objects remains to be seen.

Chang [ 141 developed a system for the analysis of objects with curved surfaces. His system used the properties of junctions extensively. His system takes as input a list of vertices in a two-dimensional line drawing of a scene of bodies which have both flat and curved surfaces. The objects considered in the scenes were far from complex real world objects and the system did not face the problem of imperfect edges.

The biggest difficulty with junction analysis for curved objects is the fact that a line may undergo change in its interpretation. Labeling techniques exploit the fact that the label of a line is consistent at both of its ends. In curved objects, particularly with concave objects, this line-label consistency is not always satisfied. This makes the labeling process very difficult in that processes like that of Waltz cannot be applied. Recently, Chakravarty [ 13 1 has introduced a line and junction labeling scheme, which also is claimed to be valid for curved surface bodies. He defines an imaginary junction as a junction formed by the intersection of a concave curved line with an invisible virtual edge. This imaginary junction is helpful in analyzing partially occluded concave surfaces, which result in a change in interpretation of a line. He analyzed isolated curved objects, but did not show whether this scheme could be extended to the analysis of scenes containing more than one curved object. Thus it appears that junctions may help in interpretation of

a scene containing curved objects but their role will be very much reduced. If recognized correctly,T junctions st i l l play an important role as a strong indicator of occlusion (or self- occlusion of the object). It is interesting to note that even after a very strong emphasis on junctions in polyhedral scenes, most systems for scenes containing curved objects have not given comparable importance to junctions.

C. Shape Information from Intensities Krakauer [301 represents an image as an intensity-region

tree. If p ( t ) is a set 2f image ppints of intensity t or greater then, since p ( t ) C p ( t ) if t > t , the regions fall into a tree structure based on this subset relation. An intensity contour map of an object may be easily obtained from the intensity region tree. Krakauer showed that from the intensity region tree of an image, one may extract information about both the surface properties and the shape of an object.

Horn [ 251, [26] utilized the fact that the intensity at a point in an image is the product ‘of the reflectance at the corresponding object point and the intensity of illumination at that point. Thus a surface in a scene having uniform reflectance will result in an image whose intensity array may be used to determine the spatial positions of the points of that surface. However, this cannot be done using local operations alone. Horn [ 251 showed that the reflectivity and the gradient of the surface are related by a nonlinear first-order partial differential equation in two unknowns. He developed a method for Soh- ing this equation and thus developed a method to find the shape of a smooth opaque object from the intensity measured in an image. Later [ 261 he showed that the shape of an object may be obtained without solving nonlinear differential equations if one works in gradient space.

Barrow and Tenenbaum [9] start with the light intensity values and prepare images, called “intrinsic images,” for each

intrinsic characteristic, such as surface reflectance, surface orientation, and incident illumination. The intrinsic images are in registration with the corresponding input image. Barrow and Tenenbaum believe that for a point of an image the intensity value encodes all the mentioned intrinsic attributes of the corresponding scene point. The problem of decoding the intensity values to obtain intrinsic attributes is a hard problem. Barrow and Tenenbaum believe that by exploiting clues from various physical phenomena in conjunction with the physics of imaging it may be possible to recover the intrinsic images from the input image. Their belief is supported by their experience with an experimental domain. This domain may be viewed, “as an approximation of a world of colored Flay-Doh objects in which surfaces are smooth, reflectance is uniform for each object, there is outdoor illumination, and the scene is imaged by a TV camera.”

D. Object Recognition in Images Object recognition is the process of matching the description

of the object to a model of the object and thus (possibly) assigning a suitable name to the description. The complexity of the object recognition process depends on the deviation allowed in the object views from the stored models, on the number of different objects to be recognized, and on the nature of the objects to be recognized. Obviously, it is more difficult to design a recognizer for many objects differing only slightly from each other, than for quite different objects. It will be easier to design a recognizer for a car or motorcycle and a pedestrian than for a particular style of car. The complexity increases with the details t o be considered.

In general, it seems to be a good strategy initially to generate hypotheses from a coarse analysis and then to use these initial hypotheses for the guidance in detailed matching of the object descriptions to the models 1291 : If the coarse analysis results in elimination of many objects of the library as a possible match, subsequent matching will require considerably less work. However, in most of the nontrivial situations, fiiding a method for good coarse analysis is not simple.

McKee and m a l [ 4 1 ] use the area of the region between the $-s curve for the model and for the unknown object as a measure of dissimilarity. A similar method has been used by Martin and Aggarwal [391. It should be mentioned that in these works, care has been taken to make the $-s curve insen- sitive to translation, rotation, and starting point. Moreover, size changes can be accounted for easily. This method is certainly useful if only the outline of the object is important. In those cases where internal edges and internal details are important, this method does not seem to be very efficient. How- ever, even in such situations, this method may be of use in coarse analysis. Usually it is easy to extract fairly accurate outlines of the object and in most situations, outlines are very helpful in narrowing down the search.

Perkins [48], (491 uses concurves in matching. Concurves are very similar to $-s curves. As the concurves are not insen- sitive to translation, rotation, and starting point, part of a con- curve is matched to start the general matching process. The objects are always of the same size. This method also has the same limitations as those of McKee and Aggarwal’s scheme.

Guzman [ 231 defines a model as a generalized description of an object or a class of objects, with certain parameters left un- specified. He considers models of many types such as shape model, relation model, fured model, rigid model, free model, sloppy model, etc. He uses a tree-search approach for model-

JAIN AND AGGARWAL: SCENES WITH CURVED OBJECTS 809

ing. His design is quite exhaustive, but its utility remains to be seen as his system [23] was presented to collect criticism and suggestions. Without results it is difficult to be sure about the correctness and effectiveness of the theories.

Barrow et al . [8 ] have been working with relational models. In [ 8 ] , they stated, “The object recognition process is essenti- ally one of abstraction where we say that a number of pictures all represent the same object, or possibly objects forming a class such as cups or chairs. Thus we must discard most of the information in the picture, and it is plausible that we should discard the metric information and keep only properties and interrelationships of the parts of the picture. Thus the distance between two lines may not be important but the fact that they are parallel may be.” The matching process for such models is the process of finding a monomorphism from the picture description to the model. For this, two approaches are proposed. The fvst approach is an extension of the graph matching technique of Rastall [SO] and the other is that of “hierarchical synthesis.” The latter approach seems to be more attractive for complex scenes. In that method, not only is a set of known structures representing objects which might occur in the picture specified, but also a hierarchy of substructures of these is specified. The recognition proceeds by first finding the smaller substructures and then checking combina- tions of them to recognize larger known substructures in the picture. Application of these concepts to curved objects are demonstrated in [ 5 1 , [7 ] and in Turner’s thesis [ 561. He implemented these concepts using POPLER, a hnguage which is similar to PLANNER.

Krakauer [ 301 developed a system which used the intensity region tree for the recognition of fruits. This method certainly has interesting features for objects with smoothly curved surfaces. However, the effectiveness of the method reduces rapidly for objects with nonsmooth surfaces. Moreover, this method treats the object representation as a single sample in a pattern classification sense and hence may pose serious problems when objects are occluded. Dudani [ 19 J , Advani [ 11, and McGhee [38] have developed methods for recognition of three-dimensional objects, however these methods are similar to pattern classification methods in flavor and are not discussed here.

N. SEGMENTATION AND RECOGNITION IN SCENES

In Section 111 we considered the methods for the analysis of three-dimensional scenes from their two-dimensional images. Some efforts are being made to use sensors to obtain depth information also. Agin [ 2 J , [ 3 J used a laser ranging technique for measuring the depth. Nevatia [43 J obtained depth information by simulating stereo vision using a single camera for a moving object. Duda et al. [ 181 use a scanning laser sensor that can provide registered arrays of intensity and range data. We will not discuss further the technique of measuring depth as we are interested in the the post data acquisition analysis.

Agin [ 2 ] , [ 3 ] developed a system for obtaining three- dimensional models using a television camera and a deflectable laser beam diverged into a plane by a cylindrical lens. He represented complex objects using generalized cylinders as the primitives. Generalized cylinders were formalized as a volume representation by an arbitrary cross section varying along a space curve axis. The objects which appeared in his examples are a Barbie doll, horse, snake, hammer, etc. Complex objects, such as dolls, were segmented into parts and described using

generalized cylinder primitives. The most significant point is the use of a volume representation for the primitives as opposed to a surface representation.

A generalized cylinder consists of a space curve, or axis, and a cross section function defined on this axis. The description o f a simple object may be determined by locating an axis such that the object’s cross section varies in a uniform manner along the axis. Descriptions of complex objects may be built up by “cutting and pasting” the descriptions of their constituent parts. An important property of the cross section of generalized cylinders is translational invariance. Thus given a model in terms of generalized cylinders, the contours of the object may be uniquely synthesized. However, the inverse process, the generation of a model to represent a given real object, does not yield a unique answer. The segmentation of an object into parts and the determination of the relationship of the parts relative to each other are done using heuristic techniques.

Nevatia and Binford [44] , [45] generalized and improved the techniques for describing the pieces. Primitive parts are volume representations and are described as generalized local cones. These local cones are the volumes swept out by trans- lating an arbitrary cross section, maintaining it normal to the path along which it is translated, while the scale of the cross section is changed smoothly. Thus a segment of an object can be described by an axis and arbitrary normal cross section valued function. Segmentation of an object is done by cutting it into simple pieces. A simple piece should have a continuous axis and a continuous cross section function along the axis. Nevatia and Binford believe in a flexible segmentation process generating alternate propositions possibly guided by higher level routines.

The recognition process is then the process of matching the symbolic descriptions for the current scene, with some descrip tions stored in the memory. As in the case of recognition of objects from their two-dimensional images, the matching problem can be cast in graph theoretic terms. Descriptions may be viewed as graphs with the pieces and joints as nodes. The relations of these pieces and joints are the arcs of the graph. The graphs may not be exactly identical and partial graph matching may be useful.

Hollerbach [ 241 described pottery using multiple generalized cylinders. He developed useful qualitative descriptions which bring out the significant features and subordinate lesser ones. Soroka and Bajcsy [SS] describe a system for representing three-dimensional objects by examining slices through spaces in which they are embedded.

M a n and Nishihara [36] , [37] developed a method for r e p resenting three-dimensional shapes based on a hierarchy of stick figures, where each stick is an axis in the shape’s generalized cone representation. In a data base, stick figures for an object at several levels of detail were stored. In Fig. 4 we re- produce a model of a human as given by this approach. The recognition process uses an imagespace processor for moving between objectcentered and viewer-centered coordinate frames. The interaction between the image, the model, and the image-space processor gradually relaxes the model so that its axes project onto the axes computed from the image.

The concepts of generalized cylinders or generalized cones provide powerful methods for. the representation of elongated objects or of objects having elongated parts. It can be seen that most of the objects used in [ 3 ] , [4 ] , [36] , [37] , [44] , and [45] are either elongated or have elongated paits. For spherical objects this concept is not that powerful. Neverthe-


\

I (e)

Fig. 4. Hierarchical stick figure representation. (a) A human. (b) Arm. (c) Lower arm. (d) Hand. (e) Finger.

less, generalized cylinders offer a useful method for representation of curved objects in three dimensions.

V. REPRESENTATION PROBLEMS The importance of good representation of information in

scene analysis systems cannot be overemphasized. As it may have been noted by the reader, the analysis of scenes having curved objects is facing serious representation problems. The representation problem is now attracting due attention of researchers in the area. Some representation methods were discussed in previous sections in connection with methods for segmentation and recognition. In this section, we discuss various methods for representation of objects.

A. Primal Sketch, 2.5-0 and Generalized Cylinders According to the approach taken by Marr [32]-[37], the

first step in processing an image is to extract its primal sketch. The primal sketch is a primitive representation that allows the intensity changes in the image and the local geometry of an image to be made explicit. This can be done by marking significant intensity changes in the image by “place tokens” which are defined by blobs, small lines, and the ends of lines or bars. The local geometrical relations between place tokens may be represented by virtual lines joining nearby place tokens.

The next step in the representation not only makes information about depth, local surface orientation, and discontinuities in these quantities explicit, but also creates and maintains a global representation of depth that is consistent with the local cues. Stereo, motion, and occlusion are used as cues in finding depth and/or local changes in depth. Local surface orientation is obtained using shading, texture gradient, and perspective cues. The 2.5-dimensional sketch is a viewer-centered representation and makes explicit information about depth, local surface orientation and discontinuities in these quantities in the image in a form that is closely matched to what early

visual processes can deliver. The 2.5-dimensional sketch can convey information to other processes which may be useful in recognition tasks. For the recognition task, the representation of the object should be based on an object-centered coordinate system. The representation using generalized cones is a possible candidate for this type of representation. The recognition process using this representation was discussed in Section IV.

The analysis of occluded objects is known to be a difficult task. For the analysis of occluding contours, M a n [33] made two assumptions: 1) the nearby points on a contour corre- spond to nearby points on the viewed surface; and 2) the dis- tinction between convexities and concavities in a contour reflects real properties of the surface, not an artifact of perspective. It was shown that if these assumptions hold for all distant vantage points such that the line of sight lies parallel to the plane of the cross section of the cone, then the viewed surface must be a generalized cone. He developed algorithms for discovering the axis and cross section of occluded parts of the objects from the visible parts.

B. Multiple-View Representation Minsky [42] proposed a multiple view representation for

three-dimensional objects. This representation is based on the fact that if one chooses one’s primitives correctly, then the number of qualitatively different views of an object may be quite small. Therefore, the representation of three-dimensional shape might consist of a catalogue of the different appearances of the shape. Underwood and Coates [57] used a similar idea in their research. However, this method of representation has not been sufficiently explored.

Recently, Baker [6] proposed a method for building models of objects through binocular and motion parallax analysis. This method is conceptually similar to Minsky’s multiple view representation. Baker has developed a method to combine information from various views in a model. It appears that the models of the same object obtained from two different sets of views may differ appreciably, making recognition a very difficult task. The research of Shapira and Freeman [ 5 31 may also be considered in this category. They, however, have allowed bodies to have only quadric or planar faces.

C. Polyhedral Approximations Baumgart [ 111 used the fact that a three-dimensional object

can be approximated using only polyhedral shapes. This makes manipulation of representations and the comparison between the expected view and the actual view easy. How- ever, this representation has no uniqueness. Representations obtained on two different occasions from two different sets of views may be quite different. Moreover, using only polyhedral shapes, it seems computationally impossible to be able to represent fine details about the curvature in the shape. Coons [ 161 has used rectangular patches for the specification of the surfaces of curved objects. The usefulness of these representations in scene analysis is doubtful as it is not known how to derive these representations from images.

D. Medial Axis Transform The Blum Transform [ 121, also known as the Medial Axis

Transform, bears some resemblance to the representation by generalized cylinders. This geometry of shape is based on the notion of growth outward from a point. This transform is obtained by associating with every point in the interior of a

JAIN AND AGGARWAL: SCENES WITH CURVED OBJECTS 81 1

shape a maximal disk neighborhood. The Medial Axis Trans- form of a given closed shape consists of the centers of the maximal disks contained in a shape which are not wholly contained in any larger disk. In three dimensions the Medial Axis Transform becomes a curved surface in space.

For twedimensional surfaces the Medial Axis Transform has some resemblance to generalized cylinders. Medial Axis Trans- forms, however, have a very undesirable property-a minor variation in the outline produces a major perturbation in the Medial Axis Transform of the outline. Also, in three dimensions they are inferior to generalized cylinders as their medial axis may be two dimensional and hence will be computationally more expensive in representation and manipulation.

The representation methods considered in this section make it clear that representation of curved objects is a difficult task. Many approaches are being explored.

VI. CONCLUSION In this paper a brief review of important trends in research

for the segmentation, recognition, representation, and analysis of curved objects in images and scenes has been presented. The research in this area, and in overall computer vision also, may be broadly classified in two classes. One approach is highly influenced by theories of vision in psychology. Re- searchers pursuing this type of research are trying to develop computational theories for human vision. They expect that after these theories are developed, computer vision systems may be designed using these theories. The second approach adopted by researchers is to develop algorithms for specific vision tasks. The techniques developed by this class of researchers use domain dependent knowledge extensively. The techniques developed by them are useful in developing systems working in a limited world, but lack power for general applications. Each system is tailored to suit a particular application. Obviously a general vision system should not be limited to a restricted domain. At the same time it is doubtful that a system having the capability to analyze general scenes may be designed using computational theories of human vision. The flexibility of present sensors for forming images is not comparable to that of the human eye. Thus the basic data upon which computation has to be performed has quite different characteristics than the data on which the psycho- physical system has to operate.

A deep understanding of the physical processes involved in the image formation processes of sensors (cameras) may be vital in extracting relevant information from the image. Most scene analysis systems use intensity and step changes in intensity at the first level of description. The next level is the symbolic description of the objects. Horn [25], [26] and Barrow and Tenenbaum [ 91 have stressed correctly the need to consider that an intensity value in an image encodes the characteristics-such as incident illumination, reflectance, orientation of surfaces-of the surface element at that point. By exploiting the constraints imposed by the real world in conjunction with the physics of the imaging process it may be possible to decode the information contained in the intensity values. This approach is in its infancy and it is difficult to predict how complex the methods for obtaining reliable values of intrinsic characteristics will be. However;such processes certainly deserve thorough investigation.

For the segmentation of images most early systems relied heavily on intensity values. Later, the domain dependent knowledge about the objects was used in segmentation. The

reliable segmentation of images is still an unsolved problem. This has encouraged some researchers to use range data in the segmentation and analysis of images. Though it is certain that range data will be a great help in segmentation, the added complexity in hardware may prevent these techniques from be- coming very popular. It is clear that in the absence of reliable segmentation, the task of scene analysis is very difficult. The approaches emerging from the works of Marr [32]-[37]. and Barrow and Tenenbaum [9] are attractive as they use general knowledge for extracting the symbolic representation at the level of surfaces and volumes.

The techniques for the recognition of the objects in images and scenes are also being actively explored. For the recognition of objects in images the representation of edges based on $-s plots are quite powerful. This representation may also be used for recognition of objects from their partial views. For the representation of objects in threedimensions, the concept of generalized cones may play a key role.

Many special purpose systems have been implemented for the recognition of a limited set of objects. A powerful general technique is yet to emerge. Some research has been done for the recognition of objects from their partial views. In many industrial and military applications objects are to be recognized from their partial views. Techniques for the recognition of objects from their partial views may be very useful. Dy- namic scenes have also received little attention. The ability to recognize moving objects may also play an important role in many applications.

It is clear from the literature that many efforts are being made to develop systems capable of analyzing scenes containing general curved objects. Many researchers now feel that the early techniques developed for special applications and for polyhedra may not be useful in systems for more general scenes. This has led to a search for low-level processing independent of domain knowledge but exploiting the constraints imposed by the real world, and utilizing knowledge of the physics of imaging devices. The new approaches are certainly more independent of the domain in early processing of data, but their power is yet to be demonstrated by working systems. The emerging methodologies for the representation of objects are also independent of the domain knowledge. Recognition of objects, particularly from their partial views, requires more attention. It is certain that statistical pattern recognition procedures will not be of much use in recognition of objects in scenes as the viewpoint may change, objects may be only partially visible, and the number of different objects may be very large. Symbolic matching techniques are more powerful in recognition of objects but methods are yet to be developed for efficient and economical storage and use for models of several hundred objects. In summary, the state of the art for the analysis of scenes containing curved objects has made progress but it still has a long way to go.

ACKNOWLEDGMENT We thank our colleagues L. Davis, W. Martin, and J. Roach

for their comments and suggestions.

REFERENCES

[ 11 J. G. Advani, “Computer recognition of three-dimensional objects from optical images,” FkD. dissertation, Ohio State Univ., Columbus, 197 1.

[ 2 ] G. J. Agin, “Representation and description of curved objects,” Stanford Artificial Intelligence Lab. Memo AIM-173, Oct. 1972.

[3] G. J. Agin and T. 0. Binford, “Computer demription of curved


objects,” in Roc . 3rd IJCAI (Stanford University, Stanford, CA),

[ 4 ] G. J. Agin, “Hierarchical representation of 3-D objects,” Fmal rep., SRI Project 1187, Stanford Research Inst., Stanford, CA,

[ 5 ] A. P. Ambler, H. G. Barrow, C. M. Brown, R. M. Burstall, and Mar. 1977.

tem,” in Roc. 3rd IJCAI (Stanford University, Stanford, CA), R. J. Poppelstone, “A versatile computer controlled assembly sys-

pp. 629-640, Aug. 1973.

[ 6 ] H. Baker, “Three dimensional modelling,” in Roc. 5th IJCAI Pp. 298-307, AUg. 1973.

.~ (Cambridge, MA), pp. 649-655, Aug. 1971.

[ 7 ] H. G. Barrow and R. J. Popplestone, “Relational descriptions in picture processing,” in Machine Intelligence, vol. 6, B. Meltzer and D. Mitchie, Eds. Edinburg, Scotland: University Press, 1971.

(81 H. G. Barrow, A. P. Ambler, and R. M. Burstall, “Some techniques for recognizing structure in pictures,” in Frontiers of Pat- tern Recognition, s. Watanabe, Ed. 1972.

[ 9 ] H. G. Barrow and J. M. Tenenbaum, “Recovering intrinsic scene characteristics from images,” Technical Note 157, Stanford Re-

[ lo ] -, “IGS: A paradigm for integrating image segmentation and search Inst., Stanford, CA, Apr. 1978.

interpretation,” in Roc. 3rd In?. Joint Con$ Pattern Recogni-

[ 11 ] B. G. Baumgart, “Geometric modeling of computer vision,” tion, pp. 504-513, 1976.

Stanford Artificial Intelligence Labomtory Memo AIM-249, Stanford Univ., Stanford, CA, Oct. 1974.

[12] H. Blum, “Biological shape and visual science (Part I),” J. Theor.

[ 131 I. Chakravarty, “A generalized line and junction labelling scheme with applications to scene analysis,” Rensselaer Polytechnic

[14] Y. Chang, “Machine perception of objects with curved surfaces,” Institute, Troy, NY, Tech. Rep. CRL-55, Dec. 1977.

Coordinated Science Lab., Univ. Illinois, Urbana, TR 163, May 1974.

I151 M. B. Clowes, “On seeing things,” Artificial Intel., vol. 2 , pp.

[ l a ] S. A. Coons, “Surfaces for computer-aided design of space forms,” M.I.T. Project MAC Report, MAC-TR-41, June 1967.

[ 171 L. Davis, “A survey of edge detection techniques,” Computer Gmphics and Image Roces ing , vol. 4, 1975, pp. 248-270.

[ 181 R. 0. Duda, D. Nitzan, and P. Barrett, “Use of range and reflectance data to fmd planar surface regions,” Stanford Research

[19] S. A. Dudani, “An experimental study of moment methods for Inst., Stanford, CA, Tech. Rep. 162, Apr. 1978.

automatic identification of three-dimensional objects from television images,” Ph.D. dissertation, Ohio State Univ., Columbus,

[20] G. Falk, “Inf:rpretation of imperfect line-data as a three-dimen- 1973.

sional scene, Computer Science Dep., Stanford Univ., Stanford, CA, Report CS 180,1970.

[ 2 1 ] E. C. Freuder, “A computer system for visual recognition using active knowledge,” M.LT., Cambridge, MA, AI-TR-351, 1976.

[ 2 2 ] A. Guzman, “Decomposition of a visual scene into three-dimensional bodies,” in Roc. AFIPS Fall Joint Computer Conf., Dec.

[23] -, “Analysis of curved line drawings using context and global 1968.

information,” in Machine Intelligence 6, B. Meltzer and D. Michie, Eds. Edinburgh, Scotland: University Press, 1971.

[24] J. M. Hollerbach, “Hierarchical shape description of objects by selection and modification of prototypes, M.I.T., Cambridge,

[25] B. Horn, “Obtaining shape from shading information,” in Psy- chology of Computer Vision, P. H. Winston, Ed. New York: McGraw-Hill, 1975.

[26] -, “Understanding image intensities,” Artificial Intel., vol.

(271 D. A. Huffman, “Impossible objects as nonsense sentences,” Machine Intelligence 6, B. Meltzer and D. Michie, Eds. Edin-

[28] -, “Curvature and creases: A primer on paper,” in Roc. Con$ burgh, Scotland: University Press, 1971.

Computer Gmphics, Pattern Recognition, and Dota SnUctures,

I291 M. D. Kelly, “Edge detection in pictures by computer using plan- ning,’’ Machine Zntelligence 6, B. Meltzer and D. Michie, Eds.

[ 30 1 L. J. Krakauer, “Computer analysis of visual properties of curved Edinburgh, Scotland: University Press, 1971.

[31] A. K. Mackworth, “Interpreting pictures of polyhedral scenes,” objects,” M.I.T., Cambridge, MA, MAC-TR-82, 1971.

Biol., VOI. 38, pp. 205-287, 1973.

79-116,1971.

MA, AI-TR-346, NOV. 1975.

8, pp. 201-231,1977.

pp. 360-370, 1975.

in Roc. 3rd IJCAI (Stanford Univ., Stanford, CA), pp. 556- 563,1973.

[32] D. Marr, “Early processing of visual information,” M.I.T., Cam- bridge, MA, AI Memo 340,1975.

(331 -, “Analysis of occluding contours,” M.I.T., Cambridge, MA, AI Memo 372,1976.

[34] D. Marr, and T. Poggio, “A theory of human stereo vision,” M.I.T., Cambridge, MA, AI Memo 451,1977.

[35] D. Marr, “Representing visual information,” M.I.T., Cambridge, MA, AI Memo 477,1977.

[36] D. Marr and H. K. Nishihara, “Spatial disposition of axes in a generalized cylinder representation of objects that do not encom- pass the viewer,” M.I.T., Cambridge, MA, Memo No. 341, Dec. 1975.

(371 -, “Representation and recognition of the spatial organization of three-dimensional shapes,” M.I.T., Cambridge, MA, AI Memo

(381 R. B. McGhee, “Automatic recognition of complex three-dimen- 377, Aug. 1976.

Intelligent Robots, K. S. Fu and J. T. Tou, E&. New York: sionaI objects from optical images,” in Leaming Systems and

(391 W. M. Martin, and J. K. Aggarwal, “Dynamic scene analysis: The Plenum Press, 1978.

study of moving images,” Univ. Texas, Austin, TR No. 184, Elec- tronics Research Center, Jan. 1977.

[40] J. W. McKee and J. K. Aggarwal, “Fmding the edges of the surfaces of three-dimensional curved objects by computer,” Pattern

I411 -, “Computer recognition of partial views of curved objects,” Recognition, vol. 7, pp. 25-52,1975.

[42] M. Minsky, “A framework for representing knowledge,” in m e IEEE Trans. Comput., vol. C-22, pp. 790-800,1977.

Psychology of Computer Vision, P. H. Winston, Ed. New York: McGraw-Hill, 1975, pp. 211-277.

(431 R. Nevatia, “Depth measurement by motion stereo,” Computer G ~ p h i ~ ~ a n d Z m a g e R ~ ~ e & g , v o l . 5, pp. 203-214,1976.

[44] R. Nevatia and T. 0. Bmford, “Structured descriptions of complex objects,” in Roc. 3rd IJCAZ (Stanford Univ., Stanford, CA),

[45] 7 , “Description and recognition of curved objects,” J. Arti-

[46] T. Pavlidis and S. L. Horowitz, “Segmentation of plane curves,” ficual Intelligence, vol. 8, 1977, pp. 77-98.

[47 1 T. Pavlidis and F. Ali, “A general syntactic shape analyzer,” Com- IEEE Trans. Comput., vol. C-23, pp. 860-870, 1974.

puter Science Laboratory, Princeton Univ., Princeton, NJ, TR No. 221, Dec. 1976.

[ 4 8 ] W. A. Perkins, “A model-based vision system for industrial parts,”IEEE Trans. Comput., vol. C-27, pp. 126-149, 1978.

[ 4 9 ] W. A. Perkins and T. 0. Bmford, “A corner fmder for visual feedback,” Computer Gmphics and Imuge Pmcessng, vol. 2, pp.

pp. 641-647,1973.

(501 J. S. Rastall, “Graph family matching,” Res. Memo., Dep. Ma- chine Intelligence and PerceDtion. Univ. Edinburah, Edinburgh,

355-376,1973.

Scotland, MfP-R-62, 1969. 511 E. M. Riseman and M. A. Arbib, “Computation techniques in the

ImageRocessing, vol. 6, pp. 221-276, 1977. visual segmentation of static scenes,” Computer Graphics and

521 L. G. Roberts, “Machine perception of threedimensional solids,” in Optical and Electro-Optical Information Rocersing, 1. T. Tipett etal. , Eds. Cambridge, MA: M.I.T. Prw, 1965.

531 R. Shapira and H. Freeman, “Reconstruction of curved surface bodies from a set of imperfect projections,” in Roc. 5th ZJCAI

541 Y. Shirai, “Edge fmding, segmentation of edges and recognition (Cambridge, MA), pp. 628-634, 1977.

of complex objects,’’ in Roc. 4th IJCAI, (Tbilisi, Georgia,

- - - . - .

USSR). DD. 674-681.1975. [ 5 5 ] 9 . I. &r&a and R.‘Bajcsy, “A program for describing 3-D ob-

jects using generalized cylinders as primitives,” in Roc. Pattern Recognition and Image processing Con$ (Chicago, IL), pp. 331-339, June 1978.

[ 5 6 ] K. J. Turner, “Computer perception of curved objects using a television camera,” Ph.D. dissertation, Univ. Edinburgh, Edin-

[57] S. A. Underwood and C. L. Coates, “Visual learning from burgh, Scotland, 1974.

multiple views,’’ IEEE Tram. Computers, vol. C-24, pp. 651- 661,1975.

I581 D. Waltz, “Understanding line drawings of scenes with shadows,” in The Psychology of Computer Virion, P. H. Winston, Ed. New

[ 59 1 S. Zucker, “Region growing: Childhood and adolescence,” Com- York: McGraw-Hill, 1975.

puter GmphicsandZmageProceaing, vol. 5, pp. 382-389.1976.

Documents

R. Jain Computer Analysis of Scenes with Curved Objects.pdf