1997 - Menezes de Sequeira, Cortez - Partitions A taxonomy of types and representations and an overview of coding techniques

8/7/2019 1997 - Menezes de Sequeira, Cortez - Partitions A taxonomy of types and representations and an overview of codi

1/25

Partitions: a taxonomy of types and representations and

an overview of coding techniques

Manuel Menezes de Sequeira 1,2 and Diogo Cortez

Instituto de Telecomunicacoes, IST, 1096 LISBOA CODEX, Portugal

Recognising the importance of partition or shape coding in the field ofimage and video coding, and the fact that a systematisation of the subjectwould simplify the comparison of partition coding techniques, this paperproposes a classification of partition types and partition representationsinto a taxonomy tree. The partition type level of the tree classifies thepossible partition types that may have to be coded. The partition rep-resentation level classifies the possible representations for each partitiontype.This paper also overviews the partition coding techniques which addressthe considered partition types and which use the identified partition rep-resentations. Emphasis is given to binary partition coding techniques pro-posed within the framework of MPEG-4.

1 Introduction

During the last decades, considerable research effort has been put into the fieldof image and video coding. Pixel-based coding techniques, whose first stepswere made in the seventies [46,26,27], matured during the eighties and begin-ning of the nineties into widely spread standards such as ITU-T H.261 [24],ISO/IEC MPEG1 [44], ISO/IEC MPEG2/ITU-T H.262 [45], and more re-cently ITU-T H.263 [25]. Meanwhile, starting in the beginning of the eighties,research on new techniques has begun [38]; these new techniques have beenidentified [61] as using mid-level vision concepts (such as regions and textures)instead of the pixel-based low-level vision concepts used before.

1 Partially supported by JNICT.2 Corresponding author. Address for correspondence: Torre Norte, 10-15, IT,IST, 1096 LISBOA CODEX, Portugal. Tel.: +351.1.8418461. Fax: +351.1.8418472.Email: [email protected].

Preprint submitted to Elsevier Preprint 21 November 2010


2/25

In 1993, the ISO MPEG group started a new standardisation effort: MPEG-4 [52]. There will be important differences between MPEG-4 and the existingstandards: (a) it will permit easy access and manipulation of the contents ofvideo sequencesin terms of scene objects, apart from providing the usualrequirements of compression, quality, and cost [53], and (b) it will be based on

a flexible syntax, enabling it to survive longer by allowing and encouragingevolution.

Obtaining a description of a scene (in a video sequence) in terms of a set ofobjectsimage analysisis one of the most important tasks in modern imageand video representation. It is also one of the main challenges faced by theresearchers in this field. Image analysis algorithms usually produce partitionsof the scenes into two-dimensional (or three-dimensional) regions. These par-titions usually have to be coded during the image and video representationprocess. It has been recognised that partition information will account for a

significant percentage of the bit stream (e.g., [23]). It is thus very importantto develop efficient partition coding techniques.

The comparison of techniques proposed in the literature has often been hauntedby the lack of systematisation of the subject. This paper attempts to fill thisgap by proposing a taxonomy of partition types and representations. The par-tition coding techniques addressing each of the considered partition types andusing the identified partition representations are then overviewed.

2 Definitions

Digital images, whether two- or three-dimensional (the third dimension isusually time), are usually obtained through a digitalisation process, whichinvolves sampling and quantisation. The sampling pattern often takes theform of a lattice [57]. For instance, rectangular and hexagonal lattices can beused for a two-dimensional image, see Figure 1.

An image graph is a simple graph [54], defining a neighbourhood system, as-sociated with an image. The neighbourhood systems N4, N8, and N6where

each vertex (except those on the border of the image) has respectively 4, 8and 6 neighboursare usually associated with images sampled according torectangular and hexagonal lattices, see Figure 2. Each set of pixels S in animage also has an associated graph, which is the maximal subgraph of theimage graph with the set of vertices S. A maximal subgraph is a subgraphto which no further edges from the original graph can be added without alsoadding some vertices (in this case, pixels). Henceforward, S will be used inter-changeably as meaning set of vertices S or the maximal subgraph associatedwith the set of vertices S.

2


3/25

lattice sites

u0

u1

(b)

u0

u1

(a)

y

x

y

x

Fig. 1. Examples of (a) rectangular and (b) hexagonal lattices (u0 and u1 are thelattice basis vectors).

graph edge

pixel or graph vertex

(b)(a) (c)

Fig. 2. Examples of sampling lattices with different neighbourhood systems: (a)and (b) rectangular sampling lattice with N4 and N8, respectively; (c) hexagonal

sampling lattice with N6.

A partition is a digital image where the value at each pixel is a label identifyingthe class to which that pixel belongs. If the number of labels is restricted totwo, the partition is binary; if more than two labels are possible, the partitionis said to be mosaic. Partitions are usually obtained by segmenting a digitalimage.

A class is the set of all pixels in a partition having that classs label. A regionconsists of the pixels in a connected component of a class (seen as a maximalsubgraph).

The class adjacency graph (CAG) is a graph with one vertex for each classin the partition (plus an extra one representing the outside of the image),and an edge between any two classes adjacent in the image graph. The regionadjacency graph (RAG) is defined similarly for regions.

Two partitions are said to be class or region equivalent if they divide the imageinto equally shaped classes or regions. Two partitions are said to be equal if,apart from being class equivalent, the labels of each class are equal in both

3


4/25

partitions. The partitions are said to be class topologically equivalent if theyhave the same labels and the CAGs are equal.

The line graph of a partition is obtained by duality of its planar image graph,see Figure 3. The contours of a partition are usually defined as the subgraph of

the line graph containing all vertices and edges standing between pixels withdifferent labels (i.e., belonging to different classes).

(a)

line graph edge

line graph vertex

pixel

(b)

Fig. 3. Line graphs: (a) rectangular and (b) hexagonal.

Contours can also be defined directly on the pixels, by selecting only the pixelsat the borders between regions.

Generally, the contour information allows only for a representation of parti-tions up to region equivalence. If class equivalence, or equality, is required,then information about which regions belong to which classes (region-classinformation) is necessary.

Several kinds of contour vertices, as shown in Figure 4, are defined accordingto their degree (number of incident edges [54]): 1 dead ends, 2 regularvertices, 3 junctions, and 4 crossings.

(c)

dead end

(b)(a)

junction crossing junctionregular regular

Fig. 4. Variety of vertices on contour graphs: (a) and (b) rectangular and hexagonalline graphs, and (b) rectangular image graph.

The concepts of path (a sequence of edges in a graph connecting successivevertices), circuit (a path which ends at the starting vertex), and loop (a circuitwhich cannot be segmented into two circuits), can be defined over a graph [54].

4


5/25

Circuits and paths are simple if they do not contain repeated edges. An Eulercircuit is a simple circuit containing all the edges of a given graph.

3 Taxonomy of partition types and representations

A taxonomy of partition types and representations is proposed in this section.The two main levels of the taxonomy, partition type and partition represen-tation, can be seen to correspond to the first steps taken when developinga partition coding technique: the identification of the problem to be solvedcorresponds to the identification of the partition type addressed by the codingtechnique, and the selection of the partition representation corresponds to se-lecting the kind of data the coding technique will manipulate. Thus, differentrepresentations, usually leading to different techniques, can be used for the

same type of partitions.

During the description of the taxonomy tree levels, square brackets will beused to specify the codes representing each branch of the tree.

3.1 Partition type

The partition types can be organised in a tree with the following levels:

(i) Space Are partitions two- [2D] or three-dimensional [3D]?(ii) Lattice What sort of sampling lattice was used for digitising the im-

ages from which the partitions were obtained (e.g., rectangular [R] orhexagonal [H])?

(iii) Graph What kind of graph is super-imposed on the partition (usuallya neighbourhood system is specified [N

n])?

(iv) Classes Are partitions binary [B] or mosaic [M]?(v) Connectivity Are the classes connected [C] or can they be disconnected

[D] (on the chosen image graph)?

Figure 5 shows the partition type levels of the taxonomy tree. The leaves ofthe taxonomy tree correspond to different types of partition. Each type ofpartition can be specified by answering the five questions listed. For instance,the answers: i. two-dimensional [2D], ii. hexagonal [H], iii. 6-neighbourhood[N6], iv. mosaic [M], and v. connected [C], (or, with codes, 2DHN6MC) definea type of partitions that lie in a two-dimensional space, correspond to digitalimages sampled according to an hexagonal lattice, are structured accordingto the hexagonal graph, can have more than two classes, and where all classesare connected (the concepts of class and region are equivalent in this case).

5


6/25

Space:

(2D or 3D)

Lattice:

(hexagonal orrectangular)

Classes:(binary or

mosaic)

Graph:

(Nn)

Connectivity:(connected or

disconnected)

R

B M B M

C C C C

B M

C C

H

2D 3D

...

N6 N4 N8

DDDDDD

2DHN6Bc 2DHN6Mc

2DHN6MC

2DRN4Bc 2DRN4Mc 2DRN8Bc 2DRN8Mc

Fig. 5. The partition type taxonomy tree (in bold, the example given in the text).c stands for either C (connected classes) or D (disconnected classes).

Notice that the branches under 3D in the figure are not drawn, since this

paper addresses mainly two-dimensional partitions. At the partition represen-tation level, however, three-dimensional partitions will be considered in moredetail (see next section).

3.2 Partition representation

This section introduces more levels of detail, related with the representationchosen for the partitions, into the taxonomy tree. Two- and three-dimensional

partitions will be dealt with separately.

3.2.1 Two-dimensional partitions

The first important decision to be made regards mosaic partitions:

(i) Handling Should the mosaic partitions be handled as such (a singlemosaic partition) [M] or should they be separated into a collection of bi-nary partitions (each one corresponding to a different class in the originalmosaic partition) [B]?

As will be discussed in Section 4.1, the handling of mosaic partitions as col-lections of binary partitions is often of paramount importance. For instance,when classes should be readily accessible from the coded bit stream, a collec-tion of binary partitions may allow an easier access to the various objects ina scene than the original mosaic partition.

It has been seen in Section 2 that a partition can be represented in two dif-ferent ways: either by the labels of each pixel, or by contour information plus

6


7/25

region-class information. When class equivalence is the aim, the latter providesinformation about the clustering of regions into a certain number of classes.

Hence, the next level in the taxonomy will be:

(ii) How How should the partition be represented? With pixel labels [L] orwith contours [C]?

For the case of partitions represented with contours, other choices have tobe made: How to represent the contours? What sort of neighbourhood sys-tem has the line graph? These questions lead to two other levels of partitionrepresentation in the taxonomy tree:

(iii) Where Where should contours be defined? On the image graph or onthe line graph? That is, should the contour be defined on pixels [P] or onedges [E] (understood here as borders between pixels, edges of the line

graph)?(iv) Graph What is the kind of neighbourhood system of the graph from

which the contour is a subgraph [Nn

]?

Figure 6 shows the partition representation levels of the taxonomy tree for thetwo-dimensional case.

How:

(labels orcontours)

Where:

(pixels oredges)

Graph:(N

n)

EP

N3N6

CL CL

EP

N4 N8 N4

2DHN6McBCEN

3

2DHN6Bc 2DHN6Mc 2DRN4Bc 2DRN4Mc

B M B MHandling:

(binary ormosaic)

2DRN8Bc 2DRN8Mc

B M

CL

EP

N4 N8 N4

Partitiontype:

Fig. 6. The partition representation taxonomy tree for the two-dimensional case (in

bold, the example given in the text).c

stands for either C (connected classes) or D(disconnected classes).

The 2DHN6Mc partition type with a representation separated into binary classpartitions, using contours defined on edges, which have a N3 neighbourhoodsystem, is coded as 2DHN6Mc-BCEN3 or:

Partition type Two-dimensional, hexagonal grid, N6 graph, mosaic, classesconnected or disconnected according to whether c is C or D.

Partition representation Mosaic treated as independent binary partitions,

7


8/25

contours, edges, N3 graph.

3.2.2 Three-dimensional partitions

As can be seen in Figure 7, for three-dimensional partitions two representa-tions may be considered: stick to three dimensions [3D], or slice the partitionalong the time domain and use two-dimensional methods [2D]. Prediction ofthe two-dimensional partition slices can be used [Inter], otherwise the two-dimensional partitions are considered independent [Intra]. When prediction isused, it may [M] or may not [F] use motion compensation (the M standsfor motion while the F stands for fixed). The motion information maybe either estimated from the three-dimensional partition [23] or input fromexternal sources (e.g., from a motion estimator working with the original three-dimensional image). Notice that the slicing to two dimensions establishes a

connection to one of the two-dimensional branches at the top of the represen-tation taxonomy shown in Figure 6, depending on the type of the resultingtwo-dimensional (possibly predicted) partitions.

Approach:(3D or 2D)

Prediction:(intra or

inter)

Compensation:(motion compensated

or not)

3D 2D

Intra Inter

M F

from the leaves of the 3D branch ofthe partition type taxonomy tree

to top of two-dimensionalrepresentation taxonomy

Fig. 7. The partition representation taxonomy tree for the three-dimensional case.

3.3 Representation properties

Choosing the representation for the partitions (of a given type) depends onthe properties of each representation and how adequate they are for the task

at hand. Pros and cons related with some of the levels of the partition repre-sentation taxonomy tree are listed below:

Handling: 3 (a) Mosaic A single connected contour graph can separate sev-eral regions, which leads to coding efficiency when a contour representationis used; however, access to a single class shape is not easy, since the regions(and classes) are not represented individually. (b) Binary The classes are

3 Only for mosaic partitions.

8


9/25

represented independently, and thus easy access to each class is provided,though at the expense of a lower coding efficiency.

How: (a) Labels In this case, the identification of the class to which eachpixel in the partition belongs is very simple, though the shapes of the classesare not directly represented. (b) Contours The shapes of the classes are

directly represented, albeit at the expense of requiring somewhat involvedalgorithms to ascertain the class of a given pixel [50,58,2].

Where: (a) Pixels Representing contours on pixels poses a number of prob-lems, especially in the case of mosaic partitions, since using all border pixelsleads to unnecessary repetition at both sides of a border; when the problemis avoided by using only one side of each border, other problems arise: e.g.,how should one pixel wide regions or parts of regions be distinguished fromborders of thick regions. Although the problems associated with these rep-resentations have solutions, often somewhat involved, coding contours onpixels does not seem to achieve higher compression than coding contours

on edges [13] (see also Section 4.6). (b) Edges This is usually a moreelegant way of representing contours, which in addition typically providesmore compression than pixel based contours [13].

4 Overview of partition coding techniques

Once the type of partitions to code has been ascertained and a partitionrepresentation selected, according to the taxonomy defined in the previous

section, there are usually a number of available coding techniques. This sectionoverviews some of these techniques. Special attention will be payed to two-dimensional partitions.

4.1 Objectives of coding

Picard [53] identified the three performance criteria that classical video sourcecoders attempt to minimise: rate, distortion, and cost. The first relates to thedesirable compression of the data to transmit, so as to reduce redundancyand also irrelevancy, if information losses are admissible during coding. Thesecond pertains to the need to maintain the quality of the signal as high aspossible, according to a possibly subjective criterion, and is applicable only tolossy coding techniques. The third has to do with implementation costs.

A fourth [performance] criterion was also identified by Picard [53]: minimisecontent access work. That is, the contents of a coded video sequence shouldbe as easy to access and manipulate as possible. Such an access or manip-ulation of individual objects requires them to be coded as independently as

9


10/25

possible in the bit stream. This fourth criterion is being addressed also inMPEG-4, and is related to one of the most important MPEG-4 functionalities:object scalability.

The question of whether to use lossy partition coding techniques is an im-

portant one. It is true that some techniques that are inherently lossy, suchas parametric curves, can yield good compression [29]. However, it may bedifficult, for some applications, to establish sound partition coding quality cri-teria. Also, when the scene objects (corresponding possibly to classes or sets ofclasses) are to be manipulated individually, e.g., pasting an object into a dif-ferent scene, the effects of lossy partition coding can be very important, sincepieces of the real object may be lost, pieces of the background can be intro-duced, and even object deformation may occur. This seems to indicate thatlossless partition coding techniques are preferable, and that simplificationsshould be introduced into the partitions carefully during the segmentation

process, before partition coding.

However, if lossy coding is acceptable, the losses are usually constrained sothat there is:

(i) Class topological equivalence: the classes should be maintained in num-ber and adjacency relations; that is, the CAG should not be altered. Astronger constraint can be imposed if the RAG is not allowed to change.

(ii) Small displacement of borders: the borders between the regions shouldchange as little as possible (according to some error criterion); other con-

straints may be imposed, for instance on errors associated with the areaand position of the regions.

4.2 Mosaic vs. binary partitions

When easy access to the contents of the video sequence is required, the shapesof the various objects (e.g., a class or a set of classes in a partition) willhave to be coded independently. This requirement can be imposed even if the

segmentation process resulted in a mosaic partition, reducing the problem tothe coding of a series of binary partitions (see the handling level in Figure 6).

The independent coding of binary partitions also arises naturally when a lay-ered scene representation, as proposed by Wang and Adelson [61], is used.Layered representations of the scenes are also used in the MPEG-4 Video Ver-ification Model 3.0 (VM3): each layer corresponds to a two-dimensional ob-ject of arbitrary shape, whose time snapshots are called Video Object Planes(VOPs) [52,48]. The shape of the objects represented by VOPs can be asso-

10


11/25

ciated to binary partitions. 4 However, if the content of the VOPs is codedthrough region based techniques, then mosaic partitions will also be necessarywithin each VOP.

Thus, both coding of binary and mosaic partitions may be important issues

when easy access to the contents of the video sequences is required.

4.3 Partition models

The coding efficiency always depends on the characteristics of the partitionsbeing coded. Most of the techniques aim at genericness, though this is a some-what hard to define property. By genericness it is often meant that the tech-niques perform well on average. The problem with this definition is that often

little is known about the statistics of the partitions which need to be coded.This is a general problem in image processing: is there a statistical model forthe images to process? In the case of partition coding, the statistical char-acterisation of input partitions depends both on the original images and onthe segmentation algorithm used upstream. 5 Hence, most techniques do notaddress a specific model of input partitions, making only some general as-sumptions such as: 6

(i) the regions tend to contain a significant amount of pixels, i.e., smallregions are improbable;

(ii) the classes tend to contain a small amount of regions;(iii) the contours (borders between regions) tend to be simple (not ragged);(iv) the region interiors tend not to contain too many small holes.

4.4 Class coding

Class coding is necessary when: 1. class equivalence is enough, 2. the partitionsused have disconnected classes (see connectivity level in Figure 5), and 3.the explicit labels of the partition pixels have not (yet) been coded (which isthe case for contour coding techniques and for some label coding techniques).The objective of class coding is to establish which regions are grouped in the

4 Actually the shapes of the VOPs can be specified in MPEG-4 using binaryshape, i.e., a binary partition, or grey scale shape, which is an alpha planespecifying the transparency of each pixel.5 Such a dependency makes it difficult to evaluate the performance of a partitioncoder by itself.6 See for instance Chapter 10 of [28].

11


12/25

same class. This issue will not be discussed at length here. However, note thatthe coding methods used should take into account that:

(i) the explicit class labels are not required, since class equivalence is enough;(ii) adjacent regions cannot belong to the same class, for otherwise they would

be a single region (this can help reduce the amount of data to transmit).

If partition equality is required, then the class labels should be coded explicitlyfor each region in the partition. When the classes are connected, the fact thata given label appears only once can be used to reduce the amount of datato transmit, since the degrees of freedom keep reducing until zero when thenext-to-last label is transmitted.

4.5 Label coding

Label coding techniques code partitions whose representation is based on pixellabels. The cases of binary and mosaic partitions will be addressed separatelyin the following.

4.5.1 Binary partitions

Binary partitions can be seen as binary (or two-tone or bi-level) images. There-fore, the techniques available for coding binary images are good candidates

for coding binary partitions. While lossless techniques can be applied withoutany problems, lossy techniques often do pose some problems, since the typeof losses they allow does not generally take into account the requirementsidentified in Section 4.1 for lossy partition coding.

Reviews on binary image coding can be found in [36,30] and, specifically forfax, [31]. The lossless coding standards ITU-T T.4 and T.6 (Group 3 andGroup 4 facsimile) [16,17] and ITU-T T.82 (JBIG, for progressive coding ofbinary images) [32] use techniques with increasing compression efficiency:

ITU-T T.4 Uses one-dimensional run-length encoding (RLE) and, option-

ally, also the two-dimensional modified relative element address designate(MREAD) codes, both followed by variable length coding (VLC). In thetwo-dimensional mode, each k line is coded with RLE (k is set to 2 for lowresolution images and to 4 for high resolution images), while all the otherlines are coded with MREAD.

ITU-T T.6 Is similar to ITU-T T.4, though the two-dimensional modeis always used and k is set to infinite, so that only MREAD is used. Theresulting codes are called modified MREAD (MMREAD).

ITU-T T.82 Uses the arithmetic Q-Coder [51] to code the pixel values. The

12


13/25

probabilities for the Q-Coder are estimated using a local context (a tem-plate) for the current pixel. Since JBIG uses resolution layers for progressivecoding, two types of templates exist: the first is used in the lowest resolutionlayer and includes only pixels already transmitted in that layer, while thesecond is used for all the other layers and includes not only pixels from the

current layer but also from the layer immediately below in resolution.

A technique based on a modified MMREAD code, on 16 16 blocks, has beenproposed for the coding of binary alpha maps in the framework of MPEG-4 [1]. This technique has been adopted in VM3 [48] after the last round ofcore experiments on binary shape coding [59] (see also Section 4.7). Two tech-niques with relations to JBIG [5,6] have also been evaluated during the coreexperiments. Both use arithmetic codes with probabilities estimated from alocal context around the pixel to be coded.

Among all the other techniques that have been proposed for binary partitioncoding, the morphological skeletons [41] (and more recently [33]) is especiallyrelevant, mainly because this technique has evolved lately to efficiently coveralso mosaic partitions [8] (see Section 4.5.2). This technique represents theshape of a region by a set of skeleton points and a so-called quench function:the region is the union of structuring elements (of a certain shape) centred onthe skeleton points and scaled according to the value of the quench functionat that point.

Since binary partitions are a special case of mosaic partitions, techniques de-veloped for the latter may also be applied to the former, either directly or

with simplifying changes, despite the fact that they do not take into accountthe special characteristics of binary partitions.

4.5.2 Mosaic partitions

The case of mosaic partitions is more complex. The coding of mosaic parti-tions has received less attention than the coding of binary partitions (however,see [8,7,60]). It is possible, nevertheless, to use binary partition coding tech-niques by first converting the mosaic partitions into bit planes. 7

A technique using the concept of geodesic skeleton, where the regions aredescribed by a set of skeleton points and a quench function [8], was recently

7 For instance, using the Four-colour theorem [54], the regions in a partition can beperfectly identified by painting them with only four colours. Hence, each region canbe identified by a two-bit label, and thus two bit-planes are sufficient for representingthe partition. Each of the two bit-planes can be coded independently using (lossless)binary partition coding techniques. Notice that some borders are present in bothbit-planes, so this method cannot yield optimal results.

13


14/25

proposed. This technique was developed for mosaic partitions, being thus alsoapplicable in the binary case, and is, in a sense, an extension of the techniqueproposed in [41] for binary partitions (see Section 4.5.1). The authors claimthat the geodesic skeleton is preferable to chain code whenever there aremany isolated and short contour arcs to be coded, which seems to be the

case when 3D -2DInterM (motion predicted 2D partitions corresponding totime slices of a 3D partition) partition representations are used.

A method which is also related to geodesic skeletons has been proposed in [60].It represents regions as a union of structuring elements with appropriate trans-lations and scalings. Both techniques ([8,60]) allow the structuring elementsto overlap already coded regions, thus avoiding duplicate coding of bordersand reducing the required bit rate. Both techniques are lossy and, again, canbe used for mosaic and binary partitions.

Another interesting technique, based on Johnson-Mehl tessellations, has beenproposed in [7]. 8 The idea is to find germs and their germinating time foreach region such that the original partition is reproduced well when the germsare allowed to grow until reaching other growing germs. Though the techniqueproposed is lossy, it can easily be made lossless. According to the authors, thetechnique performed worse than the other techniques studied (straight lineand polygonal approximation, chain codes, and geodesic skeletons).

4.6 Contour coding

At least three breeds of contour coding techniques can be distinguished:

(i) Chain codes The contour graph is coded by a string of symbols rep-resenting the direction of the chain connecting a vertex to the nextvertex on the contour. Each of these strings is called a chain code. Sym-bols may also represent direction changes, which makes the chain codesdifferential.

(ii) Parametric curves The contours are approximated by parametric curves,whose coefficients are then coded; the most common examples are approx-

imations by straight lines and by splines (in general, by polynomials).(iii) Transform codes The contours are represented as parametric curveswhich are coded using transform methods, followed by coefficient quan-tisation, in a one-dimensional equivalent of the transform image coding.

All these techniques involve two steps: first the representation is changed bytransforming the contours into strings of symbols (e.g., changes in chain di-rection, spline parameters, control points or transform coefficientspossibly

8 [7] also contains a good review of partition coding techniques.

14


15/25

quantised) and then these symbols are entropy coded.

For contours defined on pixels, it is also possible to use techniques developedfor binary image coding. The idea is to paint black, against a white back-ground, all the border pixels in the partition and then use one of the tech-

niques mentioned in Section 4.5.1. Notice, however, that lossless techniquesshould in general be used, since lossy techniques were not usually developedwith partition coding in mind.

4.6.1 Chain codes

The contour graph is a subgraph of either the line graph (for contours definedon edges) or the image graph (for contours defined on pixels), and usuallyconsists of a collection of paths on the original graph. Contours can thus be

represented by a string of symbols representing which of the neighbours of thecurrent graph vertex belongs to the contour or, which is the same, the direc-tion of the (chain) link connecting it to the next vertex on the contour: thesestrings are called chain codes [18,19,64]. When the symbols represent direc-tion changes, the chain codes are said to be differential [14,22]. The simplestpartitions are those for which the contour graph is constituted of disconnectedloops, that is, circuits where each vertex has exactly two neighbours in thecontour graph.

Binary partitions are generally simpler to code than mosaic partitions. Themain difference stems from the fact that, for binary partitions, all vertices inthe contour graph (at least for contours defined on the the line graph) have aneven number of neighbours: two vertices for images sampled with hexagonallattices, and two or four vertices for images sampled with rectangular lattices.That is, the connected components of such graphs have Euler circuits, i.e., theycan be drawn without lifting the pencil, according to a known theorem 9 ingraph theory [54].

Mosaic partitions with contours defined on edges require special treatment,since the existence of junction vertices (vertices with degree 3, see Figure 4)precludes the definition of contours as disconnected circuits. There are at least

two ways of dealing with this problem:

(i) Ignore junctions and crossings Select one of the exits and leave theothers for coding as separate contours; since initial contour points arecostly to code, this solution is not optimal.

9 A connected multigraph [and hence also a simple graph] has an Euler circuitif and only if each of its vertices has even degree [54]this theorem solves theso-called Konigsberg bridges problem.

15


16/25

(ii) Code junctions and crossings explicitly [42] Select one of the exits butcode also information about the junction or crossing so that later one canreturn and continue following the remaining exits (one in the case of ajunction, two in the case of a crossing).

When junctions and crossings are explicitly coded, the compression obtainedwhen coding a connected component of a contour depends strongly on the waythe connected component is followed: where to start, which exit to follow firstat each junction or crossing, etc. The problem of coding can then be seen asa problem of minimising the bit rate given a certain syntax of representation.This problem is similar to the Konigsberg bridges problem generalised, thatis, to the problem of making a line drawing without lifting the pencil andminimising the length of the redrawn lines [3].

When contours are defined on pixels, the concepts of junction and crossing

require a more involved definition and treatment [40,13]. In the case of bi-nary partitions, the problem may be solved by again ignoring the presenceof vertices of degree larger than two in the contour graph. Another problemof contours defined on pixels is posed by one pixel wide regions or parts ofregions, which make it difficult to use a stopping condition as simple as stopwhen the initial vertex of the contour is attained, which is often used whencoding contours defined on edges. Such regions may also require the existenceof a turning back (180) direction in the chain codes, rarely used, whichmay cause some VLCs to be inefficient (for instance Huffman). 10

In general, chain codes correspond to the specification of a subgraph, con-

sisting of a set of paths, in the underlying image or line graph. A contourconnected component consists of a set of paths linked at junctions and cross-ings. These paths can be represented by: 1. a position for the first vertex ofthe path, maybe implicitly indicated in previous crossing or junction informa-tion, and 2. a string of symbols, the chain codes, which may include crossingsand junctions information. Both the first vertex position and the chain codesare then entropy coded. The construction of the chain codes may also includecontour simplification procedures.

Several techniques have been proposed in the literature for entropy coding

the initial vertices and the chain codes: 1. zero order Huffman and arithmeticcoding (adaptive or not) [43,42], which tend to be inefficient, since regionborders are usually very different from a Brownian random walk through the

10 Consider an alphabet consisting of two symbols A and B with equal probabilities0.5: the corresponding Huffman code will have one bit per symbol. If a third, im-probable but possible, symbol C is added, and the probabilities are p(A) = 0.495,

p(B) = 0.495, p(C) = 0.01, the number of bits per code word will be 1, 2, and 2,respectively. The average number of bits per symbol will be 1.505, 40% worst thanthe minimum of 1.071.

16


17/25

image or line graph; 2. nth order Huffman and arithmetic coding (adaptive ornot) [13,43,14]; 3. Ziv-Lempel coding [65,63], which is a form of dictionary-based coding [36]; and 4. run-length coding, which groups chain codes intoruns of related symbols [34,43], usually corresponding to straight line seg-ments [37,4,43] (and hence constituted either of a single symbol or of two sym-

bols, with adjacent directions, which verify the conditions defined by Rosenfeldin [55]).

In the framework of the MPEG-4 core experiments on binary shape cod-ing [59], extensions to basic or differential chain codes have been proposed. In[20,59] a lossy multi-grid chain code is proposed which, according to the au-thors, reduces by an average of 25% the coding cost with respect to differentialchain codes. In [62] a method is proposed which decomposes a (differential)chain code into two chain codes with half the resolution, plus additional codesif lossless coding is desired.

4.6.2 Parametric curves

These techniques approximate contours (or contour segments) by parametricfunctions, usually polynomials. The functions can usually be represented byeither a set of coefficients or a set of control points [56,15]. The coefficients orthe coordinates of the control points are quantised and then entropy coded.Notice that when polynomials of degree one are used (with rectangular co-ordinates), the contours are approximated by polygons. The use of controlpoints [49] simplifies the quantisation process, since it is simpler to control

the errors introduced by quantising the coordinates of control points than theerrors introduced by quantising the coefficients of a polynomial. In the caseof mosaic partitions, the crossings and junctions of contours (as defined inSection 4.6.1) are frequently selected as control points [15,39].

One of the most important problems in parametric curve representation ofcontours is error control. Iterative techniques are commonly used which suc-cessively split the contour until a sufficiently small approximation error isobtained for each resulting segment [15,39]. The error is frequently calculatedfrom the geometrical distance between the parametric curves and the real con-

tours [39,21], but some researchers propose the use of the contrast across thecontours, assuming it is available [15].

When control points are used, their differences along the contour graph areusually entropy coded. These methods deal with junctions and crossings in avery similar way to chain coding techniques (see Section 4.6.1).

As part of the MPEG-4 core experiments on binary shape coding [59], para-metric curve techniques have also been evaluated [21,47,35,12] (some of thesetechniques stem from the earlier [29]). These techniques approximate the con-

17


18/25

tours with polygons or splines using a set of control points chosen again with asplit algorithm. The selection of which approximation method to use is eitherdone for each contour segment (between control points) or for each object.The proposed techniques also take advantage of time redundancy betweencontrol points along the successive partitions. One-dimensional transform cod-

ing methods, some of which multi-resolution, are proposed to compensate theresidual error between the parametric curve approximation and the actualcontours (see the next section).

4.6.3 Transform codes

The contours are represented first as parametric curves taking values in IR, ifthe contour (or contour segment) being coded can be represented by a polarfunction centred somewhere in the image, or in IR2 for other kinds of contour

(or contour segments). These parametric curves (still a lossless representation)are then coded using transform methods [10], in a one-dimensional equivalentof the transform coding used in image coding (e.g., DCT in JPEG, H.261,H.262, and H.263), i.e., the parametric curves are transformed and the result-ing coefficients are quantised and entropy coded.

Transform codes have also been under scrutiny in the MPEG-4 core exper-iments on binary shape coding [59], both for contour coding proper and forcoding the residual error after using parametric curve methods.

The first of the techniques considered in the core experiments considers a polar

representation of the contour [11]. The contour is represented by a functionof the polar angle, whose value is the distance between the centroid and thecontour in the direction defined by the angle. 11 The one-dimensional DCT ofthe distance function is calculated and then its coefficients are quantised andVLC coded. Some contours cannot be properly represented by a parametricfunction of the polar angle (since more than one contour point may occur fora single angle). Hence, parts of the contour may have to be left out. Theseparts are handled separately using chain codes (see Section 4.6.1). This tech-nique can also take advantage of the temporal redundancy between successivepartitions.

The other transform coding techniques tested on the MPEG-4 core experi-ments use either the one-dimensional DST or DCT to code not the contouritself, but the residual error (distance) between a parametric curve approx-imation and the actual contour [47,35,12]. In [12] the distance between theapproximated and actual contours is calculated either horizontally or verti-cally, depending on the slope of the line between the control points of the

11 The centroid is the point whose coordinates are the average of the coordinates ofall the pixels in the region enclosed by the contour.

18


19/25

contour segment being encoded. This substantially reduces the calculationsrelative to the usual orthogonal distance method. In [47] a multi-resolutionversion of the DST is used, so as to provide contour (object) scalability.

4.7 Evaluation of coding techniques

The evaluation of the various existing partition coding techniques is an im-portant issue, though out of the scope of this paper. As mentioned before,[7] contains a good review of some partition coding techniques together withtheir evaluation.

Recently, MPEG-4 has finished a round of core experiments on binary shapecoding [59], in which techniques for binary partition coding were evaluated

within a common framework. The core experiments usually take place betweenthe MPEG meetings. If the best of the evaluated techniques is also better thanthe technique used in the current version of the VM, the VM is updated. Hence,the VM is continuously evolving and serves as the reference against which allcoding techniques or tools are compared. The current VM, VM3 [48], usesmodified MMREAD codes [1] for binary shape coding.

5 Conclusion

A systematisation of the field of partition coding has been proposed in theform of a taxonomy tree. The tree is divided in two main levels: partition typeand partition representation.

The partition type level classifies the possible partition types that may have tobe coded. The partitions types are classified according to: i. space (2D or 3D),ii. sampling lattice, iii. superimposed graph structure, iv. number of classes inthe partition, and v. class connectivity.

The partition representation level classifies the possible representations for

each partition type. The representation of two-dimensional partitions is clas-sified according to: i. whether mosaic partitions should be broken into a setof binary partitions, ii. whether the partition should be represented by pixellabels or by contours, and iii. whether contours should be defined on the pix-els or on the edges. The representation of three-dimensional partitions, on theother hand, is classified according to whether three-dimensional partitions arerepresented by successive two-dimensional partitions, each corresponding to atime instant, whether prediction from the previous two-dimensional partitionin the sequence is used, and whether motion compensation is also used.

19


20/25

The proposed systematisation is believed to simplify the comparison betweenpartition coding techniques, by establishing clearly which type of partitions agiven partition coding technique addresses, and which partition representationthat technique is based on.

An overview of the partition coding techniques available for each partitiontype and the corresponding partition representations has been presented. Theoverview includes techniques evaluated under the MPEG-4 core experimentson binary shape coding [59]. The extension of the taxonomy tree with a sys-tematisation of partition coding techniques will be left for further study.

An issue of interest, which will also be left for further study, is the extension ofthe partition tree to include a branch for line drawings or contours that maybe open (which are not the dual of some partition). This is of interest sincecontour-based coding, or image reconstruction from edges [22,9,15], with itslong history, still seems to have a large potential in image coding.

Acknowledgement

The authors would like to acknowledge the valuable comments of Prof. Fer-nando Pereira and of the anonymous reviewers.

References

[1] Technical description for MPEG-4 first round of test. Technical DescriptionISO/IEC JTC1/SC29/WG11 MPEG95/0354, Toshiba, November 1995.

[2] S. M. Ali and R. E. Burge. A new algorithm for extracting the interior ofbounded regions based on chain coding. Computer Vision, Graphics, and ImageProcessing, 43:256264, 1988.

[3] Richard Bellman and K. L. Cooke. The Konigsberg bridges problemgeneralized. Journal of Mathematical Analysis and Applications, 25:17, 1969.

[4] Michael James Biggar and A. G. Constantinides. Thin line coding techniques.In Proceedings of the International Conference on Digital Signal Processing,Florence, Italy, September 1987.

[5] Frank Bossen and Touradj Ebrahimi. A simple and efficient binary shape codingtechnique based on bitmap representation. Technical Description ISO/IECJTC1/SC29/WG11 MPEG96/0964, EPFL, July 1996.

[6] Noel Brady. Adaptive arithmetic encoding for shape coding. TechnicalDescription ISO/IEC JTC1/SC29/WG11 MPEG96/0975, Teltec Ireland(Dublin City University), ACTS/MoMuSys, July 1996.

20


21/25

[7] Patrick Brigger, Antoni Gasull, Chuang Gu, Ferran Marques, FernandMeyer, and Christophe Oddou. Contour coding. CEC DeliverableR2053/UPC/GPS/DS/R/006/b1, EPFL, UPC, CMM, LEP, December 1993.

[8] Patrick Brigger and Murat Kunt. Morphological shape representation for very

low bit-rate video coding. Signal Processing: Image Communication, 7(46):297311, November 1995.

[9] Stefan Carlsson. Sketch based coding of grey level images. Signal Processing,15(1):5783, July 1988.

[10] R. Chellappa and R. Bagdazian. Fourier coding of image boundaries. IEEETransactions on Pattern Analysis and Machine Intelligence, 6(1):102105,January 1984.

[11] Yu-Shin Cho, Shi-Hwa Lee, Jae-Seob Shin, and Yang-Seock Seo. Results of coreexperiments on comparison of shape coding tools (S4). Technical Description

ISO/IEC JTC1/SC29/WG11 MPEG96/0717, Samsung AIT, March 1996.

[12] Yu-Shin Cho, Shi-Hwa Lee, Jae-Seob Shin, and Yang-Seock Seo. Shapecoding tool: Using polygonal approximation and reliable error residue samplingmethod. Technical Description ISO/IEC JTC1/SC29/WG11 MPEG96/0565,Samsung AIT, January 1996.

[13] Diogo Cortez. Classificacao e codificacao de contornos. Masters thesis,Instituto Superior Tecnico, Universidade Tecnica de Lisboa, Lisb oa, May 1995.

[14] Murray Eden and Michel Kocher. On the performance of a contour codingalgorithm in the context of image coding part I: Contour segment coding. Signal

Processing, 8(4):381386, July 1985.

[15] F. Eryurtlu, A. M. Kondoz, and B. G. Evans. Very low-bit-rate segmentation-based video coding using contour and texture prediction. IEE Proceedings Vision, Image, and Signal Processing, 142(5):253261, October 1995.

[16] Standardization of Group 3 facsimile apparatus for document transmission.Recommendation T.4, CCITT, 1980.

[17] Facsimile coding schemes and coding control functions for Group 4 facsimileapparatus. Recommendation T.6, CCITT, 1984.

[18] Herbert Freeman. On the encoding of arbitrary geometric configurations. IRETransactions on Electronic Computers, 10:260268, June 1961.

[19] Herbert Freeman. Computer processing of line-drawing images. ComputingSurveys, 6(1):5797, March 1974.

[20] Antoni Gasull, Ferran Marques, and Juan A. Garca. Lossy image contourcoding with multiple grid chain code. In Proceedings of the Workshop onImage Analysis and Synthesis in Image Coding (WIASIC94), page B4, Berlin,Germany, October 1994. Heirich-Hertz-Institute.

21


22/25

[21] Peter Gerken, Michael Wollborn, and Stefan Schultz. Polygon/splineapproximation of arbitrary image region shapes as proposal for MPEG-4 tool evaluation technical description. Technical Description ISO/IECJTC1/SC29/WG11 MPEG95/0360, RACE/MAVT, University of Hannover,Robert Bosch GmbH, and Deutsche Telekom AG, November 1995.

[22] Donald Norman Graham. Image transmission by two-dimensional contourcoding. Proceedings of the IEEE, 55(3):336346, March 1967.

[23] Chuang Gu and Murat Kunt. Contour simplification and motion compensatedcoding. Signal Processing: Image Communication, 7(46):279296, November1995.

[24] Draft revision of recommendation H.261: Video codec for audiovisual services atpx64 kbits/s, CCITT study group XV, TD 35, 1989. Signal Processing: ImageCommunication, 2(2):221239, August 1990.

[25] Video coding for low bitrate communication. Draft Recommendation H.263,ITU-T, December 1995.

[26] Ali Habibi. Hybrid coding of pictorial data. IEEE Transactions onCommunications, COM-22(5):614624, May 1974.

[27] Ali Habibi. Survey of adaptive image coding techniques. IEEE Transactionson Communications, COM-25(11):12751284, November 1977.

[28] Robert M. Haralick and Linda G. Shapiro. Computer and Robot Vision,volume I. Addison-Wesley Publishing Company, Inc., Reading, Massachusetts,

1992.

[29] Michael Hotter. Object-oriented analysis-synthesis coding based on movingtwo-dimensional objects. Signal Processing: Image Communication, 2(4):409428, December 1990.

[30] Thomas S. Huang. Coding of two-tone images. IEEE Transactions onCommunications, COM-25(11):14061424, November 1977.

[31] Roy Hunter and A. Harry Robinson. International digital facsimile codingstandards. Proceedings of the IEEE, 68(7):854867, July 1980.

[32] Progressive bi-level image compression. Recommendation T.82, ITU-T, March1993.

[33] Remi Jeannot, Demin Wang, and Veronique Haese-Coat. Binary imagerepresentation and coding by a double-recursive morphological algorithm.Signal Processing: Image Communication, 8(3):241266, April 1996.

[34] Toru Kaneko and Masashi Okudaira. Encoding of arbitrary curves based on thechain code representation. IEEE Transactions on Communications, 33(7):697707, July 1985.

22


23/25

[35] Jong-Lak Kim, Jong-Il Kim, Jong-Tae Lim, Jin-Hun Kim, Han-Soo Kim, Kyu-Hwan Chang, and Seong-Dae Kim. Daewoo proposal for object scalability.Technical Description ISO/IEC JTC1/SC29/WG11 MPEG96/0554, DaewooElectronics CO.LTD. and KAIST, January 1996.

[36] Weidong Kou. Digital Image Compression: Algorithms and Standards. KluwerAcademic Publishers, Boston, 1995.

[37] M. K. Kundu, B. B. Chaudhuri, and D. Dutta Majumder. A generalised digitalcontour coding scheme. Computer Vision, Graphics, and Image Processing,30:269278, 1985.

[38] Murat Kunt, Athanassios Ikonomopoulos, and Michael Kocher. Second-generation image-coding techniques. Proceedings of the IEEE, 73(4):549574,April 1985.

[39] Michael S. Landy and Yoav Cohen. Vectorgraph coding: Efficient coding of

line drawings. Computer Vision, Graphics, and Image Processing, 30:331344,1985.

[40] Yuh-Tay Liow. A contour tracing algorithm that preserves commom boundariesbetween regions. CVGIP: Image Understanding, 53(3):313321, May 1991.

[41] Petros A. Maragos and Ronald W. Schafer. Morphological skeletonrepresentation and coding of binary images. IEEE Transactions on Acoustics,Speech and Signal Processing, 34(5):12281244, October 1986.

[42] Ferran Marques, Josep Sauleda, and Antoni Gasull. Shape and location codingfor contour images. In Proceedings of the Picture Coding Symposium (PCS93),

page 18.6, Lausanne, Switzerland, March 1993.

[43] T. H. Morrin, II. Chain-link compression of arbitrary black-white images.Computer Graphics and Image Processing, 5:172189, 1976.

[44] Coding of moving pictures and associated audio for digital storage media up toabout 1,5 Mbit/s. International Standard 11172, ISO/IEC, 1993.

[45] Generic coding of moving pictures and associated audio information. DraftRecommendation H.262, Draft International Standard 13818, ITU-T, ISO/IEC,January 1995.

[46] Arun N. Netravali and John O. Limb. Picture coding: A review. Proceedingsof the IEEE, 68(3):366406, March 1980.

[47] Kevin OConnell and Damon Tull. Motorola MPEG-4 contour-coding tooltechnical description. Technical Description ISO/IEC JTC1/SC29/WG11MPEG95/0447, Motorola, November 1995.

[48] Ad hoc Group on MPEG-4 Video VM Editing. MPEG-4 video verificationmodel version 3.0. Document ISO/IEC JTC1/SC29/WG11 N1277, ISO, July1996.

23


24/25

[49] David W. Paglieroni and Anil K. Jain. A control point theory for boundaryrepresentation and matching. In Proceedings of the International Conferenceon Acoustics, Speech and Signal Processing (ICASSP85), pages 18511854,Tampa, Florida, 1985. IEEE, Signal Processing Society.

[50] Theo Pavlidis. Contour filling in raster graphics. Computer Graphics, 15(3):2936, July 1981.

[51] William B. Pennebaker, Joan L. Mitchell, Glen G. Langdon, Jr., and Ronald B.Arps. An overview of the basic principles of the q-coder adaptive binaryarithmetic coder. IBM Journal of Research and Development, 32(6):717726,November 1988.

[52] Fernando Pereira. MPEG4: a new challenge for the representation of audio-visual information. In Proceedings of the Picture Coding Symposium (PCS96),pages 716, Melbourne, Australia, March 1996.

[53] Rosalind W. Picard. Content access for image/video coding: the fourthcriterion. Technical Report 295, MIT Media Lab: Perceptual ComputingSection, 1994.

[54] Kenneth H. Rosen. Discrete Mathematics and its Applications. McGraw-Hill,Inc., New York, 1991.

[55] Azriel Rosenfeld. Digital straight line segments. IEEE Transactions onComputers, 23(12):12641269, December 1974.

[56] Philippe Saint-Marc, Hillel Rom, and Gerard Medioni. B-spline contourrepresentation and symmetry detection. IEEE Transactions on Pattern

Analysis and Machine Intelligence, 15(11):11911197, November 1993.

[57] Jean Serra. Image Analysis and Mathematical Morphology, volume I. AcademicPress, Inc., San Diego, California, 1993.

[58] Uri Shani. Filling regions in binary raster images: a graph-theoretical approach.Computer Graphics (SIGGRAPH80 Proceedings), 14(3):321327, July 1980.

[59] MPEG Video Subgroup. Core experiments on MPEG-4 video shape coding.Document ISO/IEC JTC1/SC29/WG11 N1326, ISO, July 1996.

[60] Shape coding with an optimized morphological region description. Contribute

COST211ter, Simulation Subgroup, SIM(92)23, U.C.L., February 1992.

[61] John Y. A. Wang and Edward H. Adelson. Representing moving images withlayers. IEEE Transactions on Image Processing, 3(5):625638, September 1994.

[62] Shuichi Watanabe, Hisashi Saiga, Hiroyuki Katata, and Hiroshi Kusao. Binaryshape coding based on hierarchical chain codes. Technical Description ISO/IECJTC1/SC29/WG11 MPEG96/1045, Sharp Corporation, July 1996.

[63] Terry A. Welch. A technique for high-performance data compression. IEEETransactions on Computers, pages 819, June 1984.

24


25/25

[64] C. A. Wuthrich and Peter Stucki. An algorithmic comparison between square-and hexagonal-based grids. CVGIP: Graphical Models and Image Processing,53(4):324339, July 1991.

[65] Jacob Ziv and Abraham Lempel. A universal algorithm for sequential datacompression. IEEE Transactions on Information Theory, IT-23(3):337343,May 1977.

25

Documents

1997 - Menezes de Sequeira, Cortez - Partitions A taxonomy of types and representations and an overview of coding techniques