A contour-based approach to binary shape coding using a multiple grid chain code

Signal Processing: Image Communication 15 (2000) 585}599

A contour-based approach to binary shape codingusing a multiple grid chain code

Paulo Nunes!,*, Ferran MarqueH s", Fernando Pereira!, Antoni Gasull"

!Instituto Superior Te& cnico/Instituto de Telecomunicac7 oJ es, Av. Rovisco Pais } 1049-001 Lisboa, Portugal"Universitat Polite%cnica de Catalunya, UPC, Campus Nord, Modul D5 } 08034 Barcelona, Spain

Abstract

This paper presents a contour-based approach to e$ciently code binary shape information in the context ofobject-based video coding. This approach meets some of the most important requirements identi"ed for the MPEG-4standard, notably e$cient coding and low delay. The proposed methods support both object-based lossless andquasi-lossless coding modes. For the cases where low delay is a primary requirement, a macroblock-based coding mode isproposed which can take advantage of inter-frame coding to improve the coding e$ciency. The approach presented hererelies on a grid di!erent from that used for the pixels to represent the shape } the hexagonal grid } which simpli"es thetask of contour coding. Using this grid, an appraoch based on a di!erential chain code (DCC) is proposed for the losslessmode while, for the quasi-lossless case, an approach based on the multiple grid chain code (MGCC) principle is proposed.The MGCC combines both contour simpli"cation and contour prediction to reduce the number of bits needed to codethe shapes. Results for alpha plane coding of MPEG-4 video test sequences are presented in order to illustrate theperformance of the several modes of operation, and a comparison is made with the shape-coding tool chosen byMPEG-4. ( 2000 Elsevier Science B.V. All rights reserved.

Keywords: Shape coding; Contour coding; Hexagonal grid; Multiple grid chain code; Motion estimation/compensation; Object-basedvideo coding

1. Introduction

One of the main features of the MPEG-4 stan-dard is its capability to describe scenes in terms oftheir compositing entities and their evolution intime [5]. In the MPEG-4 Video framework, eachone of these separate entities is called a video object

*Corresponding author. Tel: #351-1-841-84-60; fax: #351-1-841-84-72.

E-mail addresses: [email protected] (P. Nunes), [email protected] (F. Pereira), [email protected] (F. MarqueH s),[email protected] (A. Gasull)

(VO). VOs can be classi"ed into three types withrespect to its alpha plane information: opaque, withconstant transparency, and with variable transpar-ency [16]. Therefore, the separate transmission ofa VO requires the encoding of its alpha plane,notably the support of the alpha plane and thetransparency information. While for binary shapesonly the alpha plane support has to be coded,grey-level shapes require, in addition, the coding oftransparency information. In MPEG-4, transpar-ency information is encoded either with a singletransparency value or with techniques very similarto those used for luminance information [14]. The

IMAGE 487 Elizabeth Balaguru CSR

0923-5965/00/$ - see front matter ( 2000 Elsevier Science B.V. All rights reserved.PII: S 0 9 2 3 - 5 9 6 5 ( 9 9 ) 0 0 0 4 1 - 7

e$cient encoding of the shape information has re-quired the development of very sophisticated tech-niques, during a process where many technicalproposals were carefully evaluated [12].

During the MPEG-4 standardisation process,two main families of binary shape coding ap-proaches have been proposed: bitmap-based andcontour-based techniques. Bitmap-based tech-niques directly encode the shape pixels as belong-ing to the object or to the background. Two mainmethods have been presented in this area: themodi"ed modi"ed Reed [19] and context-basedarithmetic encoding [2]. Contour-based techniquesrepresent the shape by its boundary. Three di!erentapproaches have been proposed relying on thecontour representation: vertex-based [4], baseline-based [6], and chain-code-based [15] shapeencoders.

Due to the MPEG-4 requirements for shape cod-ing [11], various techniques have developed loss-less as well as lossy modes. However, during thecomparison process, the need for quasi-losslesstechniques arose. Such techniques may introducelosses in the encoding process but these should notbe noticeable for the user.

This paper deals with a contour-based shapecoding technique that can work in two modes:lossless and quasi-lossless. The paper presents indetail the method that was proposed by the authorsin [15,8], extends the discussion of the more com-plex cases that may be found in the encoder, anddescribes the decoder. The technique relies on therepresentation of shape information on a grid dif-ferent from that used for the luminance informa-tion. The shape grid is called the hexagonal grid[9,18] and simpli"es the contour coding task. Us-ing this grid, an approach based on a di!erentialchain code [3] is proposed for the lossless mode.The quasi-lossless case is handled by an approachbased on the multiple grid chain code (MGCC)principle [10]. The MGCC combines both contoursimpli"cation and contour prediction to reduce thenumber of bits necessary for coding the shapes.

The paper is structured as follows. After theintroduction, a brief review of the MPEG-4 binaryshape coding technique is presented in Section 2.Section 3 discusses the hexagonal grid concept.While Section 4 describes the basis of the lossless

shape-coding technique, Section 5 presents thebasis of the quasi-lossless shape-coding approach.Section 6 deals with the actual implementation ofthe proposed shape coding for both the intra-frameand inter-frame modes. Finally, in Section 7 simu-lation results are presented and conclusions aregiven.

2. The MPEG-4 binary shape coding approach

Due to its relevance in this paper, this sectiondescribes the bitmap-based binary shape codingtechnique, called context-based arithmetic encod-ing (CAE), adopted by MPEG for the MPEG-4standard [14].

To code the binary shape of a video object plane(VOP) with CAE, the alpha plane of the VOP isenclosed in the tightest rectangle containing all theshape pixels } the bounding box (BB) } with dimen-sions multiple of the macroblock (MB) dimensions,i.e. 16]16 pixels (Fig. 1). This BB is built minimis-ing the number of MBs with shape pixels. This wayof creating the BB is especially useful in the case ofperforming both shape and texture coding togethersince it decreases the number of MBs that have tobe processed and transmitted [14].

Each block of 16]16 pixels within this BB iscalled a binary alpha block (BAB) and is encoded ina raster scanning order. There are three di!erentcategories of BABs as illustrated in Fig. 1:

1. TRANSPARENT (all BAB alpha values are 0),2. OPAQUE (all BAB alpha values are 255),3. BOUNDARY (BAB alpha values are either 0 or

255).

While TRANSPARENT and OPAQUE BABs are en-coded by a single word describing the BAB type,BOUNDARY blocks may need further data, besidesthe BAB type, to completely encode all the shapepixels, notably motion information and contextdata.

2.1. Binary alpha block-type encoding

The methods used to encode the BAB are in-dicated by the BAB type. Each BAB can be

IMAGE 487

586 P. Nunes et al. / Signal Processing: Image Communication 15 (2000) 585}599

Fig. 1. Alpha plane and corresponding bounding box.

Table 1List of BAB types for CAE

BAB type Semantic VOP Type

0 MVDS""0 && NO UPDATE P-, B- VOPs1 MVDS!"0 && NO UPDATE P-, B- VOPs2 TRANSPARENT All VOP Types3 OPAQUE All VOP Types4 INTRA CAE All VOP Types5 MVDS""0 && INTER CAE P-, B- VOPs6 MVDS !"0 && INTER CAE P-, B- VOPs

encoded with one of the seven di!erent types listedin Table 1, where MVDs is the motion vector di!er-ence for shape, i.e. the di!erence between the actualmotion vector and the motion vector predictor forthe current BAB, and NO UPDATE means that thecurrent BAB can be completely reconstructed bymotion compensation without the need for anyfurther information.

2.1.1. Intra VOPsFor intra coded VOPs, the number of BAB types

allowed is restricted to three out of the seven di!er-ent values available } (2}4) } encoded using a VLCtable indexed by a BAB type context C. For eachBAB, the context is computed based on the valuesof the already encoded neighbouring BAB types,and the BAB type of the BAB being encoded.Fig. 2 shows the template used to compute thecontext C using Eq. (1), where c

kis the type of the

Fig. 2. Template for computing the BAB type context forI-VOPs.

kth BAB in the template.

C"

3+k/0

(ck!2)3k. (1)

If the computation of the context C involvesBABs outside the current VOP BB, the correspond-ing BAB types are assumed to be transparent.

2.1.2. Inter VOPsIn the case of inter coded VOPs (P- and B-VOPs)

all BAB types listed in Table 1 are allowed and theBAB-type information is encoded using a di!erentVLC table indexed by the co-located BAB type inthe selected reference VOP and the BAB type of theBAB being encoded.

If the sizes of the current and reference VOPs aredi!erent, some BABs in the VOP being encodedmay not have a co-located equivalent in the refer-ence VOP. In this case the BAB-type matrix of thereference VOP is manipulated in order to matchthe size of the current VOP [14].

2.2. Binary alpha block motion compensation

Besides the type information, motion informa-tion can also be used to encode the BAB data. Inthis case the BAB is encoded in inter mode and onemotion vector (MVs } motion vector for shape) foreach BAB is used. Similar to what happens intexture coding, in the MPEG-4 shape coding tech-nique the shape motion vectors are encoded di!er-entially, which means that for each motion vectora predictor is built from neighbouring shapemotion vectors already encoded and the di!er-ence between this predictor and the actual MVs } theMVDs } is VLC encoded. When no valid shape

IMAGE 487

P. Nunes et al. / Signal Processing: Image Communication 15 (2000) 585}599 587

Fig. 3. CAE templates for context computation for: (a) intra coded BABs; (b) inter coded BABs, where the pixel being encoded is markedwith &?'.

Table 2List of BAB sizes for CAE

CR BAB size in pixels

1 16]162 8]84 4]4

motion vectors are available for prediction texturemotion vectors can be used.

2.3. Binary alpha block size conversion

The compression ratio of the CAE technique canbe increased by allowing the shape to be encoded ina lossy mode. This can be achieved by encodinga down-sampled version of the BAB. In this casethe BAB can be down-sampled by factor of 2 or4 before context encoding and up-sampled by thesame factor after context decoding. The relationbetween the original and the down-sampled BABsizes is called conversion ratio (CR). Table 2 pres-ents the di!erent BAB sizes allowed, depending onthe value of the CR parameter. The error intro-duced by this down-sampling/up-sampling opera-tion is responsible for the lossy encoding and iscalled the conversion error.

Encoding the shape using a high down-samplingfactor provides a better compression ratio; howeverthe reconstructed shape after up-sampling may ex-hibit annoying artifacts. In order to minimise theseartifacts an adaptive non-linear up-sampling "lterhas been adopted by MPEG [14].

2.4. Context-based arithmetic encoding

For BOUNDARY blocks which are not exclusivelyencoded by motion information, context-basedarithmetic encoding is applied. For these blocks theBAB data is encoded by a single binary arithmeticcodeword (BAC). In this case, each binary shapepixel in the BAB (or in its down-sampled version) isencoded in a raster scanning order (until all pixelsare encoded). The process of encoding a givenshape pixel using CAE is the following:

1. Compute a shape context number based ona template.

2. Use this context number to index a probabilitytable.

3. Use the indexed probability to drive a binaryarithmetic encoder.

Since the encoding performance depends on thescanning order of the BAB, the encoder is allowedto transpose (matrix transposition) the BAB beforeencoding. This operation is signalled to the decoderthrough a 1 bit #ag called scanning type.

2.4.1. Context computationThe core of the CAE technique is the exploita-

tion of the spatio-temporal redundancy in themotion-compensated BAB data using causal con-texts to predict the shape pixels according to pre-de"ned templates. Fig. 3 shows the di!erenttemplates used in CAE encoding.

For intra coded BABs no motion compensationis used; thus only the spatial redundancy is ex-ploited by computing a 10 bit context for each pixel

IMAGE 487


Fig. 4. Shape and corresponding pixel-based contour representations: (a) outer contour; (b) inner contour.

of the BAB using Eq. (2) with N"9, according tothe template of Fig. 3(a), where c

k"0 for transpar-

ent pixels and ck"1 for opaque pixels.

C"

N+k/0

ck2k. (2)

Temporal redundancy can additionally be ex-ploited for inter coded BABs by computing a 9 bitcontext using also pixels from the motion compen-sated BAB (besides the pixels from the BAB beingencoded). In this case the context is computed usingEq. (2) with N"8, according to the template ofFig. 3b. As in the intra case, c

k"0 for transparent

pixels and ck"1 for opaque pixels.

When building contexts some special cases mayappear, notably some pixels may be out of thecurrent VOP BB or the template may cover pixelsfrom BABs which are not known at decoding time.These cases have special pre-de"ned rules whichare known both by the encoder and the decoder[14].

3. The basics of the contour-based approach

3.1. The hexagonal grid

As previously stated, chain-code-based shapecoding techniques rely on a contour representationto code the shape information. In order to allow thecoding of arbitrary binary shapes, an adequate rep-

resentation of the shape boundary is necessary.This representation must not only describe theshape unambiguously, but also allow decrease inthe redundancy of the encoded representation asmuch as possible.

A common approach for the binary shape repres-entation is to describe the shape by those pixels ofthe background (pixels in the BB outside the shape)that have at least one 4-connected neighbour be-longing to the shape (the foreground), de"ning anouter pixel-based contour (pixels marked witha grey level in Fig. 4(a)). In a similar way, the shapecan be described by the interior pixels that have atleast one 4-connected neighbour belonging to thebackground, de"ning an inner pixel-based contour(pixels marked with a grey level in Fig. 4(b)).Although these pixel-based descriptions allow anunambiguous representation of binary shapes, theyrequire some contour elements to be coded morethan once and the de"nition of additional chaincode symbols [18].

To simplify the process of representing the shapecontour and to avoid the above-mentioned draw-backs, a di!erent grid has been adopted based onthe edge sites: the hexagonal grid. Assuming thatthe pixels of the shape are located in a rectangulargrid (squares in Fig. 5(a)), the (hexagonal) edge gridcorresponds to all the sites located between twoadjacent pixels, extended to the image borders (seg-ments in Fig. 5(a)). It gets its name precisely fromthe fact that each edge site has six neighbour edgesites. Edge elements are active if they are between

IMAGE 487


Fig. 5. Shape and corresponding edge-based contour representation: (a) pixel grid and hexagonal edge grid; (b) edge-based contourrepresentation.

two pixels with di!erent labels, one in the back-ground and another in the foreground (Fig. 5(b)).This edge-based representation allows an easy andunambiguous conversion from the label image tothe contour image and vice versa, providing a verynatural framework to handle all shape-codingtechniques based on contour descriptions (e.g.chain-code-based techniques and polygonalapproximations).

3.2. Direct and diwerential chain code

In the context of chain-code-based techniques,the contour-based representation of a shape is builtby tracking the contour and representing eachmovement by a chain code symbol. The set ofpossible movements depends on the type of con-tour representation. In the case of the pixel-based

Fig. 6. Pixel-based contour representation using (a) 8-connectedchain code; (b) 4-connected chain code.

Fig. 7. Edge-based contour representation: example of (a) directand (b) di!erential chain code.

contour representations, eight movements areneeded, if an 8-connected chain code is considered,or four in the case of the 4-connected chain code(Figs. 6(a) and (b), respectively).

In the case of the edge-based contour representa-tion, from a given edge site only six possible sitescan be reached when tracking a contour and, there-fore, there are only six possible movements (Fig.7(a)). In this case, however, since the edge-basedrepresentation guarantees that no contour elementhas to be coded twice, it is possible to use a di!eren-tial chain code (DCC) (Fig. 7(b)). When using a dif-ferential description of the contour (as in the DCCcase), the number of possible movements reduces to3: MTURN RIGHT, TURN LEFT, STRAIGHT AHEADN(Fig. 7(b)). In order to use the di!erential descrip-tion, the coding process has to keep in memory theprevious movement. This can be done by trackingthe direction of the previous movement: MNORTH,SOUTH, EAST, WESTN.

IMAGE 487


4. Lossless shape coding: di4erential chain code(DCC)

This section applies the contour-based codingapproach described above to the lossless coding ofshape. This lossless coding technique uses the hex-agonal grid to represent the contour (edge-basedrepresentation), as described in the previous sec-tion, and a di!erential chain code representation tocode it.

As in the MPEG-4 technique, to code the binaryshape, the shape pixels are enclosed in a BB withdimensions multiple of the MB dimensions.

4.1. Selection and coding of the initial contourelements

The contour element where the shape encodingstarts is the "rst active edge site found when scann-ing the contour image from left to right and top tobottom. Therefore, the "rst active site is alwaysa horizontal element and thus the initial directionfor the encoding process can be set to EAST. Notethat the decoder shares this information and, there-fore, it does not have to be sent. To independentlycode each 4-connected region in the alpha plane,a di!erent initial contour element is used. There-fore, after coding the contour of each connectedregion, the encoder looks for a new initial elementuntil all contour elements have been coded. Sincechain coding can independently code each 4-con-nected region of a possible multi-region VO, itenables an easy rede"nition of VOs: a VO contain-ing two separate regions can easily be converted into two independent VOs.

The position of the "rst initial element is coded inan absolute way. Given the way the shape BB isconstructed [14], the "rst initial element in thecontour image will be always on the "rst row ofMBs. Thus, it is coded using log

2<OP}width bits

for the horizontal co-ordinate and 4 bits (log216)

for the vertical co-ordinate.The coding of the subsequent initial elements is

done in a way relative to the position of the pre-vious one. To do so, all possible candidates toinitial elements (i.e. all the horizontal contour ele-ments, except those in the last row) are indexedfollowing a top-to-bottom and left-to-right scan.

Each initial element is then coded using its di!eren-tial index (i.e. current}index!previous}index) withlog

2(BB}pixels!previous}index) bits, where pre-

vious}index and current}index are the indices of theprevious and current initial elements, respectively,and BB}pixels coincide with the number of possiblecandidates to initial elements (Fig. 5(a)). Therefore,the length of the word used for coding each initialelement is adapted with respect to the position ofthe previous initial element. At the decoder side, thecurrent index of the initial element is computedadding to the decoded value, the value of the pre-viously decoded index for the current VOP.

4.2. Encoding the contour elements:building the DCC

After encoding the initial element of a given4-connected region, the encoder searches for thefollowing non-coded contour element (active edgesite), testing each direction in the following order:

1. RIGHT,2. STRAIGHT AHEAD,3. LEFT.

Priority is given to the movement RIGHT so that thecontour tracking algorithm "rst completes every4-connected region.

For each contour element, the encoder outputsthe corresponding DCC symbol from the set ofpossible symbols, i.e. MRIGHT, STRAIGHT AHEAD,LEFTN. These symbols are then gathered intogroups of n symbols and Hu!man encoded. Whenthe encoder reaches the end of the current contourwith an incomplete group of symbols to code, itchooses from the corresponding Hu!man table theshortest word, in terms of number of bits, havingthis group of symbols as pre"x. The contour ofa 4-connected region is "nished when the contourtracking reaches an already coded contour element,i.e. in the case of the DCC, the initial contourelement of the region being coded. After encodingthe last element of the contour, the encoder has tosignal to the decoder whether there are more con-tours left to code or not. This is done by puttinga one bit #ag in the bitstream } the LAST}CONTOUR

#ag } following the last symbol. When set to &0', this#ag indicates that there are more contours

IMAGE 487


(associated to non-connected regions) to code, andthat, when set to &1' all contours for this alpha planehave been coded. The encoding of the alpha planestops when a chain code has been built for each4-connected region of the shape and thus all con-tour elements have been coded.

4.3. Decoding the DCC

The decoding of the DCC is the inverse of theencoding process. After decoding the initial elementof a contour, the decoder looks for the next DCCsymbol and tracks the contour according to thesymbols being decoded, until the initial element isreached. When the initial element is reached, thedecoder reads the LAST}CONTOUR #ag (1 bit), inorder to know whether there are more contours todecode for this alpha plane. If a LAST}CONTOUR

#ag"&1' is detected, the contour decoding isstopped.

After having completely decoded the contourimage for the current alpha plane, the shape de-coder has to construct the label image, this meansthe binary shape. This is done by jointly scanning,just once, the contour and label images from top todown and left to right, propagating the labels basedon the contour information.

5. Quasi-lossless shape coding:multiple grid chain code (MGCC)

The technique here proposed for quasi-losslessshape coding is an extension of the di!erentialchain coding, using the edge-based contour repres-entation, which can encode more than one contourelement with a single MGCC code. The main di!er-ence between DCC and MGCC lies in the fact thatin DCC all codes represent a movement betweenneighbour contour elements while MGCC maylink contour elements that are not neighbours. Fur-thermore, the distance between two contour ele-ments linked by a MGCC code depends on thecode. The method tries to increase the probabilityof large movements and, thus, to reduce the num-ber of codes necessary to represent the contour.Large movements result in approximations of thereal contour and lead to losses in the decoding

Fig. 8. De"nition of MGCC cells: (a) clockwise and (b) counter-clockwise.

process. Nevertheless, such losses are very smallbeing, at most, of one pixel per MGCC code.

5.1. MGCC basics

5.1.1. The MGCC cellsMGCC uses as basic cell a 3]3 pixels area of the

hexagonal grid (Fig. 8) and a single movement isused to go through the cell. In Fig. 8, starting fromthe input contour element marked with 0, sevenpossible output contour elements can be reachedM1, 2, 7N, going through the cell. Each one of theseoutput contour elements represents a di!erentmovement and thus a di!erent contour con"gura-tion. For compression e$ciency reasons, there aretwo types of cells: clockwise and counterclockwise(Figs. 8(a) and (b), respectively). Its choice dependson the evolution of the contour and maximises thenumber of contour elements coded per cell.

5.1.2. The MGCC lossesA MGCC code does not always represent

a unique contour con"guration. As illustrated inFig. 9, a movement (e.g. from 0 to 3) may corres-pond to two di!erent contour con"gurations.Therefore, the decoding algorithm has to decidebetween two possible contour con"gurations. Theambiguity in this decision introduces the codinglosses. Note, however, that the only possible error isthe labelling of the central pixel in the cell. That is,the maximum error is the wrong labelling of iso-lated pixels in the borders of the regions and neverin their interior.

IMAGE 487


Fig. 9. Ambiguities introduced by MGCC.

5.1.3. Dynamic selection of the MGCC cellsAs can be seen in Fig. 8, some of the MGCC

symbols represent larger movements than others.This is the case for symbols 3 and 4 with respect tosymbols 1 and 7. As a consequence, the more oftenthese symbols are used, the higher the performanceof the coding technique.

In order to use as many large movements aspossible, the MGCC technique dynamically selectsthe type of cell between clockwise and counter-clockwise. The selection of the next MGCC celltype relies on the prediction of the contour traject-ory, assuming boundary smoothness and usinga rule known by both the encoder and the decoder.The following selection rules try to increase thefrequency of the symbols corresponding to largermovements, assuming that the contour trajectorydoes not change from cell to cell:

Output symbol Previous cell type Next cell type

1,2,3 Clockwise CounterclockwiseCounterclockwise Clockwise

4,5,6,7 Clockwise ClockwiseCounterclockwise Counterclockwise

Fig. 10 shows the next cell types for all possiblecontour con"gurations of a counterclockwise cell.Note that the selection of the type of cell automati-cally de"nes the position of the cell in the imagegrid. Therefore, two consecutive cells do not haveto be aligned as shown in some cases of Fig. 10.

Fig. 11 illustrates the improvement achievedwhen selecting dynamically the next cell showing

two possible solutions for encoding the same con-tour segment. The "rst solution uses cell adapta-tion, and as a consequence the position of the cellused for the second movement is modi"ed, allowingthe second part of the contour to be coded usingonly movement 4. The second solution does notperform cell adaptation and an extra symbol isneeded to code the same contour.

5.2. Selection and encoding of the initial contourelements

For the selection and encoding of the initialcontour elements, the same algorithm as that forthe lossless shape coding mode is applied, i.e. theshape is enclosed in a BB and the contour elementwhere the contour encoding starts is the "rst activeedge site found when scanning the image from leftto right and top to bottom.

5.3. Encoding the contour elements:building the MGCC

The initial contour element selected is the inputelement in an MGCC cell (site marked 0 in Fig. 8).As for the lossless case, after encoding the initialelement, the encoder searches for the next non-coded contour elements, testing each direction inthe following order:

1. RIGHT,2. STRAIGHT AHEAD,3. LEFT.

Similar to the lossless case, priority is given to themovement RIGHT so that the contour tracking al-gorithm "rst completes every 4-connected region.After the encoder "nds an output element of theMGCC cell, it generates the corresponding symbolfrom the set of possible MGCC symbols, i.e. M1, 2,7N. Symbols will be grouped and Hu!man encoded.The output site becomes the input site of the follow-ing cell. The contour of a 4-connected region iscompleted when the contour tracking reaches analready coded contour element.

In the MGCC case, since the type and position ofthe cells is changed dynamically, it is possible thatthe last movement in a contour does not coincidewith the initial contour element, but with an

IMAGE 487


Fig. 10. Selection of the next MGCC cell for a counterclockwise cell.

Fig. 11. Example of MGCC cell adaptation.

element inside the "rst cell of the contour beingcoded. In order that the decoder can detect the endof the contour, it needs to know without ambi-guities all the contour elements of the "rst cell. Thisis done by sending one bit which allows to decodethe "rst MGCC cell without ambiguities after theinitial MGCC symbol. This bit tells the decoder

whether the second contour element is in theSTRAIGHT AHEAD direction or not, which is su$-cient to inform the decoder which precise contourshall draw for the "rst cell.

Similar to the lossless case, after encoding eachcontour, the encoder tells the decoder whetherthere are more contours to code by means of the

IMAGE 487


LAST}CONTOUR #ag bit. The encoding of the VOPalpha plane stops after a multiple grid chain codehas been built for each 4-connected region in theshape and all contours have been coded.

5.4. Decoding the MGCC

Although MGCC decoding is again the inverseof the encoding process, for each decoded MGCCsymbol, only the input and output contour ele-ments of the corresponding cell are known for sure.The interior elements are not precisely known andthus need to be de"ned by the decoder. For thesecases, the decoder makes a choice, labelling thecorresponding sites with a temporary `uncertaintylabela. The input and output cell elements are label-led with a de"nitive label. While temporary labelsmay be changed if topological inconsistencies arefound, de"nitive labels are never changed. Thisway, the input and output contour elements, whoselocations are "xed by the encoding process, areused as anchor sites to de"ne the interior ones.When the information provided by the input andoutput contour elements is not su$cient to allowa decision to be taken, the con"guration producingthe smoothest contour is chosen.

The decoding of each contour is similar to theDCC case. After decoding the initial contour ele-ment, the decoder decodes the "rst MGCC cell andthe #ag indicating which interior contour con"g-uration has to be chosen for the "rst cell. For thesubsequent contour element, it decodes the nextMGCC symbol and tracks the contour accordingto the symbols being decoded and the decodingrules stated above. This process continues untilreaching the initial cell. Then the decoder reads theLAST}CONTOUR #ag (1 bit) in order to knowwhether there are more contours to decode for thisalpha plane or not. If a LAST}CONTOUR #ag"&1' isdetected, contour decoding is stopped.

As in the DCC case, after having completelydecoded the contour image for the current alphaplane, the shape decoder has to construct the labelimage. This is done by jointly scanning the contourand label images, from top to down and left toright. However, since in the MGCC case the shapeis coded with losses, a special processing in the

conversion from contours to labels is neces-sary in order to detect and correct possibleinvalid contour con"gurations caused by `lessfortunatea decisions in the assignment of the tem-porary labels.

6. Object versus macroblock-based coding modes

In the previous sections, the chain-coding tech-niques for lossless and quasi-lossless intra binaryshape coding were presented. Depending onwhether the shape is coded simultaneously or not,the object-based or MB-based coding modes areused.

For non-real-time applications, where low delayis not a primary requirement, an object-based cod-ing scheme is adopted where the shape is coded(and multiplexed) simultaneously } object-basedmode. Thus, a video decoder using this mode canstart decoding motion and texture informationonly after the whole shape has been decoded. AnMB-based coding scheme is also proposed wherethe shape is divided into MBs of 16]16 pixels thatare then independently encoded in a raster scann-ing way, thus allowing the video decoder to decodethe motion and texture of each MB immediatelyafter decoding the corresponding shape }MB-basedmode. This coding mode is typically less e$cientthan the object-based coding mode since eachshape MB is independently coded without usingany information from the neighbouring source(if no motion is used). While for the object-basedcoding mode only intra coding has been con-sidered, for the MB-based coding mode both intraand inter coding modes can be used. For the intercoding mode, and in order to improve coding e$-ciency, motion compensation is used, without cod-ing the residues, and with a varying thresholddepending on the acceptable shape degradation.Another approach to object-based coding mode, inthe context of partition coding, using temporalprediction, is presented in [7].

While for the object-based coding mode, theshape is transmitted whole at once, for theMB-based coding mode, the shape information issent MB by MB. For coding purposes, eachshape MB is classi"ed into three categories:

IMAGE 487


1. TRANSPARENT (all MB alpha values are 0),2. OPAQUE (all MB alpha values are 255),3. BOUNDARY (MB alpha values are either 0 or

255).

For the "rst two types, only an MB-type word issent, while for the BOUNDARY MBs motion estima-tion is performed. Based on the sum of absolutedi!erences (SAD) between the current MB and itsbest prediction, an inter/intra coding mode deci-sion is made. Note that, for binary shapes, the SADis just the number of alpha values that di!erbetween the two MBs } current and prediction. Inconclusion, a shape MB can be coded in one of thefollowing four modes:

1. TRANSPARENT: no need to send any informationbesides the MB type } MB}TRANSPARENT.

2. OPAQUE: no need to send any information be-sides the MB type } MB}OPAQUE

3. INTRA: the DCC or MGCC schemes are inde-pendently applied to each shape MB.

4. INTER: a motion vector with two components issent (one for the vertical co-ordinate and an-other for the horizontal co-ordinate); no residuesare coded.

6.1. Intra/inter MB coding mode decision

An MB is coded in intra or inter mode dependingon the SAD value for its best prediction and on thevalue of a control parameter, called alpha threshold(a

5)3). This parameter indicates the shape degrada-

tion accepted when motion compensation is per-formed since no residues are coded. The rule forchoosing between inter and intra modes is the fol-lowing:

if (SAD"%45 13%$*#5*0/

)a5)3

)

mode"inter

else

mode"intra

a5)3"0 corresponds to lossless coding while in-

creasing a5)3

allows to achieve higher compressionrates at the expense of higher levels of controlleddegradation. This also means that inter coding can

be used both for the lossless (DCC and a5)3"0) and

lossy (MGCC) modes. Note that in the case ofselecting the lossy mode, the types of losses that areintroduced in the inter mode di!er from that of theintra mode. In the inter mode, losses can be largerthan one pixel and may appear in the interior of theshape. This is also related to the control of theintra/inter mode by means of a global measure, i.e.the SAD value.

6.2. Shape MB motion estimation/compensation

6.2.1. MB motion estimationMotion estimation is performed at the MB level.

The comparisons are made between the currentalpha MB and the displaced alpha MB in theprevious reconstructed alpha plane. A full searcharound the original MB position is used witha maximum displacement of [!16, #15] integerpixel positions. The (x, y) displacement given thesmallest SAD is chosen to give the alpha MBmotion vector.

6.2.2. Diwerential coding of alpha motion vectorsWhen using the inter mode coding, motion vec-

tors have to be transmitted. The motion vectorcomponents (horizontal and vertical) are di!eren-tially coded by using a spatial neighbourhood ofthree motion vectors already transmitted (Fig. 12).In the case where the MB is in the border of thecurrent alpha plane, the following rules apply de-pending on the number of candidate predictorsoutside the VOP alpha plane:

1. If the MB of one and only one candidate pre-dictor is outside the VOP, the correspondingmotion vector candidate predictor is set to zero.

2. If the MBs of two and only two candidate pre-dictors are outside the VOP, the correspondingmotion vectors candidate predictors are set tothe third candidate predictor.

3. If the MBs of all candidate predictors are out-side the VOP, the corresponding motion vectorscandidate predictors are set to zero.

The alpha motion vector coding is performed sep-arately on the horizontal and vertical components.For each component, the median value of the three

IMAGE 487


Fig. 12. MB neighbourhood for motion vector prediction.

candidates for the same component is computed:

Px"Median(aMV1

x, aMV2

x, aMV3

x),

Py"Median(aMV1

y, aMV2

y, aMV3

y),

where aMVNx

is the &x' co-ordinate for the alphamotion vector of candidate predictor N.

The motion vector di!erences, given by the fol-lowing expressions for each component, are thenVLC encoded:

aMVDx"aMV

x!P

x,

aMVDy"aMV

y!P

y,

where aM<Dxis the &x' co-ordinate for the di!eren-

tial alpha motion vector.

6.2.3. MB motion compensationThe decoder computes the current motion vector

in the inverse way using the motion vectors fromthe previous decoded MBs and adding the currentdecoded motion vector di!erence as follows:

aMVx"aMVD

x#P

x,

aMVy"aMVD

y#P

y,

where Px

and Pyare computed in the same way as

in the encoder. After obtaining the current motionvector, the decoder gets the corresponding alphaMB in the previous reconstructed alpha plane.

7. Results and conclusions

To complete the study of the shape codingmethods proposed here, coding results for well-known video test sequences are presented in this

Table 3Bits/alpha plane } MB-based coding mode (intra and intermodes)

DCC MGCC

a5)3"0 a

5)3"4 a

5)3"8 a

5)3"4 a

5)3"8

Kids 4113 2791 1954 2831 1948Logo 2401 1870 1550 2528 1611Cyclamen 7396 4309 2733 5016 3035Robot 5325 4364 3599 4210 3482Girl 945 480 305 520 309Stefan 766 682 570 656 559

Table 4Bits/alpha plane } MB-based coding mode (only intra mode)

DCC MGCC

Kids 4387 4329Logo 5595 5154Cyclamen 9437 9508Robot 5591 5351Girl 1215 1243Stefan 776 747

Table 5Bits/alpha plane } object-based coding mode (only intra mode)

DCC MGCC


section. Tables 3}5 present the average number ofbits per alpha plane for the following cases:

f Two of the objects of the MPEG-4 sequence`Childrena, notably the `Kidsa and the `Logoa,in SIF format (100 frames),

f One object of the sequence `Cyclamena in SIFformat (100 frames),

f One object of the MPEG-4 sequence `TotalDestructiona, called `Robota, in SIF format (44frames),

f One object of the MPEG-4 sequence `Weathera,called `Girla, in QCIF format (100 frames),

IMAGE 487


Fig. 13. Example VOPs for `Kidsa, `Logoa, `Cyclamena, `Robota, `Girla and `Stefana.

f One object of the MPEG-4 sequence `Stefana, inQCIF format (100 frames).

Fig. 13 shows one frame for each of these objectsand the corresponding alpha planes.

Tables 3 and 4 show that the use of the intercoding mode for lossless coding (i.e. DCC witha5)3"0 versus only DCC) with the MB-based

mode, allows a signi"cant reduction in the numberof bits used. This is more visible for `Logoa due tothe pure translational motion of this object butall tested objects use less bits depending on theamount and type of motion that they have. More-over, Tables 4 and 5 show the signi"cant reductionin the number of bits achieved by the object-basedmode relative to the MB-based mode, due toa much more e$cient use of spatial redundancy.

In order to allow some comparison with theMPEG-4 standard shape coding solution (CAE),Table 6 presents the results for the coding ofthe same object shapes using the technique in theMPEG-4 Video VM 8.0 [13] as implemented in theMoMuSys VM8-971119 software implementation[1]. From the results, it may be concluded thatInter CAE and the best directly comparable tech-nique proposed here (DCC object-based intra cod-ing } Table 5) have similar performance in the case

Table 6Bits/alpha plane } CAE (VM 8.0) (lossless coding)

Inter CAE Intra CAE


of the `Kidsa, `Girla and `Stefana objects; for theother objects, Inter CAE outperforms, notably forvery structured objects such as the `Logoa. How-ever, for the intra coding mode MGCC outper-forms CAE (see Tables 5 and 6).

Having seen the results, it is important to high-light that MGCC introduces new important fea-tures with respect to CAE. First, the types of codinglosses introduced by the MGCC do not change thehomotopy of the shape being coded. As reported in[17], this is a problem that has to be carefullyhandled in the case of CAE since CAE coding lossescan lead to the creation of holes inside the decodedobjects or even to the division of a single object intoseveral objects. This issue has to be taken intoaccount when encoding the VOP information.

IMAGE 487


Moreover, MGCC has the possibility to be ex-tended to code complete image partitions. Thisextension is possible by introducing the concept oftriple point in the encoding process [18].

Since the CAE technique uses tools and modes,notably arithmetic encoding and the prediction ofthe MB coding modes, that may improve the chaincode based techniques proposed here, in the futurethe work will proceed by the integration of theseadditional tools to study the corresponding perfor-mance improvement. Moreover, an object-basedcoding mode using temporal prediction will also bedeveloped.

Although shape coding has been for many yearsa topic of research, it was the push given by theMPEG-4 standardisation e!ort, and its competi-tive environment, that "nally allowed a signi"cantjump in terms of shape coding performance. Thisstrong push was motivated by the key role thatshape coding plays in the MPEG-4 object-basedrepresentation of video data. With shape, videoobjects will follow, and with video objects peoplewill interact, giving birth to a new generation ofmultimedia applications.

Acknowledgements

The authors acknowledge the support of theEuropean Commission under the ACTS projectMoMuSys. This work was also supported by the`Acc7 a8 o Integrada Luso-Espanhola, Codi"cac7 a8 o deVmHdeo para Terminais Audiovisuais MoH veisa.P. Nunes acknowledges PRAXIS XXI for his Ph.D.scholarship.

References

[1] ACTS MoMuSys Project, MPEG-4 video veri"cationModel 8.0 software implementation: MoMuSys VM8-971119, November 1997.

[2] N. Brady, F. Bossen, N. Murphy, Context-based arith-metic encoding of 2D shape sequences, in: Proceedings ofthe 1997 IEEE International Conference on Image Pro-cessing, Santa Barbara, CA, USA, October 1997.

[3] H. Freeman, On the encoding of arbitrary geometric con-"gurations, IRE Trans. Electron. Comput. EC(10) (June1961) 260}268.

[4] P. Gerken, Object-based analysis-synthesis coding of im-age sequences at very low bit rates, IEEE Trans. Circuits

Systems Video Technol., Special Issue on Very Low BitRate Video Coding 4 (3) (June 1994) 228}235.

[5] R. Koenen, F. Pereira, L. Chiariglione, MPEG-4: Contextand objectives, Signal Processing: Image Communication9 (4) (May 1997) 295}304.

[6] S. Lee, D.-S. Cho, S. Cho, S. Son, E. Jang, J.-S. Shin, Binaryshape coding using 1-D distance values from baseline, in:Proceedings of the 1997 IEEE International Conferenceon Image Processing, Santa Barbara, CA, USA, October1997.

[7] F. MarqueH s, A. Gasull, Partition coding using multi-gridchain code and motion compensation, in: Proceedings ofthe 1996 IEEE International Conference on Image Pro-cessing, Lausanne, Switzerland, September 1996, Vol. II,pp. 935}938.

[8] F. MarqueH s, P. Nunes, Description of the core experimentS4b: Multi-grid chain coding method, Doc. ISO/IECJTC1/ SC29/WG11 M1326, Chicago MPEG meeting,September 1996.

[9] F. MarqueH s, J. Sauleda, A. Gasull, Shape and locationcoding for contour images, in: Proceedings of the 1993Picture Coding Symposium, Lausanne, Switzerland,March 1993, pp. 18.6.1}18.6.2.

[10] T. Minami, K. Shinohara, Encoding of line drawings witha multiple grid chain code, IEEE Trans. Pattern Anal. andMach. Intell. PAMI-8 (2) (March 1986) 269}276.

[11] MPEG Requirements Group, MPEG-4 requirements,Version 11, Doc. ISO/IEC JTC1/SC29/WG11 N2723,Seoul MPEG meeting, March 1999.

[12] MPEG Video Group, Core experiments on MPEG-4video shape coding, Doc. ISO/IEC JTC1/SC29/WG11N1471, MaceioH MPEG meeting, November 1996.

[13] MPEG Video Group, MPEG-4 video veri"cation model8.0, Doc. ISO/IEC JTC1/SC29/WG11 N1796, StockholmMPEG meeting, July 1997.

[14] MPEG Video and SNHC Groups, FDIS of ISO/IEC14496-2 Part 2: Visual, Doc. ISO/IEC JTC1/SC29/WG11N2502, Atlantic City MPEG meeting, October 1998.

[15] P. Nunes, F. Pereira, F. MarqueH s, Multi-grid chain codingof binary shapes, in: Proceedings of the 1997 IEEE Inter-national Conference on Image Processing, Santa Barbara,CA, USA, October 1997.

[16] J. Ostermann, Coding of binary shape in MPEG-4, in:Proceedings of the 1997 Picture Coding Symposium,Berlin, Germany, September 1997, pp. 659}662.

[17] J. Ostermann, E$cient encoding of binary shapes usingMPEG-4, in: Proceedings of the 1998 IEEE InternationalConference on Image Processing, Chicago, IL, USA,October 1998.

[18] P. Salembier, F. MarqueH s, A. Gasull, Coding of partitionsequences, in. L. Torres, M. Kunt (Eds.), Video Coding:The second Generation Approach, Kluwer Academic Pub-lishers, Dordrecht, 1996, pp. 125}170.

[19] N. Yamaguchi, T. Ida, T. Watanabe, A binary shape cod-ing method using modi"ed MMR, in: Proceedings of the1997 IEEE International Conference on Image Processing,Santa Barbara, CA, USA, October 1997.

IMAGE 487


Documents

A contour-based approach to binary shape coding using a multiple grid chain code