An Implementation of Multi-Dimensional Maximally Stable … · 2015-07-28 · 3.1 Enumerating extremal regions We describe ﬁrst a method to enumerate all extremal regions of a given

An Implementation of

Multi-Dimensional

Maximally Stable Extremal Regions

Andrea Vedaldi

February 7, 2007

Contents

1 Introduction 1

2 Maximally stable extremal regions 2

3 Regions computation 33.1 Enumerating extremal regions . . . . . . . . . . . . . . . . . . . . 43.2 Computing the stability score . . . . . . . . . . . . . . . . . . . . 53.3 Cleaning up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.4 Fitting elliptical regions . . . . . . . . . . . . . . . . . . . . . . . 6

4 Experiments 6

1 Introduction

We describe an implementation of the Maximally Stable Extremal Region ([3],Sect. 2, MSER) feature detector1 and an immediate multi-dimensional general-ization ([1], Sect. 2). We propose an algorithm (Sect. 3) that is essentially union-tree with path-compression and union-by-rank (see for instance [4]). Howeverwe do not use the N-tree graph of [4] as for the purpose of fitting ellipses tothe MSERs a much simpler data structure turns out to be sufficient (Sect. 3.4).Finally we describe a few experiments where 3-D MSERs are used to trackregions in video sequences (Sect. 4). This is different from [2] as we do notextract regions from each frame independently, but directly a 3-D region fromthe stacking of such frames.

1The implementation can be downloaded at http://vision.ucla.edu/∼vedaldi/code/mser/mser.html

1

Figure 1: MSER tracker. By computing 3-D MSERs on the stacking of theframes of a video sequence we obtain a simple tracker (Sect 4).

2 Maximally stable extremal regions

Here an image I(x), x ∈ Λ is a real function of a finite set Λ with a topology τ .Elements of Λ are called pixels. For simplicity, we take Λ = [1, 2, . . . , N ]n andthe topology τ induced by the 4-way or 8-way neighborhoods, but we do notrestrict ourselves to n = 2 as [3].

A level set S(x), x ∈ Λ of the image I(x) is the set of pixels that haveintensity not greater than I(x), i.e.

S(x) = {y ∈ Λ : I(y) ≤ I(x)}.

A path (x1, . . . , xn) is a continuous sequence of pixels (i.e. such that xi and xi+1

are 4-way or 8-way neighbors for i = 1, . . . , n − 1). A connected component Cof the set Λ is a subset C ⊂ Λ for which each pair (x1, x2) ∈ C2 of pixels isconnected by a path fully contained in C. The connected component is maximalif any other connected component C ′ containing C is equal to C. An extremalregion R is a maximal connected component of a level set S(x). We denote byR(I) the set of all extremal regions of image I.

Stability criteria. Among all extremal regions R(I), we are interested in theones that satisfy certain stability criteria which we introduce next. Let the levelI(R) of the extremal region R be the maximum image value attained in theregion R, i.e.

I(R) = supx∈R

I(x). (1)

2

x

I(R)

I(R−∆)

I(R+∆)

R−∆

R

R+∆

I(x)

Figure 2: Stability criteria. We show an extremal region R of a one dimen-sional image I(x) and the corresponding extremal regions R+∆ and R−∆ (seetext). Stability is computed based on the area variation of such regions (Sect. 2).

Let ∆ > 0. Let R+∆ be the smallest extremal region that contains iR and hasintensity which exceeds of at least ∆ the intensity of R (Fig. 2), i.e.

R+∆ = argmin{|Q| : Q ∈ R(I), Q ⊃ R, I(Q) ≥ I(R) + ∆}. (2)

Similarly, let R−∆ be the biggest extremal region containing R that has intensitywhich is exceeded by at least ∆ by R, i.e.

R−∆ = argmax{|Q| : Q ∈ R(I), Q ⊂ R, I(Q) ≤ I(R)−∆}. (3)

Consider the area variation

ρ(R;∆) =|R+∆| − |R−∆|

|R|.

The region R is maximally stable if it is a minimum for the area variation, inthe following sense: ρ(R;∆) is smaller than ρ(Q;∆) for any extremal regionQ “immediately contained” or “immediately containing” R. We say that anextremal region R immediately contains another extremal region Q if R ⊃ Qand if R′ is another extremal region with R ⊃ R′ ⊃ Q, then R′ = R. Note thatthis notion makes sense because the base set Λ is finite.

3 Regions computation

We describe an efficient algorithm for the computation of the maximally stableextremal regions of an image I(x) defined on a discrete domain Λ.

3

3.1 Enumerating extremal regions

We describe first a method to enumerate all extremal regions of a given imageI. Let x1, x2, . . . , xN ∈ Λ be a sorting2 of the image pixels by increasing inteistyvalue, i.e.

I(x1) ≤ I(x2) ≤ . . . I(xN ).

We compute extremal regions incrementally, by considering larger and largerimage subdomains Λt = {x1, x2, . . . , xt} ⊂ Λ for t = 1, . . . , N . Denote byIt = I|Λt the restriction of the image I to the subset Λt.

For t = 1, Λ1 = {x1} is trivially an extremal region of the image I! and levelI(x1). For t = 2, either x1 and x2 are connected and Λ2 is an extremal regionof I2, or they are not and {x2} is an extremal region of Λ2. Moreover Λ1 is anextremal region of I2 if, and only if, I(x2) 6= I(x1). This is captured in generalby:

Lemma 1. Let t be one of 1, 2, . . . , N − 1. Let R1, . . . , RK be all the extremalregions of It. Let

• K1 the subset of indices k for which I(Rk) 6= I(xt+1) and

• K2 the subset of indices k for which I(Rk) = I(xt+1) but xt+1 is notconnected to Rk and

• let K3 be the subset of indices k for which xt+1 is connected to Rk.

Then

1. for all k ∈ K1 ∪ K2 the set Rk is an extremal region of It+1;

2. the set R = {xt+1}∪k∈K3 is an extremal region of It+1;

3. all extremal regions of It+1 are obtained either as (1) or (2).

Proof. By definition each Rk is a maximal connected component of the setSt(Rk) = {x ∈ Λt : I(x) ≤ I(Rk)}. If k ∈ K1, then I(Rk) 6= I(xt+1), St(Rk) =St+1(Rk) and Rk is a maximal connected component of St+1(Rk) as well. Ifk 6∈ K1, then St+1(Rk) = S(Rk)∪{xt+1}. However if k ∈ K2, then Rk and xt+1

are not neighbors and Rk is still maximal in St+1(Rk). Finally, {xt+1} togetherwith all the regions Rk of level I(Rk) ≤ I(xt+1) which are neighbors of xt+1, i.e.k ∈ K3, constitute a new extremal region. To see this, note that (i) R ⊂ S(xt+1),(ii) R is connected because the subregions Rk are connected and any two pointsin two different subregions are connected through xt+1 by construction and (iii)R is maximal as if not, one could add a pixel y ∈ Λt = Λt+1−{xt+1} to R thatwould be either an extension of one of the extremal regions Rk of image It or{y} would be a new extremal region of image It by itself.

Finally, we need to show that the listing is exhaustive. So let R be anextremal region of image It+1. If R ⊂ St(R), then xt+1 6∈ R and R is equalto some Rk for k ∈ K1 ∪ K2 by the inductive hypotesis. If, on the other hand,xt+1 ∈ R, then R is obtained as (2).

2This can be done in linear time by using bucket-sort.

4

Lemma 1 suggests a simple algorithm to enumerate extremal regions. Theidea is to consider one pixel at time in the order x1, x2, . . . growing extremalregions for the intermediate images It until IN = I is reached.

Formally, this process can be implemented by means of a forest of pixels. Attime t the forest represents all the union operations that have been performedso far according to point (2) of Lemma 1. Since extremal regions are onlygenerated by such union operations, the tree stores all the extremal regions ofall intermediate images I1, . . . , It.

Let us consider the addition of pixel xt+1 to the forest. Following Lemma 1,we must search for all extremal regions R1, . . . , Rk of image It which are neigh-bors of xt+1 and join them to xt+1 to obtain the new region R. This is done byscanning the neighbors y ∈ Λt of xt+1 and, for each of them, climbing the treein search for the appropriate extremal regions Rk. In practice, we simply takethe union of all sets S(y) ∪ S(π(y)) ∪ S(π2(y)) ∪ · · · ∪ S(root(y)) = S(root(y)),where S(y) is the subtree rooted at y, π(y) is the parent of y and root(y) isthe root of the tree that contains y. While only some of S(πn(y)) are indeedextremal regions of image It, S(root(y)) always is and, since it covers all othersubsets anyway, it is sufficient to join that. The join operation is then encodedin the forset by making xt+1 parent of root(y), i.e. π(root(y))← xt+1.

This basic algorithm can be improved significantly by keeping the tree bal-anced. This is an optimization of the join operation, for which xt+1 is notnecessarily added to the forest as root; instead one uses as root one of the nodesroot(y) with the goal keeping the tree height short. Although this disruptspartially the property of the forest (some of the extremal regions of the inter-mediate images I1, I2, . . . are lost), the relevant information (i.e. the regionsthat are extremal regions of I) is preserved, as it can be verified. In particular,regions can be emitted as soon as condition (1) of the Lemma is encountered,which correspond to the case I(y) 6= I(π(y)).

3.2 Computing the stability score

Once the extremal region tree is computed, we need to calculate the area vari-ation for each region and then selecting the maximally stable ones. The area|R| of each region is computed efficiently as explained in Sect. 3.4. In order tocompute the area variation of a region R, we need to figure out the regions R−∆

and R+∆. To do this we begin by arranging the extremal regions into a treewhere R is parent of R′ if R immediately contains R′. Then each region R isconsidered and the tree is explored to find a region Q for which R = Q−∆ andthe region R+∆. This is done by scanning the regions R0 = R, R1 = π(R0),R2 = π(R1) and so on. If a region Q = Ri satisfies Q−∆ = R0, then

I(R0) ≤ I(Ri)−∆ < I(R1).

The condition is not necessary though; according to (3) we need to keep theregion of maximum area among all the candidate ones. Similarly, if Ri = R+∆,then

I(Ri) ≤ I(R0) + ∆ < I(Ri+1).

5

In this case the condition is also sufficient as at most one of such regions exist.

3.3 Cleaning up

The stability score alone may not be sufficient to select only useful regions. Inthe cleanup phase we

• remove very small and very big regions;

• remove regions which have too high area variation (even if they are indeedminima of the variation score);

• remove duplicated regions.

Duplicated regions arise because, due to noise, the same mode of the local min-ima score may correspond to more than one local minimum. Duplicated regionsare easily found by comparing each MSER R with the MSER R′ immediatelycontaining R and removing R if they are too similar.

3.4 Fitting elliptical regions

Fitting elliptical regions amount to computing for each maximally stable ex-tremal region R the first and second order moments, i.e.

µ(R) =1|R|

∑x∈R

x, Σ(R) =1|R|

∑x∈R

(x− µ)(x− µ)>.

Rather than considering directly the centered moment Σ(R), it is computation-ally more convenient to compute

M(R) =1R

∑x∈R

xx>

and use the fact that Σ(R) = M(R) − µ(R)µ(R)>. The advantage is that anyquantity which is obtained by integrating a function f(x), x ∈ Λ of the imagedomain (in particular f(x) = x and f(x) = xx>) can be computed for all regionsat once by visiting (in breath first order and from the leaves) each pixel of theforest and summing its value to the parent. The visit order is determined (andcan be recorded for later use) during the construction of the forest itself. Thissimple idea achieves the same efficiency of [4] for the purpose of fitting ellipses.

4 Experiments

Multi-dimensional extremal regions can be compute for instance on volumetricimages or video sequences. Here we explore the latter possibility, which shouldyields to a dynamic extension of MSER, or region tracker. Some results areshown in Fig. 1 and in Fig. 3.

6

Figure 3: Examples of incorrectly tracked regions. Since the shape of theregion is not constrained in any way across frames, due to cross-frame overlap-ping, regions may bleed yielding to inconsistent tracking.

References

[1] M. Donoser and H. Bischof. 3D segmentation by maximally stable volumes.In ICPR, 2006.

[2] M. Donoser and H. Bischof. Efficient maximally stable extremal region(MSER) tracking. In CVPR, 2006.

[3] J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide baseline stereofrom maximally stable extremal regions. In BMVC, 2002.

[4] E. Murphy-Chutorian and M. Trivedi. N-tree disjoint-set foreset for maxi-mally stable extremal regions. In BMVC, 2006.

7

Documents

An Implementation of Multi-Dimensional Maximally Stable … · 2015-07-28 · 3.1 Enumerating extremal regions We describe ﬁrst a method to enumerate all extremal regions of a given