10
Graph Partitioning by Spectral Rounding: Applications in Image Segmentation and Clustering * David A. Tolliver Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213. [email protected] Gary L. Miller Computer Science Department, Carnegie Mellon University, Pittsburgh, PA 15213. [email protected] Abstract We introduce a family of spectral partitioning methods. Edge separators of a graph are produced by iteratively reweighting the edges until the graph disconnects into the prescribed number of components. At each iteration a small number of eigenvectors with small eigenvalue are computed and used to determine the reweighting. In this way spectral rounding directly produces discrete solutions where as cur- rent spectral algorithms must map the continuous eigenvec- tors to discrete solutions by employing a heuristic geometric separator (e.g. k-means). We show that spectral rounding compares favorably to current spectral approximations on the Normalized Cut criterion (NCut). Results are given for natural image segmentation, medical image segmentation, and clustering. A practical version is shown to converge. 1. Introduction Several problems in machine perception and pattern recognition can be formulated as partitioning a graph. In general, most formulations of partition quality yield NP- hard optimization problems. This raises two important questions. First, does a particular optimization problem capture good partitions for the image segmentation domain, especially in light of the optimization being NP-hard and thus we may never know the true optimum anyway. Sec- ond, given that the optimal value is a good characterization are approximations quickly constructible and do they return good partitions? One popular formulation, used in image processing and clustering, is the normalized cut (NCut) of a graph intro- duced by Shi and Malik [17]. The ideas contained therein were further explored by Ng et al. [15] and Yu and Shi [21] * This work was supported in part by the National Science Foundation under grants CCR-9902091, CCR-9706572, ACI 0086093, CCR-0085982 and CCR-0122581. both of whom motivated multi-way partitioning algorithms. In part, our method was motivated by observations made in [15, 21]. Now, how does the NCut optimization problem fare against our two questions? It is not difficult to construct image examples for which common image percepts do not correspond to the optimal NCut of the image (e.g. see Shental et al.’s example [16] and see Figure 3 for similar sub-optima). This is unsurpris- ing, and an acknowledged attribute of all objective measures of cluster or partition quality (see Kleinberg [6] and Meil˘ a [11] for treatment of this issue). But, for many images, as we shall show, there are segmentations with a smaller nor- malized cut value than in those generated by earlier meth- ods that are at the same time more pleasing. For example, one of the main empirical advantages of spectral rounding technique seems to be that it is less likely to split the im- age in homogeneous regions, see Figure 2, while returning smaller NCut values. Thus good image segmentations are generated as graph partitions without reformulating the un- derlying combinatorial problem. The two common paradigms for approximating such objective functions are 1) linear or semidefinite program- ming [10, 20, 2]. and 2) spectral methods [4, 17]. In this paper we introduce a spectral technique that empirically im- proves upon existing spectral algorithms for quotient cuts. Earlier spectral methods consisted of a two stage algorithm. In the first stage a small collection of, say k, eigenvectors with small eigenvalues are computed. While in the second stage these vectors are used to map the graph vertices into R k and a geometric separator is then applied [3]. Recently, Lang and Rao have proposed a compelling alternative, a flow-based technique [8] that bests geometric rounding for quotient 2-cuts of a graph. Our approach is to skip the geometric separator step by iteratively reweighting the graph in such a fashion that it eventually disconnects. At each iteration we will use the eigenvalues and eigenvectors of the reweighted graph to de- 1

Graph Partitioning by Spectral Rounding: …Graph Partitioning by Spectral Rounding: Applications in Image Segmentation and Clustering ∗ David A. Tolliver Robotics Institute, Carnegie

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Graph Partitioning by Spectral Rounding: …Graph Partitioning by Spectral Rounding: Applications in Image Segmentation and Clustering ∗ David A. Tolliver Robotics Institute, Carnegie

Graph Partitioning by Spectral Rounding:Applications in Image Segmentation and Clustering∗

David A. TolliverRobotics Institute,

Carnegie Mellon University,Pittsburgh, PA 15213.

[email protected]

Gary L. MillerComputer Science Department,

Carnegie Mellon University,Pittsburgh, PA 15213.

[email protected]

Abstract

We introduce a family of spectral partitioning methods.Edge separators of a graph are produced by iterativelyreweighting the edges until the graph disconnects into theprescribed number of components. At each iteration a smallnumber of eigenvectors with small eigenvalue are computedand used to determine the reweighting. In this way spectralrounding directly produces discrete solutions where as cur-rent spectral algorithms must map the continuous eigenvec-tors to discrete solutions by employing a heuristic geometricseparator (e.g. k-means). We show that spectral roundingcompares favorably to current spectral approximations onthe Normalized Cut criterion (NCut). Results are given fornatural image segmentation, medical image segmentation,and clustering. A practical version is shown to converge.

1. Introduction

Several problems in machine perception and patternrecognition can be formulated as partitioning a graph. Ingeneral, most formulations of partition quality yield NP-hard optimization problems. This raises two importantquestions. First, does a particular optimization problemcapture good partitions for the image segmentation domain,especially in light of the optimization being NP-hard andthus we may never know the true optimum anyway. Sec-ond, given that the optimal value is a good characterizationare approximations quickly constructible and do they returngood partitions?

One popular formulation, used in image processing andclustering, is the normalized cut (NCut) of a graph intro-duced by Shi and Malik [17]. The ideas contained thereinwere further explored by Nget al. [15] and Yu and Shi [21]

∗This work was supported in part by the National Science Foundationunder grants CCR-9902091, CCR-9706572, ACI 0086093, CCR-0085982and CCR-0122581.

both of whom motivated multi-way partitioning algorithms.In part, our method was motivated by observations made in[15, 21]. Now, how does the NCut optimization problemfare against our two questions?

It is not difficult to construct image examples for whichcommon image percepts do not correspond to the optimalNCut of the image (e.g.see Shentalet al.’s example [16]and see Figure 3 for similar sub-optima). This is unsurpris-ing, and an acknowledged attribute of all objective measuresof cluster or partition quality (see Kleinberg [6] and Meila[11] for treatment of this issue). But, for many images, aswe shall show, there are segmentations with a smaller nor-malized cut value than in those generated by earlier meth-ods that are at the same time more pleasing. For example,one of the main empirical advantages of spectral roundingtechnique seems to be that it is less likely to split the im-age in homogeneous regions, see Figure 2, while returningsmaller NCut values. Thus good image segmentations aregenerated as graph partitions without reformulating the un-derlying combinatorial problem.

The two common paradigms for approximating suchobjective functions are 1) linear or semidefinite program-ming [10, 20, 2]. and 2) spectral methods [4, 17]. In thispaper we introduce a spectral technique that empirically im-proves upon existing spectral algorithms for quotient cuts.Earlier spectral methods consisted of a two stage algorithm.In the first stage a small collection of, sayk, eigenvectorswith small eigenvalues are computed. While in the secondstage these vectors are used to map the graph vertices intoRk and a geometric separator is then applied [3]. Recently,Lang and Rao have proposed a compelling alternative, aflow-based technique [8] that bests geometric rounding forquotient 2-cuts of a graph.

Our approach is to skip the geometric separator step byiteratively reweighting the graph in such a fashion that iteventually disconnects. At each iteration we will use theeigenvalues and eigenvectors of the reweighted graph to de-

1

Page 2: Graph Partitioning by Spectral Rounding: …Graph Partitioning by Spectral Rounding: Applications in Image Segmentation and Clustering ∗ David A. Tolliver Robotics Institute, Carnegie

termine new edge weights. At first hand this may seem veryinefficient, since the most expensive step in the two stagemethod is the eigenvector calculation. By using the eigen-vector from the prior step as a starting point, for finding thenew eigenvector, simple powering methods seem to work inonly a small number of steps.

1.1. Problems and Results

Let G = (V,E, w) be a graph onn vertices with con-nectivity E; we wish to find a partitioning of vertices suchthat some cut metric is minimized. An apparently equi-table measure of the quality of a cut is thenormalized cut(NCut). As NCut minimizes the ratio cut cost over the bal-ance in total degree of the partitions. The normalized cutcriterion is:

nc(G) = argminV1,..,Vp

:1p

p∑i=1

vol(Vi, V \ Vi)vol(Vi)

(1.1)

wherevol(Vi) is the sum of edge weights associated withthe vertices inVi, vol(Vi, V \ Vi) is the sum of the edgeweights connectingVi to remainder of the graph, andVi ∩Vj = ∅ . The combinatorial problem in Equation 1.1 canbe stated as a Quadratically Constrained Quadratic Program(QCQP)[21]. This QCQP admits a straightforward eigen-vector relaxation, stated as minimization of the RayleighQuotient over orthogonal functions.

1.2. Quality of the Spectral Bound

Spectral methods are so named because the secondsmallest eigenvalueλ2 of the normalized Laplacianbounds the best cut obtained from a continuous vector.The associated eigenvectors provide a means of obtaininga discrete solution that satisfies the eigenvalue bound. Theeigenvalue bound on theisoperimetric number of a graphΦ(G), see [14, 4], can be ported to the normalized 2-cutas 1

2λ2 ≤ nc(G) ≤ Φ(G) ≤√

2λ2, asΦ(G) is an up-per bound onnc(G). The upper bound onΦ(G) is loosein general, as demonstrated by the pathological graphs con-structed by Guattery and Miller in [5]. While, guaranteedO( 1√

n) cut bounds were exhibited for planar graphs in [18].

Equation 1.1 has been studied in the context of imagesegmentation in the vision community [17, 21] and cluster-ing in the learning community [15, 20]. In all cases a stan-dard spectral algorithm is used. The methods [17, 21, 15]differ primarily in how the eigenvectors are used to find afeasible solution satisfying the constraints in Equation 1.1.

2. Preliminaries

Throughout this paper we letG = (V,E,w) denote anedge weighted undirected graph without multiple edges orselfloops, whereV is a set ofn vertices numbered from 1

to n, E is a set ofm edges, andw : E → [0, 1] is the edgeweighting.

We associate four matrices with the graph: First,W theweighted adjacency matrix,

Wij ={

wij = w(i, j) if (i, j) ∈ E0 otherwise

(2.1)

Theweighted degreeof vertexi is di =∑n

j=1 wij . We as-sume that no vertex has zero degree. Second, theweighteddegree matrixD is

Dij ={

di if i = j0 otherwise

. (2.2)

Third, thegeneralized Laplacianor simply theLaplacianof G is L = D−W . Finally, thenormalized Laplacian ofG isL = D−1/2LD−1/2.

Rather than working directly with the normalized Lapla-cian we shall work with a similar system. IfD1/2f = g andg is an eigenvector ofL with eigenvalueλ, i.e.,Lg = λgthen it is an easy calculation to see thatf is a generalizedeigenvector for the pair(L,D) with eigenvalueλ. That isLf = λDf . It will be convenient to work with the gener-alized eigenvalues and vectors of(L,D). In this case thenormalized Rayleigh quotient isfT Lf/fT Df of the valu-ationf .

We make a simple, but important, observation aboutthese Rayleigh quotients:

Lemma 1. Given a weighted symmetric graphG =(V,E, w) then the normalized Rayleigh quotient can bewritten as

fT Lf

fT Df=

∑(i,j)∈E,i<j(fi − fj)2wij∑

(i,j)∈E,i<j((fi)2 + (fj)2)wij(2.3)

wherefi = f(vi)

The main importance of Lemma 1 is that for each valua-

tion f and each edgeeij we get the fraction (fi−fj)2

(fi)2+(fj)2. We

will use this fraction to reweight the edgeeij . The simplestreweighting scheme would be to replace the edge weight

wij with the weight(fi)2+(fj)

2

(fi−fj)2wij . There are several issues

with this scheme that will be address in the next section.

3. Spectral Rounding

In this section spectral rounding is introduced as a pro-cedure for obtaining graph cuts. At each iteration a smallnumber of eigenvectors with small eigenvalue are computedand used to determine the reweightingw′ for the graphG = (V,E, w). We show this process induces a k-way mul-tiplicity in the k smallest eigenvalues of L (i.e. λi(L) = 0for 1 ≤ i ≤ k). By obtaining a Laplacian with this

2

Page 3: Graph Partitioning by Spectral Rounding: …Graph Partitioning by Spectral Rounding: Applications in Image Segmentation and Clustering ∗ David A. Tolliver Robotics Institute, Carnegie

nullspace property we guarantee that the matrix representsk disconnected subgraphs, whose vertex membership canbe read off directly from the firstk eigenvectors.§3.1 defines the spectral rounding algorithm.§3.2 con-

nects decreasing a Rayleigh quotient to reweighting thegraph.§3.6 the spectral rounding algorithm is shown to con-verge for areweighting schemeproposed in§3.1.

3.1. TheSR-Algorithm

For a graphG = (V,E,w0) prescribe the number ofpartitionsk that the edge cut is to yield. Given a validreweighting scheme, iteration of theSR-Step producesa sequence ofN weightings{w(N)} such that the graphGN = (V,E, wN ) is disconnected intok components bythe weightingwN .

Algorithm 1 SR-Step(w: ||w||k > 0)Let Fk = [f1 ... fk] denote thek generalized eigenvectorsof L(G;w), D(G;w) associated with thek smallest eigen-valuesΛk = diag([λ1 ... λk])

1. compute wr = R(Fk,Λk), set α = 1 & w′ = wr

2. while ||w′||k ≥ ||w||kα← 1

2α, w′ = (1− α)w + αwr

3. return w′

The functionR computes a new weighting of the graphgiven the firstk eigenpairs ofL,D. The norm|| · ||k istaken over weightings of the graph, such that||w||k = 0iff the weightingw disconnects the graph into at leastkpieces. A pairR, || · ||k is called areweighting schemeif the SR-Step converges in a finite number of iterations.We define Algorithm 2, theSR-Algorithm , as the itera-tion of Algorithm 1 until ||w(N)||k u 0. In the followingsections we proposeRs and corresponding norms|| · ||ksuch that theSR-Step andSR-Algorithm converge inthe desired fashion.

In §3.6 a simplified version ofSR-Algorithm isshown to converge on graphs with|| · ||k < 1. In the caseof a 2-cut this reduces toλ2(L) < 1. The class of graphssatisfying this spectral constraint is very general, excludingan uninteresting collection of graphs for our purposes. Inparticular, if λ2 ≥ 1 then no subset of the vertices existswith more than half of its edge volume contained within it(entailed by the Cheeger bound2Φ(G) ≥ λ2 [4]). Suchgraphs are often called expander graphs.

3.2. Fractional Averages: a reweighting function

By Lemma 1 we saw that the Rayleigh quotient could bewritten as a sum of formal fractions where the numeratorsare added separately from the denominators. Define afor-mal fraction as a pair of real numbersab and itsvalue as

the real numbera/b. We call the average of a set offor-mal fractions thefractional average. We now prove a fewsimple but important facts about fractional averages.

Definition 1. Given formal fractions

a1

b1, · · · , an

bn

thefractional average is the formal fraction∑ni=1 ai∑ni=1 bi

where theai’s andbi’s are reals.

We will simply call formal fractions fractions and onlymake a distinction between the formal fraction and its valuewhen needed. In the case when theai’s andbi’s are nonneg-ative we first observer that the fractional average is a convexcombination of the fractions. That is we can rewrite the sumas

n∑i=1

bi

b· ai

bi

whereb =∑n

i=1 bi. Thus fractional average lies betweenthe largest and smallest fraction.

Possibly a more important interpretation is by viewingeach fractionai

bias the pointPi = (bi, ai) in the plane and

the value of the fraction is just its slope. The fractional av-erage is just the vector sum of the points. Since we are onlyinterested in the value of the fraction, the slope, we willthink of the fractional average as the centroid of the points.If we multiply the numerator and denominator by a scalarw we shall say wereweighted the fraction by w. Geo-metrically, we are scaling the vectors or pointsPi and thencomputing the centroid.

In the next lemma we show that we can control the slopeof the fractional average by reweighting.

Lemma 2. If a1b1≤ · · · ≤ an

bnandw1 ≥ · · · ≥ wn then∑n

i=1 ai∑ni=1 bi

≥∑n

i=1 aiwi∑ni=1 biwi

The inequality is strict if for some pair1 ≥ i < j ≤ n wehave thatai

bi<

aj

bjandwi > wj .

Proof. It will suffice to show that∑ni=1 ai∑ni=1 bi

−∑n

i=1 aiwi∑ni=1 biwi

≥ 0 (3.1)

Multiplying the left hand side through by its denomina-tors we get∑

i,j

ajbiwi −∑i,j

ajbiwj =∑i,j

ajbiwi − ajbiwj (3.2)

3

Page 4: Graph Partitioning by Spectral Rounding: …Graph Partitioning by Spectral Rounding: Applications in Image Segmentation and Clustering ∗ David A. Tolliver Robotics Institute, Carnegie

Observe that term wherei = j are zero. Thus we canwrite the sum as:∑

i<j

ajbi(wi − wj) + aibj(wj − wi) (3.3)

Rearranging the last term in the sum gives:∑i<j

ajbi(wi − wj)− aibj(wi − wj) (3.4)

Finally we get:∑i<j

(ajbi − aibj)(wj − wi) (3.5)

By the hypothesis each term in the sum above is non-negative which proves the inequality. The strict inequalityfollows when one of the pair of terms in the sum are bothpositive as prescribed in the hypothesis.�

We get an exact expression by observing that the onlytime we got an inequality was when we cleared the denom-inators. Thus we have the following equation.

n∑i=1

ai

n∑i=1

biwi−n∑

i=1

aiwi

n∑i=1

bi =∑i<j

(ajbi−aibj)(wj−wi)

(3.6)The Rayleigh quotient in Lemma 1 associates a formal

fraction with each edge inG. One of the simplest waysto get weights satisfying the hypothesis of Lemma 2, for

such a system, is to pickwi = bi

ai= f(ui)

2+f(vi)2

(f(ui)−f(vi))2, if ai is

not zero. We shall call thisinverse fractional reweighting.This reweighting scheme gives very large values for smallvalues ofai. We have found that using the stereographicmap to normalized the inverse fractions between zero andone works well.

Observation 1. The stereographic projectionΨ : Rd →Sd preserves the order of points on the real line, mappingpoints at∞ to 1 and points at0 to 0. Thus theinverseweight ordering of the edge update values is preserved bythe stereographic map.

If we think of theΦ as mapping points inRd to Rd+1,where we are only interested in the value in thed+1 dimen-sion, then the images ofv ∈ Rd is vT v

vT v+1≥ 0. We useΨh

to denote the map which returns the value in this dimension(i.e. the “height” on the sphere).

3.3. Reweighting for Multiple Valuations (step 1 )

In the discussion so far we have assumed only one val-uation is being used. To produce cuts of higher degree (i.e.k > 2) it is crucial that we simultaneously handle multiplevaluations in computing the reweighting. Given two valua-tions, sayf andg, we need to pick a single new weight per

edge,i.e. a reweighting functionR. This can be thought ofas finding a reweighting of the graph,w′, such that

fT L′f + gT L′g

fT D′f + gT D′g<

fT Lf + gT Lg

fT Df + gT Dg(3.7)

we begin be proving that such a weighting exists. We state acorollary of lemma 2 for the case of two fractional averages,but extending it to a larger number of terms may be donewith the same argument.

Corollary 1. Given two sets ofn fractions{ai

bi} and{ ci

di}

and weights{wi} and{ωi} such that∑ni=1 ai∑ni=1 bi

>

∑ni=1 aiwi∑ni=1 biwi

,

∑ni=1 ci∑ni=1 di

>

∑ni=1 ciωi∑ni=1 diωi

there exists a single reweighting{γi} satisfying∑ni=1 ai +

∑i=1 ci∑n

i=1 bi +∑n

i=1 di>

∑ni=1 aiγi +

∑ni=1 ciγi∑n

i=1 biγi +∑n

i=1 diγi. (3.8)

Proof. First observe that thel.h.s.of equation 3.8 is a frac-tional average. Thus by Lemma 2 there exists a weightingof the formal fractionsa

b=

Pni=1 aiPni=1 bi

and cd

=Pn

i=1 ciPni=1 di

such

that a+cb+d

< α1a+α2cα1b+α2d

. This condition insures the existenceof a single reweighting{αk} whereγi = α1wi + α2ωi sat-isfying the inequality in equation 3.8.

An example weighting{αk} for the fractional averageof fractional averages is given byαk = bk

ak. In §? we show

how this reweighting of the fractional average of two valueswill connect the inequality in equation 3.7.

The fractional average over the Rayleigh quotientsRfa,takes the fractional sum off andg per edge, yielding onefraction per edge. Formally, for an edge(uv), letai(u, v) =1λi

(fi(u) − fi(v))2, andbi(u, v) = fi(u)2 + fi(v)2 for kvaluations. Define the reweighting function as

Rfa(F,Λ, wuv) = Ψh

(∑ki=1 bi(u, v)∑ki=1 ai(u, v)

)wuv(3.9)

To successfully link the reweighting functionsR toeigenvalue optimization we must specify a norm func-tion || · ||k on weightings ofG. For example, giventhe function Rfa we specify the norm ||w′||k =Pk

i=1P

(uv)∈E wuv(f ′i(u)−f ′(v))2Pki=1

P(uv)∈E wuv(f ′i(u)2+f ′i(v)2)

, i.e. the fractional average

of the updated eigenvectorsf ′i | L′f ′i = λ′iD′f ′i . Given

the reweighting functionRsa the norm||w′||k is the sumof k smallest eigenvalues ofL′, D′. We now connect linearcombinations of weightings to their eigenvalues.

3.4. From Rayleigh Quotients to Eigenvalues

In §3.2 we showed how to, given a valuation or set ofvaluations of a graph, reweight the edges so as to reduce the

4

Page 5: Graph Partitioning by Spectral Rounding: …Graph Partitioning by Spectral Rounding: Applications in Image Segmentation and Clustering ∗ David A. Tolliver Robotics Institute, Carnegie

Rayleigh quotient. In general this does not mean that if thevaluationf is an eigenvector with eigenvalueλ of the oldgraph that the corresponding eigenpairf ′ andλ′ of the newgraph will have the property thatλ′ ≤ λ.

Given a new edge weightingw′ such that the fractionalaverage of an eigenvector is decreased, we show that there isa linear combination of the weights of the formw+t ·w′ fort > 0 such that the associated eigenvalue is also decreased.This yields an algorithm which forces the target eigenvalueto zero. And motivates a matrix where the entries are linearfunctions of a scalart and the following lemma.

Definition 2. Given two weightingsw andw′ of G we de-fine thematrix curve , a 1−dimensional family of matrices,as:

W (t) = W + tW ′

for t ≥ 0.

A direct consequence of the scale invariance of the

Rayleigh quotient fT LffT Df

is that any linear combinationW (t) = W + tW ′ may be expressed as a convex com-binationW (α) = (1 − α)W + αW ′ on 0 ≤ α ≤ 1 (i.e.α = t

t+1 ). The eigenstructure of normalized laplacians de-fined onW (α) andW (t) are identical by the scale invari-ance of the Rayleigh quotient.

Lemma 3. Given a weighted graphG = (V,E, w), ma-tricesL andD, the simple eigenpair(f, λ) | Lf = λDf ,

and a new weightingw′ such thatfT L′f

fT D′f< fT Lf

fDf = λ then

the derivative of the eigenvalue function1 λ(t) of the matrixcurveW (t) = W + tW ′ is well defined for smallt and

dλ(t)dt

< 0

at t = 0.

Proof. For a simple eigenpair(f, λ), recall thatfT Lf

fT Df= λ,

asW (0) = W and thusL(0) = L, D(0) = D by defini-

1The proof thatdλ exists follows from properties of the characteristicpolynomial ofL, D and relies on the implicit function theorem. Detailscan be found on the differentiability ofλ andf in Lax [9] chapter 9.

tion. We deduce the bound ondλdt with a simple derivation.

dλ(t)dt

=fT(

dL(t)

dt − λdD(t)

dt

)f

fT DfT(3.10)

=fT (L′ − λD′)f

fDfT

{dL(t)

dt= L′

}(3.11)

= fT (L′ − λD′)f (3.12)

= fT L′f − λfT D′f (3.13)

=(

fT L′f

fT D′f− λ

fT D′f

fT D′f

)fT D′f (3.14)

=(

fT L′f

fT D′f− λ

)fT D′f (3.15)

=(

fT L′f

fT D′f− fT Lf

fT Df

)fT D′f (3.16)

The bound is entailed by observing that 1) the termfT D′f =

∑v∈V f2(v)d′(v) is positive, and 2) thus the

sign of dλ(t)dt is strictly negative due to the inequality

fT L′ffT D′f

< fT LffDf . The above derivation demonstrates that

the slopedλ(t)dt is negative att = 0. By the continuity of

λ there exists at > 0 such thatλ(t) < λ(0). This insuresthat the eigenvalueλ associated withf can be decreasedalgebraically using the procedure in§3.2 on fractional aver-ages.

To handle multiple eigenvectors one hope might be tosimultaneously bound the derivatives of the target eigenval-ues{λ2, ..., λk} of L(t), D(t). To do this one arrives atthe update criteria for the re-weightingw′; select aw′ such

that fTi L′fi

fTi D′fi

<fT

i Lfi

fTi Dfi

insuring thatλi(L(t)) < λi(L) for

2 ≤ i ≤ k. But in general this may be unrealistic since wemust use the same weight per edge for all the eigenvectorsfi. In the case where the aforementioned inequalities donot hold, we determinew′ so as to decrease the fractionalaverage of the Rayleigh quotients (seeRfa in §3.3). Theaverage of the target eigenvalues tends to zero as a conse-quence of the decrease in the fractional average. We nowstate a Lemma connecting the derivative of two eigenvaluesλf and λg with the reweighting scheme suggested in theprevious section.

Lemma 4. For a weighted graphG = (V,E, w) with ma-tricesL andD and simple eigenpairs(f, λf ) |Lf = λfDfand(g, λg) | Lg = λgDg, given a reweightingw′ such that

1λf

fT L′f + 1λg

gT L′g

fT D′f + gT D′g<

1λf

fT Lf + 1λg

gT Lg

fT Df + gT Dg= 1

(3.17)then

dλf (t)dt

+dλg(t)

dt< 0

at t = 0.

5

Page 6: Graph Partitioning by Spectral Rounding: …Graph Partitioning by Spectral Rounding: Applications in Image Segmentation and Clustering ∗ David A. Tolliver Robotics Institute, Carnegie

Proof. We begin by stating a related quantity of interest, thederivative of the fractional average of Rayleigh quotients onf andg for the matrix curvew = w + t · w′ as:

d

dt

[ 1λfT (t)L(t)f(t) + 1

λgT (t)L(t)g(t)fT (t)D(t)f(t) + gT (t)D(t)g(t)

](3.18)

and examine its derivative centered att = 0. First we mustfix the scale of the eigenvectorsf(t) andg(t), we choosef(t)T D(t)f(t) = g(t)T D(t)g(t) = 1 w.l.o.g. Thus equa-tion 3.18 simplifies to

d

dt

[1

λffT (t)L(t)f(t) + 1

λggT (t)L(t)g(t)

1 + 1

]=

12

(1λf

d

dtλf (t) +

1λg

d

dtλg(t)

)(3.19)

by the linearity of the derivative. We may now substitutethe functional form ofdλ(t)

dt in

12

(1λf

d

dtλf (t) +

1λg

d

dtλg(t)

)=

12

(1λf

fT (L′ − λfD′)f +1λg

gT (L′ − λgD′)g)

(3.20)

assume the bound holds

12

(1λf

fT L′f − λf

λffT D′f +

1λg

gT L′g − λg

λggT D′g

)< 0(

1λf

fT L′f − fT D′f +1λg

gT L′g − gT D′g

)(3.21)

< 0

arriving at

1λf

fT L′f +1λg

gT L′g < fT D′f + gT D′g

which is equivalent to the hypothesis in equation 3.17. Theremainder of the proof follows the continuity argument inLemma 3.

3.5. Termination of SR-Step

In order to prove that the algorithm SR-Algorithm con-verges to ak-way partition we first need to show that eachstep of SR-step terminates. In the next subsection we usethis termination to show convergence. To simplify the dis-cussion we only consider using one eigenvector. The SR-Algorithm has two majors steps. In the first step given avaluationf it computes a reweightingwr. We claim with areasonable assumption about the Rayleigh quotient that

fT Lf

fT Df>

fT Lrf

fT Drf. (3.22)

By lemma 2 we know that equation 3.22 is true as long asnot all the fractionsδij = (fi − fj)2/(f2

i + f2j ) are equal.

We show that if the fraction are all equal the Rayleigh quo-tient is at least one, contradicting how we construct affinitygraphs. We take

R(f)ij = Φh(δ−1ij )·wij = Φh

(f2

i + f2j

(fi − fj)2

)·wij (3.23)

as the reweighting functionR used inSR-step .

Lemma 5. Let G = (V,E, w) be a connected graph andf a valuation such that all the fractionsδij = (fi −fj)2/(f2

i + f2j ) are equal thenfT Lf

fT Df≥ 1

Proof. Let f be a valuation ofG with Rayleigh quotientλ.Suppose allδij = (fi− fj)2/(f2

i + f2j ) over all edges ofG

are the same value. Observe that: 1)δij > 1 if the sign offi

andfj differ, 2) δij = 1 if fifj = 0, 3) δij < 1 if fifj > 0.If we are not case 3) then the value of each fraction is atleast one. Thus by Lemma 1 the Rayleigh quotient is atleast one.

Lemma 6 can be easily generalized to the case of mul-tiple valuations under the update ruleRfa in §3.3. As thefractional average of the generalized Rayleigh quotients isa convex sum it lies between the largestλk and smallestλ2

eigenvalues. Ifλk < 1 holds the method will terminate asstated above with the update ruleRfa.

Lemma 6. Algorithm SR-step terminates for graphs suchthatλk < 1

Proof. Follows from Lemma 3 and Lemma 5.

3.6. Convergence of SR-Algorithm

By Lemma 6 we know that each step of procedure SR-step produces a new weighted graph such that the norm|| ||kis strictly decreasing. We show that in fact the limit normmust be zero, i.e., SR-Algorithm converges.

Again for simplicity of the exposition we only considerthe case of a single eigenvector.

Let Gi, Di, λi, andxi be theith graph, its degree ma-trix, its second eigenvalue, and eigenvector. That isGixi =λiDixi. We also assume that eachxi is a unit length vector.Thus all the eigenvectors all belong to a compact region inRn, the unit sphere. The Bolzano-Weierstrass theorem dic-tates such a space contains an accumulation point, say,x.Let {x(i)} be a subsequence of eigenvectors that convergeto x, and letG′, D′, andλ′ be the corresponding graph, de-gree matrix and eigenvalue. The eigenvectors converge tox and the eigenvalues converge to some valueλ. To insurethat the graph also converges to a unique graph we apply theBolzano-Weierstrass theorem again to the weighted graphswhich belong to a compact set inRm. Thus, we may also

6

Page 7: Graph Partitioning by Spectral Rounding: …Graph Partitioning by Spectral Rounding: Applications in Image Segmentation and Clustering ∗ David A. Tolliver Robotics Institute, Carnegie

assume the graphs and degree matrix converge toG′, D′,andλ′ such thatG′x′ = λ′D′x′.

Lemma 7. The limit of the sequence ofλi, as defined above,converges toλ = 0

Proof. Suppose thatλ′ > 0. We know by Lemma 6 that ifwe run SR-step onG′ we will get a new graph withλ′′ < λ′.Let ε = λ′ − λ′′. We can also run SR-Step on eachG′

i

getting a new eigenvalueλ′′i . Let εi = λ′i − λ′′i . Since SR-step is a continuous function inG′

i andx′i we get that theεi converge toε. For sufficiently largei it follows thatεi ≥ε/2. But this implies thatλ′i goes to−∞ which contradictsthe fact the they a bounded below by zero.

4. Empirical Evaluation

We show that spectral rounding compares favorably torecent spectral approximations for the Normalized Cut cri-terion (NCut). For the purpose of our evaluation we com-pared our method, denotedSR, to that of Yu and Shi [21],denotedEIG, as it reliably returns superior cuts relative tomethods which use K-means and less of the eigenvectors,those based on [15]. In the following sections results aregiven in geometric clustering, natural image segmentation,and medical image segmentation. In all cases, the samegraphG = (V,E,w), is passed to both algorithms. Tocompare the partitioning we report the expected improve-ment in NCut value, on the initial graphG = (V,E, w),and thevariation of information between clusterings.

The first comparison, the expected improvement invalue, can be expressed as a positive scalerc such thatnc(SR) u c · nc(EIG) on average. In the case wherec < 1 spectral roundingfinds an improved cut on average.

Throughout this section we report thevariation of infor-mation described in [12] to measure the difference betweentwo partitionings,Pa andPb of the vertex setV , defined as

Dvi(Pa, Pb) = H(Pa) + H(Pb)− 2I(Pa, Pb) (4.1)

with I(Pa, Pb) = −∑p

i=1

∑pj=1

ni,j

n log ni,j

n , H(P ) =−∑p

k=1nk

n log nk

n , wherenk is number of vertices in thekth partition, ni,j the size of the overlap between theith

partition inPa andjth partition inPb, andn = |V |. The en-tropy termH(P ) can be thought of as the number of bits re-quired to represent the distribution of partition sizes. WhereasI(Pa, Pb) is the mutual information between the two par-titioningsPa andPb. And so,Dvi can be thought of as thenumber of bits needed to represent the cumulative disagree-ment between the overlapping partitions ofPa andPb. AsexpectedDvi(Pa, Pa) = 0.

4.1. Random Geometric Graphs

We compareSRandEIG in the expected partition coston a collection of random geometric graphs. The vertices

Geo-Graph nc(SR)=.064 nc(EIG)=.109Figure 1. A (|V | = 300) geometric graph, and two 5-way cuts.

of G = (V,E, w) are associated with uniformly distributedcoordinates inRd. The edge set ofE(G) is then constructedusing the following rule, for{u, v ∈ V |u 6= v}, (u, v) ∈E ⇐⇒ dist(u, v) < r. We sampled 10000 graphswith 1000 vertices and chose the radiusr such that the ex-pected degree of each vertex was approximatelylog(|V |).As shown in Figure 1 such graphs afford a large numberof relatively small cuts. Table 1 contains the improvementfactor, and the cluster divergence. We note that the diver-gence distance, relative to partition entropyH(SR), high-lights that the NCut improvements are not due to a smallnumber of boundary vertex exchanges, but rather thatSRandEIG return significantly different subgraphs.

Dvi(SR,Eig) nc(SR) = c · nc(EIG)geo-graph 0.910± .219 c = .690± .113

Table 1. Comparison between spectral roundingSRand the multi-way cut algorithm of Yu and Shi [21]EIG. The partition entropyfor SR wasH(SR) u 1.935.

4.2. Image Segmentation

The parameters used in constructing a weighted graphfrom an image were fixed for all the results presented inthis section. The graphG = (V,E, w) represents an imageas follows. For each pixel in the image a vertex inV is as-signed. If two pixels are connected inE a weight inw isdetermined based on the image data. The graph connectiv-ity, E, was generated by connecting pixels to 15% of theirneighboring pixels in a 10 pixel radius. The initial weight-ing w of the graphG = (V,E,w) was determined usingtheIntervening Contourcue described in [17]. This cue as-signs small weights to pixels which lie on opposite sides ofa strong image boundary, and large weights otherwise.

4.2.1 Natural Image Segmentation

We compiled a set of a 100 images from Google Imagesusing the keywordsfarm, sports, flowers, mountains, &pets. Examples from this data set, and segmentations canbe found in Figure 2. Again, we note that changes inthe cut value often correlate with large changes in the co-membership relationships on the image pixels. To quantita-

7

Page 8: Graph Partitioning by Spectral Rounding: …Graph Partitioning by Spectral Rounding: Applications in Image Segmentation and Clustering ∗ David A. Tolliver Robotics Institute, Carnegie

tively compare the methods on natural images we report thedivergence distance and NCut improvement factorc.

Dvi(SR,Eig) nc(SR) = c · nc(EIG)natural 1.23± .160 c = .536± .201

Table 2. Comparison between spectral roundingSRand the multi-way cut algorithm of Yu and Shi [21]EIG on segmentations of nat-ural images. The average cluster entropy overSR-segmentationsof the image collection is1.62± .4.

4.2.2 Medical Image Segmentation

To a degree, clustering methods are only successful in thatare useful in servicing a particular task. We have selecteda simple medical task, segmenting out the left ventricle (afluid sack located in the brain), as it is well defined –i.e. theboundary of the percept is agreed upon by experts. Whilethis task would seem to be relatively easy, a successful au-tomatic method represents a significant reduction in humaneffort for a common labeling task.

The test set was constructed from a collection of 200NMR images containing the left ventricle. The collectionwas built by taking 5 slices each from scans of 40 individu-als. Images were selected randomly from slices containingthe left ventricle. As shown in Figure 5 the appearance ofthe ventricle varies substantially in shape and size.

The comparison of segmentations obtained from spectralrounding and the eigenvector method of Yu and Shi [21]with the human labels is given in Table 3. The divergencedistance and expected cut improvement are given in Table4. The average cluster entropy forSRwas0.611 ± .131.As this is a two-class problem, this suggests that one of thesegments tends to be much smaller than the other. This isdue to the often small size of the ventricle in the image.

nc(SR) nc(EIG)[21]Pr(v ∈ T (Im)) .95± .04 .37± .12

Table 3. The valuePr(v ∈ T (Im)) is reported over the popula-tion of images, whereT (Im) is the expert’s hand segmentationandPr(v ∈ T (Im)) is the probability that a pixelv in a segmentis also contained inT (Im) – this statistic was computed for thesegment with the largest overlap withT (Im).

Dvi(SR,Eig) nc(SR) = c · nc(EIG)medical 1.856± .192 c = .598± .237

Table 4. The divergence and expected value improvement for themedical image data set. The average cluster entropy forSRseg-mentations on the medical data set was0.611± .131.

5. Discussion

We have presented a new rounding technique for theeigenvectors of graph laplacians in order to partition a

nc(SR)=.019 nc(EIG)=.061 nc(SR)=.024 nc(EIG)=.057

nc(SR)=.021 nc(EIG)=.021nc(SR)=.048�� nc(EIG)=.068

Figure 5. Examples of the left ventricle, and qualitative resultsfor the SRandEIG algorithms. Segmentations required approx-imately1.2 seconds forEIG and1.9 seconds forSR.

graph. The method was shown to converge and demon-strated empirical improvements over the standard roundingstrategy on a variety of problems. In ongoing work we areseeking a theoretical bound for the SR-algorithm.

The initial eigencomputation remains as a hurdle to theintegration of spectral methods into large scale vision anddata-mining systems. At present the best known [7, 1] av-erage case time bound onε−approximate estimation of anextremal eigenpair isO(m

√nε ), for generalized Laplacians

with n = |V | andm = |E|. Fortunately, recent theoreticalresults hold out the promise of nearly-linear time eigencom-putation. Recent work by Spielman and Teng [19] on lineartime linear solvers suggests thatε−approximate eigencom-putation may soon fall into same time complexity.

References

[1] S. Arora, E. Hazan, and S. Kale. Fast algorithms for ap-proximate semidefinite programming using the multiplica-tive weights update method.FOCS, pages 339–348, 2005.

[2] S. Arora, S. Rao, and U. V. Vazirani. Expander flows, geo-metric embeddings and graph partitioning. InSTOC, pages222–231, 2004.

[3] T. F. Chan, J. R. Gilbert, and S.-H. Teng. Geometric spectralbisection. Manuscript, July 1994.

8

Page 9: Graph Partitioning by Spectral Rounding: …Graph Partitioning by Spectral Rounding: Applications in Image Segmentation and Clustering ∗ David A. Tolliver Robotics Institute, Carnegie

Input Data Feature Map Eig [21] SR

k=4 nc(EIG) = .0151 nc(SR) = .0064

k=5 nc(EIG) = .0119 nc(SR) = .0030

k=5 nc(EIG) = .0069 nc(SR) = .0033

k=5 nc(EIG) = .0019 nc(SR) = .0015Figure 2. The first four rows illustrate with qualitative examples the improvements in NCut value for natural images. Column three containssegmentations generated by the published code of Yu and Shi [21]. Column four contains results forSpectral Rounding. The number ofsegmentsk is fixed for each comparison. We emphasize that the cut cost was evaluated on identical combinatorial problems (graphs).

Input Data Feature Map Eig [21] SR

k=2, comparison nc(EIG) = .0021 nc(SR) = .0021Figure 3. The second to last row illustrates a 2-way cut in which the NCut values are nearly identical, but which support very differentpercepts.

[4] F. R. K. Chung.Spectral graph theory. Number 92 in Re-gional Conference Series in Mathematics. Amer. Math. Soc,Providence, 1997.

[5] S. Guattery and G. L. Miller. On the quality of spectral sep-arators.Matrix Analysis & Applications, 19(3), 1998.

[6] J. Kleinberg. An impossibility theorem for clustering.Ad-vances in Neural Information Processing Systems, 2003.

[7] J. Kuczynski and H. Wozniakowski. Estimating the largesteigenvalue by the power and lanczos algorithms with a ran-dom start. Matrix Analysis & Applications, 4(13):1094–1122, 1992.

[8] K. Lang and S. Rao. A flow-based method for improving theexpansion or conduectance of graph cuts.Integer Program-ming and Combinatorial Optimization, pages 325–337, June2004.

9

Page 10: Graph Partitioning by Spectral Rounding: …Graph Partitioning by Spectral Rounding: Applications in Image Segmentation and Clustering ∗ David A. Tolliver Robotics Institute, Carnegie

Input Data Eig [21] Intermediate SRfinal

k=6, SR iteration nc(EIG) = .0074 i = 1, nc(SR) = .0062 i = 4, nc(SR) = .0057Figure 4. A sequence of iterations projected onto the feasible set, starting left with solution from Yu’s method and ending with the fourthand finalSRiteration on the right. Notice that the large cuts in the sky and field shift to small cuts in the area around the farm.

[9] P. D. Lax. Linear Algebra. Pure and Applied Mathematics.Wiley-Interscience, 1st edition, 1997.

[10] F. T. Leighton and S. Rao. An approximate max-flow min-cut theorem for uniform multicommodity flow problemswith applications to approximation algorithms. InFOCS,pages 422–431. IEEE, Oct. 1988.

[11] M. Meilaa. Comparing clusterings – an axiomatic view.ICML, pages 577–584, 2005.

[12] M. Meila. Comparing clusterings by the variation of infor-mation.COLT, pages 173–187, 2003.

[13] G. L. Miller and D. A. Tolliver. Graph partitioning by spec-tral rounding.SCS Technical Report, May 2006.

[14] B. Mohar. Isoperimetric numbers of graphs.Journal ofCombinatorial Theory, Series B, 47:274–291, 1989.

[15] A. Ng, M. Jordan, and Y. Weiss. On spectral clustering:Analysis and an algorithm. InNIPS, 2002.

[16] N. Shental, A. Zomet, T. Hertz, and Y. Weiss. Learningand inferring image segmentations with the gbp typical cutalgorithm. ICCV, 2003.

[17] J. Shi and J. Malik. Normalized cuts and image segmenta-tion. In PAMI, 2000.

[18] D. Spielman and S. Teng. Spectral partitioning works: Pla-nar graphs and finite element meshes.FOCS, 1996.

[19] D. Spielman and S. Teng. Nearly linear time algorithms forgraph partitioning, graph sparsification, and solving linearsystems.STOC, pages 81–90, 2004.

[20] E. P. Xing and M. I. Jordan. On semidefinte relaxtion fornormalized k-cut and connections to spectral clustering.TR-CSD-03-1265, University of California Berkeley, 2003.

[21] S. Yu and J. Shi. Multiclass spectral clustering. InICCV,October 2003.

10