The full-degree spanning tree problem

The Full-Degree Spanning Tree Problem

Randeep Bhatia,∗ Samir Khuller, Robert Pless, Yoram J. SussmannComputer Science Department and Institute for Advanced Computer Studies, University of Maryland,College Park, Maryland 20742

The full-degree spanning tree problem is defined as fol-lows: Given a connected graph GGG = (VVV, EEE), find a span-ning tree TTT to maximize the number of vertices whosedegree in TTT is the same as in GGG (these are called verticesof “full” degree). This problem is NP-hard. We presentalmost-optimaloptimaloptimal approximation algorithms for it assumingthat coRcoRcoR ≠≠≠ NPNPNP. For the case of general graphs, our ap-proximation factor is Θ(

√nnn). Using Håstad’s result on

the hardness of an approximating clique, we can showthat if there is a polynomial time approximation algo-rithm for our problem with a factor of OOO(nnn1/2−εεε) then coRcoRcoR= NPNPNP. Additionally, we present two algorithms for op-timally solving small instances of the general problemand experimental results comparing our algorithm tothe optimal solution and the previous heuristic used forthis problem. © 2000 John Wiley & Sons, Inc.

Keywords: graph; algorithm; approximation; spanning tree;NP-hardness; full degree

1. INTRODUCTION

In this paper, we study the problem of computing aspanning tree T in a connected graph G = (V, E) to max-imize the number of vertices of full degree. These arevertices whose degree in the tree T is the same as theirdegree in the graph G. This problem was first mentionedby Lewinter [8]. Tree optimization problems have beenvery well studied (see [3] for a survey).

The problem arises as a central problem in the workby Pothof and Schut [11] which has to do with water-distribution networks. To measure flow in pipes, you caninstall flowmeters on edges. However, you do not needto install flowmeters on all edges in the network. If you

Received September 1999; accepted July 2000Correspondence to: R. Pless; e-mail: [email protected] preliminary version of this paper appeared in the 10th Annual ACM–SIAM Symposium on Discrete Algorithms, 1999∗Present address: Bell Labs, Lucent Technologies, Murray Hill, NJ07974. This research was done while this author was at the Universityof MarylandContract Grant Sponsor: NSF; contract grant number: CAREER AwardCCR-9501355

c© 2000 John Wiley & Sons, Inc.

install flowmeters on edges of a cotree (a cotree H is a setof edges in the graph, such that the deletion of edges inH leaves a spanning tree in the graph), then we can inferthe flow on the edges of the spanning tree due to flowconservation (and the flow into and out of terminals).This requires that we install exactly m−n+1 flowmeters,where m is the number of edges and n is the number ofvertices in the network, since every cotree has m−(n−1)edges. Unfortunately, “the cost of flow meters is severaltimes the cost of pressure meters” [11]. Pressure metersare installed at vertices, and one can compute the flow onan edge by measuring the pressures on both the incidentvertices. Thus, we are looking for a cotree that is incidenton as few vertices as possible (to minimize cost). Thesavings is exactly the number of full-degree vertices ina spanning tree, since these vertices have zero degree inthe corresponding cotree and we do not have to installpressure meters at these vertices. Hence, we would liketo maximize the number of vertices of full degree. Onepressure meter is installed at each vertex that does nothave full degree; by using these meters, we can computethe flow on each edge in the cotree and then infer theflow on each edge in the tree due to flow conservation(and the flow into and out of terminals).

1.1. Previous Work

Pothof and Schut [11] suggested the following heuris-tic to maximize the number of full degree vertices: De-fine the weight of each edge to be the sum of the de-grees of the incident vertices and compute a minimum-weight spanning tree. As discussed in their paper, thereare cases when their algorithm does very poorly. Con-sider a wheel graph (a cycle with an additional centralvertex with edges to all vertices on the cycle); then theiralgorithm may find no vertices of full degree! In fact, forthis particular graph, our full-degree spanning tree algo-rithm (for arbitrary graphs) finds a spanning tree withn/3 vertices of full degree.

NETWORKS, Vol. 36(4), 203–209 2000

1.2. Our Results

The full-degree spanning tree problem is NP-hard[1, 2]. We consider approximation algorithms for thisproblem—these are algorithms that have a polynomialrunning time and produce a suboptimal solution. The ra-tio of the optimal solution to the solution that we pro-duce (for a maximization problem) is referred to as theapproximation ratio ρ.

In Subsection 2.1, we provide an algorithm with an ap-proximation ratio of O(

√n) for the full-degree spanning

tree (FDST) problem. In Subsection 2.2, we show that ifthere is a polynomial time algorithm for the FDST prob-lem with a worst-case approximation ratio of O(n1/2−ε)then coR = NP. This proves that our worst-case approx-imation ratio cannot be improved significantly, if NP-complete problems do not have randomized polynomialalgorithms. Section 3 discusses absolute bounds on thesize of the FDST solution.

In Section 4, we give two algorithms to compute theoptimal solution for small-problem instances and com-pare and contrast our heuristic with the one given byPothof and Schut [11]. For large random graphs (we triedvarious kinds of general graphs and planar graphs gener-ated by LEDA [9] and Stanford Graphbase [7]), we foundthat our heuristic consistently gave better solutions. Wealso performed tests to compare our algorithm’s solutionto the optimal solution. For random planar graphs, wefound that, in all cases except one, our heuristic gave theoptimal solution. For arbitrary random graphs, on halfthe tests, our algorithm found the optimal solution, andon the other half of the tests, it missed by one vertex (inone case it missed by two vertices).

Another paper independently considered the samequestion, giving algorithms to find exact solutions tothe FDST problem for interval graphs, cocomparabil-ity graphs, graphs of bounded asteroidal number, andbounded tree-width graphs and giving a polynomial timeapproximation scheme for this problem when the inputgraph is planar [2].

2. APPROXIMATION ALGORITHMS

2.1. Full-degree Spanning Tree Algorithm

Let the given graph be G = (V, E) and let G haven vertices. Let G1 be a graph defined on vertex set V :G1 = (V, E1), where E1 ⊆ E. Let X ⊆ E be a set ofedges. We define G1 ⊕ X(G1 X) to be the graph withedge set E1 ∪ X(E1\X) and vertex set V. Let Y ⊆ V bea set of vertices. We define EY ⊆ E to be the set of alledges at least one of whose endpoints is incident on somevertex in Y. We define G1 ⊕ Y(G1 Y) to be the graphG1 ⊕ EY(G1 EY). These operations are augmentation

(reduction) of G1 by a given set of vertices or edges. LetN(v) be the set of neighbors of vertex v.

We now present the Greedy Star-Insertion Algorithmthat finds a good solution to the FDST problem. Thealgorithm is extremely simple. We show how to imple-ment it efficiently and then prove an O(

√n) bound for its

worst-case approximation factor.

High-level Description. The algorithm first sorts thevertices in nondecreasing order of degree, then considersthem one-by-one to be added to the current solution. LetS be the full-degree vertices (initially, S is empty), andF, the spanning tree that we are constructing. At eachstep, we insert a vertex into S, together with its incidentedges (a “star”), as long as it does not create a cycle. Insome sense, the algorithm is similar to Kruskal’s MSTalgorithm, except that we are inserting vertices (alongwith all their incident edges) rather than simply addingsingle edges at each step.

The main hurdle regarding an efficient implementationis to check if the vertex vi that we are considering canbe legally inserted or not, without creating a cycle inF. The edges incident on vi are in two categories: edgesalready in F and edges not in F as yet. If the set ofvertices vi∪u|(vi, u) /∈ F and u ∈ N(vi) all belong todistinct connected components of F, then we can insert vi

without creating a cycle in F. However, if two verticesbelong to the same component, then adding the edgesincident to vi will create a cycle in F.

The algorithm uses the well-known Union-Find datastructure (see Tarjan [12]) to maintain the connectedcomponents induced by the current spanning forest F.Maintaining also a bit vector encoding edge membershipin F allows the check (e ∈ F?) to be done in constanttime. The algorithm scans the adjacency list of the cur-rent vertex vi and, for each adjacent edge not already inthe forest, checks to see which component the oppositevertex lies in. If all the components thus discovered alongwith the component in which vi lies are distinct, then vi

can be added to the current solution. We do this checkusing an array Validate[]. While processing vi, when wescan an edge (vi, u) /∈ F, and u belongs to component c,we set Validate[c] = i. If for some other edge (vi, u′) /∈ F,u′ also belongs to component c, then we can detect thatwe are trying to write i in location Validate[c] that al-ready contains i. This will detect the case when twoneighbors of vi belong to the same component.

A detailed description of the algorithm is given inFigure 1. We assume that our input is a graph G = (V, E)and the output is a spanning tree F of G and a set ofvertices S that have full degree in F.

Theorem 1. The Greedy Star-Insertion Algorithm de-livers a solution of size at least (OPT)/(2

√n + 1) where

OPT is the size of the optimal solution. Moreover, thealgorithm has a worst-case time bound of O(mα(m, n)),where n, m denote the number of vertices and edges in

204 NETWORKS–2000

FIG. 1. Algorithm for finding a good FDST.

the graph, respectively, and α(m, n) is the inverse Acker-mann function [12].

Proof. Let TOPT be an optimal solution to the FDSTproblem. Let OPT be the number of vertices of full de-gree. Let d be a degree threshold. Let Ad(Bd) be the setof full degree vertices of degree ≤ (>)d in an optimalsolution. Hence, OPT = |Ad| + |Bd|. We will bound Ad

and Bd separately. Let Id denote the set of vertices of de-gree ≤ d in S, where S is the set of full degree verticesobtained by the Greedy Star-Insertion Algorithm. Notethat the vertices in Id have full degree in the final solu-tion F. Lemma 2 shows that |Ad| ≤ d|Id| and Lemma4 shows that |Bd| ≤ (n − 2)/(d − 1). Therefore, since|S| ≥ max1, |Id|, we have

OPT

|S| =|Ad| + |Bd|

|S| ≤ d|Id| + n−2d−1

max1, |Id| ≤ d +n

d − 1.

Let d =√

n + 1. Then, we have

|S| ≥ OPT

2√

n + 1.

Finally, note that in the Greedy Star-Insertion Algorithmeach edge is considered at most twice, once for each oneof its incident vertices, and each vertex is consideredonce. Also note that for each edge considered by thealgorithm a constant number of Union and Find opera-tions are invoked. This, therefore, implies the bound onthe running time. The first sorting step can be done inlinear time, as we are only sorting degrees which areintegral valued in the range 1, . . . , n.

Lemma 2 (Bound on Low-degree Vertices). |Ad| ≤d|Id|, where Ad (Id) is the set of full-degree verticesof degree ≤ d in the optimal solution (the tree F outputby the algorithm).

Proof. Id is a forest, with at most d|Id| nodes.Suppose these nodes form a collection of componentsC1, . . . Ck. We can bound the number of edges:

∑(|Ci|−

1) ≤ d|Id|. Consider any v ∈ Ad. The star of this node(including v) is incident on at least two nodes of some Ci;either it was chosen by the greedy algorithm, in whichcase the whole star is in Ci, or it was not picked becauseit would have created a cycle. If there are multiple suchcomponents, pick any one to charge. Each componentCi gets charged at most |Ci| − 1 times (the number ofedges in it from F). Each node of OPT defines connec-tivity between two nodes of some Ci using the optimaltree. Since the optimal solution is acyclic, there cannotexist more than |Ci| − 1 connections in a subset of sizeCi. So, Ad ≤ ∑

(|Ci| − 1) ≤ d|Id|.The following corollary follows easily from Lemma 2:

Corollary 3. For graphs with degree bounded by d theGreedy Star-Insertion Algorithm delivers a solution ofsize at least OPT/d, where OPT is the size of the optimalsolution.

Lemma 4 (Bound on High Degree Vertices). |Bd| ≤(n−2)/(d−1), where Bd is the set of full-degree verticesof degree > d in the optimal solution.

NETWORKS–2000 205

Proof. For ease of presentation, we drop the sub-script on Bd in the following proof.

Note that GB = (V, EB) must be a forest. For eachvertex v ∈ B, let OUT(v) be the number of edges incidentto v whose other endpoint is not in B. Let IN(v) be thenumber of edges incident to v whose other endpoint is inB. Since for each such v we have IN(v) + OUT(v) ≥ d,by summing, we obtain

∑v∈B(IN(v) + OUT(v)) ≥ d|B|.

Let E′B be the edges between vertices in B. On the left-

hand side of the summation, the edges in E′B are counted

twice and the edges in EB − E′B are counted once. We

thus obtain |EB| + |E′B| ≥ |B|d. Since |E′

B| ≤ |B| − 1,we obtain |B| − 1 + (n − 1) ≥ |B|d. Simplifying givesthe bound of |B| ≤ (n − 2)/(d − 1).

2.2. Lower Bounds on Approximation Factorfor FDST

Let the input graph be G = (V, E) and let G haven vertices. We show that it is not possible to design anO(n(1/2−ε))(ε > 0) approximation algorithm for the FDSTproblem unless coR = NP.

This shows that our approximation algorithm is al-most optimal assuming that coR ≠ NP. In addition, ourlower bound results hold for bipartite graphs.

We establish the lower bound by a linear reductionfrom the independent set problem to the FDST prob-lem. It is known that no polynomial time O(n1−ε)(ε > 0)approximation algorithm exists for the independent setproblem, unless coR = NP [5].

Theorem 5. No O(n(1/2−ε))(ε > 0) approximation algo-rithm exists for the FDST problem, unless coR = NP.

Proof. Given a graph H, an input instance of the in-dependent set problem (we will assume w.l.o.g. that Hhas at least two edges and that there are no isolated ver-tices in H), we create an instance graph G of the FDSTproblem as follows: We also assume that the maximumindependent set in H has size at least three. G can beviewed as a four-layer graph whose edges only connectvertices in adjacent layers. Hence, G is bipartite. Layerone consists of just one vertex a. Layer two has one ver-tex for every vertex of H, and every vertex in the secondlayer is connected to a. Layer three has one vertex forevery edge of H, and if (u, v) is an edge in H, then thecorresponding vertex in the third layer is connected tothe vertices in layer two, corresponding to the verticesu and v of H. Finally, layer four has two vertices b andc, which are both connected to every vertex in the thirdlayer.

Let T be a feasible solution to the FDST problem forgraph G. First, note that if any two vertices have at leasttwo common neighbors in G then they both cannot havefull degree in T. Hence, only one vertex from amongb, c has full degree in T. This is because H has at

least two edges and, hence, b and c have at least twocommon neighbors. Similarly, at most one vertex in thethird layer of G has full degree in T. This is because bothb and c are in the neighborhood of every vertex in thethird layer. If vertex a has full degree in T, then noneof the vertices in the third layer of G have full degreein T, since these vertices have two common neighborsin layer two. Finally, note that all vertices in the secondlayer with full degree in T must form an independent setin H.

The above implies that if H has an independent setof size i then there is a feasible solution of size i + 1,to the FDST problem, for the graph G. In this solution,vertex a and the vertices in the independent set have fulldegree. Similarly, if there is a feasible solution to theFDST problem for the graph G of size i + 1, then if bor c has full degree (only one of them can be a full-degree vertex), then no pair of vertices in the secondlayer can have full degree. To see this, suppose that vand v′ are two vertices in the second layer that havefull degree. Since they have edges to vertex a and sinceeach one of them has an edge to a vertex in layer three(corresponding to the edge incident on these vertices inH), which are both connected to either vertex b or c:By the assumption that one of the vertex b or c has fulldegree, this would thus imply the presence of a cyclein the solution to the FDST, a contradiction. We cannotpick more than one full-degree vertex in layer three inany case, and also we can only pick vertex a or a vertexin layer three of full degree, but not both. Therefore,we will be able to pick at most three vertices of fulldegree. Hence, at least i vertices in layer two have fulldegree in this feasible solution and, therefore, H has anindependent set of size i.

By our reduction, an α-approximation algorithm forthe FDST problem implies an α-approximation algo-rithm (up to an additive term) for the independent setproblem. Let N be the number of vertices in H. Then, thelower bound for the independent set problem [5] estab-lishes that α = Ω(N(1−ε))(ε > 0) unless coR = NP. Notethat G has n = O(N2) vertices, where N is the number ofvertices in H. This, therefore, yields the claimed lowerbound on α in terms of n.

3. ABSOLUTE SIZE OF THE SOLUTIONTO THE FDST PROBLEM

In this section, we show that we can always find asolution to the FDST problem of size Ω(n/∆2), where ∆is the maximum degree in the input graph G. In addition,we give an example of a graph, with maximum degree ∆,for which the size of any solution to the FDST problemis O(n/∆2).

Let G2 be the graph obtained from G by adding ad-ditional edges between those vertices of G that have acommon neighbor in G. Note that any two vertices that

206 NETWORKS–2000

belong to an independent set of G2 do not have a com-mon neighbor in G. Hence, the set of edges of G incidenton the vertices in any independent set of G2 do not con-tain any cycles. Hence, the set of vertices in any inde-pendent set of G2 form a solution to the FDST problemon the graph G. Observe that we can pick a maximalindependent set in G2 of size Ω(n/∆2). Therefore, wecan always find a solution to the FDST problem, of sizeΩ(n/∆2). The following example shows that, in general,the biggest possible solution to the FDST problem willbe of size O(n/∆2):

For our example, we use the graph shown in Figure2 (∆ ≥ 4). The vertices of this graph can be thought ofas the points on a grid of size (∆/2 + 1) × (∆/2 + 1):There are ∆/2 + 1 vertices in any horizontal strip andthere are ∆/2 + 1 vertices in any vertical strip. Thus,n ≈ ∆2/4. For every horizontal (vertical) strip, there isa horizontal (vertical) clique that connects the verticesof the strip. Thus, every vertex is in two cliques, each ofsize ∆/2+1: one vertical and the other horizontal. Everyvertex, thus, has degree ∆. Clearly, no two vertices thatbelong to a common clique can be of full degree in theFDST, since any three vertices in a clique are connectedin a cycle. Also, every pair of vertices that are not in acommon clique have two common neighbors, one in ahorizontal clique and one in a vertical clique. Thus, themaximum number of full-degree vertices in an FDST onthis graph is 1. Clearly, this can be extended for largervalues of n by duplicating this structure.

Note: For planar graphs, there are excellent (abso-lute) bounds on the size of independent sets obtained bythe greedy algorithm (see Papadimitriou and Yannakakis[10]). Perhaps such bounds can be obtained for indepen-dent sets in the square (G2) of bounded degree planargraphs? (If the degree is unbounded, then K2,n−2 is anexample of a planar graph whose square is a clique, andthere is only one vertex of full degree.)

FIG. 2. Tight example for the size of the FDST solution.

4. OPTIMAL SOLUTIONS AND EXPERIMENTALRESULTS

We implemented the Pothof and Schut heuristic [11],as well as our Greedy Star-Insertion heuristic, and com-pared the performance of the two heuristics. We referto the heuristics as PS and BKPS, respectively. To ex-perimentally determine the empirical efficiency of theGreedy Star-Insertion Algorithm, it is necessary to findthe optimal solution. We give an integer program that isfeasible to run for small graphs, up to about 30 nodeswith arbitrary connectivity, and then we give an algo-rithm that is especially efficient for larger, dense graphs.

4.1. Integer Program Formulation

∀e, xe ∈ 0, 1, xe = 1 iff e ∈ spanning tree

∀v, fv ∈ 0, 1, fv = 1 iff v has full degree

maximize∑

v∈V

fv

with constraints:∑

e

xe = |V| − 1

∀v, ∀e incident on v, fv ≤ xe.

This integer program, as written, does not ensure thatthe set of edges with xe = 1 forms a spanning tree. Toenforce this condition, the integer program must be aug-mented with constraints that enforce connectivity. Thiscan be implemented by arbitrarily defining a root node,creating a commodity for each vertex, and enforcing thatthere is a valid flow, along the spanning tree edges, fromeach node to the root. If the set of edges for which xe = 1has size |V|−1 and connects every vertex to an arbitraryvertex, it must be a spanning tree.

FIG. 3. The recursive call structure for dense graphs—the numberinside the circle is the number of unassigned nodes, a measure of howmuch work the algorithm has left to do. For the instance shown here,making one node full degree prohibits 20 others, leading to a slowexponential growth.

NETWORKS–2000 207

FIG. 4. Experimental comparison of new approximation algorithm BKPS and previous heuristic PS for random planar and random arbitrarygraphs.

4.2. Dense Graphs

A different algorithm can find the optimal solutionefficiently for dense graphs. No pair of nodes can si-multaneously have full degree if they have at least twopaths of length at most two between them. Thus, in densegraphs, choosing one node as having full degree imme-diately eliminates the possibility for many other nodesto have full degree. This leads to a recursive algorithmwhich we outline as follows (see Fig. 3): Initially, colorall nodes gray. Start each recursive iteration by choosingany gray vertex v, which may or may not be a full degreenode in the OPT solution. We follow both possibilities:In case one, we assume that v does not have full degree,we color v black, and recursively choose another graynode to continue. In the other case, v does have full de-gree, we color v white, and then color black all the nodesin the graph which can now no longer have full degreein the spanning tree. The point is that, while the runningtime of this algorithm is exponential, it grows slowly fordense graphs, because we get to color many nodes blackeach time that we postulate a node as having full degree.

4.3. Experiments

Preliminary testing has been done on various kindsof graphs generated in LEDA [9] and Stanford Graph-base [7] (random planar and nonplanar graphs) of sizesranging from 10 vertices to 200 vertices. Figure 4 dis-plays the results of tests on graphs with 100 vertices forvarious edge densities.

The first chart shows the performance of the two al-gorithms on random planar graphs of 100 vertices. Thesecond chart shows the performance on random graphsof 100 vertices. The x axis records the number of edges

in the graph, and the y axis records the number of full-degree nodes that we find.

For small graphs, we were able to compare the al-gorithm solution with the optimal solution, computed asdescribed above. For random planar graphs, only in onecase (out of about 35 tests) was there a difference be-tween the optimal solution and BKPS and that, too, byonly one vertex. For arbitrary random graphs (out of25 tests), for half the tests, BKPS obtained an optimalsolution, and for the other half of the tests, the opti-mal solution had one more vertex (in one case, BKPSmissed by two vertices). The BKPS algorithm can be fur-ther improved in two ways: The degree of a node maybe viewed as a measure of “how many vertices of fulldegree are possibly prevented from being picked.” The“effective” degree of a node is computed as if the edgesof the star were deleted from the graph; the performanceis improved by dynamically updating the degree order-ing of the nodes. An additional strategy that uses localsearch to improve the final solution of BKPS was dis-cussed in [6].

Acknowledgments

We thank Jan Schut for sending us a copy of [11], HalGabow for suggestions improving Lemma 2, and SudiptoGuha for useful comments.

REFERENCES

[1] R. Bhatia, S. Khuller, R. Pless, and Y. Sussmann, The fulldegree spanning tree problem, Tech. Report CS-TR-3931,University of Maryland, 1998.

[2] H.J. Broersma, A. Huck, T. Kloks, O. Koppius, D. Kratsch,H. Müller, and H. Tuinstra, Degree-preserving forests, Net-works 35 (2000), 26–39.

208 NETWORKS–2000

[3] G. Galbiati, A. Morzenti, and F. Maffioli, On the approx-imability of some maximum spanning tree problems, TheorComput Sci 181 (1997), 107–118.

[4] M.R. Garey and D.S. Johnson, Computers and intractabil-ity: A guide to the theory of NP-completeness, Freeman,San Francisco, 1979.

[5] J. Håstad, Clique is hard to approximate within n1−ε, 37thAnn Symp on Foundations of Computer Science, 1996, pp.627–636.

[6] S. Khuller, R. Bhatia, and R. Pless, On local search andplacement of meters in networks, 11th Ann ACM–SIAMSymp on Discrete Algorithms, 2000, pp. 319–328.

[7] D.E. Knuth, The Stanford graphbase, ACM/Addison Wes-ley, New York, 1993.

[8] M. Lewinter, Interpolation theorem for the number ofdegree-preserving vertices of spanning trees, IEEE TransCirc Syst CAS-34 (1987), 205.

[9] K. Mehlhorn and S. Näher, LEDA: A platform for com-binatorial and geometric computing, Commun ACM 38(1995), 96–102; http://mpi-sb.mpg.de/LEDA/leda.html.

[10] C.H. Papadimitriou and M. Yannakakis, Worst-case ratiosin planar graphs and the method of induction on faces,22nd Ann Symp on Foundations of Computer Science,1981, pp. 358–363.

[11] I.W.M. Pothof and J. Schut, Graph-theoretic approach toidentifiability in a water distribution network, Memoran-dum No 1283, Universiteit Twente, 1995.

[12] R.E. Tarjan, Data structures and network algorithms, Soci-ety for Industrial and Applied Mathematics, Philadelphia,1983.

NETWORKS–2000 209

Documents

The full-degree spanning tree problem