9
Min-Max Problems on Factor-Graphs Siamak Ravanbakhsh * MRAVANBA@UALBERTA. CA Christopher Srinivasa CHRIS@PSI . UTORONTO. CA Brendan Frey FREY@PSI . UTORONTO. CA Russell Greiner * RGREINER@UALBERTA. CA * Computing Science Dept., University of Alberta, Edmonton, AB T6G 2E8 Canada PSI Group, University of Toronto, ON M5S 3G4 Canada Abstract We study the min-max problem in factor graphs, which seeks the assignment that minimizes the maximum value over all factors. We reduce this problem to both min-sum and sum-product infer- ence, and focus on the later. In this approach the min-max inference problem is reduced to a sequence of Constraint Satisfaction Problems (CSP), which allows us to solve the problem by sampling from a uniform distribution over the set of solutions. We demonstrate how this scheme provides a message passing solution to several NP-hard combinatorial problems, such as min- max clustering (a.k.a. K-clustering), asymmetric K-center clustering problem, K-packing and the bottleneck traveling salesman problem. Further- more, we theoretically relate the min-max reduc- tions and several NP hard decision problems such as clique cover, set-cover, maximum clique and Hamiltonian cycle, therefore also providing mes- sage passing solutions for these problems. Ex- perimental results suggest that message passing often provides near optimal min-max solutions for moderate size instances. 1. Introduction In recent years, message passing methods have achieved a remarkable success in solving different classes of opti- mization problems, including maximization (e.g., Frey & Dueck 2007;Bayati et al. 2005), integration (e.g., Huang & Jebara 2009) and constraint satisfaction problems (e.g., Mezard et al. 2002).When formulated as a graphi- cal model, these problems correspond to different modes of inference: (a) solving a CSP corresponds to sampling Proceedings of the 31 st International Conference on Machine Learning, Beijing, China, 2014. JMLR: W&CP volume 32. Copy- right 2014 by the author(s). from a uniform distribution over satisfying assignments, while (b) counting and integration usually correspond to estimation of the partition function and (c) maximization corresponds to Maximum a Posteriori (MAP) inference. Here we introduce and study a new class of inference over graphical models –i.e., (d) the min-max inference problem, where the objective is to find an assignment to minimize the maximum value over a set of functions. The min-max objective appears in various fields, partic- ularly in building robust models under uncertain and ad- versarial settings. In the context of probabilistic graph- ical models, several different min-max objectives have been previously studied (e.g., Kearns et al. 2001;Ibrahimi et al. 2011). In combinatorial optimization, min-max may refer to the relation between maximization and minimiza- tion in dual combinatorial objectives and their correspond- ing linear programs (e.g., Schrijver 1983), or it may refer to min-max settings due to uncertainty in the problem speci- fication (e.g., Averbakh 2001;Aissi et al. 2009). Our setting is closely related to a third class of min-max combinatorial problems that are known as bottleneck prob- lems. Instances of these problems include bottleneck trav- eling salesman problem (Parker & Rardin 1984), min- max clustering (Gonzalez 1985), k-center problem (Dyer & Frieze 1985;Khuller & Sussmann 2000) and bottleneck assignment problem (Gross 1959). Edmonds & Fulkerson (1970) introduce a bottleneck framework with a duality theorem that relates the min- max objective in one problem instance to a max-min ob- jective in a dual problem. An intuitive example is the du- ality between the min-max cut separating nodes a and b the cut with the minimum of the maximum weight – and min-max path between a and b, which is the path with the minimum of the maximum weight (Fulkerson 1966). Hochbaum & Shmoys (1986) leverage triangle inequality in metric spaces to find constant factor approximations to several NP-hard min-max problems under a unified frame- work.

Min-Max Problems on Factor-Graphsproceedings.mlr.press/v32/ravanbakhsh14.pdf · 2020. 11. 21. · Min-Max Problems on Factor-Graphs is the normalizing constant and I(:) is the indicator

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Min-Max Problems on Factor-Graphsproceedings.mlr.press/v32/ravanbakhsh14.pdf · 2020. 11. 21. · Min-Max Problems on Factor-Graphs is the normalizing constant and I(:) is the indicator

Min-Max Problems on Factor-Graphs

Siamak Ravanbakhsh∗ [email protected] Srinivasa‡ [email protected] Frey‡ [email protected] Greiner∗ [email protected]

∗ Computing Science Dept., University of Alberta, Edmonton, AB T6G 2E8 Canada‡ PSI Group, University of Toronto, ON M5S 3G4 Canada

AbstractWe study the min-max problem in factor graphs,which seeks the assignment that minimizes themaximum value over all factors. We reduce thisproblem to both min-sum and sum-product infer-ence, and focus on the later. In this approachthe min-max inference problem is reduced toa sequence of Constraint Satisfaction Problems(CSP), which allows us to solve the problem bysampling from a uniform distribution over the setof solutions. We demonstrate how this schemeprovides a message passing solution to severalNP-hard combinatorial problems, such as min-max clustering (a.k.a.K-clustering), asymmetricK-center clustering problem, K-packing and thebottleneck traveling salesman problem. Further-more, we theoretically relate the min-max reduc-tions and several NP hard decision problems suchas clique cover, set-cover, maximum clique andHamiltonian cycle, therefore also providing mes-sage passing solutions for these problems. Ex-perimental results suggest that message passingoften provides near optimal min-max solutionsfor moderate size instances.

1. IntroductionIn recent years, message passing methods have achieveda remarkable success in solving different classes of opti-mization problems, including maximization (e.g., Frey &Dueck 2007;Bayati et al. 2005), integration (e.g., Huang& Jebara 2009) and constraint satisfaction problems(e.g., Mezard et al. 2002).When formulated as a graphi-cal model, these problems correspond to different modesof inference: (a) solving a CSP corresponds to sampling

Proceedings of the 31 st International Conference on MachineLearning, Beijing, China, 2014. JMLR: W&CP volume 32. Copy-right 2014 by the author(s).

from a uniform distribution over satisfying assignments,while (b) counting and integration usually correspond toestimation of the partition function and (c) maximizationcorresponds to Maximum a Posteriori (MAP) inference.Here we introduce and study a new class of inference overgraphical models –i.e., (d) the min-max inference problem,where the objective is to find an assignment to minimizethe maximum value over a set of functions.

The min-max objective appears in various fields, partic-ularly in building robust models under uncertain and ad-versarial settings. In the context of probabilistic graph-ical models, several different min-max objectives havebeen previously studied (e.g., Kearns et al. 2001;Ibrahimiet al. 2011). In combinatorial optimization, min-max mayrefer to the relation between maximization and minimiza-tion in dual combinatorial objectives and their correspond-ing linear programs (e.g., Schrijver 1983), or it may refer tomin-max settings due to uncertainty in the problem speci-fication (e.g., Averbakh 2001;Aissi et al. 2009).

Our setting is closely related to a third class of min-maxcombinatorial problems that are known as bottleneck prob-lems. Instances of these problems include bottleneck trav-eling salesman problem (Parker & Rardin 1984), min-max clustering (Gonzalez 1985), k-center problem (Dyer& Frieze 1985;Khuller & Sussmann 2000) and bottleneckassignment problem (Gross 1959).

Edmonds & Fulkerson (1970) introduce a bottleneckframework with a duality theorem that relates the min-max objective in one problem instance to a max-min ob-jective in a dual problem. An intuitive example is the du-ality between the min-max cut separating nodes a and b –the cut with the minimum of the maximum weight – andmin-max path between a and b, which is the path withthe minimum of the maximum weight (Fulkerson 1966).Hochbaum & Shmoys (1986) leverage triangle inequalityin metric spaces to find constant factor approximations toseveral NP-hard min-max problems under a unified frame-work.

Page 2: Min-Max Problems on Factor-Graphsproceedings.mlr.press/v32/ravanbakhsh14.pdf · 2020. 11. 21. · Min-Max Problems on Factor-Graphs is the normalizing constant and I(:) is the indicator

Min-Max Problems on Factor-Graphs

The common theme in a majority of heuristics for min-maxor bottleneck problems is the relation of the min-max ob-jective with a CSP (e.g., Hochbaum & Shmoys 1986; Pan-igrahy & Vishwanathan 1998). In this paper, we establisha similar relation within the context of factor-graphs, byreducing the min-max inference problem on the originalfactor-graph to that of sampling (i.e., solving a CSP) onthe reduced factor-graph. We also consider an alternativeapproach where the factor-graph is transformed such thatthe min-sum objective produces the same optimal result asmin-max objective on the original factor-graph. Althoughthis reduction is theoretically appealing, in its simple formit suffers from numerical problems and is not further pur-sued here.

Section 2 formalizes min-max problem in probabilisticgraphical models and provides an inference procedure byreduction to a sequence of CSPs on the factor graph. Sec-tion 3 reviews Perturbed Belief Propagation equations (Ra-vanbakhsh & Greiner 2014) and several forms of high-order factors that allow efficient sum-product inference. Fi-nally Section 4 uses these factors to build efficient algo-rithms for several important min-max problems with gen-eral distance matrices. These applications include prob-lems, such as K-packing, that were not previously studiedwithin the context of min-max or bottleneck problems.

2. Factor Graphs and CSP ReductionsLet x = {x1, . . . , xn}, where x ∈ X , X1 × . . .×Xn de-note a set of discrete variables. Each factor fI(xI) : XI →YI ⊂ < is a real valued function with range YI , over asubset of variables –i.e., I ⊆ {1, . . . , n} is a set of indices.Given the set of factors F , the min-max objective is

x∗ = argx minmaxI∈F

fI(xI) (1)

This model can be conveniently represented as a bipartitegraph, known as factor graph (Kschischang & Frey 2001),that includes two sets of nodes: variable nodes xi, and fac-tor nodes fI . A variable node i (note that we will oftenidentify a variable xi with its index “i”) is connected to afactor node I if and only if i ∈ I . We will use ∂ to de-note the neighbors of a variable or factor node in the factorgraph– that is ∂I = {i s.t. i ∈ I} (which is the set I) and∂i = {I s.t. i ∈ I}.

Let Y =⋃

I YI denote the union over the range ofall factors. The min-max value belongs to this setmaxI∈F fI(x

∗I) ∈ Y . In fact for any assignment x,

maxI∈F fI(xI) ∈ Y .

Example Bottleneck Assignment Problem: given a matrixD ∈ <N×N , select a subset of entries of size N that in-cludes exactly one entry from each row and each column,whose largest value is as small as possible. As an applica-

(a) (b)Figure 1. Factor-graphs for the bottleneck assignment problem

tion assume the entry Di,j is the time required by workeri to finish task j. The min-max assignment minimizes themaximum time required by any worker to finish his assign-ment. This problem is also known as bottleneck bipartitematching and belongs to class P (e.g., Garfinkel 1971).Here we show two factor-graph representations of thisproblem.

Categorical variable representation: Consider a factor-graph with x = {x1, . . . , xN}, where each variable xi ∈{1, . . . , N} indicates the column of the selected entry inrow i of D. For example x1 = 5 indicates the fifth col-umn of the first row is selected (see Figure 1(a)). De-fine the following factors: (a) local factors f{i}(xi) =Di,xi

and (b) pairwise factors f{i,j}(x{i,j}) = ∞I(xi =xj) − ∞I(xi 6= xj) that enforce the constraint xi 6= xj .Here I(.) is the indicator function –i.e., I(True) = 1 andI(False) = 0. Also by convention ∞ 0 , 0. Note thatif xi = xj , f{i,j}(x{i,j}) = ∞, making this choice un-suitable in the min-max solution. On the other hand withxi 6= xj , f{i,j}(x{i,j}) = −∞ and this factor has no im-pact on the min-max value.

Binary variable representation: Consider a factor-graphwhere x = [x1−1, . . . , x1−N , x2−1 . . . , x2−N , . . . , xN−N ]∈ {0, 1}N×N indicates whether each entry is selectedxi−j = 1 or not xi−j = 0 (Figure 1(b)). Here the fac-tors fI(xI) = −∞I(

∑i∈∂I xi = 1)+∞I(

∑i∈∂I xi 6= 1)

ensures that only one variable in each row and column isselected and local factors fi−j(xi−j) = xi−jDi,j−∞(1−xi−j) have any effect only if xi−j = 1.

The min-max assignment in both of these factor-graphs asdefined in eq. (1) gives a solution to the bottleneck assign-ment problem.

For any y ∈ Y in the range of factor values, we reduce theoriginal min-max problem to a CSP using the followingreduction. For any y ∈ Y , µy-reduction of the min-maxproblem eq. (1), is given by

µy(x) ,1

Zy

∏I

I(fI(xI) ≤ y) (2)

where

Zy ,∑X

∏I

I(fI(xI) ≤ y) (3)

Page 3: Min-Max Problems on Factor-Graphsproceedings.mlr.press/v32/ravanbakhsh14.pdf · 2020. 11. 21. · Min-Max Problems on Factor-Graphs is the normalizing constant and I(:) is the indicator

Min-Max Problems on Factor-Graphs

is the normalizing constant and I(.) is the indicator func-tion.1 This distribution defines a CSP over X , whereµy(x) > 0 iff x is a satisfying assignment. Moreover, Zy

gives the number of satisfying assignments.

We will use fyI (xI) , I(fI(xI) ≤ y) to refer to reducedfactors. The following theorem is the basis of our approachin solving min-max problems.

Theorem 2.1 2 Let x∗ denote the min-max solution and y∗

be its corresponding value –i.e., y∗ = maxI fI(x∗I). Then

µy(x) is satisfiable for all y ≥ y∗ (in particular µy(x∗) >

0) and unsatisfiable for all y < y∗.

This theorem enables us to find a min-max assignment bysolving a sequence of CSPs. Let y(1) ≤ . . . ≤ y(N) bean ordering of y ∈ Y . Starting from y = y(dN/2e), if µy

is satisfiable then y∗ ≤ y. On the other hand, if µy is notsatisfiable, y∗ > y. Using binary search, we need to solvelog(|Y|) CSPs to find the min-max solution. Moreover atany time-step during the search, we have both upper andlower bounds on the optimal solution. That is y < y∗ ≤y, where µy is the latest unsatisfiable and µy is the latestsatisfiable reduction.

Example Bottleneck Assignment Problem: Here we definethe µy-reduction of the binary-valued factor-graph for thisproblem by reducing the constraint factors to fy(xI) =I(∑

i∈∂I xi = 1) and the local factors to fy{i−j}(xi−j) =

xi−jI(Di,j ≤ y). The µy-reduction can be seen as defin-ing a uniform distribution over the all valid assignments(i.e., each row and each column has a single entry) wherenone of the N selected entries are larger than y.

2.1. Reduction to Min-Sum

Kabadi & Punnen (2004) introduce a simple method totransform instances of bottleneck TSP to TSP. Here weshow how this results extends to min-max problems overfactor-graphs.

Lemma 2.2 Any two sets of factors, {fI}I∈F and{f ′I}I∈F , have identical min-max solution

argx minmaxIfI(xI) = argx minmax

If ′I(xI)

if ∀I, J ∈ F , xI ∈ XI , xJ ∈ XJ

fI(xI) < fJ(xJ) ⇔ f ′I(xI) < f ′J(xJ)

This lemma simply states that what matters in the min-maxsolution is the relative ordering in the factor-values.

1 To always have a well-defined probability, we define 00, 0.

2All proofs appear in Appendix A.

Let y(1) ≤ . . . ≤ y(N) be an ordering of elements in Y ,and let r(fI(xI)) denote the rank of yI = fI(xI) in thisordering. Define the min-sum reduction of {fI}I∈F as

f ′I(xI) = 2r(fI(xI)) ∀I ∈ F

Theorem 2.3

argx min∑I

f ′I(xI) = argx minmaxIfI(xI)

where {f ′I}I is the min-sum reduction of {fI}I .

Although this allows us to use min-sum message passing tosolve min-max problems, the values in the range of factorsgrow exponentially fast, resulting in numerical problems.

3. Solving CSP-reductionsPreviously in solving CSP-reductions, we assumed an idealCSP solver. However, finding an assignment x such thatµy(x) > 0 or otherwise showing that no such assignmentexists is in general NP-hard (Cooper 1990). However, mes-sage passing methods have been successfully used to pro-vide state-of-the-art results in solving difficult CSPs. Weuse Perturbed Belief Propagation (PBP Ravanbakhsh &Greiner 2014) for this purpose. By using an incompletesolver (Kautz et al. 2009), we lose the lower-bound y onthe optimal min-max solution, as PBP may not find a so-lution even if the instance is satisfiable. 3 However thefollowing theorem states that, as we increase y from themin-max value y∗, the number of satisfying assignmentsto µy-reduction increases, making it potentially easier tosolve.

Proposition 3.1

y1 < y2 ⇒ Zy1≤ Zy2

∀y1, y2 ∈ <

where Zy is defined in eq. (3).

This means that the sub-optimality of our solution is re-lated to our ability to solve CSP-reductions – that is, as thegap y−y∗ increases, the µy-reduction potentially becomeseasier to solve.

PBP is a message passing method that interpolates betweenBelief Propagation (BP) and Gibbs Sampling. At eachiteration, PBP sends a message from variables to factors(νi→I ) and vice versa (νI→i). The factor to variable mes-sage is given by

νI→i(xi) ∝∑

xI\i∈X∂I\i

fyI (xi, xI\i)∏

j∈∂I\i

νj→I(xj)

(4)

3To maintain the lower-bound one should be able to correctlyassert unsatisfiability.

Page 4: Min-Max Problems on Factor-Graphsproceedings.mlr.press/v32/ravanbakhsh14.pdf · 2020. 11. 21. · Min-Max Problems on Factor-Graphs is the normalizing constant and I(:) is the indicator

Min-Max Problems on Factor-Graphs

where the summation is over all the variables in I exceptfor xi. The variable to factor message for PBP is slightlydifferent from BP; it is a linear combination of the BP mes-sage update and an indicator function, defined based on asample from the current estimate of marginal µ(xi):

νi→I(xi) ∝ (1− γ)∏

J∈∂i\I

νJ→i(xi) + γ I(xi = xi)

(5)

where xi ∼ µ(xi) ∝∏J∈∂i

νJ→i(xi) (6)

where for γ = 0 we have BP updates and for γ = 1, wehave Gibbs Sampling. PBP starts at γ = 0 and linearlyincreases γ at each iteration, ending at γ = 1 at its finaliteration. At any iteration, PBP may encounter a contradic-tion where the product of incoming messages to node i iszero for all xi ∈ Xi. This could mean that either the prob-lem is unsatisfiable or PBP is not able to find a solution.However if it reaches the final iteration, PBP produces asample from µy(x), which is a solution to the correspond-ing CSP. The number of iterations T is the only parameterof PBP and increasing T , increases the chance of finding asolution (Only downside is time complexity; nb., no chanceof a false positive.)

3.1. Computational Complexity

PBP’s time complexity per iteration is identical to that ofBP– i.e.,

O

(∑I

(|∂I| |XI |) +∑i

(|∂i| |Xi|)

)(7)

where the first summation accounts for all factor-to-variable messages (eq. (4)) 4 and the second one accountsfor all variable-to-factor messages (eq. (5)).

To perform binary search over Y we need to sort Y , whichrequires O(|Y| log(|Y |)). However, since |Yi| ≤ |Xi| and|Y| ≤

∑I |Yi|, the cost of sorting is already contained in

the first term of eq. (7), and may be ignored in asymptoticcomplexity.

The only remaining factor is that of binary search itself,which is O(log(|Y|)) = O(log(

∑I(|XI |))) – i.e., at most

logarithmic in the cost of PBP’s iteration (i.e., first termin eq. (7)). Also note that the factors added as constraintsonly take two values of ±∞, and have no effect in the costof binary search.

As this analysis suggests, the dominant cost is that of send-ing factor-to-variable messages, where a factor may depend

4The |∂I| is accounting for the number of messages that aresent from each factor. However if the messages are calculatedsynchronously this factor disappears. Although more expensive,in practice, asynchronous message updates performs better.

on a large number of variables. The next section shows thatmany interesting factors are sparse, which allows efficientcalculation of messages.

3.2. High-Order Factors

The factor-graph formulation of many interesting min-maxproblems involves sparse high-order factors. In all suchfactors, we are able to significantly reduce theO(|XI |) timecomplexity of calculating factor-to-variable messages. Ef-ficient message passing over such factors is studied by sev-eral works in the context of sum-product and max-productinference (e.g., Potetz & Lee 2008; Gupta et al. 2007;Tar-low et al. 2010;Tarlow et al. 2012). The simplest form ofsparse factor in our formulation is the so-called Potts fac-tor, fy{i,j}(xi, xj) = I(xi = xj)φ(xi). This factor assumesthe same domain for all the variables (Xi = Xj ∀i, j)and its tabular form is non-zero only across the diago-nal. It is easy to see that this allows the marginaliza-tion of eq. (4) to be performed in O(|Xi|) rather thanO(|Xi| |Xj |). Another factor of similar form is the inversePotts factor, fy{i,j}(xi, xj) = I(xi 6= xj), which ensuresxi 6= xj . In fact any pair-wise factor that is a constant plusa band-limited matrix allows O(|Xi|) inference (e.g., seeSection 4.4).

In Section 4, we use cardinality factors, where Xi = {0, 1}and the factor is defined based on the number of non-zerovalues –i.e., fyK(xK) = fyK(

∑i∈K xi). The µy-reduction

of the factors we use in the binary representation of thebottleneck assignment problem is in this category. Gailet al. (1981) propose a simple O(|∂I| K) method forfyK(xK) = I(

∑i∈K xi = K). We refer to this factor as

K-of-N factor and use similar algorithms for at-least-K-of-N fyK(xK) = I(

∑i∈K xi ≥ K) and at-most-K-of-N

fyK(xK) = I(∑

i∈K xi ≤ K) factors (see Appendix B).An alternative for more general forms of high order factorsis the clique potential of Potetz & Lee (2008). For large K,more efficient methods evaluate the sum of pairs of vari-ables using auxiliary variables forming a binary tree anduse Fast Fourier Transform to reduce the complexity of K-of-N factors toO(N log(N)2) (see Tarlow et al. (2012) andits references).

4. ApplicationsHere we introduce the factor-graph formulation for sev-eral NP-hard min-max problems. Interestingly the CSP-reduction for each case is an important NP-hard problem.Table 1 shows the relationship between the min-max andthe corresponding CSP and the factor α in the constant fac-tor approximation available for each case. For example,α = 2 means the results reported by some algorithm isguaranteed to be within factor 2 of the optimal min-maxvalue y∗ when the distances satisfy the triangle inequal-

Page 5: Min-Max Problems on Factor-Graphsproceedings.mlr.press/v32/ravanbakhsh14.pdf · 2020. 11. 21. · Min-Max Problems on Factor-Graphs is the normalizing constant and I(:) is the indicator

Min-Max Problems on Factor-Graphs

Table 1. Min-max combinatorial problems and the corresponding CSP-reductions. The last column reports the best α-approximationswhen triangle inequality holds. ∗ indicates best possible approximation.

min-max problem µy-reduction msg-passing cost α

min-max clustering clique cover problem O(N2K log(N)) 2∗ (Gonzalez 1985)K-packing max-clique problem O(N2K log(N)) N/A

(weighted) K-center problem dominating set problem O(N3 log(N)) or O(NR2 log(N)) min(3, 1 + maxi wi

mini wi) (Dyer & Frieze 1985)

asymmetric K-center problem set cover problem O(N3 log(N)) or O(NR2 log(N)) log(N)∗ (Panigrahy & Vishwanathan 1998;Chuzhoy et al. 2005)bottleneck TSP Hamiltonian cycle problem O(N3 log(N)) 2∗ (Parker & Rardin 1984)

bottleneck Asymmetric TSP directed Hamiltonian cycle O(N3 log(N)) log(n)/ log(log(n)) (An et al. 2010)

(a) (b) K-packing (binary) (c) sphere packing (d) K-centerFigure 2. The factor-graphs for different applications. Factor-graph (a) is common between min-max clustering, Bottleneck TSP andK-packing (categorical). However the definition of factors is different in each case.

ity. This table also includes the complexity of the mes-sage passing procedure (assuming asynchronous messageupdates) in finding the min-max solution. See Appendix Cfor details.

4.1. Min-Max Clustering

Given a symmetric matrix of pairwise distances D ∈<N×N between N data-points, and a number of clustersK, min-max clustering seeks a partitioning of data-pointsthat minimizes the maximum distance between all the pairsin the same partition.

Let x = {x1, . . . , xN} with xi ∈ Xi = {1, . . . ,K} bethe set of variables, where xi = k means, point i belongsto cluster k. The Potts factor f{i,j}(xi, xj) = I(xi =xj)Di,j − ∞I(xi 6= xj) between any two points is equalto Di,j if points i and j belong the same cluster and −∞otherwise (Figure 2(a)). When applied to this factor graph,the min-max solution x∗ of eq. (1) defines the clustering ofpoints that minimizes the maximum inter-cluster distances.

Now we investigate properties of µy-reduction for thisfactor-graph. The y-neighborhood graph for distance ma-trix D ∈ <N×N is defined as graph G(D, y) = (V, E),where V = {1, . . . , N} and E = {(i, j)| Di,j ≤ y}. Notethan this definition is also valid for an asymmetric adja-cency matrix D. In such cases, the y-neighborhood graphis a directed-graph.

The K-clique-cover C = {C1, . . . , CK} for a graph G =(V, E) is a partitioning of V to at most K partitions suchthat ∀i, j, k i, j ∈ Ck ⇒ (i, j) ∈ E .

Figure 3. Min-max clustering of 100 points with varying numbersof clusters (x-axis). Each point is an average over 10 random in-stances. The y-axis is the ratio of the min-max value obtained bymessage passing (T = 50 iterations for PBP) over the min-maxvalue of FPC. (left) Clustering of random points in 2D Euclideanspace. The red line is the lower bound on the optimal result basedon the worst case guarantee of FPC. (right) Using symmetric ran-dom distance matrix where Di,j = Dj,i ∼ U(0, 1).

Proposition 4.1 The µy-reduction of the min-max cluster-ing factor-graph defines a uniform distribution over theK-clique-covers of G(D, y).

Figure 3 compares the performance of min-max clusteringusing message passing to that of Furthest Point Clustering(FPC) of Gonzalez (1985) which is 2-optimal when the tri-angle inequality holds. Note that message passing solutionsare superior even when using Euclidean distance.

4.2. K-Packing

Given a symmetric distance matrixD ∈ <N×N betweenNdata-points and a number of code-wordsK, theK-packingproblem is to choose a subset ofK points such that the min-imum distance Di,j between any two code-words is max-

Page 6: Min-Max Problems on Factor-Graphsproceedings.mlr.press/v32/ravanbakhsh14.pdf · 2020. 11. 21. · Min-Max Problems on Factor-Graphs is the normalizing constant and I(:) is the indicator

Min-Max Problems on Factor-Graphs

imized. Here we introduce two different factor-graph for-mulations for this problem.

4.2.1. FIRST FORMULATION: BINARY VARIABLES

Let binary variables x = {x1, . . . , xN} ∈ {0, 1}N , in-dicate a subset of variables of size K that are selectedas code-words (Figure 2(b)). Use the factor fK(x) =∞I(

∑i xi 6= K) − ∞I(

∑i xi = K) (here K =

{1, . . . , N}) to ensure this constraint. The µy-reductionof this factor for any −∞ < y < +∞ is a K-of-N factoras defined in Section 3.2. Furthermore, for any two vari-ables xi and xj , define factor fxi,xj

(xi,j) = −Di,jxixj −∞(1−xixj). This factor is effective only if both points areselected as code-words. The use of −Di,j is to convert theinitial max-min objective to min-max.

4.2.2. SECOND FORMULATION: CATEGORICALVARIABLES

Define the K-packing factor-graph as follows: Let x ={x1, . . . , xK} be the set of variables where xi ∈ Xi ={1, . . . , N} (Figure 2(a)). For every two distinct points1 ≤ i < j ≤ K define the factor f{i,j}(xi, xj) =−Dxi,xj

I(xi 6= xj) +∞I(xi = xj). Here each variablerepresents a code-word and the last term of each factor en-sures that code-words are distinct.

Proposition 4.2 The µy-reduction of the K-packingfactor-graph for the distance matrix D ∈ <N×N definesa uniform distribution over the cliques of G(−D,−y) thatare larger than K.

The µy-reduction of our second formulation is similar tothe factor-graph used by Ramezanpour & Zecchina (2012)to find non-linear binary codes. The authors consider theHamming distance between all binary vectors of length n(i.e., N = 2n) to obtain binary codes with known mini-mum distance y. As we saw, this method is O(N2K2) =O(22nK2) –i.e., grows exponentially in the number of bitsn. In the following section, we introduce a factor-graphformulation specific to categorical variables with Hammingdistance that have message passing complexity O(n2K2y)–i.e., not exponential in n. Using this formulation we findoptimal binary codes where both n and y are large.

4.2.3. SPHERE PACKING WITH HAMMING DISTANCE

Our factor-graph defines a distribution over the K binaryvectors of length n such that the distance between ev-ery pair of binary vectors is at least y.5 Finding so-called “nonlinear binary codes” is a fundamental prob-lem in information theory and a variety of methods have

5 For convenience we restrict this construction to the case ofbinary vectors. A similar procedure may be used to find maxi-mally distanced ternary and q-ary vectors, for arbitrary q.

2 1 2 1 0 1 1 2 2 1 2 1 2 1 0 21 1 1 2 1 0 0 2 2 1 1 1 0 2 1 00 0 1 2 0 1 2 0 2 1 2 2 1 0 2 00 0 0 2 1 0 1 2 1 0 1 0 2 1 0 12 2 0 2 2 2 1 1 2 0 2 2 1 2 1 20 2 0 0 0 2 0 2 0 0 2 1 0 0 0 01 1 2 0 0 1 2 2 0 0 1 0 1 2 2 12 0 2 0 2 1 0 1 1 2 0 1 1 1 2 00 2 1 1 1 1 1 1 0 1 0 0 0 0 1 21 0 0 1 0 2 0 0 1 2 0 0 2 2 1 21 2 2 1 1 0 2 1 1 2 2 2 2 0 2 11 0 1 0 2 2 2 1 0 1 1 2 2 1 0 2

Figure 4. (left) Using message passing to choose K = 30 out ofN = 100 random points in the Euclidean plane to maximize theminimum pairwise distance (with T = 500 iterations for PBP).Touching circles show the minimum distance. (right) Exampleof an optimal ternary code (n = 16, y = 11,K = 12), foundusing the K-packing factor-graph of Section 4.2.3. Here each ofK = 12 lines contains one code-word of length 16, and everypair of code-words are different in at least y = 11 digits.

been used to find better codes, trying either to maximizethe number of keywords K or their minimum distance y(e.g., see Litsyn et al. 1999 and its references). Letx = {x1−1, . . . , x1−n, x2−1, . . . , x2−n, . . . , xK−n} be theset of our binary variables, where xi = {xi−1, . . . , xi−n}represents the ith binary vector or code-word. Additionallyfor each 1 ≤ i < j ≤ K, define an auxiliary binary vectorzi,j = {zi,j,1, . . . , zi,j,n} of length n (Figure 2(c)).

For each distinct pair of binary vectors xi and xj , and aparticular bit 1 ≤ k ≤ n, the auxiliary variable zi,j,k = 1iff xi−k 6= xj−k. Then we define an at-least-y-of-n factorover zi,j for every pair of code-words, to ensure that theydiffer in at least y bits.

In more details, define the following factors on x and z:(a) z-factors: For every 1 ≤ i < j ≤ K and 1 ≤ k ≤ n,define a factor to ensure that zi,j,k = 1 iff xi−k 6= xj−k:

f(xi−k, xj−k, zi,j,k) = I(xi−k 6= xj−k)I(zi,j,k = 1)

+ I(xi−k = xj−k)I(zi,j,k = 0).

This factor depends on three binary variables, therefore wecan explicitly define its value for each of 23 = 8 possibleinputs.

(b) distance-factors: For each zi,j define at-least-y-of-nfactor:

fK(zi,j) = I(∑

1≤k≤n

zi,j,k ≥ y)

Table 2 reports some optimal codes including codes withlarge number of bits n, recovered using this factor-graph.Here Perturbed BP used T = 1000 iterations.

4.3. (Asymmetric) K-center Problem

Given a pairwise distance matrixD ∈ <N×N , the K-centerproblem seeks a partitioning of nodes, with one center per

Page 7: Min-Max Problems on Factor-Graphsproceedings.mlr.press/v32/ravanbakhsh14.pdf · 2020. 11. 21. · Min-Max Problems on Factor-Graphs is the normalizing constant and I(:) is the indicator

Min-Max Problems on Factor-Graphs

(a) K-center (Euclidean) (b) Facility Location (c) BTSP (Euclidean) (d) BTSP (Random)Figure 5. (a) K-center clustering of 50 random points in a 2D plane with various numbers of clusters (x-axis). The y-axis is the ratio ofthe min-max value obtained by message passing (T = 500 for PBP) over the min-max value of 2-approximation of Dyer & Frieze (1985).(b) Min-max K-facility location formulated as an asymmetric K-center problem and solved using message passing. Squares indicate 20potential facility locations and circles indicate 50 customers. The task is to select 5 facilities (red squares) to minimize the maximumdistance from any customer to a facility. The radius of circles is the min-max value. (c,d) The min-max solution for Bottleneck TSPwith different number of cities (x-axis) for 2D Euclidean space as well as asymmetric random distance matrices (T = 5000 for PBP).The error-bars in all figures show one standard deviation over 10 random instances.

Table 2. Some optimal binary codes from Litsyn et al. 1999 re-covered by K-packing factor-graph in the order of increasing y. nis the length of the code, K is the number of code-words and y isthe minimum distance between code-words.

n K y n K y n K y n K y8 4 5 11 4 7 14 4 9 16 6 9

17 4 11 19 6 11 20 8 11 20 4 1323 6 13 24 8 13 23 4 15 26 6 1527 8 15 28 10 15 28 5 16 26 4 1729 6 17 29 4 19 33 6 19 34 8 1936 12 19 32 4 21 36 6 21 38 8 2139 10 21 35 4 23 39 6 23 41 8 2339 4 25 43 6 23 46 10 25 47 12 2541 4 27 46 6 27 48 8 27 50 10 2744 4 29 49 6 29 52 8 29 53 10 29

partition such that the maximum distance from any nodeto the center of its partition is minimized. This problemis known to be NP-hard, even for Euclidean distance ma-trices (Masuyama et al. 1981). Frey & Dueck (2007) usemax-product message passing to solve the min-sum vari-ation of this problem –a.k.a. K-median problem. A bi-nary variable factor-graph for the same problem is intro-duced in Givoni & Frey (2009). Here we introduce a bi-nary variable model for the asymmetric K-center problem.Let x = {x1−1, . . . , x1−N , x2−1, . . . , x2−N , . . . , xN−N}denote N2 binary variables, where xi−j = 1 indicates thatpoint i participates in the partition that has j as its center.Now define the following factors:

A. local factors: ∀i 6= j f{i−j}(xi−j) = Di,jxi−j −∞(1− xi−j).

B. uniqueness factors: every point is associated with ex-actly one center (which can be itself). For every i con-sider I = {i − j | 1 ≤ j ≤ N} and define fI(xI) =∞I(

∑i−j∈∂I xi−j 6= 1)−∞I(

∑i−j∈∂I xi−j = 1).

C. consistency factors: when j is selected as a center byany node i, node j also recognizes itself as a center.

∀j, i 6= j define f(x{j−j,i−j}) = ∞I(xj−j = 0 ∧xi−j = 1)−∞I(xj−j = 1 ∨ xi−j = 0).

D. K-of-N factor: only K nodes are selected as centers.Letting K = {i − i | 1 ≤ i ≤ N}, define fK(xK) =∞I(

∑i−i∈K xi−i 6= K)−∞I(

∑i−i∈K xi−i = K).

For variants of this problem such as the capacitated K-center, additional constraints on the maximum/minimumpoints in each group may be added as the at-least/at-mostK-of-N factors.

We can significantly reduce the number of variables andthe complexity (which isO((N3 log(N))) by bounding thedistance to the center of the cluster y. Given an upperbound y, we may remove all the variables xi−j for Di,j >y from the factor-graph. Assuming that at mostR nodes areat distance Di−j ≤ y from every node j, the complexity ofmin-max inference drops to O(NR2 log(N)).

Figure 5(a) compares the performance of message-passingand the 2-approximation of Dyer & Frieze (1985) whentriangle inequality holds. The min-max facility locationproblem can also be formulated as an asymmetricK-centerproblem where the distance to all customers is ∞ and thedistance from a facility to another facility is −∞ (Fig-ure 5(b)).

The following proposition establishes the relation betweenthe K-center factor-graph above and dominating set prob-lem as its CSP reduction. The K-dominating set of graphG = (V, E) is a subset of nodes D ⊆ V of size |D| = Ksuch that any node in V \D is adjacent to at least one mem-ber of D – i.e., ∀i ∈ V \ D ∃j ∈ D s.t. (i, j) ∈ E .

Proposition 4.3 For symmetric distance matrix D ∈<N×N , the µy-reduction of the K-center factor-graph

Page 8: Min-Max Problems on Factor-Graphsproceedings.mlr.press/v32/ravanbakhsh14.pdf · 2020. 11. 21. · Min-Max Problems on Factor-Graphs is the normalizing constant and I(:) is the indicator

Min-Max Problems on Factor-Graphs

above, is non-zero (i.e., µy(x) > 0) iff x defines a K-dominating set for G(D, y).

Note that in this proposition (in contrast with Proposi-tions 4.1 and 4.2) the relation between the assignments xandK-dominating sets of G(D, y) is not one-to-one as sev-eral assignments may correspond to the same dominatingset. Here we establish a similar relation between asymmet-ric K-center factor-graph and set-cover problem.

Given universe set V and a set S ={V1, . . . ,VM} s.t. Vm ⊆ V , we say C ⊆ S coversV iff each member of V is present in at least one memberof C –i.e.,

⋃Vm∈C Vm = V . Now we consider a natural

set-cover problem induced by any directed-graph. Givena directed-graph G = (V, E), for each node i ∈ V , definea subset Vi = {j ∈ V | (j, i) ∈ E} as the set of all nodesthat are connected to i. Let S = {V1, . . . ,VN} denote allsuch subsets. An induced K-set-cover of G is a set C ⊆ Sof size K that covers V .

Proposition 4.4 For a given asymmetric distance matrixD ∈ <N×N , the µy-reduction of the K-center factor-graph as defined above, is non-zero (i.e., µy(x) > 0) iffx defines an induced K-set-cover for G(D, y).

4.4. (Asymmetric) Bottleneck Traveling SalesmanProblem

Given a distance matrixD ∈ <N×N , the task in the Bottle-neck Traveling Salesman Problem (BTSP) is to find a tourof all N points such that the maximum distance betweentwo consecutive cities in the tour is minimized (Kabadi& Punnen 2004). Any constant-factor approximation forarbitrary instances of this problem is NP-hard (Parker &Rardin 1984).

Let x = {x1, . . . , xN} denote the set of variables wherexi ∈ Xi = {0, . . . , N − 1} represents the time-stepat which node i is visited. Also, we assume modulararithmetic (module N ) on members of Xi –e.g., N ≡ 0mod N and 1− 2 ≡ N − 1 mod N . For each pair xi andxj of variables, define the factor (Figure 2(a))

f{i,j}(xi, xj) =∞I(xi = xj)−∞I(|xi − xj | > 1) (8)+Di,jI(xi = xj − 1) +Dj,iI(xi = xj − 1)

where the first term ensures xi 6= xj and the second termmeans this factor has no effect on the min-max value whennode i and j are not consecutively visited in a path. Thethird and fourth terms express the distance between i and jdepending on the order of visit. Figure 6 shows the tabularform of this factor. In Appendix B.2 we show an O(N)procedure to marginalize this type of factor.

Here we relate the min-max factor-graph above to a uni-form distribution over Hamiltonian cycles.

∞ Di,j −∞ · · · −∞ −∞ Dj,i

Dj,i ∞ Di,j · · · −∞ −∞ −∞−∞ Dj,i ∞ · · · −∞ −∞ −∞

......

.... . .

......

...−∞ −∞ −∞ · · · ∞ Di,j −∞−∞ −∞ −∞ · · · Dj,i ∞ Di,j

Di,j −∞ −∞ · · · −∞ Dj,i ∞

Figure 6. The tabular form of f{i,j}(xi, xj) used for the bottle-neck TSP.

Proposition 4.5 For any distance matrix D ∈ <N×N , theµy-reduction of the BTSP factor-graph (shown above), de-fines a uniform distribution over the (directed) Hamiltoniancycles of G(D, y).

Figure 5(c,d) reports the performance of message passing(over 10 instances) as well as a lower-bound on the opti-mal min-max value for tours of different length (N ). Herewe report the results for random points in 2D Euclideanspace as well as asymmetric random distance matrices. Forsymmetric case, the lower-bound is the maximum over j ofthe distance of two closest neighbors to each node j. Forthe asymmetric random distance matrices, the maximum isover all the minimum length incoming edges and minimumlength outgoing edges for each node.6

5. ConclusionThis paper introduces the problem of min-max inference infactor-graphs and provides a general methodology to solvesuch problems. We use Factor-graphs to represent severalimportant combinatorial problems such as min-max clus-tering, K-clustering, bottleneck TSP and K-packing anduse message passing to find near optimal solutions to eachproblem. In doing so, we also suggest a message passingsolution to several NP-hard decision problems includingthe clique-cover, max-clique, dominating-set, set-cover andHamiltonian path problem. For each problem we also an-alyzed the complexity of message passing and establishedits practicality using several relevant experiments.

ReferencesAissi, Hassene, Bazgan, Cristina, and Vanderpooten, Daniel.

Min–max and min–max regret versions of combinatorial op-timization problems: A survey. European Journal of Opera-tional Research, 197(2):427–438, 2009.

An, Hyung-Chan, Kleinberg, Robert D, and Shmoys, David B.Approximation algorithms for the bottleneck asymmetric trav-eling salesman problem. In Approximation, Randomization,and Combinatorial Optimization. Algorithms and Techniques,pp. 1–11. Springer, 2010.

6If for one node the minimum length in-coming and out-goingedges point to the same city, the second minimum length in-coming and out-going edges are also considered in calculatinga tighter bound.

Page 9: Min-Max Problems on Factor-Graphsproceedings.mlr.press/v32/ravanbakhsh14.pdf · 2020. 11. 21. · Min-Max Problems on Factor-Graphs is the normalizing constant and I(:) is the indicator

Min-Max Problems on Factor-Graphs

Averbakh, Igor. On the complexity of a class of combinatorialoptimization problems with uncertainty. Mathematical Pro-gramming, 90(2):263–272, 2001.

Bayati, Mohsen, Shah, Devavrat, and Sharma, Mayank. Maxi-mum weight matching via max-product belief propagation. InISIT, pp. 1763–1767. IEEE, 2005.

Chuzhoy, Julia, Guha, Sudipto, Halperin, Eran, Khanna, Sanjeev,Kortsarz, Guy, Krauthgamer, Robert, and Naor, Joseph Seffi.Asymmetric k-center is log* n-hard to approximate. Journalof the ACM (JACM), 52(4):538–551, 2005.

Cooper, Gregory F. The computational complexity of probabilis-tic inference using bayesian belief networks. Artificial intelli-gence, 42(2):393–405, 1990.

Dyer, Martin E and Frieze, Alan M. A simple heuristic for thep-centre problem. Operations Research Letters, 3(6):285–288,1985.

Edmonds, Jack and Fulkerson, Delbert Ray. Bottleneck extrema.Journal of Combinatorial Theory, 8(3):299–306, 1970.

Frey, Brendan J and Dueck, Delbert. Clustering by passing mes-sages between data points. science, 315(5814):972–976, 2007.

Fulkerson, DR. Flow networks and combinatorial operations re-search. The American Mathematical Monthly, 73(2):115–138,1966.

Gail, Mitchell H, Lubin, Jay H, and Rubinstein, Lawrence V.Likelihood calculations for matched case-control studies andsurvival studies with tied death times. Biometrika, 68(3):703–707, 1981.

Garfinkel, Robert S. An improved algorithm for the bottleneckassignment problem. Operations Research, pp. 1747–1751,1971.

Givoni, Inmar E and Frey, Brendan J. A binary variable model foraffinity propagation. Neural computation, 21(6):1589–1600,2009.

Gonzalez, Teofilo F. Clustering to minimize the maximum inter-cluster distance. Theoretical Computer Science, 38:293–306,1985.

Gross, O. The bottleneck assignment problem. Technical report,DTIC Document, 1959.

Gupta, Rahul, Diwan, Ajit A, and Sarawagi, Sunita. Efficientinference with cardinality-based clique potentials. In Proceed-ings of the 24th international conference on Machine learning,pp. 329–336. ACM, 2007.

Hochbaum, Dorit S and Shmoys, David B. A unified approach toapproximation algorithms for bottleneck problems. Journal ofthe ACM (JACM), 33(3):533–550, 1986.

Huang, Bert and Jebara, Tony. Approximating the permanent withbelief propagation. arXiv preprint arXiv:0908.1769, 2009.

Ibrahimi, Morteza, Javanmard, Adel, Kanoria, Yashodhan, andMontanari, Andrea. Robust max-product belief propagation.In Signals, Systems and Computers (ASILOMAR), 2011 Con-ference Record of the Forty Fifth Asilomar Conference on, pp.43–49. IEEE, 2011.

Kabadi, Santosh N and Punnen, Abraham P. The bottleneck tsp.In The Traveling Salesman Problem and Its Variations, pp.697–735. Springer, 2004.

Kautz, Henry A, Sabharwal, Ashish, and Selman, Bart. Incom-plete algorithms., 2009.

Kearns, Michael, Littman, Michael L, and Singh, Satinder.Graphical models for game theory. In Proceedings of the Sev-enteenth conference on UAI, pp. 253–260. Morgan KaufmannPublishers Inc., 2001.

Khuller, Samir and Sussmann, Yoram J. The capacitated k-centerproblem. SIAM Journal on Discrete Mathematics, 13(3):403–418, 2000.

Kschischang, FR and Frey, BJ. Factor graphs and the sum-productalgorithm. Information Theory, IEEE, 2001.

Litsyn, Simon, M., Rains E., and A., Sloane N. J. The tableof nonlinear binary codes. http://www.eng.tau.ac.il/˜litsyn/tableand/, 1999. [Online; accessed Jan 302014].

Masuyama, Shigeru, Ibaraki, Toshihide, and Hasegawa, Toshi-haru. The computational complexity of the m-center problemson the plane. IEICE TRANSACTIONS (1976-1990), 64(2):57–64, 1981.

Mezard, M., Parisi, G., and Zecchina, R. Analytic and algorithmicsolution of random satisfiability problems. Science, 297(5582):812–815, August 2002. ISSN 0036-8075, 1095-9203. doi:10.1126/science.1073287.

Panigrahy, Rina and Vishwanathan, Sundar. An o(log*n) approx-imation algorithm for the asymmetric p-center problem. Jour-nal of Algorithms, 27(2):259–268, 1998.

Parker, R Gary and Rardin, Ronald L. Guaranteed performanceheuristics for the bottleneck travelling salesman problem. Op-erations Research Letters, 2(6):269–272, 1984.

Potetz, Brian and Lee, Tai Sing. Efficient belief propagation forhigher-order cliques using linear constraint nodes. ComputerVision and Image Understanding, 112(1):39–54, 2008.

Ramezanpour, Abolfazl and Zecchina, Riccardo. Cavity approachto sphere packing in hamming space. Physical Review E, 85(2):021106, 2012.

Ravanbakhsh, Siamak and Greiner, Russell. Perturbed messagepassing for constraint satisfaction problems. arXiv preprintarXiv:1401.6686, 2014.

Schrijver, Alexander. Min-max results in combinatorial optimiza-tion. Springer, 1983.

Tarlow, Daniel, Givoni, Inmar E, and Zemel, Richard S. Hop-map: Efficient message passing with high order potentials. InInternational Conference on Artificial Intelligence and Statis-tics, pp. 812–819, 2010.

Tarlow, Daniel, Swersky, Kevin, Zemel, Richard S, Adams,Ryan Prescott, and Frey, Brendan J. Fast exact inference forrecursive cardinality models. arXiv preprint arXiv:1210.4899,2012.