COMBINATORIALLY CONSTRAINED PORTFOLIO OPTIMIZATION … · 2018. 7. 18. · proposed message passing method with Gurobi’s branch-and-bound method for solving the portfolio problem

COMBINATORIALLY CONSTRAINED PORTFOLIO OPTIMIZATIONUSING MESSAGE PASSING ALGORITHMS

by

Alexia Yeo

A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science

Department of Mechanical and Industrial EngineeringUniversity of Toronto

c© Copyright 2018 by Alexia Yeo

Abstract

COMBINATORIALLY CONSTRAINED PORTFOLIO OPTIMIZATION USING MESSAGE

PASSING ALGORITHMS

Alexia Yeo

Master of Applied Science

Department of Mechanical and Industrial Engineering

University of Toronto

2018

Portfolio optimization aims to find the optimal investment strategy for a series of assets that results in

a minimization of the portfolio’s variance. Real life portfolio considerations like limiting the number of

assets or limiting assets to be traded in lots add combinatorial constraints to the original problem that

are computationally expensive for commercial solvers. The aim of this thesis is to explore the use of

message passing approaches to solve the portfolio problem with these hard constraints. Message passing

represents a class of algorithms used to solve inference problems in graphical models. They are known

to recover good sub-optimal solutions and we test this for the posed problem by comparing with branch-

and-bound exact methods. Computational results for the portfolio problem with varying cardinality,

target return and portfolio size confirm that message passing is a viable method for finding sub-optimal

solutions and can return tight bounds around the exact solution.

ii

To Teresa, Bernard, Annie and Carlo Alberto

iii

Acknowledgements

Thank you to my supervisor Professor Roy H. Kwon for introducing me to the field of message passing.

I have had an incredibly explorative and enriching master’s experience researching this new area. I

thoroughly enjoyed discovering and applying this non-traditional avenue for optimization. I am grateful

for his guidance throughout this research process in helping to improve the quality of my research.

I am thankful to Professor Scott Sanner and Professor Merve Bodur for serving on my thesis committee.

I appreciate the time they gave to review my thesis and provide insightful comments and suggestions.

iv

Contents

Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Inference in Probabilistic Graphical Models 4

2.1 Probabilistic Graphical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Message Passing Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 Approximate Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3.1 Linear Programming Relaxations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3.2 Max-Product Linear Programming Algorithm . . . . . . . . . . . . . . . . . . . . . 11

2.3.3 Tightening LP Relaxations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.4 Approximate Inference with Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4.1 The Method of Zhang et al. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.5 Message Passing Applied to Optimization Problems . . . . . . . . . . . . . . . . . . . . . 16

2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

v

3 Portfolio Optimization 20

3.1 Markowitz Portfolio Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Round-lot Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.3 Cardinality Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.4 Markowitz portfolio problem with round-lot and cardinality constraints . . . . . . . . . . 23

3.5 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4 Quadratic Integer Optimization 25

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.2 Branch and Bound Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.3 Graphical Model Reformulation of the Unconstrained QIP . . . . . . . . . . . . . . . . . . 29

4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5 Message Passing Applied to Portfolio Optimization 34

5.1 Graphical Model Reformulation of the Portfolio Problem . . . . . . . . . . . . . . . . . . 35

5.1.1 Solving the maximum-a-posteriori problem . . . . . . . . . . . . . . . . . . . . . . 38

5.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6 Conclusion 47

6.1 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Bibliography 49

vi

Chapter 1

Introduction

1.1 Motivation

In modern portfolio theory, a risk-averse investor seeks to find the proportions of investments for a series

of assets that will minimize the portfolio variance while achieving some portfolio target return. This

problem is formulated as a mathematical program known as the Markowitz Model [27]. The traditional

Markowitz Model can efficiently be solved to optimality for large problem sizes using available methods

like interior-point algorithms [14]. However, extensions to the Markowitz Model which include consid-

erations to real-life practical concerns such as cardinality and round-lot constraints are less trivial to

solve [14]. The cardinality constraint is an important extension used to limit transaction costs associated

with investing in many assets [14]. Furthermore, large institutional traders mainly trade with round-

lots where one lot represents a batch of assets [14]. Cardinality and round-lot constraints draw special

attention from academics and this thesis as they transform the original Markowitz Model from a con-

tinuous problem to a pure integer one [10, 14, 49]. Typical approaches for finding solutions under these

constraints focus on branch-and-bound algorithms, heuristics, and Lagrangian relaxation approaches.

Commercial solvers using branch-and-bound methods are currently available to handle integer program-

ming and their solutions are often used as benchmarks. Thus in this thesis, we compare the solutions

generated by a proposed message passing method with those from Gurobi’s integer programming solver

1

2

[18].

1.2 Contribution

In this thesis, we model the portfolio optimization problem as a Markov random field, a type of prob-

abilistic undirected graphical model. A Markov random field is a useful abstraction that can be used

to encode the covariance relationship between the assets in our portfolio. Currently there exists a class

of algorithms known as message passing algorithms used to perform inference on graphical models. We

review existing literature where optimization problems have been reformulated as inference problems

on graphical models and message passing has been successfully applied. Traditionally, message passing

algorithms operate on an unconstrained objective function [2]. We study a recently developed method

to incorporate constraints within the message passing scheme and perform inference over a constrained

Markov random field. We conduct a thorough computational study to compare the effectiveness of the

proposed message passing method with Gurobi’s branch-and-bound method for solving the portfolio

problem with varying portfolio returns, cardinality limits and asset pool size. Contributions can be

succinctly summarized as follows:

• Reformulation of the portfolio optimization problem with round-lot and cardinality constraints as

an inference problem over a probabilistic graphical model

• Solving the recast portfolio inference problem using existing message passing algorithm methods

including max-product linear programming message updates [17], a cluster tightening scheme [36],

and Zhang et al.’s constraint parameter method to include global constraints [50]

1.3 Outline

In Chapter 2, we review the literature on probabilistic graphical models including message passing algo-

rithms for approximate inference and algorithms for models with constraints. We also review previous

applications of these inference techniques used to solve optimization problems. In Chapter 3, we review

the literature on portfolio optimization problems with cardinality and round-lot constraints. We discuss

3

the current optimization methods existing that are used to solve them. In Chapter 4, we build an undi-

rected graphical model to represent an unconstrained binary quadratic integer problem. With a series

of experiments we evaluate the performance of the max-product linear programming message passing

algorithm to solve this problem type. Results are compared with Gurobi’s branch-and-bound solver for

integer programs. In Chapter 5, we perform a thorough computational study comparison between Zhang

et. al’s [50] message passing procedure for constrained graphical models and Gurobi’s branch-and-bound

method to solve the portfolio problem under varying target returns, cardinality restrictions and asset

pool size. Finally, in Chapter 6 we provide concluding remarks and directions for future work.

Chapter 2

Inference in Probabilistic Graphical

Models

2.1 Probabilistic Graphical Models

The underlying structure behind an inference problem is a probabilistic graphical model. This thesis

explores message passing algorithms designed for inference on the probabilistic graphical model known

as a Markov Random Field (MRF). An MRF G = (V, E) is an undirected graph and is composed of

a set of nodes V connected by edges E . Each node i ∈ V can take value from one of k possible states

in K[2, 23]. In this way the set K is discrete [2, 23]. We say all singular nodes form a cluster, all

nodes connected by edges form a cluster and furthermore, any collection of nodes that are dependent

on each other form a cluster [2, 23]. A cluster is denoted c to form the set of all clusters C [2, 23].

This model is used to represent the joint probability distribution of a set of variables where nodes

represent the random variables of the distribution [2, 23]. Potential functions are defined for each cluster

and encode the relationship imposed upon the variables in that cluster [2, 23]. From an optimization

viewpoint, nodes can be seen as the variables of the problem while potential functions can be seen as

terms contributing to the objective function [48]. For a graph G = (V, E) we define the Energy function

4

5

E(x) for a configuration of variables x as the negative sum of all the potential functions:

E(x) = −∑c∈C

θc(x) (2.1)

Where θc is the potential function associated with a cluster c of variables and may take positive or

negative values [2, 23]. The Boltzmann distribution gives the probability that a system will be in a

certain state given the system’s energy and is defined as follows:

P (x) =1

Zexp(−E(x)) (2.2)

Where Z is a normalizing factor to ensure the most probable state has probability 1. Expression (2.2)

leads to an alternative representation of the probability distribution as a product of potential functions:

P (x) =1

Z

∏c∈C

θc(xc). (2.3)

This demonstrates the MRF probability distribution has a useful decomposition into the factors of each

defined potential functions [2, 23]. This is a key property which will be used to motivate the discussion of

message passing algorithms in section 2.2. Finding the most probable state of a system is then equivalent

to maximizing the Boltzmann distribution and is an optimization problem. Combining (2.1) and (2.3)

yields the following:

Maximize∑c∈C

θc(xc). (2.4)

This optimization problem is commonly known as the maximum-a-posteriori (MAP) problem [2, 23].

MAP problems are posed for inference problems over graphical models where the goal is to determine

the state of each node in the graph. MAP problems are frequently posed in computer vision [37, 15],

error correcting codes [48], and computational biology [47]. To solve the MAP problem in these fields,

methods known as message passing algorithms have been developed to exploit the inherent graphical

structure of the problem.

6

2.2 Message Passing Algorithms

Message passing algorithms are a class of algorithms that work on a graphical model to iteratively pass

messages between the nodes of the graph. We consider the fact that the marginal probability for a

particular variable xi is given by the sum of the joint distribution without xi:

p(xi) =∑x1

. . .∑xi−1

∑xi+1

. . .∑xN

P (X) (2.5)

where here we have a joint distribution composed of N variables [2, 23] and X is a particular configuration

of all nodes in the graph. Recalling (2.3), the expression for each marginal probability can be re-written

in terms of the potential functions on clusters:

p(xi) ∝∑x1

. . .∑xi−1

∑xi+1

. . .∑xN

∏c∈C

θc (2.6)

Note that summing over a particular variable is effectively removing that variable from the expression

and allows p(xi) to be calculated based on the information from the graph [2, 23]. In theory, p(xi) can be

calculated in a brute force manner known as variable elimination [2, 23]. However, this is intractable for

many clusters and the optimal order of conducting the summation is unclear [2, 23]. Instead, the process

of summing over a particular variable and removing it can be viewed as a message passing procedure

[2, 23]. In this way, the message passed from variable j to variable i is equal to the summation over

variable j of all the products of the potential functions containing only i and j, and the previous messages

delivered to j:

mj→i(xi) =∑xj

θj(xj) · θij(xi, xj) ·Πu∈N(j)\imuj(xj) (2.7)

Here N(j)\i refers to all the nodes connected to xj except the one of current interest xi and muj(xj)

is the previously computed message sent from node xu to xj [2, 23]. Here θij(xi, xj) is the potential

function associated with variables xi and xj which form an independent cluster. The previous messages

delivered to xj are simply the resultant potentials of summing the previous potentials over xj and thus

7

no longer contain the variable xj [2, 23]. In this way, computing all the messages entering variable xi

will give the value of p(xi) [2] [23]. The computed marginal probability p is also referred to as the belief

bi(xi) [2, 23]. In practice, a potential θ is initialized as the marginal probability for a node [48]. Adding

messages to θ will update it to θ̃ and we can refer to θ̃ as the final belief of a node [48]. To emphasize,

the belief bi(xi) is the updated marginal probability of a node xi after messages have been sent to it

[2, 23]. Message passing methods mainly vary based on how the beliefs are updated which is equivalent

to varying how the messages are sent. They are not limited to the form of (2.7) [2, 23]. In fact, the belief

updates of (2.7) form the sum-product message passing algorithm [2, 23]. Changing the summation in

(2.7) to a Max operator gives rise to the max-product message passing algorithm which will be discussed

again in section 2.3.2 [2, 23]. In general, message passing algorithms work on the presented motivation of

iteratively eliminating variables in the model to compute a marginal probability [2, 23]. At convergence

the ideal configuration is decoded by finding the configuration that maximizes the final value of each

single node belief:

x∗i = argmax bi(xi) (2.8)

This can be interpreted as finding the most probable assignment for each node [2, 23]. Message passing

initially began with Judea Pearl’s belief propagation algorithm for non-loopy graphs (1982) [31] and

developed into variants for loopy graphs that solve the MAP problem approximately [17, 24, 40]

2.3 Approximate Inference

In many practical applications the graphical model for performing inference is loopy. That is, it is

possible for a message to cycle many times throughout the graph [2]. In this way, convergence is not

guaranteed and the previously described exact message passing scheme in section 2.2 is not applicable

[2]. As a solution, we can approximate the original graph by decomposing it into its non-loopy sub

parts and use exact message passing methods to solve the sub-problems [2]. The class of message

passing algorithms that take this approach are known as linear programming relaxations [2]. Quadratic

8

programming relaxations that keep the quadratic form of the original problem have also been developed

but are less used than their linear programming relaxation counterpart [43] [44].

2.3.1 Linear Programming Relaxations

Linear programming relaxation approaches to MAP inference demonstrate the interesting overlap be-

tween traditional optimization topics and inference topics. The basic components of a linear program-

ming relaxation approach are as follows:

• Construction of the MAP problem as a linear program

• Relaxation of the integrality constraints of the linear program

• Formation of the Lagrangian dual function

• Methods to solve the Lagrangian problem

We will begin this discussion on linear programming approaches by deriving the dual of the MAP

problem. Note this section takes reference from [17, 35, 40]. Recall the MAP assignment problem in

(2.4) posed here for a pair-wise MRF G = (V, E) where xi can take values from a discrete set S:

(MAP) Maximizexi∈S∑i∈V

θi(xi) +∑ij∈E

θij(xi, xj) (2.9)

This objective can be made linear through the introduction of binary indicator variables. Namely, µi(xi)

for each node i ∈ V and each state xi ∈ S, where µi(xi) = 1 if node i is assigned to state xi and µi(xi) = 0

otherwise. We also introduce µij(xi, xj) for each edge connecting i and j, and each specific assignment

to states (xi, xj). Similarly, µij(xi, xj) = 1 if both i is assigned to state xi and j is assigned to state xj .

9

The resulting integer linear program is as follows

(MAP-ILP) Maximizeµ∑i∈V

∑xi∈S

θi(xi)µi(xi) +∑ij∈E

∑xi∈S,xj∈S

θij(xi, xj)µij(xi, xj) (2.10)

Subject to µi(xi) ∈ {0, 1}, ∀i ∈ V, xi ∈ S, (2.11)∑xi∈S

µi = 1, ∀i ∈ V, (2.12)

µij(xi, xj) ∈ {0, 1}, ∀i, j ∈ E , xi, xj ∈ S, (2.13)∑xi∈S,xj∈S

µij(xi, xj) = 1, ∀i, j ∈ E , (2.14)

∑xi∈S

µij(xi, xj) = µj(xj), ∀i, j ∈ E , xj ∈ S, (2.15)

∑xj∈S

µij(xi, xj) = µi(xi), ∀i, j ∈ E , xi ∈ S. (2.16)

Constraints (2.11) and (2.13) enforce that the µ indicator variables are binary. Constraints (2.12) and

(2.14) enforce that each variable can only be assigned to one state, and similarly, every edge can only

be assigned to one pair of states. Constraints (2.15) and (2.16) enforce that the pair-wise assignments

are consistent with each other for each edge. Note that while we have made a change of variables, the

solution to this new problem is equivalent to the original problem (2.9). We now relax the integrality

constraints such that the indicator variables µ can take any value in the interval [0, 1]. This changes

equations (2.11) and (2.13) to the following:

µi(xi) ∈ [0, 1] ∀i ∈ V, xi ∈ S (2.17)

µij(xi, xj) ∈ [0, 1] ∀i, j ∈ E , xi, xj ∈ S (2.18)

We call the MAP problem with these relaxed integer constraints the MAP-LP problem. The remaining

consistency constraints (2.15) and (2.16) can be incorporated into the objective by Lagrangian augmen-

10

tation. The normalization constraints (2.12) and (2.14) are enforced explicitly.

L(λ, x) =∑i∈V

∑xi

θi(xi)µi(xi) +∑ij∈E

∑xi,xj

θij(xi, xj)µij(xi, xj) +∑ij∈E

∑xi

λi→ij(xi)(∑xj

µij(xi, xj)− µi(xi))

(2.19)

Where λi→ij(xi) are the Lagrangian parameters associated with each of the consistency constraints (2.15)

and (2.16). This Lagrangian is a relaxation of the relaxed integer problem. To summarize, we currently

have the following hierarchy between our problems for x a MAP assignment (2.19):

L(λ, x) ≥ zMAP-LP(x) ≥ zMAP-ILP(x) = zMAP(x) (2.20)

where z is the optimal objective value of the respective optimization problem. We now wish to tighten

this upper bound to arrive close to the original MAP objective function value. This can be done by

finding the Lagrangian parameters that minimize the Lagrangian values . In traditional optimization

this problem is the Dual problem:

(Dual) Minimizeλ L(λ) (2.21)

Combining (2.19) and (2.21) with rearrangements gives rise to the following dual expression with clear

sub-problems. To arrive at (2.23) the binary indicator variables µ are eliminated by performing a

maximization over them.

∑i∈V

Maxxi[θi(xi) +

∑xj

λi→ij(xi)] +∑ij∈E

Maxxi,xj[θij(xi, xj)− λi→ij(xi)− λj→ij(xj)] (2.22)

From (2.22) we can observe sub-problems associated with each singular node and every edge. As the

Lagrangian parameters work to enforce consistency between sub-problems, they represent messages sent

between the sub-problems. Hence, they are expressed with sub-scripts representing what messages they

send. Messages are sent to update the beliefs. When the beliefs are optimal and the subproblems are

11

consistent, the Lagrangian parameters in (2.22) cancel out:

∑c∈C

θc(xc) =∑c∈C

θ̃c(xc) (2.23)

where θc are the original potential functions and θ̃c are the updated potential functions. The value of θc

changes to θ̃c based on the incoming messages it receives. We now will discuss how the messages (dual

variables) are updated such that the dual problem (2.21) is solved and we arrive at consistent beliefs

(2.23).

2.3.2 Max-Product Linear Programming Algorithm

The Max-Product Linear Programming Algorithm (MPLP) developed by Globerson and Jaakola is a

message update method that results in a gradual decrease in the dual objective to solve (2.21). This

is done by a coordinate descent scheme [17, 35, 40]. We fix all the dual variables λ except for one

particular λij→i(xi) and solve for the λij→i(xi) that minimizes the dual (2.21). Naturally, this involves

solving the individual sub-problems under the fixed assignment. This is done in an exact manner with

the max-product algorithm described in section 2.2. Combining the coordinate descent scheme and

exact max-product algorithm leads to closed form expressions for the passed messages that can be used

efficiently. Pseudo-code for the MPLP algorithm is presented in Algorithm 1 [17].

We note although coordinate descent algorithms decrease the dual objective at every iteration, they

are not generally guaranteed to converge to the dual optimum. The reason is that although the dual

objective is convex, it is not necessarily strictly convex. This implies that the minimizing coordinate

value may not be unique. Thus at convergence MPLP may not find the dual optimal solution which is

a drawback of the method.

12

Data: θc(xc) ∀c ∈ C, ε

Result: Assignment xi ∀i ∈ V

Initialize messages λc→i(xi) = 0

while Dual Decrease ≥ ε do

for c ∈ C do

for k ∈ C\c do

Send MPLP message updates λc→i(xi)

Update θk(xk) to θ̃k(xk)

end

end

Decode integer solution xi = argmax bi(xi)

Compute primal∑c∈C θc(xc)

Compute dual∑c∈C θ̃c(xc)

end

Algorithm 1: Coordinate descent message update procedure

2.3.3 Tightening LP Relaxations

The linear programming approach to MAP described above works with the assumption that the resulting

linear program is tight when the integer constraints are relaxed as in (2.17) and (2.18) [17] [35] [40].

Unfortunately, this cannot be ensured for a general problem [36]. For a tight linear program, the

integrality gap between the dual objective and the exact MAP solution is zero [36]. In practice, we can

assess the tightness of our linear program by noting the resultant integrality gap. A proposed method to

tighten the linear programming relaxation is the cluster pursuit method introduced by Sontag [36]. Recall

in the previous section, the dual was decomposed into sub-problems involving only pair-wise clusters.

The over-arching idea behind cluster pursuit is that adding higher order clusters (e.g. clusters of three

nodes) will help enforce local consistency between the sub-problems and thus lead to a tighter relaxation

[36]. This is achieved by dynamically enforcing relations between the variables until the relaxed feasible

region is equal to the integer feasible region [36]. It is expected that the integer feasible region can

be approximated without having to enforce all the previously relaxed constraints [35]. Moreover, only

13

clusters that are guaranteed to tighten the bound (lower the dual objective value) are added [36]. In

practice, we predetermine a list of possible clusters to add and compute a score based on how much

they change the bound. Clusters are ranked and added to a queuing list. Pseudo-code for tightening LP

relaxations is presented in Algorithm 2 and is based on the method outlined by Sontag [36].

Data: θc(xc) ∀c ∈ C, ε1, ε2

Result: x∗i ∀i ∈ V

Initialize ij ∈ C

while Dual Decrease ≥ ε1 do

for c ∈ C do

Send message updates λc→i(xi)

end

Decode integer solution x̂i = argmax bi(xi)

if Integrality Gap ≥ ε2 then

x̂ = x∗

else

Add new higher order cluster c̃ with best score

end

end

Algorithm 2: Linear Program Tightening Procedure

2.4 Approximate Inference with Constraints

Our previous discussion of message passing has mainly focused on MRFs with solely single node and pair-

wise clusters. However, in practice we may encounter a problem with higher-order clusters representing

a relationship between all the nodes in that cluster. We call this a constrained MRF[2, 23]. This can be

posed as the following combinatorial optimization problem with |K| constraints:

Maximize∑c∈C

θc(xc) (2.24)

Subject to∑c∈C

φkc (xc) ≤ 0, k ∈ K. (2.25)

14

Naively, we can solve the above problem by incorporating the constraints (2.25) in the objective (2.24)

by computing for which configurations the constraint is violated and assigning those a value of negative

infinity [2, 23]. If a configuration satisfies the constraint, we assign it a value of zero [2, 23]. In this

way, we are still trying to find the most probable assignment of x. Assignments that do not satisfy the

constraints are simply given a low value:

Maximize∑c∈C

θc(xc) +∑k∈K

hk(xc) (2.26)

where hk(xc) = −∞ if∑c∈C

φkc (xc) > 0, 0 otherwise. (2.27)

This is not tractable for large clusters as we require the calculation of hk(xc) for all possible configurations

|S||c| where |S| is the number of states xi can take and |c| is the size of the cluster [2, 23]. Message passing

methods for graphs with large clusters are only tractable for certain problem types [13]. For example,

Tarlow et al. [39] demonstrated message passing can be tractable for higher order potentials concerned

with constraints on cardinality and variable order. Duchi et al. [13] showed message passing can be

tractable for mutual exclusion constraints where we pair variables together with no sharing allowed.

Recently methods have been developed to conduct inference on general constrained MRFs with large

clusters [1, 26, 50]. Aguiar et al. [1] added a quadratic constraint violation penalty to the Lagrangian

in (2.19). As the Lagrangian no longer decomposes into subproblems because of the quadratic term,

a new maximization procedure known as the Alternating Direction Method of Multipliers is proposed.

Lim et al. [26] presented a method utilizing cutting planes. In the context of minimization, the problem

constraint is incorporated into the Lagrangian function (2.19). At each iteration, the optimal maximizing

Lagrangian parameter and MAP assignment are computed and a cut is added to remove it as a solution.

At convergence, we arrive at the solution that minimizes the Lagrangian, with all the maximizers being

previously cut out. Unfortunately, this method assumes exact solutions can be computed for the MAP

problem at each iteration and is not generally tractable. The method presented by Zhang et al. [50]

also incorporates a constraint violation penalty into the Lagrangian, but in an alternative way that still

allows for a dual decomposition. Exact MAP solutions are also not needed at each iteration. Zhang et

15

al’s [50] method was shown to have a good performance in comparison to the methods developed by

Aguiar et al. [1] and Lim et al. [26] in terms of convergence time and solution quality.

2.4.1 The Method of Zhang et al.

To avoid clusters with a large number of nodes Zhang et al. [50] propose a message passing method

of relaxing the integer problem in (2.24) to a linear problem ignoring constraints (2.25) that add large

clusters. Constraints (2.25) are reformulated to span clusters with only one or two nodes exclusively [50].

The linear problem’s dual is then augmented with Lagrangian parameters γ incorporating the previously

ignored constraints to arrive at the following new dual formulation:

Minimizeγ∑c∈C

Max[θc(xc)−∑k∈K

γkφkc (xc)] (2.28)

The dual problem is solved for the optimal beliefs θ̃ and optimal Lagrangian parameters γ through two

methods:

1. Beliefs θ̃ are updated via the MPLP coordinate descent scheme described in section 2.3.2. This

ensures a dual decrease in each iteration.

2. Lagrangian parameters γ are updated by binary search. This ensures the solution is feasible.

As the MPLP scheme is already described in section 2.3.2, we now detail how binary search is used to

ensure a feasible solution. In practice, we use coordinate descent to update the beliefs θ̃ first and fix all

γ except for a particular γk to maximize over [50]. At each iteration, we can decode the current solution

for each sub-problem based on the current computed beliefs and current γ:

x̂c = argmax[θ̃c(xc)−∑k∈K

γkφkc (xc)] (2.29)

With this solution we verify whether the constraints are satisfied [50]. For each constraint k, if there is

a violation, we increase the value of the parameter γk, else if there is no violation, we decrease its value

[50]. We will eventually find the minimum value of γk needed to ensure k is enforced. The algorithm

16

terminates when we can no longer decrease the dual objective [50]. At this point, the solution is decoded

as in (2.8) [50]. To emphasize, as we are solving a relaxation of the original integer problem, the solution

is expected to be sub-optimal [50]. Hence we can use the linear programming tightening technique

discussed in section 2.3.3 to improve the solution [36]. We present the pseudo-code for Zhang et al.’s

method in Algorithm 3 [50]:

Data: θc(xc) ∀c ∈ C, φkc ∀c ∈ C ∀k ∈ K, ε

Result: x∗i ∀i ∈ V

Initialize γ = 0, fmax = −∞

while Dual Decrease ≥ ε do

for c ∈ C do

Send MPLP message updates λc→i(xi)

end

for k ∈ K do

Update γk via binary search

Decode integer solution x̂i = argmax bi(xi)

if x̂ is feasible and∑c θc(x̂c) ≥ fmax then

x∗ = x̂, fmax =∑c θc(x̂c)

end

end

end

Algorithm 3: Message passing procedure for constrained MRFs

2.5 Message Passing Applied to Optimization Problems

Thus far in this chapter, we have described at length how the abstract MAP problem of finding the op-

timal variable assignment that maximizes the sum of potentials in a graph can be solved efficiently. The

MAP problem is a combinatorial optimization problem. In this way, other combinatorial optimization

problems have been reformulated as MAP problems and solved using the message passing techniques

described. We now discuss the previous work done making the connection between combinatorial opti-

mization and their graphical model representation. We divide them into two categories; approaches based

17

on linear programming relaxations, and approaches where problem specific message passing procedures

could be derived due to the special nature of the problem’s resulting graphical structure.

• Lazic et al. [25] applied message passing to the facility location problem. In the facility location

problem, one wishes to assign facilities as open or close and assign customers to the open facilities.

Lazic et al. presents the problem as a probabilistic graphical model and compares solutions based

on MPLP and max-sum message update procedures. It is concluded that MPLP provides good

approximate solutions for this problem based on known exact results from benchmark data sets.

• Dhoot [12] applied message passing to optimize wind farm layouts in a grid. This problem is a

constrained binary quadratic integer problem. Linear programming relaxations and LP tightening

are used. Constraints were incorporated with a method similar to that of Aguair et al [1]. Results

from comparing with an exact branch-and-bound solver show that message passing provides good

sub-optimal solutions and converges in a time significantly faster than branch-and-bound.

• In his paper that introduces the LP tightening procedure, Sontag tests his method on the protein

design problem [36]. Here the goal is to find a sequence of variable assignments representing

amino-acids that will be the most stable and minimize an energy function. With the cluster

tightening method, near-optimal and exact solutions were successfully recovered. It was reported

that this problem was too large to be handled by branch-and-bound algorithms.

• Yanover [46] [47] also explored the protein design problem along with the stereo vision prob-

lem. In the stereo vision problem, we are given two images and wish to find the disparity in

each pixel. This amounts to minimizing an energy function. The MAP problem is formulated and

solved using a linear programming approach. For both problems, message passing was used and

converged to a solution. Furthermore, standard solvers were unable to scale to solve the same large

problems.

• In Zhang et. al’s paper [50] that developed the constrained message passing scheme discussed in

section 2.4.1, the quadratic knapsack problem with five constraints is tested. They observed

near optimal solutions characterized by small integrality gaps. Furthermore, they outperformed

18

commercial solvers in terms of optimality gap and time.

• Sanghavi et al. [34] proposed an algorithm to solve the maximum weight independent set

problem using a variant of the max-product algorithm. The problem constraints are relaxed

forming a linear programming to which the max-product algorithm is amended.

• Moallemi [28] developed a message passing procedure to solve the resource allocation prob-

lem. The problem of assigning resources to activities was visually represented with a graphical

model with which a problem specific message passing algorithm was developed and applied. Com-

pared with heuristic methods, message passing outperformed in terms of its optimality gap at

convergence, and was more stable than the other methods which exhibited more varied gaps.

• The traveling salesman problem is explored by Ravanbakhsh [32] [33] and is formulated as a

probabilistic graphical model. A message passing procedure unique to the graph is derived. Results

compared against the exact branch and bound method demonstrated its ability to arrive at near

optimal solutions.

We can observe two common themes throughout these investigations. First, message passing is

widely reported to arrive at near optimal solutions and outperform problem specific heuristic methods.

Second, message passing can tackle problems that standard commercial solvers, namely branch-and-

bound, cannot.

2.6 Discussion

In this chapter, we have discussed the theory behind message passing to demonstrate that the combi-

natorial MAP optimization problem can be efficiently solved for good sub-optimal solutions by linear

programming relaxation approaches. This is because the graphical model can be decomposed into easily

solvable sub-problems. Concerns about the tightness of the resulting relaxed problem can be addressed

by LP tightening schemes [35]. Furthermore, constraints involving many variables can successfully be

incorporated into the message passing procedure [50]. It is for these reasons that message passing has

been applied successfully to a broad array of optimization problems.

19

In short, message passing can be applied to an optimization problem by modeling it as an MRF and

reformulating the original objective as a MAP optimization problem. In the MAP form, the graphical

structure of the problem is explicitly highlighted, to which message passing algorithms can be applied

to exploit the structure. We are next motivated in investigating a problem not yet explored by message

passing in the literature; the portfolio optimization problem.

Chapter 3

Portfolio Optimization

When investing in various financial instruments such as bonds and stocks, one naturally wishes to

maximize the profit accrued by investing given some tolerance to a loss. Financial instruments used

for investments represent a legal agreement with monetary value. Thus, they may be traded amongst

investors as willed to achieve their profit goals. The process of determining how much to invest in a set of

assets is known as the portfolio selection problem. To motivate our discussion for portfolio optimization

we first introduce the Markowitz model which was the first portfolio selection model to incorporate the

mean and variance of a portfolio within a mathematical program.

3.1 Markowitz Portfolio Problem

For this thesis, we seek to solve the classical Markowitz portfolio problem with round-lot and cardinality

constraints. The Markowitz portfolio problem in its simplest form (3.2) - (3.5) seeks to find the ideal

portfolio weights from a selection of n assets that minimizes the portfolio’s variance [27]. Let ri be the

random return of asset i, µi the expected return of asset i and σij the covariance between assets i and

j. If we assign asset i a portfolio weight xi the portfolio return is given by rp =∑ni=1 rixi and the

expected return of the portfolio is given by µp =∑ni=1 µixi [27]. Using these values we can formulate

20

21

an expression for the portfolio’s variance:

σ2p = E[(rp − µp)2] =

n∑i=1

n∑j=1

σijxixj (3.1)

Portfolio variance is of important interest as it gives a quantitative measure of the loss of a particular

investment [27]. The mathematical programming formulation to find the optimal investment weights

that will minimize the variance is given by Markowitz [27] as follows:

Minimize

n∑i=1

n∑j=1

σijxixj (3.2)

Subject to

n∑i=1

xi = 1, (3.3)

n∑i=1

µixi ≥ R, (3.4)

xi ≥ 0, ∀i = 1, . . . , n. (3.5)

Constraint (3.3) ensures all the portfolio weights representing the proportion of the investment sum to

one. Constraint (3.4) is imposed to force the portfolio to have a minimum target expected return R.

We allow for no short-selling (the selling of an asset we do not own) by restricting the decision variable

to be positive in (3.5) [27]. This version of the Markowitz model is handled well by commercial solvers

particularly because the covariance matrix is always positive semi-definite [10, 14, 49]. Variations on this

problem include the addition of combinatorial constraints such as round-lot and cardinality constraints

that are difficult to solve in practice. Although challenging, these constraints address real life portfolio

concerns that are of interest to address.

22

3.2 Round-lot Constraint

Assets are often constrained to be traded in large batches or ”lots”. The decision variable zi is now

integer and reflects the amount of lots for asset i bought. The variable xi for the proportion of investment

we invest in asset i is now dependent on pi, the price of the asset, Mi the number of assets in a lot, and

V the total investment budget [10, 14, 49]. This amounts to adding the following two constraints to the

standard Markowitz problem:

xi =piMiziV

, ∀i = 1, . . . , n, (3.6)

zi ∈ Z+, ∀i = 1, . . . , n. (3.7)

3.3 Cardinality Constraint

Cardinality constraints are used to limit the number of assets that can be invested in to a maximum of

K. For such a problem, we introduce a binary variable yi for each asset i representing whether the asset

is included in the portfolio or not [10, 14, 49]. This amounts to adding the following three constraints

to the standard portfolio optimization problem:

&sumni=1yi ≤ K, (3.8)

Lyi ≤ xi ≤Myi, ∀i = 1, . . . , n, (3.9)

yi ∈ {0, 1}, ∀i = 1, . . . , n, (3.10)

where the purpose of constraint (3.9) is to ensure the choice of variable xi agrees with the decision yi of

whether the asset is included in the portfolio or not. In this way we take M to be a sufficiently large

number and L a sufficiently small number.

23

3.4 Markowitz portfolio problem with round-lot and cardinality

constraints

The goal of this thesis is to solve the Markowitz portfolio problem with round-lot and cardinality con-

straints. It is posed fully as follows:

Minimize xTQx (3.11)

Subject to

n∑i=1

xi = 1, (3.12)

n∑i=1

µixi ≥ R, (3.13)

n∑i=1

yi ≤ K, (3.14)

Lyi ≤ xi ≤Myi, ∀i = 1, . . . , n, (3.15)

xi =piMziV

, ∀i = 1, . . . , n, (3.16)

zi ∈ Z+, ∀i = 1, . . . , n, (3.17)

yi ∈ {0, 1}, ∀i = 1, . . . , n. (3.18)

This is an integer quadratic programming problem which is NP-hard [10, 14, 49]. Problems that include

the cardinality and round-lot constraint simultaneously have been primarily addressed with heuristic

algorithms, branch-and-bound algorithms and Lagrangian relaxation approaches [3, 4, 7, 21].

24

3.5 Previous Work

We will now highlight some key approaches used to handle the cardinality and round-lot constraints

within a Markowitz framework. Literature reviewed considers both the cardinality and round-lot con-

straints applied simultaneously and separately.

• The portfolio selection problem with buy-in thresholds, cardinality constraints and round-lot con-

straints is addressed by Jobst et al. [21] The integrality constraints of the problem are relaxed

to form a continuous quadratic program. Solving this continuous program with existing interior

methods provides a bound which can be used within a branch-and-bound search tree. To aid the

solution, a heuristic involving solving the problem with only the cardinality constraint and the

adding the remaining constraints to form a new problem with the chosen assets is used .

• Bonami et al. [4] amends the standard branch-and-bound algorithm with customized branching

rules for the round-lot problem. At each node in the branch-and-bound tree, the integer variable

whose optimal value in the current continuous relaxation is not integer is considered. For each

such variable, the one for which restoring the integrality increases the variance of the portfolio the

most is found and is the variable that is branched on.

• The cardinality constrained portfolio problem is explored by Chang et al. [7] through the use of

heuristics. Genetic algorithms, tabu search and simulated annealing are applied as methods to

find near-optimal solutions.

• General quadratic integer programs with cardinality constraints are approached by Bienstock [3].

Here, the cardinality constraint is replaced by a tighter constraint which acts as a valid cut within

a branch-and-bound framework.

In all, it is observed that popular solution methodologies include customized branch-and-bound

techniques and evolutionary heuristic approaches. Notably, message passing methods have not been

applied to our desired portfolio problem. As a first step to approaching this problem, we explore message

passing applied to general quadratic integer problems in the next chapter.

Chapter 4

Quadratic Integer Optimization

Quadratic integer optimization is a special case of quadratic continuous optimization where decision

variables are limited to be integer. With the inclusion of integer decision variables, standard techniques

used in non-linear optimization such as the Karush-Kahn-Tucker methods, Newton’s method and bar-

rier methods no longer apply. Similarly, standard techniques from integer optimization are complicated

due to the quadratic nature of the objective function. Furthermore, integer quadratic programming is

well-known to be NP-Hard [8, 42].

The portfolio optimization problem presented in Chapter 3 is a quadratic integer problem (QIP). As

a first stepping stone, we wish to asses the performance of message passing techniques for solving an

unconstrained QIP. In this chapter, we represent an unconstrained binary quadratic problem as a prob-

abilistic graphical model and reformulate the QIP as a MAP inference problem. The QIP modeled as

a graph is then solved using the max-product linear programming algorithm (MPLP) augmented with

triplet cluster tightening. As our goal in Chapter 5 is to compare our message passing procedure for the

portfolio problem with a bench-mark branch-and-bound method, we choose in this chapter to compare

with branch-and-bound as well.

25

26

4.1 Introduction

For the purpose of this study, we consider the unconstrained binary quadratic problem posed as follows:

Maximize xTQx (4.1)

Subject to x ∈ {0, 1}N . (4.2)

Where Q is a real symmetric matrix [8, 42]. Deceptively simple, interest in this problem has increased in

the field of combinatorial optimization due to its potential to represent a diverse set of problems and its

computational challenge [5, 8, 42]. Some applications include facility locations problems, resources allo-

cation problems, clustering problems, set partitioning problems, assignment problems, and sequencing

and ordering problem [5, 8, 42]. The unconstrained binary quadratic problem is NP-hard with heuristics

mainly used to produce solutions in a reasonable amount of time [8, 42]. Methods for solving quadratic

integer programs mainly fall into one of the following categories:

• Semidefinite programming: The quadratic integer program with a positive semi-definite matrix

Q can be approached through semidefinite programming. Here, a linear function is minimized

subject to the constraint that a combination of symmetric matrices is positive semi-definite. In

can be used to find a bound on the objective value [6, 30].

• Greedy heuristic Algorithms: At every decision stage, the optimal immediate choice is chosen

with no foresight to the overall optimization problem [9].

• Lagrangian relaxations: Complicating constraints of the original problem can be incorporated

into a Lagrangian function to create an unconstrained or simplified version of the original problem.

Optimizing the Lagrangian will result in an upper bound on the original problem [16].

• Branch-and-Bound Algorithms: Variations of branch-and-bound are most common for solving

popular NP-hard optimization problems as they are an exact method and can provide optimality

guarantees on the generated solution. The space of candidate solutions is systematically searched

which offers an incumbent best feasible solution. Unexplored candidate space can be ruled out

27

by solving a linear programming relaxation of the problem that provides a bound on the optimal

solution. Unfortunately, branch-and-bound convergence is notoriously slow for large problems.

As we are comparing the message passing method to branch-and-bound it is discussed further in

section 4.2 [8] [42].

• Tabu and Neighborhood Search Algorithms: Starting with an initial solution, small changes

are applied moving the initial solution to a ”neighbor” solution if that ”neighbor” is better [19].

• Evolutionary Algorithms: Solutions are found by following a process mimicking evolution.

”Fit” solutions with good qualities are ”bred” to produce ”offspring” new solutions [7] [20].

• Simulated Annealing Algorithms: The algorithm begins with an initial solution and initial

probability of accepting a worse solution. As the solution space is searched, the probability of

accepting a worse solution is decreased [11].

As demonstrated earlier, graphical models have successfully been used to model problems where pair-

wise relationships between variables exist and for problems where the variables can only take values

from a finite list of possible states. We propose representing (4.1) - (4.2) as a Markov random field and

reformulating it as a MAP problem. This is because the quadratic nature of the objective function can

easily be represented as a graphical model with the variables as nodes and edges connecting every pair of

variables. We then will apply message passing with MPLP updates and a concurrent cluster tightening

method to solve (4.1) - (4.2) approximately. Specifically, we are interested in the quality of the solutions

returned by message passing, as well as the run time comparison with an exact method.

4.2 Branch and Bound Algorithm

The branch-and-bound algorithm is often used to solve quadratic integer problems by performing enu-

meration through a tree search [8, 42]. The algorithm starts by solving the relaxation of the original

problem where all integer variables are allowed to take continuous values [8, 42]. If the solution to this

first relaxation problem x∗ is integer, then x∗ is an optimal solution and the problem is solved [8, 42].

Otherwise, at least one component of x∗ has a non-integer value [8, 42]. We denote this component

28

as xi. We then choose xi to ”branch” on which means two sub-problems are created [8, 42]. In one

sub-problem, xi is assigned the value dxie and in the other, xi is assigned bxic [8, 42]. These two sub-

problems are placed in a list of open problems [8, 42]. At each iteration one of the open sub-problems is

chosen and the continuous relaxation of that sub-problem is solved [8, 42]. The relaxation values of the

problem provide a bound on the incumbent solution. The following possibilities can occur based on the

solution to the relaxation:

• If the relaxation is infeasible, the node is not split into anymore subproblems. We refer to this

node as pruned.

• If the relaxation solution is integer and better than the best known integer solution, we have a new

best integer solution.

• If the relaxation solution is not integer but better than the best known integer solution, we split

the node into two new sub-problems with branching on a variable taking non-integer value at the

relaxation solution.

• If the relaxation solution is not better than the best integer solution, the node is pruned.

By iterating the process, a search tree is created and the algorithm continues until the list of open

sub-problems is empty [8, 42]. Some key considerations when using the branch-and-bound procedure

including deciding which variable to branch on, determining which sub-problem to solve next and how to

introduce cutting planes to shrink the solution space [8, 42]. For branch selection strategy, one can use

”strong branching” or ”pseudo costs”, methods that estimate which branch will lead to the largest change

in the linear programming relaxation [8, 42]. These methods rely on past empirical evidence we have

about the problem. This is a drawback as branching is most effective at the beginning of the algorithm

where we have less evidence collected to make informed decisions [8, 42]. For choosing sub-problems to

process one can use a ”best first” or ”depth first” approach [8, 42]. In best first, problems that have

the best linear relaxation bound are processed first, however this method often does not find feasible

solutions quickly as these usually are deep into the sub-problem tree [8, 42]. The depth first approach

takes the deepest sub-problem to process next [8, 42]. Unfortunately this results in small changes with in

29

linear programming relaxation so the bound in the problem moves slowly [8, 42]. Cutting planes can be

used to improve the bound of the solution by removing the solution space with infeasible integer solutions

[8, 42]. However excessive cutting planes can increase the difficulty of solving the linear programming

relaxations [8, 42]. In all, branch-and-bound has proven itself to be a reliable and systematic method

of processing sub-problems to find good feasible solutions and bounds on the optimal value [8, 42].

However, employing branch-and-bound for integer programs involves careful decisions about the trade-

offs between different approaches based on the specific problem [8, 42]. Furthermore, branch-and-bound

is notoriously slow for large problems where the search space expands easily [8, 42].

4.3 Graphical Model Reformulation of the Unconstrained QIP

We consider a Markov random field G = (V, E) where clusters are composed of every singular node i ∈ V

and every pair of nodes connected by an edge ij ∈ E . The energy function for a graphical model is given

by:

E(x) =∑i∈V

θi(xi) +∑ij∈E

θij(xi, xj) (4.3)

where θ is a potential function which encodes a relationship between the nodes it operates on. The

objective function in (4.1) can be brought into the above form through the following identification of

the potential functions:

θi(xi) = qiixi, ∀ i ∈ V, (4.4)

θij(xi, xj) = 2qijxixj , ∀ ij ∈ E , (4.5)

where qii are the diagonal terms of the matrix Q and qij are the cross terms. The problem can then

be reformulated as the following maximum-a-posteriori problem where we sum all the defined potential

30

functions:

Maximize∑i∈V

qiixi + 2∑ij∈E

qijxixj (4.6)

To solve (4.6) we propose utilizing the MPLP linear programming approach described in section 2.3.2.

We choose MPLP as our choice of algorithm as it has previously been used in the existing literature

to solve the MAP problem in other pair-wise Markov random fields with structure similar to (4.4) and

(4.5) [25, 35, 50]. To augment this implementation, we will also utilize a tightening scheme to ensure the

linear programming relaxation is tight. We choose the cluster tightening scheme proposed by Sontag [35]

as it was shown to provide good and sometimes exact solutions when used for pair-wise Markov random

fields with the MPLP algorithm in the context of stereo vision problems and protein design problems.

4.4 Experiments

Test problems are created by randomly generating a positive-definite matrix Q with entries taken from

a uniform distribution in [-10, 10]. We test under varying problem sizes with N ∈ {20, 30, . . . , 100}

variables respectively. As the goal of this Chapter 4 is to examine our message passing procedure later

used in Chapter 5, we are interested in positive-definite matrices as the covariance matrix for assets in

the portfolio problem is positive-definite.

We test the message passing algorithm with MPLP message updates and cluster tightening with the

graphical model defined in (4.4) - (4.5) and refer to it as MP. For cluster tightening we use the scheme

developed by Sontag [35]. Here, we allow message passing to run until the dual decrease is less than

1e-2. At that point, we add tightening clusters at every iteration until the dual decease is less than 1e-6.

Triplet clusters are chosen to be added, as for large problems, searching for higher order clusters, is time

consuming. The results of MP are compared with the results obtained by the default branch-and-bound

algorithm of Gurobi 7.05 solver with all settings set to their default options. Gurobi takes the integer

program defined in (4.1) - (4.2) as input. The MP and Gurobi algorithms were run on a 64-bit Lenovo

31

E460 workstation with Intel Processor i5-6200 2.30 GHz CPU and 4GB of RAM. All algorithms were

implemented using Python 2.7. The Python library PGMPY for probabilistic graphical models was used

to employ the MPLP updates and cluster tightening schemes [29].

Table 4.1 lists the convergence time of both algorithms (GB Conv. Time, MP Conv. Time), MP’s

optimality gap at convergence (MP Gap %) and the percent error between the MP and branch-and-

bound solution (Error %). Note branch-and-bound solved all instances to optimality so its optimality

gap at convergence is not shown. Furthermore, message passing’s convergence time includes the pre-

processing time for generating a graph and finding cluster tightening triplets. As branch-and-bound is

solving to optimality while MP is not, comparing the convergence times of both methods directly is

not of interest. In this way, Gurobi’s time to reach the same optimality gap as MP at convergence is

recorded as well (GB Time to Gap). 20 problems per problem size are tested, thus Table 4.1 lists the

arithmetic averages accumulated. Note that the arithmetic and geometric averages for Gurobi’s and

MP’s convergence time were similar. From Table 4.1 we observe message passing with MPLP message

Table 4.1: A comparison of results generated using message passing with MPLP updates and clustertightening, and exact branch-and-bound

N GB Conv. Time (s) MP Conv.Time (s) GB Time to Gap (s) MP Gap (%) Error (%)20 0.01 21.37 0.00 2.93 1.9530 0.73 72.24 0.01 2.52 1.1940 0.82 93.83 0.01 3.17 1.5050 0.90 223.36 0.01 2.80 1.6760 1.94 577.31 0.01 2.74 1.6970 32.39 887.97 0.23 2.78 1.4280 6650.36 1502.90 4650.76 2.21 1.5390 12288.70 6152.95 9212.20 3.06 1.51100 22785.33 9673.90 17567.18 2.65 1.43

updates and cluster tightening performs well in comparison with an exact method. Percent error com-

pared to the exact solution is consistently below 2%. Furthermore, message passing is able to arrive

at a tighter optimality gap faster than branch-and-bound for large problem sizes. It is observed that

branch-and-bound significantly increases the average convergence time when the problem size increases

to 80 variables. This is because we now have an enlarged set of active unexplored nodes. As branch-

and-bound is search based, this results in solving many LP relaxations at nodes that lead to very small

32

Figure 4.1: GB performance over time Figure 4.2: MP performance over time

decreases in the upper bound of our maximization problem. Plots demonstrating the optimality gap

over time for both methods are shown in Figure 4.1 and Figure 4.2 respectively.

Presented respectively in Figure 4.1 and Figure 4.2 is the performance of Gurobi branch-and-bound

and message passing for a particular 80 variable test problem. In this case, Gurobi converges to its

optimal solution 659.12 at 4400 seconds while message passing reaches a 2.74 % optimality gap with an

incumbent solution at 652.27 at 420 seconds. After 420 seconds, the upper bound does not decrease

more than 1e-6 and the algorithm is terminated. Branch and bound only reaches the equivalent 2.74

% optimality gap at 2300 seconds. We observe although branch-and-bound finds the optimal solution

as the incumbent quickly, proving optimality of this solution is lengthy with small changes in its upper

bound over time. Contrarily, message passing does not find the optimal solution initially and must

update its incumbent four times before arriving at its final solution.

Furthermore, we can observe the impact of adding triplet clusters to the algorithm with the drastic

decrease in the bound at 260 seconds. Note that although we observe message passing arriving at a 2.74

% optimality gap faster than Gurobi, it does not decrease this gap over time. Hence message passing

allows only us to conclude at 420 seconds that the optimal solution lies between a value of 670.62 and

652.27. In this way, message passing can be especially useful for finding a sub-optimal solution with

some guarantees on its solution quality quickly.

33

4.5 Discussion

The experiments of this section demonstrated that message passing is effective in finding good sub-

optimal solutions of the original problem. Furthermore, although the solutions are sub-optimal, the

cluster tightening technique enables us to find good upper and lower bounds faster than exact branch-and-

bound methods for large problem sizes. We attribute this to search based nature of branch-and-bound.

Branch-and-bound finds the bound on the incumbent solution by solving a series of LP relaxations for

each node in its search path. Although techniques like strong-branching, pseudo-costs, depth first and

best first can aid in choosing a search direction that will lead to a good bound change, there is no

guarantee that a large bound change will actually occur. Contrarily, in the cluster tightening method,

we explicitly choose clusters we know will decrease the bound the most, making a good bound decrease

guaranteed. Accepting these deficiencies in standard branch-and-bound, hybrid branch-and-bound using

the message passing dual to find better bounds has also been proposed [38]. Albeit we have good bounds

on the solution with message passing, the bounds are never narrowed to converge to the exact solution.

This can be attributed to two reasons:

• MPLP message updates are not guaranteed to find the optimal dual objective value. This is due

to the integer decoding method in (2.8) where it is possible to have ties with the belief values. In

practice, ties are usually broken randomly.

• We can also cite a lack of tightening clusters and the cluster update schedule. While we used the

cluster tightening schedule developed by Sontag [35], cluster update methods are still an active

area of investigation and arriving at an exact solution is not guaranteed for general problems.

We are now satisfied that message passing can be used to find a good sub-optimal solution of quadratic

integer programs. If practitioners are under time constraints and wish to find a sub-optimal solution

quickly with certain guarantees, message passing is an alternative method to existing heuristics which

provide no optimality guarantees and branch-and-bound which is slow to provide the same guarantees.

We move our focus to a quadratic integer program with constraints, namely the portfolio optimization

problem.

Chapter 5

Message Passing Applied to

Portfolio Optimization

The portfolio optimization problem was introduced in Chapter 2 as an important tool for practitioners

wishing to choose investment weights in assets that will minimize the portfolio variance while ensuring

a target return. Two important extensions of this model representing real-life practical concerns were

introduced as the cardinality constraint for limiting the size of the portfolio and the round-lot constraint

for forcing assets to be traded in large batches. While representing these practical concerns within the

mathematical framework is simple, solving the resulting problem is not. The resulting problem is a

quadratic integer program which involves exhaustive computations from standard commercial solvers to

arrive at an optimal solution. Alternative methods approaching this problem in the literature include

the use of evolutionary heuristics, heuristics changing the branch-and-bound framework and Lagrangian

relaxations. In this chapter we will explore the use of message passing algorithms for solving the desired

portfolio optimization problem.

34

35

5.1 Graphical Model Reformulation of the Portfolio Problem

The portfolio optimization problem defined in (3.11) - (3.18) posed may be reformulated as a MAP

problem over a graph G = (V, E). Each node i ∈ V represents a single asset in our problem. Each edge

ij ∈ E represents the covariance relationship between the two nodes it connects.

We begin by analyzing the knapsack problem, a structurally similar problem to our portfolio opti-

mization problem. Zhang et al. [50] give a form for a reformulation of the quadratic knapsack problem.

In a quadratic knapsack problem, one has a selection of n items. Each item i has a value qii if chosen

and selecting a pair of items i and j gives a value qij . Each item i also has an associated weight wi.

Furthermore, we have a limit on the amount of weight we can hold, denoted b. We may also assign

multiple sets of weights and limits to arrive at multiple weight constraints. The goal of this optimization

problem is to determine which items to choose that will lead to the most value, while satisfying our

weight constraints. It is posed as a mathematical program with multiple sets of weights and limits

k ∈ K as follows [45]:

Maximize

n∑i=1

n∑j=1

xiqijxj (5.1)

Subject to

n∑i=1

wki xi ≤ bk, ∀ k ∈ K, (5.2)

xi ∈ {0, 1}, ∀ i = 1, . . . , n, (5.3)

where here items chosen are represented by binary decision variables. xi = 1 if item i is chosen and

xi = 0 is it is not chosen. Zhang et al. [50] propose the following graphical model reformulation of the

knapsack problem by including potential functions that represent the objective (5.1) and extra constraint

functions representing (5.2). While (5.2) is a global constraint that would normally be represented as a

cluster of n variables, Zhang et al.’s formulation expresses the constraints through clusters of only single

nodes. The knapsack problem may be reformulated as a graphical model through potential functions as

36

follows:

θi(xi) = qiix2i , ∀ i ∈ V, (5.4)

θij(xi, xj) = 2qijxixj , ∀ij ∈ E , (5.5)

φki (xi) = wki xi −bk

|V|, ∀i ∈ V, ∀ k ∈ K. (5.6)

Here we identify every variable as a node with a potential function given by (5.4). Every pair of nodes is

connected by an edge forming a cluster with a potential function given by (5.5). Furthermore, we define

an additional constraint function for each node represented by (5.6). We observe the newly formulated

constraint function allows us to decompose the previous global constraint into a constraint involving only

singular nodes. Recalling that the MAP problem is maximizing the sum of all the potential functions,

we can add the additional single node constraints into the MAP problem formulation:

Maximize∑i∈V

qiix2i +

∑ij∈E

2qijxixj (5.7)

Subject to∑i∈V

wki xi −bk

|V|,≤ 0 ∀ k ∈ K. (5.8)

Observe that the decomposition of the knapsack constraint into single node constraints (5.8) is still

equivalent to the original problem constraint of (5.2) [50]. In this way, the inference scheme developed

by Zhang et al. introduced in section 2.4.1 can be tractably applied to this constrained Markov random

field [50].

Returning to the portfolio problem, we may identify the cardinality and return constraints as types

of knapsack constraints. We can then follow Zhang et al’s [50] formulation presented above. For our

37

specific portfolio problem it is presented in terms of unary and pair-wise potentials as follows:

θi(xi) = −qiix2i (5.9)

θij(xi, xj) = −2qijxixj (5.10)

φ1i (xi) =1

|V|− xi (5.11)

φ2i (xi) =R

|V|− µixi (5.12)

φ3i (xi) = 1xi 6=0 −K

|V|(5.13)

xi ∈ {0, . . . ,piMziV

, . . . , 1} (5.14)

zi ∈ Z+ (5.15)

We decompose the original quadratic objective in (3.11) into its components involving unary terms and

pair-wise terms. These are represented by the potential functions in (5.9) and (5.10). The constraints

(3.12) - (3.14) can be expressed as additional single node potentials (5.11) - (5.15). Constraints (3.15) -

(3.18) are handled in the graphical model by pre-determining what values xi can take.

We note that the round-lot problem lends itself well to MAP reformulations. Message passing algo-

rithms have been explored extensively for problems where xi is limited to taking values from a discrete

set. In the round-lot problem, xi represents the proportion of V we invest in asset i. The total budget

V we have to invest in imposes a natural limit on xi values. Specifically

xi ∈ {0,piM · 1V

,piM · 2V

, . . . , 1}. (5.16)

Therefore, the choice of parameters pi, M and V directly affect how many possible states xi can take.

For a MAP problem, an increase in the number of possible states of xi leads to an increase in the

complexity of the problem. In general, taking values for pi, M and V typical for portfolio applications

leads to problem complexities that can be addressed with this message passing method.

38

5.1.1 Solving the maximum-a-posteriori problem

On the graph defined above in (5.9)-(5.15), the MAP problem is unconstrained:

Maximize −∑i∈V

qiix2i − 2

∑ij∈E

qijxixj (5.17)

Following Zhang et al. [50], the LP relaxation of (5.17) is constructed along with the resulting dual

problem. Constraints are added directly to the dual resulting in the following dual problem:

Minimize∑i∈V

Max [θi(xi)− γ1φ1i (xi)− γ22φi(xi)− γ33φi(xi)] (5.18)

+∑i,j∈E

Max [θij(xi, xj)].

With the dual problem (5.18) we observe the dual decomposes into |V| + |E| subproblems centered

around every singular node and every pair-wise set of nodes. For n assets this implies solving n+ n(n−1)2

subproblems. Applying the method of Zhang et al. [50] to the dual in (5.18) amounts to finding the

optimal configuration for x that maximizes the MAP problem in (5.17) but still satisfies the imposed

global constraints. For our problem with negative potential values, the dual solution provides a lower

bound to the exact solution while the decoded integral solution provides the upper bound. Thus, even

though we uncover sub-optimal solutions, their quality dictated by the corresponding integrality gap

can be assessed. We next describe how the dual and integral solutions are found at each iteration.

The procedure for solving the dual problem and decoding an integer solution for the specified portfolio

problem is outlined in Algorithm 4 and is amended from the pseudo-code provided in Algorithm 3 [50].

Algorithm 4 begins by initializing the predefined graphical structure of the problem. The goal of each

iteration is to solve for the optimal beliefs θ̃ and constraint penalty γ that result in an increase of the

dual objective function (5.18). Finding the optimal beliefs is handled by the dual increase guarantee

of Globerson and Jaakola’s MPLP message passing scheme [17]. At the beginning of each iteration,

the beliefs are thus updated according to MPLP. We now search for γ that leads to an increase in the

39

Data: θi(xi), θij(xi, xj), φ1i (xi), φ

2i (xi), φ

3i (xi), i ∈ V, ij ∈ E , γ

Result: x∗

Initialize θ̃ = θ, fmax =∞ ;while Dual Increase less than 1e− 6 do

Update beliefs θ̃i(xi) + γ1φ1i (xi) + γ2φ2i (xi) + γ3φ3i (xi), θ̃ij(xi, xj) for i ∈ V, ij ∈ E by MPLP;for k = 1, 2, 3 do

Update γk by binary search ;

Decode integer solution x̂i = argmax[θ̃i(xi) + γ1φ1i (xi) + γ2φ2i (xi) + γ3φ3i (xi)] ;if x̂ is feasible and

∑i∈V θi(xi) +

∑ij∈E θ(xi, xj) < fmax then

x∗ = x̂;fmax =

∑i∈V θi(xi) +

∑ij∈E θ(xi, xj);

end

end

endAlgorithm 4: Message passing procedure for the portfolio problem

dual objective. To do this, a co-ordinate descent scheme is used with all beliefs and γ fixed except for a

particular γk. We evaluate whether under the current γ assignment, the solution produced by optimizing

the dual subproblem for each node and edge results in a feasible solution. If the solution is not feasible,

we increase γk, else if it is, we decrease γk. This binary search procedure will find a sequence of γ

parameters that gradually increase the dual bound while ensuring feasibility. With a current θ̃ and γ,

the integer solution is decoded by finding the value of each node that maximizes each single node belief.

From an inference viewpoint, this is equivalent to finding the most probable state of each node. We

compute the integer solution’s objective value and compare it to the current incumbent. If it is less than

the incumbent, the solution updates that incumbent. We next asses whether the dual increase is less

than 1e-6. If it is, we conclude the dual can no longer increase and thus is our dual optimal solution and

our incumbent is the integral optimal solution. If it is not, we conclude we have not found the optimal

beliefs and repeat the procedure, going back to updating the beliefs.

Concurrent with implementing the above algorithm for our problem, we employ the triplet cluster-

ing scheme described in section 2.3.3 to ensure the linear relaxation of the problem is tight [36]. In

practice, we let Algorithm 4 run until convergence of the dual objective. At this point, Algorithm 4 is

run again with triplet clusters added at every iteration until the dual converges again [36].

40

5.2 Experiments

We split our analysis into small problem sizes (n = 20, 30, 40 assets) and large problem sizes (n= 50, 60,

70, 80, 90, 100 assets) as larger problems are computationally more exhaustive and require a separate

analysis. Problems are created by randomly generating a covariance matrix Q with entries taken in (0,1).

The returns for each asset are randomly generated with values taken from a uniform distribution in (0,1).

For all problems, we assign the following parameters constant: An investment budget of 100,000, 100

assets in a lot and a price of 50 for each asset. Each problem instance is tested under various constraints

relating to the target return and cardinality constraints. We vary the target return to take values R =

0.1, 0.2 and 0.3. We vary the cardinality constraints to fix the maximum number of assets in a portfolio

to K = 10, 20, 30, 40 and 50.

We test the message passing algorithm developed by Zhang et al. [50] with the graphical model de-

fined in (5.9)-(5.15) and refer to it as MP. The results of MP are benchmarked against the results

obtained by the default branch-and-bound algorithm of Gurobi 7.05 solver with all settings set to their

default options. For Gurobi, a cutoff time-limit of 24 hours was imposed for converging to an exact

solution. The MP and Gurobi algorithms were run on a 64-bit Lenovo E460 workstation with Intel

Processor i5-6200 2.30 GHz CPU and 4GB of RAM. All algorithms were implemented using Python 2.7.

The Python library PGMPY for probabilistic graphical models was used to employ the MPLP updates

and cluster tightening schemes [29].

Table 5.1 lists the average convergence time for MP and Gurobi, and the average percent difference

between the MP approximate solution and Gurobi’s exact solution for small problems. Tables 5.2, 5.3

and 5.4 lists the convergence time for MP and Gurobi, as well as the branch-and-bound gap if the

time limit was reached before convergence for large problem instances. Furthermore, message passing’s

convergence time includes the preprocessing time for generating a graph and finding cluster tightening

triplets. Also included is the relative percent difference between MP and Gurobi’s solution at termina-

tion. An asterisk * indicates Gurobi’s time limit was reached and terminated before convergence. Note

41

Table 5.1: Average results for small problem sizes with varying return and cardinality

K = 10 K = 20R = 0.1 R = 0.1

n MP Time (S) Gurobi Time (s) Difference (%) n MP Time (s) Gurobi Time (s) Difference (%)20 34.87 30.51 3.28 20 34.67 0.12 0.4730 139.23 1073.50 4.28 30 146.72 976.35 1.0840 382.66 29462.43 3.24 40 399.84 29845.10 2.32

R = 0.2 R = 0.2n MP Time (S) Gurobi Time (s) Difference (%) n MP Time (S) Gurobi Time (s) Difference (%)20 33.78 30.01 2.57 20 37.85 0.09 0.8030 159.64 883.71 3.38 30 131.85 764.47 0.9340 329.98 20395.65 2.50 40 313.77 18683.20 2.03

R = 0.3 R = 0.3n MP Time (S) Gurobi Time (s) Difference (%) n MP Time (S) Gurobi Time (s) Difference (%)20 34.99 34.59 2.04 20 34.28 0.093 0.5430 123.01 905.10 3.12 30 148.77 859.47 2.1540 337.70 25612.90 1.87 40 376.76 18607.80 2.38

we expect a small percent difference for instances with a high branch-and-bound gap as this solution is

sub-optimal.

Though MP returns sub-optimal solutions, observed percent differences are consistently in a 5%

range. We observe MP convergence time is dominated by the number of variables in the problem with

little relative effect when varying parameters like target return or cardinality. On the other hand, Gurobi

decreases its convergence speed as the cardinality increases. This is expected as Gurobi takes advantage

of adding meaningful cuts to the enlarged feasible region. Along with cardinality, Gurobi is similarly

affected by problem size with an increase in convergence time as the number of variables increases and

the set of candidate nodes to explore is increased. From our results, Gurobi computation time exceeds

24 hours when considering more than 50 assets. Even at the 24 hour cut-off Gurobi’s branch-and-bound

gap is often far from optimality and consistently greater than 30% for instances with more than 80

variables. An exception to this occurs at the small problem case where n = 20, K = 20. Here the

cardinality constraint is redundant. Gurobi recognizes this to solve the problem within a fraction of a

second.

Given that a sub-optimal solution is obtained from Algorithm 4, we wish to be assured of its solution

quality. Table 5.5, 5.6, 5.7 compares the time it takes MP and Gurobi to provide a relative guarantee

on their incumbent solution quality. For each of the previous trials over n = 50, 60, 70, 80, 90 and 100

assets we re-list MP’s convergence time and include its integrality gap at convergence and Gurobi’s time

42

Table 5.2: Results for large problem sizes with varying return and cardinality K = 10, 20

K = 10 K = 20R = 0.1 R = 0.1

n MP Time (s) Gurobi Time (s) Gap (%) Difference (%) n MP Time (s) Gurobi Time (s) Gap (%) Difference (%)50 721.58 85672.18 0.05 4.76 50 739.62 81158.41 0.01 3.7960 2617.14 * 47.61 1.66 60 3078.61 * 25.12 2.8770 4756.99 * 67.37 1.88 70 4923.01 * 36.84 2.0980 6680.18 * 75.88 1.25 80 6201.97 * 39.80 2.5990 10181.58 * 84.28 0.37 90 10550.53 * 52.26 1.97100 13119.33 * 96.29 0.44 100 13906.98 * 71.00 0.62

R = 0.2 R = 0.2n MP Time (s) Gurobi Time (s) Gap (%) Difference (%) n MP Time (s) Gurobi Time (s) Gap (%) Difference (%)50 736.93 85481.1 0.07 4.74 50 754.85 76430.5 0.04 4.3160 2931.05 * 43.93 2.77 60 2457.85 * 26.30 2.6270 4019.76 * 65.23 1.85 70 4670.07 * 47.69 1.8180 6633.04 * 79.57 0.91 80 6668.46 * 58.77 1.4590 100236.20 * 82.65 0.88 90 10919.25 * 69.39 1.30100 13560.13 * 97.81 0.53 100 13159.07 * 76.12 0.80


Table 5.3: Results for large problem sizes with varying return and cardinality K = 30, 40

K = 30 K = 40R = 0.1 R = 0.1

n MP Time (s) Gurobi Time (s) Gap (%) Difference (%) n MP Time (s) Gurobi Time (s) Gap (%) Difference (%)50 753.71 79771.7 0.07 3.28 50 754.85 7974.26 0.08 3.6360 2331.36 * 25.75 2.59 60 2273.65 * 15.58 3.7070 4560.15 * 32.87 2.05 70 4912.78 * 24.41 3.9180 6373.70 * 42.92 2.98 80 6358.71 * 38.11 2.6990 10580.72 * 44.24 2.49 90 10225.33 * 47.63 2.04100 13904.65 * 55.62 2.30 100 13612.86 * 44.25 2.91



43

Table 5.4: Results for large problem sizes with varying return and cardinality K = 50

K = 50R = 0.1

n MP Time (s) Gurobi Time (s) Gap (%) Difference (%)50 750.44 60368.08 0.06 3.7360 2888.80 * 9.11 2.2070 4009.16 * 28.10 2.0380 6379.98 * 33.64 1.0990 10112.13 * 39.29 1.16100 13132.61 * 44.14 1.10

R = 0.2n MP Time (s) Gurobi Time (s) Gap (%) Difference (%)50 718.25 65569.4 0.08 3.9760 2191.63 * 9.68 2.1470 4117.97 * 29.82 2.2980 6958.31 * 31.68 2.1190 10657.93 * 42.54 1.53100 13457.24 * 47.94 1.41

R = 0.3n MP Time (s) Time % Gap Difference (%)50 748.58 62853.8 0.09 3.2660 2255.76 * 9.77 2.4470 4075.90 * 24.62 2.8380 6775.62 * 34.97 1.5290 10036.42 * 47.89 1.66100 13611.93 * 49.74 1.46

to arrive at the same integrality gap. From zD, the dual optimal objective obtained from (30) and zI the

objective value of the decoded integer solution, we compute the integrality gap. We define the integrality

gap the same as Gurobi’s ”MIP Gap” value: |zD−zI |·100

zI. It is evident that MP finds tighter bounds and

can provide solution quality guarantees significantly faster than Gurobi with Gurobi exceeding the time

limit for all problems with more than 50 assets.

Figure 5.1 demonstrates a plot of MP and Gurobi’s incumbent solution, the MP dual, and Gurobi’s

best bound over time. Here we observe that MP converges to its dual optimal value significantly faster

than it takes Gurobi to converge to the exact solution. This figure is based on an instance with 30 assets,

a portfolio target return of 0.1 and a cardinality of 20. As outlined in Zhang et al. [50] convergence

is defined when the dual decrease is less than 1e-6. This occurs at 130 seconds for an integrality gap

of 1.3% with the decoded integer solution. In contrast, Gurobi’s branch-and-bound lower bound moves

slowly and only reaches an equivalent gap at 450 seconds. This plot echoes what was observed for the

44

Table 5.5: Time comparison of MP and Gurobi to arrive at tight bounds for K = 10, 20

K = 10 K = 20R = 0.1 R = 0.1

n MP Int Gap (%) MP Time (s) Gurobi Time (s) n MP Int Gap MP Time (s) Gurobi Time (s)50 3.23 721.58 75487.18 50 3.57 739.62 77788.2960 5.08 2617.14 * 60 4.55 3078.61 *70 4.19 4756.99 * 70 5.07 4923.01 *80 5.36 6680.18 * 80 5.20 6201.97 *90 5.54 10181.58 * 90 4.24 10550.53 *100 5.99 13119.33 * 100 5.61 13906.98 *

R = 0.2 R = 0.2n MP Int Gap (%) MP Time (s) Gurobi Time (s) n MP Int Gap (%) MP Time (s) Gurobi Time (s)50 3.04 736.93 71872.85 50 5.43 754.85 756430.5360 5.42 2931.05 * 60 4.86 2457.85 *70 5.79 4019.76 * 70 5.77 4670.07 *80 4.65 6633.04 * 80 5.91 6668.46 *90 5.89 100236.20 * 90 5.36 10919.25 *100 5.28 13560.13 * 100 4.63 13159.07 *


unconstrained quadratic integer programs in Chapter 4. Although message passing arrives at a tighter

gap faster, we do not converge to an exact solution, and at best can only give the bounds on the optimal

value. On the other hand, branch-and-bound does converge to an exact solution but tightening the

bounds around the solution is a slow process. In all, we can observe that message passing is a viable

alternative to practitioners who wish to arrive at a quick sub-optimal solution with certain optimality

guarantees about the solution quality.

Figure 5.1: Performance of MP and Branch-and-Bound Bounds

45

Table 5.6: Time comparison of MP and Gurobi to arrive at tight bounds for K = 30, 40

K = 30 K = 40R = 0.1 R = 0.1

n MP Int Gap (%) MP Time (s) Gurobi Time (s) n MP Int Gap (%) MP Time (s) Gurobi Time (s)50 3.41 753.71 79771.78 50 3.80 754.85 59074.3160 5.91 2331.36 * 60 4.29 2273.65 *70 5.02 4560.15 * 70 5.21 4912.78 *80 5.16 6373.70 * 80 4.26 6358.71 *90 4.13 10580.72 * 90 5.59 10225.33 *100 5.26 13904.65 * 100 5.64 13612.86 *



Table 5.7: Time comparison of MP and Gurobi to arrive at tight bounds for K = 50

K = 50R = 0.1

n MP Int Gap (%) MP Time (s) Gurobi Time (s)50 3.44 750.44 60368.0860 5.95 2888.80 *70 5.27 4009.16 *80 5.56 6379.98 *90 4.33 10112.13 *100 4.69 13132.61 *

R = 0.2n MP Int Gap (%) MP Time (s) Gurobi Time (s)50 3.39 718.25 52242.3060 5.93 2191.63 *70 5.52 4117.97 *80 4.26 6958.31 *90 5.35 10657.93 *100 5.51 13457.24 *

R = 0.3n MP Int Gap (%) MP Time (s) Gurobi Time (s)50 5.62 748.58 56687.9660 4.37 2255.76 *70 5.22 4075.90 *80 5.43 6775.62 *90 3.33 10036.42 *100 5.24 13611.93 *

46

5.3 Discussion

We investigate the application of message passing and graphical reformulation techniques to portfolio

optimization problems with combinatorial constraints. Message passing has been heralded as a new

technique to solve certain hard optimization problems sub-optimally but relatively quickly. We find in

the case of the portfolio problem with round-lot and cardinality constraints message passing compares

favorably to a standard commercial solver and returns sub-optimal solutions within a 5% error. Further-

more, message passing quickly provides tight upper and lower bounds on the optimal value which are

stable over time. We note that these message passing techniques cannot replace optimization solvers and

are only efficient when the problem’s graphical representation can be decomposed into small clusters as

in the case presented in this chapter. Furthermore, the message passing scheme provides an incumbent

integer solution but does not converge to an exact solution. We cite three reasons for this:

• The MPLP message update scheme is not guaranteed to find the optimal dual objective value.

This was mentioned as a reason in Chapter 4 and is especially relevant here as variables are no

longer binary and can take values from multiple states. This complicates the integer decoding

scheme (2.8) as we can have multiple ties for optimal values. In practice, ties are usually broken

randomly.

• The binary search scheme for finding the γ constraint parameters is not guaranteed to find the

optimal ones. Even though we are guaranteed an improvement on the bound, this ends when

γ can no longer be updated. The inclusion of the three extra variables to optimize over may

be the reason this portfolio experiment observed higher message passing optimality gaps and a

higher percent error than the unconstrained quadratic problem in Chapter 4. The formulated

unconstrained quadratic integer problem did not have these constraint variable terms.

• We may also have lack of effective tightening clusters. Again, as mentioned in Chapter 4, optimal

cluster tightening schedules are an open area of investigation and we chose a schedule that was

demonstrated to work well in practice.

Chapter 6

Conclusion

6.1 Concluding Remarks

In this thesis, we model the portfolio optimization problem with cardinality and round-lot constraints

as a probabilistic graphical model by incorporating pair-wise covariance terms within the graphs edges

and singular variance terms at the graph’s nodes. The considered portfolio problem is a quadratic

integer problem which is NP-hard. This requires an exhaustive and computationally expensive branch-

and-bound algorithm to determine the optimal portfolio investment weights of our model. This thesis

demonstrates that the portfolio problem can be equivalently reformulated as a MAP inference problem

concerned with determining the most probable assignment of the graph’s random variables. Using mes-

sage passing algorithms built for MAP inference, we can find good sub-optimal solutions for a varying

number of returns, cardinalities and asset pool sizes.

This thesis conducts a thorough computational study to determine the effectiveness of message passing

algorithm in comparison with branch-and-bound algorithm for the portfolio problem. Branch-and-bound

is effective at finding optimal layouts for small problem sizes, but as the size of the search tree increases,

proving optimality by tightening the bounds becomes challenging. In comparison, message passing pro-

duces sub-optimal layouts that are consistently within 5% of the exact branch-and-bound solution, and

47

48

can provide tighter bounds on the optimal value faster through the implementation of a cluster tightening

procedure. In all, message passing algorithms are a scalable alternative to computationally exhaustive

branch-and-bound methods, especially when one is interested in fast sub-optimal solutions with certain

optimality guarantees.

While probabilistic graphical models provide an intuitive and approachable way to capture the vari-

able dependencies of the problem, there are still limits to employing message passing as a general integer

programming solver. Mainly, as we rely on the popular linear relaxation approaches, tightness is not

guaranteed and we must utilize additional tightening approaches to arrive at good solutions. Further-

more, the dual solution is not guaranteed to be optimal so convergence to an exact solution is not

guaranteed either. We are also limited to problems where the objective and constraints can be easily

decomposed into potential functions on singular or pair-wise clusters of nodes.

6.2 Future Work

Based on the results observed from applying message passing to a constrained optimization problem,

the following recommendations can be made about future directions of research.

• Solving constrained Markov random fields representing global constraints on the objective function

is relatively new research with the method of Zhang et al. [50] published in 2017. As the method of

Zhang et al. relies on the ability for a constraint to be decomposed into singular or pair-wise sets

of nodes exclusively, research for incorporating more general constraints needs to be conducted.

Furthermore, research into finding an optimal constraint parameter γ must also be conducted so

that adding constraints does not decrease the tightness of the problem.

• As linear programming relaxation approaches to message passing rely on solving a relaxation of

the original problem, there is an opportunity to develop other more efficient tightening schemes

than cluster tightening. Furthermore, within cluster tightening, more research needs to be done

on optimal cluster tightening schedules and the size and quantity of clusters that should be added

49

at a time.

• In many message update rules, the optimal solution may not be decoded due to the fact that

ties can occur when decoding the integer solution by taking the maximum belief at each node. As

current implementations usually break ties by taking a random value, further research into creating

better tie-braking methods must be conducted.

• As measured quantities usually contain some uncertainty, the study of robustness for portfolio

problems is especially dominant in literature. While robust optimization algorithms are developed

for discrete and continuous problems, they have not been widely studied in the context of infer-

ence in graphical models. The underlying probability structure behind Markov random fields in

particular may be an interesting next step to study robust message passing algorithms.

Bibliography

[1] Aguiar P, Xing P, Figueiredo M, Smith A, Martins A (2011) An augmented lagrangian approach to

constrained MAP inference. International Conference on Machine Learning

[2] Bishop C (2006), Pattern recognition and machine learning. Springer

[3] Bienstock D (1996) Computational study of a family of mixed-integer quadratic programming prob-

lems. Mathematical Programming 74: 121 - 140

[4] Bonami P, Lejeune M (2009) An exact solution approach for portfolio optimization problems under

stochastic and integer constraints. Operations research 57(3): 650-670

[5] Boros E (2007) Local search heuristics for quadratic unconstrained binary optimization (QUBO).

Journal of heuristics 13(2): 99 - 132

[6] Vandenberghe L, Boyd S (1996). Semidefinite programming. SIAM Review 38(1): 49-95

[7] Chang T, Meade N, Beasley J, Sharaiha Y (2000) Heuristics for cardinality constrained portfolio

optimisation. Computers & operations research 27: 1271-1302

[8] Conforti M, Cornuejols G, Zambelli G (2014) Integer programming. Springer

[9] Cormen T, Stein C, Rivest R, Leiserson C (2001) Introduction to algorithms. McGraw-Hill Higher

Education

[10] Cornuejols G, Tutuncu R (2006). Optimization methods in finance: mathematics, finance and risk.

Cambridge University Press: 217 - 221

50

Bibliography 51

[11] Crama Y, Schyns M (2003) Simulated annealing for complex portfolio selection problems. European

Journal of Operational Research, 150(3):546 571

[12] Dhoot A (2016) Wind farm layout optimization using approximate inference in graphical models.

Dissertation, University of Toronto

[13] Duchi J, Tarlow D, Elidan G, Koller D (2006) Using combinatorial optimization within max-product

belief propagation. Conference on Neural Information Processing Systems

[14] Fabozzi J, Kolm P (2007) Robust portfolio optimization and management. John Wiley and Sons,

88-110

[15] Felzenszwalb P, Huttenlocher D (2006) Efficient belief propagation for early vision. International

journal of computer vision 70(1):41-54

[16] Geoffrion A (2010) Lagrangian relaxation for integer programming. Springer: 243-281

[17] Globerson A, Jaakkola T (2007) Fixing max-product: convergent message passing algorithms for

MAP LP-relaxations. Advances in Neural Information Processing Systems

[18] Inc. Gurobi Optimization (2017) Gurobi optimizer reference Manual

[19] Hansen P, Mladenovic N (2001) Variable neighborhood search: Principles and applications. Euro-

pean Journal of Operational Research, 130(3):449 467

[20] Holland J (1992) Adaptation in natural and artificial Systems: An introductory nnalysis with

applications to biology, control and artificial intelligence. MIT Press

[21] Jobst N, Horniman M, Lucas C, Mitra G (2001) Computational aspects of alternative portfolio

selection models in the presence of discrete asset choice constraints. Journal of quantitative finance 1:

1-13

[22] Kochenberger G (2014) The unconstrained binary quadratic programming problem: a survey. J

Comb Optim 28:5881

[23] Koller D (2009) Probabilistic Graphical Models, The MIT Press

Bibliography 52

[24] Kolmogorov V (2006) Convergent tree-reweighted message passing for energy minimization. Pattern

Analysis and Machine Intelligence, IEEE Transactions, 28(10): 1568-1583

[25] Lazic N, Frey B, P. Aarabi (2010) Solving the uncapacitated facility location problem using message

passing algorithms. International Conference on Artificial Intelligence and Statistics

[26] Lim Y, Jung K, Kohli P (2014) Efficient energy minimization for enforcing label statistics. IEEE

Transactions on Pattern Analysis and Machine Intelligence

[27] Markowitz HM (1952) Portfolio selection. Journal of Finance, 7: 77-91

[28] Moallemi C, Van Roy B (2011) Resource allocation via message passing. INFORMS journal on

Computing 23(2):205-219

[29] Panda A, Ankan A (2015) Mastering probabilistic graphical models with Python. Packt Publishing

[30] Park J, Boyd S (2017) A semidefinite programming method for integer convex quadratic minimiza-

tion. arXiv preprint arXiv:1703.07870

[31] Pearl J (1982) Reverend Bayes on inference engines: a distributed hierarchical approach. Proceed-

ings of the second national conference on artificial intelligence

[32] Ravanbakhsh S, Greiner R (2014) Perturbed message passing for constraint satisfaction problems.

arXiv:1401.6686.

[33] Ravanbakhsh S, Rabbany R, Greiner R (2014) Augmentative message passing for traveling salesman

problem and graph partitioning. arXiv:1406.0941

[34] Sanghavi S, Malioutov D, and Willsky A (2008) Linear programming analysis of loopy belief prop-

agation for weighted matching. Advances in neural information processing systems

[35] Sontag D (2010) Approximate inference in graphical models using LP relaxations. Dissertation,

Massachusetts Institute of Technology

[36] Sontag, D (2008) Tightening LP Relaxations for MAP using Message-Passing. Conference in Un-

certainty in Artificial Intelligence, 2008

Bibliography 53

[37] Sun J, Zheng N and Shum H (2003) Stereo matching using belief propagation . Pattern Analysis

and Machine Intelligence, IEEE Transactions 25(7): 787-800

[38] Sun M, Telaprolu M, Lee H, Savarese S (2012) Efficient and exact MAP-MRF inference using branch

and bound . IEEE conference on Computer Vision and Pattern Recognition: 1616-1623

[39] Tarlow D, Givoni E, Zemel S (2010) HOP-MAP: Efficient message passing with high order potentials.

In AISTATS, 812819

[40] Wainwright M, Jaakkola T, Willsky A (2005) MAP estimation via agreement on trees: message-

passing and linear programming. Information Theory, IEEE Transactions 11:3697-3717

[41] Weigt M and Zhou H (2006), Message passing for vertex covers. Physical Review E 74(4)

[42] Wolsey L (1998) Integer Programming. Wiley

[43] Kumar A (2011) Message-passing algorithms for quadratic programming formulations of MAP es-

timation. 2011 Conference on Uncertainty in Artificial Intelligence: 428-435

[44] Ravikumar P (2006) Quadratic programming relaxations for metric labeling and markov random

field map estimation. Proceedings of the 23rd international conference on Machine learning: 734 - 744

[45] Wang H, Kochenberger G, Glover F (2012) A computational study on the quadratic knapsack

problem with multiple constraints. Computers & Operations Research 39(1):311

[46] Yanover C, Meltzer T, Weiss Y (2006) Linear programming relaxations and belief propagation–an

empirical study. The Journal of Machine Learning Research 7:887-1907

[47] Yanover C, Weiss Y (2002) Approximate inference and protein-folding. Advances in neural infor-

mation processing systems

[48] Yedidia J (2011) Message-passing algorithms for inference and optimization. Journal of Statistical

Physics 145(4):860-890

[49] Zenios (2007). Practical financial optimization: decision making for financial engineers. Wiley

Bibliography 54

[50] Zhang Z, Shi Q, McAuley J, Wei W, Zhang Y, Yao R, van den Hengel A (2017) Solving con-

strained combinatorial optimization problems via MAP inference without high-order penalties. AAAI

Conference on Artificial Intelligence

Documents

COMBINATORIALLY CONSTRAINED PORTFOLIO OPTIMIZATION … · 2018. 7. 18. · proposed message passing method with Gurobi’s branch-and-bound method for solving the portfolio problem