6
State tracking of linear ensembles via optimal mass transport Yongxin Chen and Johan Karlsson Abstract— We consider the problems of tracking an ensemble of indistinguishable agents with linear dynamics based only on output measurements. In this setting, the dynamics of the agents can be modeled by distribution flows in the state space and the measurements correspond to distributions in the output space. In this paper we formulate the corresponding state estimation problem using optimal mass transport theory with prior linear dynamics, and the optimal solution gives an estimate of the state trajectories of the ensemble. For general distributions of systems this can be formulated as a convex optimization problem which is computationally feasible with when the number of state dimensions is low. In the case where the marginal distributions are Gaussian, the problem is reformulated as a semidefinite programming and can be efficiently solved for tracking systems with a large number of states. I. I NTRODUCTION The optimal mass transport theory provides a geometric framework for mapping a distribution to another one in a way that minimizes the total transport cost [20]. This has been used in many contexts, traditionally for application in economics and logistics [12], and more recently in imaging and machine learning [2], [11], [14], [16] as well as systems and controls [6], [7], [10]. In case when the transport cost is quadratic, the problem may be formulated as a fluid dynamics problem [1]. It can also be viewed as an optimal control problem of the density of the particles that obey the dynamics ˙ x(t) = u(t) [5]. In the subsequent paper [8] a natural generalization of this problem is introduced where the underlying linear dynamics of the particles become ˙ x(t)= Ax(t)+ Bu(t). In this work we consider the extension of this framework to the case where the full state information is not available. In particular we consider the tracking problems where we seek to estimate the states of several identical and indis- tinguishable systems from only their joint outputs. This is also known as state estimation of ensembles, see [22], [21]. One of the main obstacles is that it is not known which output that is generated by a certain subsystem, hence an association problem has to be solved. A brute force approach to this would result in a combinatorial problem. However, by formulating this as an optimal transport problem which is convex and where the number of variables only grow linearly in the number of stages. Our formulation not only allows *This work was in part supported by the Swedish Research Council (VR) grant 2014-5870 and the Swedish Foundation of Strategic Research (SSF) grant AM13-0049. Y. Chen is with the Department of Electrical and Computer Engineering, Iowa State University, email:[email protected]. J. Karlsson is with Department of mathematics, KTH Royal Institute of Technology. email: [email protected]. for tracking a finite number of particles, but also applies to the more general problems of tracking distributions. We also consider the case when the underlying distributions are Gaussian. In this case the number of variables can be reduced significantly and the problem can be solved efficiently with large number of state dimensions. At the very high level, we are developing a framework to smoothly interpolate a sequence of probability densities, in a way akin to smoothly interpolate several points in the Euclidean space. Indeed, the cubic spline interpolation of several points has a variational formulation in the flavor of optimal control [15]. The cubic spline counterpart in the space of distributions has been recently studied in [4], [3]. Our work can be viewed as a generalization of these where more general underlying linear dynamics, instead of simple second order integrator, are considered. The rest of the paper is structured as follows. Section II is a brief introduction to the optimal mass transport theory. In Section III, we formulate the tracking problems using optimal mass transport theory. The case where the densities are assumed Gaussian is discussed in Section IV. Finally, we present several numerical examples in Section V to illustrate our framework. II. BACKGROUND ON OPTIMAL MASS TRANSPORT Monge’s original formulation of optimal mass transport is as follows (see, e.g., [20]). Consider two nonnegative distributions, ρ 0 and ρ 1 , of the same mass, defined on a compact set X R n . The optimal mass transport problem seeks a transport function f : X X that minimizes the transportation cost Z X c(x, f (x))ρ 0 (x)dx over all the mass preserving maps from ρ 0 to ρ 1 , namely, Z x∈U ρ 1 (x)dx = Z f (x)∈U ρ 0 (x)dx for all U⊂ X, (1) which is often denoted f # ρ 0 = ρ 1 . The function c(x 0 ,x 1 ): X × X R + is a cost function that describes the cost for transporting a unit mass from x 0 to x 1 . The Monge’s problem is usually difficult to solve due to the nontrivial constraint f # ρ 0 = ρ 1 and the minimum may not exist. To overcome these difficulties, Kantorovich proposed a linear programming relaxation min πΠ(ρ01) Z X×X c(x 0 ,x 1 )(x 0 ,x 1 ) (2) arXiv:1803.02296v1 [cs.SY] 6 Mar 2018

State tracking of linear ensembles via optimal mass transport

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: State tracking of linear ensembles via optimal mass transport

State tracking of linear ensembles via optimal mass transport

Yongxin Chen and Johan Karlsson

Abstract— We consider the problems of tracking an ensembleof indistinguishable agents with linear dynamics based onlyon output measurements. In this setting, the dynamics ofthe agents can be modeled by distribution flows in the statespace and the measurements correspond to distributions in theoutput space. In this paper we formulate the correspondingstate estimation problem using optimal mass transport theorywith prior linear dynamics, and the optimal solution gives anestimate of the state trajectories of the ensemble. For generaldistributions of systems this can be formulated as a convexoptimization problem which is computationally feasible withwhen the number of state dimensions is low. In the casewhere the marginal distributions are Gaussian, the problemis reformulated as a semidefinite programming and can beefficiently solved for tracking systems with a large number ofstates.

I. INTRODUCTION

The optimal mass transport theory provides a geometricframework for mapping a distribution to another one in away that minimizes the total transport cost [20]. This hasbeen used in many contexts, traditionally for application ineconomics and logistics [12], and more recently in imagingand machine learning [2], [11], [14], [16] as well as systemsand controls [6], [7], [10]. In case when the transport costis quadratic, the problem may be formulated as a fluiddynamics problem [1]. It can also be viewed as an optimalcontrol problem of the density of the particles that obeythe dynamics x(t) = u(t) [5]. In the subsequent paper[8] a natural generalization of this problem is introducedwhere the underlying linear dynamics of the particles becomex(t) = Ax(t) +Bu(t).

In this work we consider the extension of this frameworkto the case where the full state information is not available.In particular we consider the tracking problems where weseek to estimate the states of several identical and indis-tinguishable systems from only their joint outputs. This isalso known as state estimation of ensembles, see [22], [21].One of the main obstacles is that it is not known whichoutput that is generated by a certain subsystem, hence anassociation problem has to be solved. A brute force approachto this would result in a combinatorial problem. However,by formulating this as an optimal transport problem which isconvex and where the number of variables only grow linearlyin the number of stages. Our formulation not only allows

*This work was in part supported by the Swedish Research Council (VR)grant 2014-5870 and the Swedish Foundation of Strategic Research (SSF)grant AM13-0049.

Y. Chen is with the Department of Electrical and Computer Engineering,Iowa State University, email:[email protected]. J. Karlsson is withDepartment of mathematics, KTH Royal Institute of Technology. email:[email protected].

for tracking a finite number of particles, but also appliesto the more general problems of tracking distributions. Wealso consider the case when the underlying distributions areGaussian. In this case the number of variables can be reducedsignificantly and the problem can be solved efficiently withlarge number of state dimensions.

At the very high level, we are developing a framework tosmoothly interpolate a sequence of probability densities, ina way akin to smoothly interpolate several points in theEuclidean space. Indeed, the cubic spline interpolation ofseveral points has a variational formulation in the flavor ofoptimal control [15]. The cubic spline counterpart in thespace of distributions has been recently studied in [4], [3].Our work can be viewed as a generalization of these wheremore general underlying linear dynamics, instead of simplesecond order integrator, are considered.

The rest of the paper is structured as follows. Section IIis a brief introduction to the optimal mass transport theory.In Section III, we formulate the tracking problems usingoptimal mass transport theory. The case where the densitiesare assumed Gaussian is discussed in Section IV. Finally, wepresent several numerical examples in Section V to illustrateour framework.

II. BACKGROUND ON OPTIMAL MASS TRANSPORT

Monge’s original formulation of optimal mass transportis as follows (see, e.g., [20]). Consider two nonnegativedistributions, ρ0 and ρ1, of the same mass, defined on acompact set X ⊂ Rn. The optimal mass transport problemseeks a transport function f : X → X that minimizes thetransportation cost ∫

X

c(x, f(x))ρ0(x)dx

over all the mass preserving maps from ρ0 to ρ1, namely,∫x∈U

ρ1(x)dx =

∫f(x)∈U

ρ0(x)dx for all U ⊂ X, (1)

which is often denoted f#ρ0 = ρ1. The function c(x0, x1) :X × X → R+ is a cost function that describes the costfor transporting a unit mass from x0 to x1. The Monge’sproblem is usually difficult to solve due to the nontrivialconstraint f#ρ0 = ρ1 and the minimum may not exist. Toovercome these difficulties, Kantorovich proposed a linearprogramming relaxation

minπ∈Π(ρ0,ρ1)

∫X×X

c(x0, x1)dπ(x0, x1) (2)

arX

iv:1

803.

0229

6v1

[cs

.SY

] 6

Mar

201

8

Page 2: State tracking of linear ensembles via optimal mass transport

where Π(ρ0, ρ1) denotes the set of all joint distributionsbetween ρ0 and ρ1. In fact, when ρ0 and ρ1 are absolutelycontinuous, these two formulations are equivalent.

When the cost function is quadratic, i.e., c(x0, x1) = ‖x0 −x1‖22, the optimal mass transport problem can be set up asan optimal control problems in fluid dynamics [1]

minu,ρ

∫ 1

0

∫x∈X‖u(t, x)‖2ρ(t, x)dxdt

subject to∂ρ

∂t+∇ · (uρ) = 0

ρk = ρ(k, ·), for k = 0, 1.

This can be interpreted as an optimization problem where themass distributions represented by infinite-decimal particles,each carrying a cost corresponding to the optimal controlproblem

minu

∫ 1

0

‖u(t)‖2dt

subject to x(t) = u(t),

x(0) = x0 and x(1) = x1

where x0, x1 are the initial and final position of the particle,respectively. Hence, choosing the quadratic cost in the op-timal mass transport problem can be seen as assuming theunderlying dynamic being x(t) = u(t).

In [8] this was generalized through replacing the costfunction that reflects deviation from the trajectory of theunderlying system dynamics. It is associated with the lineardynamic

x(t) = Ax(t) +Bu(t) (3)

and optimal control problem

minu

∫ 1

0

‖u(t)‖2dt

subject to x(t) = Ax(t) +Bu(t),

x(0) = x0 and x(1) = x1.

The cost is then given by

c(x0, x1) = (x1 − Φx0)TQ(x1 − Φx0). (4)

where Φ = eA, Q = M−110 , and

M10 =

∫ 1

0

eA(1−τ)BBT eAT (1−τ)dτ

is the controllability Grammian. Apparently, it reduces to thestandard cost c(x0, x1) = ‖x0 − x1‖2 when A = 0, B = 1.

III. TRACKING WITH OPTIMAL MASS TRANSPORT FROMOUTPUT MEASUREMENTS

We next extend this connection between optimal transportand optimal control to systems with output measurements.

To this end, assume that the underlying dynamic and outputmeasurements corresponds to the linear system

x(t) = Ax(t) +Bu(t) (5a)y(t) = Cx(t) (5b)

where A ∈ Rn×n, B ∈ Rn×p, C ∈ Rm×n and (A,B)is a controllable pair. We seek to track the time varyingdistribution ρt, where each particle abides by (5a), basedon the output distributions ρk = C#ρk at the times k =0, 1, . . . , T . Note that only the distribution of the output isavailable. We don’t have access to the information abouteach particle, namely, the particles are indistinguishable.For example, with a finite ensemble (x1(t), . . . , xN (t)) thenρt =

∑Ni=1 δ(x − xi(t)) ∈ M+(Rn) for t ∈ [0, T ] and

the outputs are ρk =∑Ni=1 δ(y − Cxi(k)) ∈ M+(Rm) for

k = 0, 1, . . . , T , where δ is the Dirac delta function andM+(Rn) is the set of measures on Rn. For determiningidentifiability of this problem, see [22].

We propose to model this tracking problem as the followingoptimal mass transport problem. We seek a flow of nonneg-ative measures ρ : t→M+(X) that minimize

minu,ρ

∫ T

t=0

∫x∈X‖u(t, x)‖2ρ(t, x)dxdt (6a)

subject to∂ρ

∂t+∇ · ((Ax+Bu)ρ) = 0 (6b)

ρk = C#ρ(k, ·), for k = 0, 1, . . . , T. (6c)

Reformulating this using the Kantorovich formulation of theoptimal transport problems we arrive at the linear program-ming problem

minρk∈M+(X)

πk∈M+(X2)

T−1∑k=0

∫(xk,xk+1)∈X2

c(xk, xk+1)dπk(xk, xk+1)

(7a)

subject to∫xk+1∈X

dπk(xk, xk+1) = dρk(xk) (7b)∫xk∈X

dπk(xk, xk+1) = dρk+1(xk+1) (7c)

ρk = C#ρk, for k = 0, 1, . . . , T, (7d)

where the cost function c is given by (4). This optimizationproblem has a dual formulation in terms of continuous testfunctions on C(X), denoted Cont(C(X)).

Theorem 1: The optimization problem (7) is the dual of

maxφk∈Cont(C(X))k=0,1,...,T

T∑k=0

∫C(X)

φk(yk)dρk(yk) (8a)

subject toT∑k=0

φk(Cxk) ≤T∑k=1

c(xk−1, xk) (8b)

for all xk ∈ Rn, k = 0, . . . , T,

and the duality gap is zero.

Proof: See the appendix for a proof.

Page 3: State tracking of linear ensembles via optimal mass transport

The proof is derived by noting that (7) is a multimarginaloptimal mass transport problem [18] with cost being afunction that separates into T terms and each term onlydepends on two of the dimensions.

Both the formulations (7) and (8) are linear programmingproblems that can be discretized and solved using generalpurpose solvers if the number of states are small. The numberof variables grows linearly in terms of T and exponentiallyin the number of system dimensions, thus they suffer fromthe curse of dimensionality. An alternative approach wouldbe testing all associations between measurements and par-ticles, but this would require a combinatorial search whichgrows exponentially in T . Yet another possible approach foraddressing this problem is to use heuristics along the linesof K-means clustering, as proposed in [21].

When the number of states is large we might therefore wantto restrict the distributions to certain classes. One such classof particular interest is the Gaussian distributions.

IV. GAUSSIAN CASES

In this section, we zoom in to the case when all the marginaldistributions are Gaussian. We assume that, for all k =0, 1, . . . , T , the k-th marginal ρk of the measurement isa Gaussian distribution with mean µk and covariance Σk.By linearity, the output density tracking problem can bedivided into two parts: interpolating the means {µk} andinterpolating the covariances {Σk}.

Interpolating the means is equivalent to solving (6) for asingle particle. It reduces to the optimal control problem

minu

∫ T

0

‖u(t)‖2dt (9a)

subject to x(t) = Ax(t) +Bu(t), 0 ≤ t ≤ T (9b)Cx(k) = µk, k = 0, . . . , T. (9c)

By introducing a Lagrangian multiplier λ(·), it is easy tosee the optimal control is of the form u(t) = BTλ(t)with λ satisfying the dual dynamics λ(t) = −ATλ(t) foreach interval t ∈ (k, k + 1). For each interval, if we fixx(k), x(k+ 1), then we can obtain a closed-form expressionfor the optimal cost, which is

(x(k + 1)− Φx(k))TQ(x(k + 1)− Φx(k)).

Therefore, a strategy to solve (9) is first minimizing u overfixed x(0), . . . , x(T ) and then minimizing the result overx(0), . . . , x(T ).

The covariances part is solved using semidefinite program-ming (SDP). We first minimize the cost with fixed statex(0), x(1), . . . , x(T ) and then minimize the resulting costover x(k), k = 0, 1, . . . , T subject to the constraint thatCx(k) is zero mean Gaussian distribution with covarianceΣk. For fixed x(k), k = 0, 1, . . . , T , the minimum of the

cost is given in the quadratic formT−1∑k=0

c(x(k), x(k + 1))

=

T−1∑k=0

(x(k + 1)− Φx(k))TQ(x(k + 1)− Φx(k)).

We then minimize this cost subject to the distribution con-straint of the output, which reads as

min E{T−1∑k=0

c(x(k), x(k + 1))} (10a)

y(k) = Cx(k) ∼ Σk, k = 0, 1, . . . , T. (10b)

We notice that in the state space, the problem can be viewedas T separate optimal transport problems. However, theseproblems are coupled through the constraints on the output.Since the cost function is quadratic, it is not difficult to showthat the solution remains Gaussian. Thus, the cost becomes

T−1∑k=0

Tr(QΣk+1 + ΦTQΦΣk − 2QΦSk,k+1),

where Sk,k+1 = E{x(k)x(k + 1)T }.

Theorem 2: The density tracking Problem (6) for Gaussianmarginals with covariances {Σ0,Σ1, . . . ,ΣT } has the SDPformulation

minΣ,S

T−1∑k=0

Tr(QΣk+1 + ΦTQΦΣk − 2QΦSk,k+1) (11a)[Σk Sk,k+1

STk,k+1 Σk+1

]≥ 0, k = 0, . . . , T − 1 (11b)

CΣkCT = Σk, k = 0, 1, . . . , T. (11c)

We remark that it suffice to have constraint (11b) to guaranteea well-defined covariance matrix for the random vector(x0, x1, . . . , xT )T . This can be proven constructively usinggraphical models, see [4].

Alternatively, to solve (10), one can choose to mini-mize the cost over fixed y(k), k = 0, 1, . . . , T first andthen minimize result over the distribution of y(k). Letcy(y(0), y(1), · · · , y(T )) be the minimum of the cost forfixed outputs. More precisely,

cy(y(0), · · · , y(T )) = minx

T−1∑k=0

c(x(k), x(k + 1))

y(k) = Cx(k), k = 0, 1, . . . , T.

Since c(·, ·) is quadratic, cy is a quadratic function of y =[y(0), y(1), · · · , y(T )]T . Furthermore, it has the form

cy(y) = yTRy

for some positive semidefinite matrix R. Thus, problem (10)becomes

miny

E{tr(RyyT )} (12a)

y(k) ∼ Σk, k = 0, 1, . . . , T, (12b)

Page 4: State tracking of linear ensembles via optimal mass transport

0 1 2 3 4 5

Time

-5

-4

-3

-2

-1

0

1

2

3

4

5S

tate

x

1

Fig. 1: Ensemble outputs in example with N = 7 systems.System dynamic given by (14) and σ = 0.1. Blue trajectory:example of a system trajectory with σ = 0.

or equivalently,

minΣy≥0

tr(RΣy) (13a)

Σy(k, k) = Σk, k = 0, 1, . . . , T. (13b)

Remark 3: The two SDP formulations (11) and (13) rep-resent two different ways of viewing the density trackingproblem. In (11) we lift the marginals distributions to thestate space and solve T separate optimal transport problems.In contrast, (13) approaches the problem directly by con-sidering the joint distributions of y(0), y(1), · · · , y(T ). Interms of complexity, (11) grows linearly as the number oftime points T , while (13) grows as T 6 in the worse case.Therefore, (11) is better for computational purpose.

After obtaining the marginal covariances {Σk} for the statevariables, we can recover Σ(t), k ≤ t ≤ k+1 from Σk, Σk+1

in closed-form for each k = 0, . . . , T − 1, using optimalmass transport theory over linear dynamics [8]. It followsthat the trajectory of the covariances of the output is givenby Σ(t) = CΣ(t)CT . Finally, let µ(t) = x(t) be the solutionto (9), we obtain our optimal density flow being a flow ofGaussian distributions with mean µ(t) and covariance Σ(t).

V. EXAMPLES

Two examples are provided to illustrate our framework. Inthe first example, we explain the tracking of finite number ofparticles and in the second one, we consider a Gaussian dis-tributions tracking problem. The implementations are madein Matlab using the package CVX, which is a package forspecifying and solving convex programs [13], [9].

A. Tracking of particles

We illustrate the tracking of a series of systems with a givensystem dynamic. Consider the tracking of N systems with

0 1 2 3 4 5

Time

-5

0

5

Sta

te

x1

0 1 2 3 4 5

Time

-5

0

5

Sta

te

x2

Fig. 2: Example with N = 7 systems to be tracked. Systemdynamic: (14). Noise level: σ = 0.1. Available measurementpoints (x). True system states (solid). Estimated system stares(dashed) Upper figure: State x1. Lower figure: State x2.

0 1 2 3 4 5

Time

-5

0

5

Sta

te

x1

0 1 2 3 4 5

Time

-5

0

5

Sta

te

x2

Fig. 3: Example with N = 5 systems to be tracked. Noiselevel: σ = 0.5. Available measurement points (x). Truesystem states (solid). Estimated system stares (dashed) Upperfigure: State x1. Lower figure: State x2.

oscillatory dynamics

dx(t) = Ax(t)dt+ σdw(t)

y(t) = Cx(t)

where the state dynamics is given by

A =

[0 1−1 0

], C =

[1 0

](14)

and where dw is normalized white Gaussian noise. We seekto recover the full state information of the systems basedon only the unordered outputs, observed at the time pointt = 0, 1, . . . , 5. The problem is solved in the two followingexamples using the formulation (7), where the state space isdiscretized. The discretization of the state space at time k is

Page 5: State tracking of linear ensembles via optimal mass transport

(a) T = 5 (b) T = 10

Fig. 4: Interpolation of covariances

given by

{(x1, x2)|x1 ∈ supp(ρk), x2 ∈ linspace([−7, 7], 150)},

where linspace([a, b], N) denotes the set of N uniformyspaced points in the interval [a, b].

For the first example we let the number of particles beN = 7 and the noise level σ = 0.1. Figure 1 shows theoutput measurement and an example of a noiseless trajectorycorresponding to (14). We seek to group the measurementscorresponding to which system they belong. Figure 2 showsthe reconstruction based on the optimization problem (7), andboth the state estimates correspond well to the true states.Even though there are a few error in the associations, theseare between points that are spaced closely together and doesnot seem to impact the state estimation significantly. Next,we consider an example with N = 5 particles and noiselevel σ = 0.5. Figure 3 shows the reconstruction based onthe optimization problem (7). Even though the noise level isfairly large, we are able to achieve good reconstructions ofthe states.

B. Tracking of Gaussian distributions

We consider a dynamical system consisting of a simple, pos-sibly high dimensional, first order integrator. The dynamicsare governed by

A =

[0 I0 0

], B =

[0I

], C =

[I 0

],

where I is an identity matrix of proper size n. When thedimension of the output is n = 1, we randomly generate sev-eral covariances and interpolate them using (11). The resultsare depicted in Figure 4 for T = 5, 10. It can be observed thatthe interpolated curves are smooth. Similar results hold forhigh dimensional setting. Figure 5 depicts several Gaussiandistributions with different means and covariances we wantto track. The tracking result is shown in Figure 6, which isa natural and smooth interpolation of the observations.

VI. CONCLUSIONS

A framework of tracking the states of indistinguishableparticles with linear dynamics using output measures is pre-sented. The measurements are the distributions of the output

Fig. 5: Marginal distributions

Fig. 6: Interpolation of covariances: T = 3

at several time points. Our framework relies on a recentdevelopment of optimal mass transport theory with priordynamics [8]. In the special case with Gaussian marginals,our problem has a SDP formulation and can be solvedefficiently. Developing fast algorithms for the general caseswill be a future research topic.

APPENDIX

A. Proof of Theorem 1

The Lagrangian of the problem (8), where the constraint (8b)is relaxed, is given by

L({φk}Tk=0,M) =

T∑k=0

∫C(X)

φk(yk)dρk(yk)

+

∫XT+1

(T∑k=1

c(xk−1, xk)−T∑k=0

φk(Cxk))

)dM(x)

Page 6: State tracking of linear ensembles via optimal mass transport

where x =(x0, . . . , xT

)∈ XT+1 [17, Chapter 8]. Note

that since the constraint (8b) is defined by continuousfunctions, then the dual variable is a nonnegative measureM ∈ M+(XT+1). Denote the one dimensional and two-dimensional marginals of M by

dρk(xk) =

∫x0,...,xk−1,xk+1,...,xT∈X

dM(x), (15a)

dπk(xk, xk+1) =

∫x0,...,xk−1,xk+2,...,xT∈X

dM(x). (15b)

For a fixed M , the maximum of L({φk}Tk=0,M) with respectto φk is finite only if∫

C(X)

φk(yk)dρk(yk) =

∫X

φk(Cxk))ρk(xk)

for any continuous functional φk : C(X) 7→ R. Fromthe Rietz representation theorem (see e.g. [17]) and thedefinition (1) of mass preserving maps this is equivalent withC#ρk = ρk. Also, using (15b) the remaining second term ofL({φk}Tk=0,M) can be written as

T−1∑k=0

∫(xk,xk+1)∈X×X

c(xk, xk+1)dπk(xk, xk+1). (16)

Next, note that for any M ∈ M+(XT+1) the marginals(15) satisfy the constraints (7b) and (7c) in the optimizationproblem (7). Conversely we would like to show that forany set of marginals satisfying the constraints (7b)-(7c) fork = 0, 1, . . . , T − 1 there is distribution M ∈ M+(XT+1)that satisfies (15). Thus, let the one-dimensional marginalsρ(xk) and two-dimensional marginals πk(xk, xk+1) satisfy(7b) and (7c) for k = 0, 1, . . . , T − 1. From equations (7b)-(7c) it follows that the total mass ρk(X) =: κ is the samefor all k = 0, 1, . . . , T . Normalizing the total mass to 1, wemay consider the Markov chain with transition probabilitiesfrom time k to k + 1 specified by the joint distributionπ(xk, xk+1)/κ. Consequently if we let M/κ be the measurecorresponding to the joint probabilities of the Markov chain,then M satisfies (15) (see e.g., [19]).

To summarize, we have shown that the dual functional of (8)is given by

max{φk}Tk=0

L({φk}Tk=0,M) =

{(16) if dρk satisfies (7d)∞ otherwise,

where dπk are the marginals given by (15b). The functionsdρk and dπk are marginals (15) for some M ∈M+(XT+1)if and only if the marginals satisfies the constraints (7b)-(7c),hence the dual of (8) is given by (7). Furthermore, since (8)has a feasible interior point, the duality gap is zero [17, p.217]. �

REFERENCES

[1] J.-D. Benamou and Y. Brenier. A computational fluid mechanicssolution to the Monge-Kantorovich mass transfer problem. NumerischeMathematik, 84(3):375–393, 2000.

[2] J.-D. Benamou, G. Carlier, M. Cuturi, L. Nenna, and G. Peyre.Iterative Bregman projections for regularized transportation problems.SIAM Journal on Scientific Computing, 37(2):A1111–A1138, 2015.

[3] J.-D. Benamou, T. Gallouet, and F.-X. Vialard. Second order modelsfor optimal transport and cubic splines on the Wasserstein space. arXivpreprint arXiv:1801.04144, 2018.

[4] Y. Chen, G. Conforti, and T.T. Georgiou. Measure-valuedspline curves: an optimal transport viewpoint. arXiv preprintarXiv:1801.03186, 2018.

[5] Y. Chen, T.T. Georgiou, and M. Pavon. On the relation between opti-mal transport and Schrodinger bridges: A stochastic control viewpoint.Journal of Optimization Theory and Applications, 169(2):671–691,2016.

[6] Y. Chen., T.T. Georgiou, and M. Pavon. Optimal steering of a linearstochastic system to a final probability distribution, Part I. IEEE Trans.on Automatic Control, 61(5):1158–1169, 2016.

[7] Y. Chen, T.T. Georgiou, and M. Pavon. Optimal steering of alinear stochastic system to a final probability distribution, Part III.arXiv:1608.03622, IEEE Trans. on Automatic Control, to appear,2017.

[8] Y. Chen, T.T. Georgiou, and M. Pavon. Optimal transport over alinear dynamical system. IEEE Transactions on Automatic Control,62(5):2137–2152, 2017.

[9] Inc. CVX Research. CVX: Matlab software for disciplined convexprogramming, version 2.0. http://cvxr.com/cvx, August 2012.

[10] F. Elvander, A. Jakobsson, and J. Karlsson. Interpolation and extrapo-lation of Toeplitz matrices via optimal mass transport. arXiv preprintarXiv:1711.03890, 2017.

[11] B. Engquist and B.D. Froese. Application of the Wasserstein metricto seismic signals. Communications in Mathematical Sciences, 12(5),2014.

[12] A. Galichon. Optimal Transport Methods in Economics. PrincetonUniversity Press, 2016.

[13] M.C. Grant and S.P. Boyd. Graph implementations for nonsmoothconvex programs. In Recent advances in learning and control, pages95–110. Springer, 2008.

[14] S. Haker, L. Zhu, A. Tannenbaum, and S. Angenent. Optimalmass transport for registration and warping. International Journalof computer vision, 60(3):225–240, 2004.

[15] J.C. Holladay. A smoothest curve approximation. Mathematical tablesand other aids to computation, 11(60):233–243, 1957.

[16] J. Karlsson and A. Ringh. Generalized Sinkhorn iterations forregularizing inverse problems using optimal mass transport. SIAMJournal on Imaging Sciences, 10(4):1935–1962, 2017.

[17] D.G. Luenberger. Optimization by vector space methods. John Wiley& Sons, 1997.

[18] B. Pass. Uniqueness and Monge solutions in the multimarginal optimaltransportation problem. SIAM Journal on Mathematical Analysis,43(6):2758–2775, 2011.

[19] M. Sharpe. General theory of Markov processes, volume 133.Academic press, 1988.

[20] C. Villani. Topics in Optimal Transportation, volume 58. Graduatestudies in Mathematics, AMS, 2003.

[21] S. Zeng, H. Ishii, and F. Allgower. Sampled observability andstate estimation of linear discrete ensembles. IEEE Transactions onAutomatic Control, 62(5):2406–2418, 2017.

[22] S. Zeng, S. Waldherr, C. Ebenbauer, and F. Allgower. Ensembleobservability of linear systems. IEEE Transactions on AutomaticControl, 61(6):1452–1465, 2016.