∗† ∗§ arXiv:2109.07820v1 [math.OC] 16 Sep 2021

arX

iv:2

109.

0782

0v1

[m

ath.

OC

] 1

6 Se

p 20

21

Formulation of branched transport as geometry optimization

Julius Lohmann∗† Bernhard Schmitzer‡ Benedikt Wirth∗§

September 17, 2021

Abstract

The branched transport problem, a popular recent variant of optimal transport, is a non-convex and non-smoothvariational problem on Radon measures. The so-called urban planning problem, on the contrary, is a shape opti-mization problem that seeks the optimal geometry of a street or pipe network. We show that the branched transportproblem with concave cost function is equivalent to a generalized version of the urban planning problem. Apartfrom unifying these two different models used in the literature, another advantage of the urban planning formulationfor branched transport is that it provides a more transparent interpretation of the overall cost by separation into atransport (Wasserstein-1-distance) and a network maintenance term, and it splits the problem into the actual trans-portation task and a geometry optimization.

Keywords: optimal transport, optimal networks, branched transport, urban planning, Wasserstein distance, geo-metric measure theory

Contents

1 Introduction 2

1.1 Generalized branched transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Generalized urban planning problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Summary of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4 General notation and definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Wasserstein distance with generalized urban metric as min-cost flow 11

2.1 Properties of the path length L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2 Properties of the generalized urban metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3 Existence of measurable optimal path selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.4 Wasserstein distance with generalized urban metric as Beckmann problem . . . . . . . . . . . . . . . . . . . . . . . 23

3 Bilevel formulation of the branched transport problem 28

3.1 Version of the branched transport problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2 Branched transport problem as generalized urban planning problem . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4 Acknowledgements 37

∗Institute for Numerical and Applied Mathematics, University of Münster, Einsteinstraße 62, 48149 Münster, Germany†[email protected]‡Institute for Computer Science, University of Göttingen, Goldschmidtstraße 7, 37077 Göttingen, Germany. [email protected]§[email protected]

1

http://arxiv.org/abs/2109.07820v1

1 Introduction

Branched transport and urban planning are distinct models developed during the past two decades that both describetransportation networks; the textbooks by Bernot et al. [BCM09] and by Buttazzo et al. [But+09] are devoted to eithermodel and provide a good starting point into the literature.The main motivation for branched transport is a variational explanation of the high complexity and ramification foundin many natural transportation systems such as river networks, vascular anatomy (like the blood vessel or the bronchialsystem) or botanical structures (like roots or leaf venation). The model is based on the assumption that the (biologicalor energetic) cost incurred by the transport is subadditive in the transported mass so that it is cost-efficient to mergeoriginally separate material flows into few large material flows. This tendency of flow-merging then automatically leadsto network-like material streams with many branchings.Urban planning on the other hand was devised as an optimal control or shape optimization problem. Here one optimizesthe street layout or the public transport routes in order to allow efficient commuting of the population between theirhomes and their workplaces. The cost of a street or public transport network then is composed of its maintenance cost(in the original model simply the total network length) and the cost of the population for commuting (measured as theoptimal transport or Wasserstein-1-distance between the distributions of homes and workplaces in a metric that dependson the street network).Even though both model formulations are fundamentally different (branched transport is a non-convex optimizationproblem over 1-currents, while urban planning can be seen as a bilevel shape optimization problem) the resulting networkstructures behave in a phenomenologically similar way. In [BW16] it was then shown that the (original) urban planningproblem can equivalently be formulated as a specific branched transport problem. The aim of the current work is togreatly generalize this result: We will introduce a natural generalization of the urban planning problem (of which theoriginal urban planning problem is a specific case) and then show that every branched transport problem with concavetransportation cost is equivalent to a generalized urban planning problem and vice versa. In particular, optimizers ofone problem induce optimizers of the other.We think that the equivalence between both models is not just useful because it unites different strands of literature. Italso has implications for the modelling and the numerics of such problems. As for the modelling, the urban planningformulation clearly separates two different contributions to the overall cost: the cost for the actual transportation as wellas the cost for building and maintaining the transport network. This is not only easier to interpret than the lumpedcost of branched transport, it also allows to consider (potentially more realistic) variants in which transportation andmaintenance cost are payed by different parties (such as commuters and transport companies), leading to games betweendifferent players. As for numerics, there exist phase field approximations of branched transport [CFM19; FDW20;Wir19] that can now be applied to solve urban planning problems numerically. Similarly, a bilevel optimization seems anattractive alternative numerical approach (though not yet implemented for such problems to the best of our knowledge)which now becomes available also for branched transport.In the remainder of the introduction we briefly state the branched transport and the urban planning model as well asour main results. In section 2 we then analyse the Wasserstein distance with respect to the so-called urban metric, a(pseudo-)metric that depends on a street or transport network and that occurs in urban planning. In particular, wewill prove properties of this urban metric and derive an equivalent Beckmann formulation. Finally, in section 3 theequivalence between the branched transport and the urban planning problem is shown.

1.1 Generalized branched transport

There are various ways to describe branched transport, in particular a Eulerian formulation due to Xia [Xia03], whichuses vector-valued Radon measures or 1-currents on Rn, and a Lagrangian formulation due to Maddalena, Solimini andMorel [MSM03] based on so-called irrigation patterns. We here only present the former (irrigation patterns will be

2

introduced later in section 3.1).The cost for moving an amount of mass m per unit distance will be described by a transportation cost τ .

Definition 1.1.1 (Transportation cost). A transportation cost is a non-decreasing concave function τ : [0,∞) →[0,∞) with τ(0) = 0.

The monotonicity of τ as well as τ(0) = 0 are natural requirements for a cost. The concavity could in principle be relaxedto the weaker condition of subadditivity,

τ(m1 +m2) ≤ τ(m1) + τ(m2),

which encodes an efficiency gain if mass is transported in bulk. Different examples for τ are presented in figure 1.Originally only τ(m) = mα for α ∈ (0, 1) was used but was generalized to the above in [BW18]. The borderline choiceτ(m) = m does not exhibit any preference for transport in bulk and is known to lead to classical Wasserstein-1 transport.The material flows from a source distribution µ+ to a sink distribution µ− (without loss of generality probability measures)are described by so-called mass fluxes.

Definition 1.1.2 (Polyhedral mass flux and branched transport cost). Assume that µ+ and µ− are finite sums ofweighted Dirac measures, i.e.,

µ+ =

M∑

i=1

fiδxi and µ− =

N∑

j=1

gjδyj ,

where fi, gj ∈ [0, 1] satisfy∑

i fi =∑

j gj = 1 and xi, yj ∈ Rn. A polyhedral mass flux between µ+ and µ− is avector-valued Radon measure F ∈ Mn(Rn) which satisfies div(F) = µ+ − µ− in the distributional sense and can beexpressed as

F =∑

e

me~eH1 e,

where the sum is over finitely many edges e = xe + [0, 1](ye − xe) ⊂ Rn with orientation ~e = (ye − xe)/|ye − xe|, thecoefficients me are real weights, and H1 e is the one-dimensional Hausdorff measure restricted to e. The branched

transport cost of F with respect to a transportation cost τ is defined as

J τ,µ+,µ− [F ] =∑

e

τ(me)H1(e).

A polyhedral mass flux can equivalently be represented as a weighted directed graph with edges e and weights me. Thecondition div(F) = µ+ − µ− encodes Kirchhoff’s law of mass preservation: Let v be any vertex of the weighted directedgraph associated with F such that v is not contained in supp(µ+) ∪ supp(µ−). Then the condition implies

∑

ein

mein =∑

eout

meout,

where we sum over all incoming edges ein and outgoing edges eout at v.Using the idea of (discrete) polyhedral mass fluxes we can pass to the continuous case using weak-∗ convergence.

Definition 1.1.3 (Mass flux, approximating graph sequence and branched transport cost). A vector-valued Radonmeasure F ∈ Mn(Rn) is called mass flux between two probability measures µ+ and µ− on Rn if there exist twosequences of probability measures µk+, µ

k− and a sequence of polyhedral mass fluxes Fk with div(Fk) = µk+ − µk− such that

Fk∗− F and µk±

∗− µ±, where

∗− indicates the weak-∗ convergence in duality with continuous functions. The sequence

3

(Fk, µk+, µk−) is called approximating graph sequence, and we write (Fk, µk+, µ

k−)

∗− (F , µ+, µ−). If F is a mass flux,

then the branched transport cost of F is defined as

J τ,µ+,µ− [F ] = inf

lim infk

J τ,µk+,µ

k− [Fk]

∣

∣

∣

∣

(Fk, µk+, µ

k−)

∗− (F , µ+, µ−)

.

The branched transport problem seeks the optimal mass fluxes between µ+ and µ−.

Definition 1.1.4 (Branched transport problem). The branched transport problem is given by

infJ τ,µ+,µ− [F ] | F ∈ Mn(Rn), div(F) = µ+ − µ−.

A minimizer is known to exist in case the problem is finite, which is true under mild growth conditions on τ [BW18].

1.2 Generalized urban planning problem

The urban planning problem was proposed by Brancolini and Buttazzo [BB05] as well as Buttazzo, Pratelli, Soliminiand Stepanov [But+09]. We directly state our generalization and relate it to the original model afterwards. The basicidea is to optimize a street network, which is represented by a set S ⊂ Rn as well as a function b : S → [0,∞) thatdescribes how costly it is to travel along each part of the network (one may view it as the inverse road quality). Thecost for travelling outside the network S is assumed to be a fixed constant a ∈ [0,∞] per distance. Given such a streetnetwork, the cost for travelling from x to y is described by the urban metric.

Definition 1.2.1 (Generalized urban metric). Let S ⊂ Rn be countably 1-rectifiable and Borel measurable, b : S → [0,∞)lower semi-continuous and a ∈ [0,∞] with b ≤ a on S. The associated generalized urban metric is defined as

dS,a,b(x, y) = infγ∈Γxy

∫

γ([0,1])∩S

b dH1 + aH1(γ([0, 1]) \ S),

where Γxy denotes the set of all Lipschitz paths γ : [0, 1] → Rn with γ(0) = x and γ(1) = y.

Without any further restrictions, dS,a,b is actually only a pseudometric since positive definiteness and finiteness cannotbe guaranteed.We will call the function b the friction coefficient of the road or pipe network since it obviously describes how difficultmotion on the network is. Of course, the meaning is the same as the previously mentioned inverse road quality.Now let µ+ and µ− be probability measures on Rn describing the initial and final distribution of a quantity that is tobe transported (or of homes and workplaces). The total cost for transporting µ+ onto µ− is given by the Wassersteindistance with respect to dS,a,b.

Definition 1.2.2 (Wasserstein distance, transport plans). Let S, a, b be as in definition 1.2.1. The Wasserstein

distance between µ+ and µ− with respect to dS,a,b is defined as

WdS,a,b(µ+, µ−) = inf

π

∫

Rn×Rn

dS,a,b dπ,

where the infimum is taken over all probability measures π on Rn×Rn with π(B×Rn) = µ+(B) and π(Rn×B) = µ−(B)for all Borel sets B ∈ B(Rn). Any such measure is called a transport plan. The set of transport plans is denoted byΠ(µ+, µ−).

4

Given a transport plan π, the quantity π(A,B) indicates how much mass is transported from A ⊂ Rn to B ⊂ Rn sothat the Wasserstein distance is nothing else than the accumulated travel cost of all mass particles. This Wassersteindistance will form part of the urban planning cost, the other part comes from the maintenance of the network (S, b). Themaintenance cost of a unit street segment of inverse quality b shall be described by c(b) for a function c : [0,∞) → [0,∞].Since maintenance cost naturally increases with road quality, c shall be non-increasing.

Definition 1.2.3 (Generalized urban planning cost). Given a non-increasing maintenance cost c : [0,∞) → [0,∞], set

a = inf c−1(0).

The generalized urban planning cost of a street network (S, b) (as in definition 1.2.1) is given by

Uc,µ+,µ− [S, b] =WdS,a,b(µ+, µ−) +

∫

S

c(b) dH1.

Note that c(b) is Borel measurable as a composition of Borel measurable functions. Recall that the function b describesthe cost for travelling on the network while the constant a is the cost for travelling outside the network. Hence we musthave c(a) = 0 as there is no road to be maintained, which explains the relation a = inf c−1(0). The urban planningproblem now seeks the optimal street network.

Definition 1.2.4 (Generalized urban planning problem). The generalized urban planning problem is given by

inf Uc,µ+,µ− [S, b] |S ⊂ Rn countably 1-rectifiable and Borel measurable, b : S → [0, a] lower semi-continuous .

The generalized urban planning problem can be seen as a bilevel optimization problem, where the outer problem optimizesthe shape S and friction coefficient b of the network and the inner one solves the optimal transport problem.The original urban planning problem from [BB05; But+09] is obtained by the specific choice

c(b) =

∞ if b < b

c if b ≤ b < a

0 if a ≤ b

for fixed parameters a, b and c. In that model only a single type of roads is built, namely roads with friction coefficientb: A better quality is impossible due to infinite maintenance cost, and there is no gain in using worse streets as theirmaintenance costs the same.

1.3 Summary of results

Our main results are

• a Beckmann-type formulation of the Wasserstein distance WdS,a,b(theorem 1.3.2) and

• the urban planning formulation of the branched transport problem (theorem 1.3.4).

In the following we briefly state and discuss both results as well as a few auxiliary results of independent interest. Letµ+ and µ− be probability measures on Rn with bounded supports, without loss of generality contained in C = [−1, 1]n.The Wasserstein-1-distance W1(µ+, µ−) = minπ∈Π(µ+,µ−)

∫

C×C|x − y| dπ(x, y) between µ+ and µ− with respect to the

Euclidean (or a similarly smooth geodesic) metric is known to equal the minimum cost of a material flux from the sourceµ+ to the sink µ− [San15, Thm. 4.6],

W1(µ+, µ−) = minG

|G|(C),

5

ca−b

c

m

τ(m) urban planning cost min(am, bm+ c)

Wasserstein cost bmdiscrete cost c · 1(0,∞)(m)branched transport cost mα

b a

c

b

ε(b)

Figure 1: Classical examples for the transportation cost τ(m) and the corresponding main-tenance cost ε(b) = (−τ)∗(−b). The discrete cost corresponds to the Steinerproblem of finding an optimal network with minimal total length.

where the minimum is taken over all Rn-valued Radon measures G ∈ Mn(C) that satisfy div(G) = µ+ − µ− and|G|(C) denotes the total variation or total mass of G. This minimum cost flow problem is also known as Beckmannformulation. The proof essentially consists of two applications of standard convex Fenchel–Rockafellar duality (the firstdualization yields the so-called Kantorovich–Rubinstein formula, from which the second dualization derives the Beckmannformulation). We show that an analogous formulation holds for the Wasserstein distance WdS,a,b

(µ+, µ−) with respectto our urban metric dS,a,b. To avoid pathological situations in which the street network connects any two points atarbitrarily small cost, we assume the following.

Assumption 1.3.1. For a given pair (S, b) denote the part of the network with friction coefficient no larger than λ by

Sλ = z ∈ S | b(z) ≤ λ.

We assume that Sλ has finite Hausdorff measure, H1(Sλ) <∞, for all λ ∈ [0, a).

Theorem 1.3.2 (Beckmann-type formulation of WdS,a,b(µ+, µ−)). Let S ⊂ C countably 1-rectifiable and Borel measur-

able, a ∈ [0,∞] and b : S → [0,∞) lower semi-continuous with b ≤ a on S. Suppose that assumption 1.3.1 is satisfied.Then we have

WdS,a,b(µ+, µ−) = inf

ξ,F⊥

∫

S

b|ξ| dH1 + a|F⊥|(C),

where the infimum is taken over ξ ∈ L1(H1 S;Rn) and F⊥ ∈ Mn(C) with F⊥ S = 0 and div(ξH1 S+F⊥) = µ+−µ−.

While the result is not unexpected (the flux G here takes the form ξH1 S+F⊥), its proof is quite technical and substan-tially more involved than for the Euclidean Wasserstein-1-distance. Indeed, in the Kantorovich–Rubinstein formula onetypically needs to jump back and forth (using density arguments) between Lipschitz functions with respect to the metricand differentiable functions whose gradient is bounded in terms of the Lipschitz constant and the local metric. However,for discontinuous metrics dS,a,b and in particular for a = ∞ this becomes difficult. Instead, it turns out easier to provethe equality directly, without passing to an intermediate dual problem, by contructing a minimizer of one problem froma minimizer of the other. In essence, if WdS,a,b

(µ+, µ−) < ∞ and π ∈ Π(µ+, µ−) is a minimizer (which will exist byproposition 2.4.4), then under assumption 1.3.1 an optimal mass flux for the Beckmann problem can be defined as Fπ,ρwith

〈ϕ,Fπ,ρ〉 =

∫

Θ

∫

[0,1]

ϕ(γ) · γ dLd(ρ#π)(γ) for all ϕ ∈ C(C;Rn),

where Θ is a space of (equivalence classes of) Lipschitz paths (it will be defined in section 2.3) and ρ#π denotes thepush-forward of π under ρ : C × C → Θ, which assigns to a pair of points a shortest connecting path (its existence

6

and Borel measurability will be shown in proposition 2.3.5). Conversely, if the Beckmann problem is finite, then underassumption 1.3.1 there exists a minimizer G = ξH1 S + F⊥ which can be associated with a mass flux measure η on Θmoving µ+ onto µ− (cf. definition 2.4.1) via

〈ϕ,G〉 =

∫

Θ

∫

[0,1]

ϕ(γ) · γ dLdη(γ) for all ϕ ∈ C(C;Rn).

This G then induces an optimal transport plan by

〈ϕ, πG〉 =

∫

Θ

ϕ(γ(0), γ(1)) dη(γ) for all ϕ ∈ C(C × C).

As for the second main result, we show that any branched transport problem with transportation cost τ and source andsink µ+ and µ− can equivalently be written as a generalized urban planning problem with a particular maintenance cost.

Definition 1.3.3 (Maintenance cost associated with τ). Let τ : [0,∞) → [0,∞) be a transportation cost. We extend τto a function on R via τ(m) = −∞ for all m < 0. The associated maintenance cost ist defined by ε(b) = (−τ)∗(−b) =supm≥0 τ(m)− bm for any b ∈ R.

By definition ε equals +∞ on (−∞, 0) and is decreasing by the properties of τ . We use the maintenace cost c = ε indefinition 1.2.3. The constant a = inf ε−1(0) then equals the right derivative of τ in 0, a = τ ′(0). Examples are providedin figure 1.

Theorem 1.3.4 (Bilevel formulation of the branched transport problem with concave transportation cost τ). Thebranched transport problem can equivalently be written as urban planning problem,

infF

J τ,µ+,µ− [F ] = infS,b

Uε,µ+,µ− [S, b],

where the infima are taken over F ∈ Mn(C) with div(F) = µ+ − µ−, countably 1-rectifiable and Borel measurable S ⊂ Cand lower semi-continuous functions b : S → [0, a].

In fact, we do not only show equality of the infima, but from each admissible F we construct an admissible pair (S, b)with nongreater cost and vice versa so that optimizers of one problem induce optimizers of the other. In more detail, let(S, b) be admissible for the urban planning problem with Uε,µ+,µ− [S, b] < ∞. The latter implies that assumption 1.3.1is automatically satisfied and a minimizer π of WdS,τ′(0),b

(µ+, µ−) exists (which we will show in proposition 2.4.4). Themass flux Fπ,ρ from above then can be shown to satisfy

J τ,µ+,µ− [Fπ,ρ] ≤ Uε,µ+,µ− [S, b].

Conversely, if G is admissible for the branched transport problem, then there exists a mass flux F (induced by removingdivergence-free parts of G) with

J τ,µ+,µ− [F ] ≤ J τ,µ+,µ− [G],

which can be written as F = ξH1 S +F⊥ with ξ ∈ L1(H1 S;Rn) and F⊥ ∈ Mn(C). Further, as we will show, ξ canbe represented such that the street network (S, b = −max(∂(−τ)(|ξ|))) is admissible for the urban planning problem and

Uε,µ+,µ− [S, b] ≤ J τ,µ+,µ− [F ].

Especially the proof of theorem 1.3.2 requires a number of lower semi-continuity results for path lengths and relatedfunctionals which are also of their own interest, so we list some of them below. We assume to be given S ⊂ C countably1-rectifiable and Borel measurable, a ∈ [0,∞] and b : S → [0, a] lower semi-continuous such that assumption 1.3.1holds. Consider a sequence γj : [0, 1] → C of Lipschitz paths with uniformly bounded Lipschitz constant that convergesuniformly to some γ : [0, 1] → C. Then the following holds.

7

• A version of Gołąb’s theorem holds (see proposition 2.1.5): For all Lebesgue-measurable sets T ⊂ [0, 1] one has

H1(γ(T )) ≤ lim infj

H1(γj(T )).

• If the γj have constant speed, the path length associated with dS,a,b is lower semi-continuous (see theorem 2.1.1),

LS,a,b(γ) ≤ lim infj

LS,a,b(γj) for LS,a,b(γ) =

∫

γ−1(S)

b(γ)|γ|dL+ a

∫

[0,1]\γ−1(S)

|γ|dL.

• If LS,a,b(γj) is uniformly bounded and a = ∞, then for each δ > 0 there exists a λ ∈ [0,∞) such that

H1(γ \ Sλ) ≤ δ

(see proposition 2.1.8). In particular, we have H1(γ \ S) = 0.

• For all x, y ∈ C there exists an injective Lipschitz path ψ : [0, 1] → C with ψ(0) = x, ψ(1) = y and LS,a,b(ψ) =dS,a,b(x, y). Moreover, ψ may be chosen to have a constant speed, bounded in terms of dS,a,b(x, y) and the Hausdorffmeasure of subsets of S (see proposition 2.2.2). If all γj are such optimal paths with LS,a,b(γj) = dS,a,b(γj(0), γj(1))uniformly bounded, then also γ is optimal with LS,a,b(γ) = dS,a,b(γ(0), γ(1)) (see proposition 2.2.6).

• In fact, dS,a,b = L ρ for a Borel measurable path selection ρ : C × C → Θ (see proposition 2.3.5; the topology onthe space Θ of paths will be specified in section 2.3).

• The urban metric dS,a,b is lower semi-continuous and for a <∞ even continuous (see proposition 2.2.3).

1.4 General notation and definitions

Throughout the article, we will use the following notation and definitions.

• I = [0, 1] denotes the unit interval. We will use this notation if I represents the domain of a path.

• Sn−1 denotes the unit sphere.

• C denotes the hypercube [−1, 1]n.

• We write Br(x) for the open Euclidean ball with radius r > 0 and center x ∈ Rn.

• Lk denotes the k-dimensional Lebesgue measure. We write L = L1.

• Hk indicates the k-dimensional Hausdorff measure.

• Let A be a topological space. We write B(A) for the σ-algebra of Borel subsets of A.

• Assume that (Ω,A, µ) is a measure space. We write L1(µ;Rn) for the Lebesgue space of equivalence classes ofA-B(Rn)-measurable functions f : Ω → Rn with

∫

Ω |f | dµ <∞, where two such functions belong to the same classif they coincide µ-almost everywhere. For σ-finite µ this definition corresponds to the quotient of the Lebesguespace L1(µ,R

n) defined in [Fed69, § 2.4.12] by the subspace f | f = 0µ-almost everywhere.

• Let µ : A → X be a map on a σ-algebra A to some set X (e.g., a scalar- or vector-valued measure). For any A ∈ Awe define the restriction µ A : A → X of µ to A by

(µ A)(B) = µ(A ∩B) for all B ∈ A.

8

• A set S ⊂ Rn is said to be countably k-rectifiable (following [Fed69, p. 251]) if it is the countable union of k-rectifiablesets. More precisely,

S =

∞⋃

i=1

fi(Ai),

where Ai ⊂ Rk is bounded and fi : Ai → Rn Lipschitz continuous. If S is countably k-rectifiable and Hk-measurable, then we can apply [Fed69, Lem. 3.2.18] which yields the existence of bi-Lipschitz functions gi : Ci → Swith Ci ⊂ Rk compact, Ti = gi(Ci) pairwise disjoint and

S = T0 ∪∞⋃

i=1

Ti

with Hk(T0) = 0. The sequence

SN =N⋃

i=1

Ti

will be called an approximating sequence for S.

• Mk(A) = F : B(A) → Rk σ-additive denotes the set of Rk-valued Radon measures on a Polish space A. Notethat every F ∈ Mk(A) is automatically regular and of bounded variation (cf. [Els18, p. 343] and [Lan69, XI, 4.5.,Thm. 8]). More specifically, the total variation measure |F| is regular and satisfies |F|(A) < ∞. We indicate theweak-∗ convergence of Radon measures by ∗

. The measure F ∈ Mk(A) is called Hl-diffuse if F(B) = 0 for allB ∈ B(A) with Hl(B) <∞ [Šil08, p. 2].

• For any closed subset A ⊂ Rn we write DMn(A) = F ∈ Mn(A) | div(F) ∈ M1(A), where div denotes thedistributional divergence. These vector-valued Radon measures were termed divergence measure vector fields in[Šil08, p. 2].

• Θ∗k(µ, .) denotes the upper k-dimensional density of a Radon measure µ : B(Rn) → [0,∞) [Sim14, p. 13]. It is forevery x ∈ Rn given by

Θ∗k(µ, x) = lim suprց0

µ(Br(x))

rkωk,

where ωk denotes the volume of the k-dimensional unit ball.

• The pushforward f#µ of a measure µ on X under a measurable map f : X → Y is the measure defined byf#µ(A) = µ(f−1(A)) for all measurable subsets A ⊂ Y .

• pi : A1 × . . .×Ak → Ai abbreviates the projection on the i-th component.

• We write the arc length of a Lipschitz path γ : [t1, t2] → C as len(γ) =∫

[t1,t2]|γ| dL and denote the Lipschitz

constant by Lip(γ) = supt6=t |γ(t)− γ(t)|/|t− t|.

• Γ denotes the set of all Lipschitz paths mapping I onto C. We write Γxy = f ∈ Γ | f(0) = x, f(1) = y for x, y ∈ C.Further, for x, y ∈ C and C > 0 let

ΓC = f ∈ Γ |Lip(f) ≤ C and ΓxyC = f ∈ ΓC | f(0) = x, f(1) = y.

9

• For any Lipschitz path γ : [t1, t2] → C we write md(γ, t0) for the metric differential of γ at t0 ∈ (t1, t2) [Kir94,p. 115], which can be applied to u ∈ R by

md(γ, t0)(u) = limhց0

|γ(t0 + hu)− γ(t0)|

h

if the limit exists. Further, the metric derivative [AGS08, p. 24] of γ at t0 is given by

|γ′|(t0) = limh→0

|γ(t0 + h)− γ(t0)|

|h|

if this limit exists. Note that by Rademacher’s theorem γ(t0) exists for L-almost all t0, and for those t0 we have

|γ(t0)| = |γ′|(t0) = md(γ, t0)(1).

• We will frequently identify the image of a path γ : I → C with its parameterization, i.e., we simply write γ insteadof γ(I) when no confusion is possible (for instance when we integrate over γ(I)).

• The Euclidean distance between two sets A,B ⊂ Rn is denoted

dist(A,B) = infx∈A,y∈B

|x− y|.

We write diam(A) for the diameter of A, i.e.,

diam(A) = supx,y∈A

|x− y|.

Moreover, dH(A,B) denotes the Hausdorff-distance between A and B, given by

dH(A,B) = max

(

supx∈A

dist(x,B), supy∈B

dist(A, y))

,

where we use the notation dist(x,B) = dist(B, x) = dist(x, B).

• For any set A we write 1A for the characteristic function,

1A(x) =

1 if x ∈ A,

0 else.

• For x, y ∈ C we define [x, y] as the line segment x + t(y − x) | t ∈ I. The sets (x, y], [x, y) and (x, y) are definedsimilarly, e.g., (x, y] = [x, y] \ x.

• For any function f : X → V with values in some normed vector space (V, ‖.‖) and A ⊂ X we write

|f |∞,A = supx∈A

‖f(x)‖.

• A sequence x : N →M of elements in some set M will be indicated by the notation (xi) ⊂M with xi = x(i). If xiactually stems from a subset Mi we instead speak of a sequence xi ∈Mi.

• If a sequence of Lipschitz paths (γj) ⊂ Γ converges uniformly to some γ, i.e., |γj − γ|∞,I → 0, we write γj ⇒ γ.

10

2 Wasserstein distance with generalized urban metric as min-cost flow

In this section we prove theorem 1.3.2, a Beckmann-type formula for the Wasserstein distance from definition 1.2.2between two probability measures µ+, µ−. Due to simple domain rescaling arguments we may assume µ+, µ− to besupported in C without loss of generality (cf. [BW18, Lem. 2.4]). Throughout the section we will fix S ⊂ C countably1-rectifiable and Borel measurable as well as a ∈ [0,∞] and b : S → [0, a] lower semi-continuous. Therefore we maydenote the (pseudo-)metric dS,a,b from definition 1.2.1 simply by d (while in section 3 we will return to the notation dS,a,bas (S, b) varies in the urban planning problem). Recall that S can be seen as a transportation network with a frictioncoefficient b describing the necessary effort to move on the network, while motion outside the network is penalized bythe parameter a. To simplify notation, we extend b to C \ S with value a so that we may write

d(x, y) = dS,a,b(x, y) = infγ∈Γxy

∫

γ

b dH1.

A related important quantity is the length of a path, which in contrast to d measures the travel distance with multiplicity.

Definition 2.0.1 (Path length associated with d). Let S, a, b as above. For Lipschitz paths γ : [t1, t2] → C we write thecost for travelling along γ as

L(γ) =

∫

[t1,t2]

b(γ)|γ| dL.

In section 2.1 we will show that L is lower semi-continuous in a certain sense. In section 2.2 we will then prove theintuitive statement d(x, y) = infγ∈Γxy L(γ) and exploit the lower semi-continuity of L to show that there exists a minimizerγopt ∈ Γxy. Additionally, we will prove further properties of d such as lower semi-continuity. A key consequence will bethe existence of a Borel measurable path map ρ with d(x, y) = L(ρ(x, y)) for all x, y ∈ C in section 2.3. Section 2.4 thenprovides the proof of theorem 1.3.2.Throughout the section assumption 1.3.1 will be made frequent use of. It is natural with regard to the urban planningproblem. Indeed, for any pair (S, b) with finite urban planning cost Uε,µ+,µ− [S, b] it holds

∞ > Uε,µ+,µ− [S, b] ≥

∫

S

ε(b) dH1 ≥

∫

Sλ

ε(b) dH1 ≥ ε(λ)H1(Sλ)

for every λ < a, while ε(λ) > 0 due to the relation a = inf ε−1(0).Let us now briefly collect some basic statements which will predominantly be used in section 2.

Lemma 2.0.2 (Lower semi-continuity of Lip). Assume that (γj) ⊂ Γ is a sequence such that γj ⇒ γ : I → C. Then weobtain

Lip(γ) ≤ lim infj

Lip(γj).

Proof. For all t1, t2 ∈ I we have

|γ(t1)− γ(t2)| ≤ |γ(t1)− γj(t1)|+ Lip(γj)|t1 − t2|+ |γj(t2)− γ(t2)|

and thus|γ(t1)− γ(t2)| ≤ |t1 − t2| lim inf

jLip(γj).

The next remark gives a relation between the measures of the image and the preimage of Lipschitz paths.

11

Remark 2.0.3 (H1-measure under Lipschitz paths). For any γ ∈ Γ and L-measurable A ⊂ I we have (cf. [Mat95,Thm. 7.5])

H1(γ(A)) ≤

∫

A

|γ| dL ≤ Lip(γ)L(A).

If γ is injective and has constant speed, then we directly get [Fed69, pp. 241-244]

H1(γ(A)) = Lip(γ)L(A).

The content of the next remark follows directly from [MM73, Thm. 2]. Recall that H0 is the counting measure.

Remark 2.0.4 (Area formula). Let γ ∈ Γ and f : γ → R Borel measurable. For any L-measurable A ⊂ [0, 1] we have

∫

γ(A)

f(x)H0(γ−1(x) ∩ A) dH1(x) =

∫

A

f(γ)|γ| dL

if one of the two sides is well-defined. For H1(γ(A)) = 0 we obtain γ = 0 L-almost everywhere in A. Moreover, forinjective γ we obtain

∫

γ(A)

f(x) dH1(x) =

∫

A

f(γ)|γ| dL, in particular H1(γ(A)) =

∫

A

|γ| dL.

We finally remind the reader of the following compactness result.

Remark 2.0.5 (Arzelà–Ascoli theorem). Let C > 0 and (γj) ⊂ ΓC be a sequence. The γj are uniformly equicontinuousby Rademacher’s theorem,

Lip(γj) = ess supI

|γj | ≤ C,

and they are pointwise bounded. The Arzelà–Ascoli Theorem thus implies γj ⇒ γ up to a subsequence. Additionally, bylemma 2.0.2 we have

Lip(γ) ≤ lim infj

Lip(γj) ≤ C

L-almost everywhere and thus γ ∈ ΓC.

2.1 Properties of the path length L

The following statement will be the main result of this section.

Theorem 2.1.1 (Lower semi-continuity property of L). Let assumption 1.3.1 be satisfied and (γj) ⊂ ΓC be a sequenceof paths with constant speed such that γj ⇒ γ. Then we have

L(γ) ≤ lim infj

L(γj).

We first prove a version of Gołąb’s theorem for images under Lipschitz paths following the proof of [PS13, Thm. 3.3].The next lemma will be helpful.

Lemma 2.1.2. Let J ⊂ I be an interval and γ ∈ Γ. Then for H1-almost all x ∈ γ(J) there exists some δ > 0 and afunction x : [−δ, δ] → γ(J) such that

1. | ˙x| = 1 L-almost everywhere on [−δ, δ],

12

2. ˙x(0) exists and x(0) = x,

3. for all ε > 0 there is some r > 0 such that

|t1 − t2| − rε ≤ |x(t1)− x(t2)|

for all t1, t2 ∈ [−r, r].

Proof. Let η : [0, l] → C be a reparameterization of γ by arc length, thus l = len(γ). By [Kir94, Thm. 2] there exists aL-null set N1 ⊂ [0, l] such that for all t ∈ [0, l] \N1 the metric differential md(η, t) is a seminorm on R and

|η(t1)− η(t2)| − md(η, t)(t1 − t2) = o(|t1 − t|+ |t2 − t|)

for all t1, t2 ∈ [0, l]. Moreover, by Rademacher’s theorem there is some L-null set N2 ⊂ [0, l] such that η exists on[0, l] \N2. N = N1 ∪N2 ∪0, l clearly satisfies H1(η(N)) ≤ Lip(η)L(N) = 0. Fix any x ∈ γ(J) \ η(N) and tx ∈ [0, l] \Nwith η(tx) = x. Choose δ > 0 sufficiently small such that [tx − δ, tx + δ] ⊂ [0, l] and define x : [−δ, δ] → γ(J) byx(t) = η(tx + t). The first two statements follow from the properties of η. Moreover, by the above identity we observe

|x(t1)− x(t2)| − md(x, 0)(t1 − t2) = |η(tx + t1)− η(tx + t2)| − md(η, tx)(t1 − t2) = o(|t1|+ |t2|)

for all t1, t2 ∈ [−δ, δ]. Thus, for every ε > 0 there exists some r > 0 such that

|x(t1)− x(t2)| ≥ md(x, 0)(t1 − t2)−ε

2(|t1|+ |t2|)

for all t1, t2 ∈ [−r, r]. Using md(x, 0)(t1 − t2) = |t1 − t2||η(tx)| = |t1 − t2| we get

|x(t1)− x(t2)| ≥ −ε

2(|t1|+ |t2|) + |t1 − t2| ≥ −εr + |t1 − t2|.

Next, we use the idea in [PS13, Thm. 3.3] to prove a version Gołąb’s theorem for Lipschitz images of finitely manyrelatively open intervals in I (lemma 2.1.3). The result for general sets will then follow immediately by the regularity ofthe Lebesgue measure (proposition 2.1.5).

Lemma 2.1.3 (Version of Gołąb’s theorem). Let (γj) ⊂ ΓC with γj ⇒ γ ∈ ΓC . Further, let O1, . . . , Ok ⊂ I be relativelyopen intervals. Then we have

H1(γ(O1 ∪ . . . ∪Ok)) ≤ lim infj

H1(γj(O1 ∪ . . . ∪Ok)).

Remark 2.1.4 (Straight limit path). If γ maps onto a straight line ℓ = span(p) for some p ∈ Sn−1, then it is easy tosee that

H1(γj(O1 ∪ . . . ∪Ok)) ≥ H1(projℓ(γj(O1 ∪ . . . ∪Ok))) → H1(γ(O1 ∪ . . . ∪Ok)),

where projℓ denotes the orthogonal projection onto ℓ.

Proof of lemma 2.1.3. We follow the proof of [PS13, Thm. 3.3]. It is easy to see that H1(γ(O1 ∪ . . .∪Ok)) = H1(γ(O1 ∪. . . ∪ Ok)). Thus, we can replace O1 ∪ . . . ∪ Ok by a union of closed intervals J = J1 ∪ . . . ∪ Jl ⊂ I. Furthermore,without loss of generality we may assume diam(γ(Ji)) > 0 for i = 1, . . . , l since by ignoring a Ji with diam(γ(Ji)) = 0we only decrease the right-hand side of the inequality to be proved, while the left-hand side stays unchanged. Define asequence of Radon measures by µj(B) = H1(γj(J) ∩ B) for B ∈ B(Rn). Clearly, the µj are uniformly bounded by C,and the Banach–Alaoglu theorem implies µj

∗ µ up to a subsequence. For H1-almost all x ∈ γ(J) we can choose x

13

as in lemma 2.1.2. Fix any such x, and for every ε > 0 let r = r(ε) > 0 as in the third point of lemma 2.1.2. The setCj = γj(J) ∩ Br(x) is compact, and dist(x(t), Cj) ≤ rε for all t ∈ [−r, r] and j sufficiently large due to the uniformconvergence of the γj . Letting r < mini diam(γ(Ji))/2 we have Cj ∩ ∂Br(x) 6= ∅ for large j. We can now apply [PS13,Lem. 3.2] which yields H1(Cj) ≥ 2r − 9rε. We next use the Portmanteau Theorem to get the desired result. We have

µ(Br(x)) ≥ lim supj

µj(Br(x)) = lim supj

H1(Cj) ≥ 2r − 9rε.

By r = r(ε) → 0 for ε→ 0 we thus obtain

Θ∗1(µ, x) = limε→0

µ(Br(x))

2r≥ 1.

This holds for H1-almost every x ∈ γ(J). Hence, we end up with

H1(γ(J)) ≤ µ(γ(J)) ≤ µ(Rn) ≤ lim infj

µj(Rn) = lim inf

jH1(γj(J)),

where we used [Sim14, Ch. 1, Thm. 3.3] in the first inequality.

Proposition 2.1.5 (Gołąb’s theorem for images of Lipschitz paths). Let (γj) ⊂ ΓC with γj ⇒ γ. Then we have

H1(γ(T )) ≤ lim infj

H1(γj(T ))

for all L-measurable sets T ⊂ I.

Proof. Let ε > 0. By the regularity of L we can choose some relatively open set O ⊂ I such that T ⊂ O andL(O \T ) < ε/(2C). Write O as a countable union of relatively open and connected sets Oi ⊂ I and choose N sufficientlylarge such that O = O1 ∪ . . . ∪ON satisfies L(O \ O) < ε/(2C). This procedure is possible due to the σ-continuity of L.By remark 2.0.3 we have

H1(γj(O)) ≤ H1(γj(O)) ≤ H1(γj(T )) +H1(γj(O \ T )) ≤ H1(γj(T )) + Lip(γj)L(O \ T ) ≤ H1(γj(T )) + ε/2

and therefore using lemma 2.1.3 (and again remark 2.0.3)

lim infj

H1(γj(T )) ≥ H1(γ(O))−ε/2 ≥ H1(γ(O))−H1(γ(O\O))−ε/2 ≥ H1(γ(T ))−Lip(γ)L(O\O)−ε/2 ≥ H1(γ(T ))−ε.

The result now follows from the arbitrariness of ε.

We can now prove our main result for this section, the lower semi-continuity property of L .

Proof of theorem 2.1.1. By lemma 2.0.2 we have |γ| ≤ Lip(γ) ≤ lim infj Lip(γj) = lim infj |γj | L-almost everywhere.Furthermore, the lower semi-continuity of b on S and b ≤ a imply b(γ(t)) ≤ lim infj b(γj(t)) for all t ∈ γ−1(S). Thus,with Fatou’s lemma we obtain

∫

γ−1(S)

b(γ)|γ| dL ≤

∫

γ−1(S)

lim infj

b(γj) lim infj

|γj | dL ≤ lim infj

∫

γ−1(S)

b(γj)|γj | dL.

Hence we haveL(γ) ≤ lim inf

j

∫

γ−1(S)

b(γj)|γj | dL+

∫

γ−1(C\S)

b(γ)|γ| dL

14

so that it suffices to show∫

T

b(γ)|γ| dL ≤ lim infj

∫

T

b(γj)|γj | dL

for T = γ−1(C \ S) ∩ γ exists and γ 6= 0. Assume to the contrary that

lim infj

∫

T

b(γj)|γj | dL <

∫

T

b(γ)|γ| dL =

∫

T

a|γ| dL,

where by restricting to a subsequence we may assume the limit inferior to actually be a limit.We first show that in this inequality we may actually replace T with a subset A ⊂ T on which γ is injective. Indeed,if a = ∞ (thus the right-hand side is infinite and the left-hand side finite) we may simply pick A = t ∈ T | γ(t) /∈γ(T ∩ [0, t)) and obtain

lim infj

∫

A

b(γj)|γj | dL ≤ lim infj

∫

T

b(γj)|γj | dL <

∫

T

a|γ| dL = ∞ =

∫

A

a|γ| dL,

where the last equality follows from∫

A |γ| dL > 0 (otherwise 0 =∫

A |γ| dL = H1(γ(A)) = H1(γ(T )) due to γ(A) = γ(T ),which by remark 2.0.4 contradicts γ 6= 0 on T ). If a <∞, on the other hand, the functions fj = b(γj)|γj |1T are essentiallybounded by aC, and thus fj

∗ f ∈ L∞(I) for some subsequence using the Banach–Alaoglu theorem. More precisely,

for each g ∈ L1(I) we have∫

T

b(γj)|γj |g dL →

∫

T

fg dL, in particular∫

T

b(γj)|γj | dL →

∫

T

f dL.

By our assumption, T = t ∈ T | f(t) < a|γ(t)| has positive Lebesgue measure. Likewise, A = t ∈ T | γ(t) /∈ γ(T∩[0, t))has positive Lebesgue measure (again, otherwise 0 =

∫

A|γ| dL = H1(γ(A)) = H1(γ(T )), contradicting γ 6= 0 on T ).

Thereforelimj

∫

A

b(γj)|γj | dL =

∫

A

f dL <

∫

A

a|γ| dL,

as desired.We will now derive a contradiction. Let us set

λ0 = limj

∫

A

b(γj)|γj |dL/

∫

A

|γ| dL.

Since λ0 < a by assumption, we can pick another λ1 ∈ (λ0, a). For all δ > 0 it holds (remark 2.0.4)

λ1

∫

A∩γ−1j (C\Sλ1)

|γj | dL ≤

∫

A

b(γj)|γj | dL < λ0

∫

A

|γ| dL+ δ = λ0H1(γ(A)) + δ

for j large enough. Furthermore, by our version of Gołąb’s theorem (proposition 2.1.5) we have

H1(γ(A)) ≤ lim infj

H1(γj(A)).

Now choose δ, ε > 0 with (λ1 − λ0)H1(γ(A))− δ > ελ1 so that (using remark 2.0.3)

H1(γj(A) ∩ Sλ1) = H1(γj(A))−H1(γj(A) \ Sλ1)

≥ H1(γj(A)) −

∫

A∩γ−1j (C\Sλ1)

|γj | dL ≥ H1(γ(A)) −λ0λ1

H1(γ(A))−δ

λ1> ε

15

for all j large enough. On the other hand, using the regularity of L we find compact sets K ⊂ A and Kj ⊂ K ∩γ−1j (Sλ1)

such thatL(A \K) <

ε

4Cand L((K ∩ γ−1

j (Sλ1)) \Kj) <ε

4C.

We have γj(Kj) ⊂ Sλ1 and γ(K)∩S = ∅ and therefore γj(Kj)∩γ(K) = ∅ for all j. Let dj = dist(γj(Kj), γ(K)). By theuniform convergence of the γj to γ we can pick a subsequence such that dH(γj , γ) < dj−1 for all j. Hence, the γj(Kj)are pairwise disjoint and thus by assumption 1.3.1

∞ > H1(Sλ1) ≥∑

j

H1(γj(Kj)),

which implies H1(γj(Kj)) → 0. Finally, we get

H1(γj(A) ∩ Sλ1) ≤ H1(γj(Kj)) +H1(γj(K \Kj) ∩ Sλ1) +H1(γj(A \K) ∩ Sλ1)

≤ H1(γj(Kj)) + Lip(γj)L((K ∩ γ−1j (Sλ1 )) \Kj) + Lip(γj)L(A \K) ≤ H1(γj(Kj)) + ε/2

and consequentlylim inf

jH1(γj(A) ∩ Sλ1) ≤ ε/2,

which is the desired contradiction.

In the case a = ∞ a simpler proof is actually possible: The main idea is to show that lim infj L(γj) < ∞ impliesH1(γ \ S) = 0 and therefore

L(γ) =

∫

I

b(γ)|γ| dL =

∫

γ−1(S)

b(γ)|γ| dL.

We will then use b(γ) ≤ lim infj b(γj) on S, which is true due to the lower semi-continuity of b on S, to get the desiredresult. To prove that lim infj L(γj) <∞ implies H1(γ \ S) = 0 we need the following two results.

Lemma 2.1.6 (Curves intersect S). Let a = ∞. If (γj) ⊂ Γ is a sequence with L(γj) uniformly bounded, then for eachδ > 0 there exists some λ ∈ [0,∞) such that

H1(γj \ Sλ) ≤ δ for all j.

Proof. For fixed δ > 0 and λ ∈ (0,∞) sufficiently large we have λδ ≥ L(γj) for all j. Additionally, by remark 2.0.4 weget

λH1(γj \ Sλ) ≤

∫

γj\Sλ

b dH1 ≤

∫

γ−1j (C\Sλ)

b(γj)|γj | dL ≤ L(γj) ≤ λδ.

Proposition 2.1.7 (Symmetric difference with limit path). Let assumption 1.3.1 be satisfied, a = ∞ and (γj) ⊂ ΓCwith L(γj) uniformly bounded. If γj ⇒ γ ∈ ΓC , then for any closed interval J ⊂ I we have

H1(γj(J) \ γ(J)) → 0 and H1(γ(J) \ γj(J)) → 0.

Proof. We prove the result for J = I, the general case then simply follows from considering reparameterizations ofγj(J), γ(J) as paths in ΓC . We begin with the first limit, H1(γj \ γ) → 0. Define Aj = t ∈ I | γj(t) /∈ γ. For acontradiction, we assume that H1(γj(Aj)) > δ (along a subsequence) for some δ > 0. The set Aj = γ−1

j (Rn \ γ) is open

16

in I. Hence, there exist closed sets Bj ⊂ Aj such that H1(γj(Aj \Bj)) < δ/2 (using that the arc length of γj is boundedand the σ-continuity of H1) and thus H1(γj(Bj)) > δ/2. By choice of the Bj we have

dj = dist(γj(Bj), γ) > 0.

By γj ⇒ γ we can assume that dH(γj+1, γ) < dj for all j by restricting to a subsequence. Thus, we get γj(Bj)∩γk(Bk) = ∅for j 6= k by construction. Invoking lemma 2.1.6 there exists some λ ∈ [0,∞) such that H1(γj \ Sλ) < δ/4 for all j.Hence, we have

H1(γj(Bj) ∩ Sλ) = H1(γj(Bj))−H1(γj(Bj) \ Sλ) > δ/4.

This yields the desired contradiction,

∞ > H1(Sλ) ≥ H1

(

˙⋃

jγj(Bj) ∩ Sλ

)

=∑

j

H1(γj(Bj) ∩ Sλ) >∑

j

δ/4 = ∞.

As for the second limit, we note

lim supj

H1(γ \ γj) = lim supj

(

H1(γ) +H1(γj \ γ)−H1(γj))

= H1(γ)− lim infj

H1(γj) ≤ 0,

where the inequality holds by Gołąb’s theorem (see for instance [But+09, Thm. 3.2] or our version proposition 2.1.5).

Proposition 2.1.8 (Limit of paths with uniformly bounded costs). Let assumption 1.3.1 be satisfied, a = ∞ and(γj) ⊂ ΓC with L(γj) uniformly bounded. Assume that γj ⇒ γ ∈ ΓC . Then for each δ > 0 there exists a λ ∈ [0,∞) suchthat

H1(γ \ Sλ) ≤ δ.

In particular, we have H1(γ \ S) = 0.

Proof. Given δ > 0, take λ ∈ [0,∞) from lemma 2.1.6, then

H1(γ \ Sλ) ≤ H1(γ \ γj) +H1(γj \ Sλ) ≤ H1(γ \ γj) + δ.

The limit j → ∞ together with proposition 2.1.7 now implies the result.

Note that lemma 2.1.6 and propositions 2.1.7 and 2.1.8 do not hold for a <∞. We can now give an alternative proof oftheorem 2.1.1 for the case a = ∞.

Alternative proof of theorem 2.1.1 for a = ∞. If lim infj L(γj) = ∞, then there is nothing to show. Hence, we canassume lim infj L(γj) < ∞, and it is enough to prove the claim for a subsequence such that lim infj L(γj) = limj L(γj),which means that L(γj) is uniformly bounded. Proposition 2.1.8 implies H1(γ \ S) = 0. We now invoke remark 2.0.4and get γ = 0 L-almost everywhere on γ−1(C \ S). This yields the desired result,

L(γ) =

∫

γ−1(S)

b(γ)|γ| dL ≤

∫

γ−1(S)

lim infj

b(γj) lim infj

|γj | dL ≤ lim infj

∫

γ−1(S)

b(γj)|γj | dL ≤ lim infj

L(γj)

using Fatou’s lemma and the lower semi-continuity of b on S as well as |γ| ≤ lim infj |γj | L-almost everywhere.

17

2.2 Properties of the generalized urban metric

We now derive properties of the generalized urban metric based on the previous analysis of the path length. The followingresult is due to the fact that d(x, y) can be written as an infimum over injective paths.

Lemma 2.2.1 (Alternative formula for d). We have

d(x, y) = infγ∈Γxy

L(γ)

for all x, y ∈ C.

Proof. Clearly, the claim is true for x = y. Furthermore, by remark 2.0.4 for any γ ∈ Γxy we have

d(x, y) ≤

∫

γ

b dH1 ≤

∫

I

b(γ)|γ| dL = L(γ)

so that the claim holds as well for d(x, y) = ∞. Thus, we can assume d(x, y) <∞ and x 6= y. Let γ ∈ Γxy such that∫

γ

b dH1 <∞.

By [Fal86, Lem. 3.1] there exists a continuous injection ψ : I → C such that ψ ⊂ γ and ψ(0) = x, ψ(1) = y. Obviously,the arc length of ψ is bounded by Lip(γ). Hence, we can assume that ψ is Lipschitz continuous. Finally, by the injectivityand remark 2.0.4

L(ψ) =

∫

ψ

b dH1 ≤

∫

γ

b dH1.

The next statement shows that d(x, y) = infγ∈Γxy L(γ) admits a minimizer γxy which satisfies len(γxy) ≤ C1 +C2d(x, y)with constants C1, C2 > 0 that do not depend on x, y. We will need this result in section 2.3 to show the existence of anoptimal measurable path selection.

Proposition 2.2.2 (Existence and arc length of minimizer for d(x, y)). Let assumption 1.3.1 be satisfied. For x, y ∈ Cthe problem d(x, y) = infγ∈Γxy L(γ) has a minimizer if d(x, y) is finite. Moreover, at least one minimizer ψ is injectiveand satisfies

len(ψ) ≤

H1(S1) + d(x, y) if a = ∞,

H1(Sa/2) +2ad(x, y) if a <∞.

Proof. We proceed by the direct method in the calculus of variations. Let (γj) ⊂ Γxy be a sequence with L(γj) ցinfγ∈Γxy L(γ). By [Fal86, Lem. 3.1] we can assume that each γj is injective (this does not increase L(γj)). If a = ∞, then

len(γj) = H1(γj ∩ S1) +H1(γj \ S1) ≤ H1(S1) +

∫

γj\S1

b dH1 ≤ H1(S1) + L(γj).

For the case a <∞ we get

len(γj) = H1(γj ∩ Sa/2) +H1(γj \ Sa/2) ≤ H1(Sa/2) +2

a

∫

γj\Sa/2

b dH1 ≤ H1(Sa/2) +2

aL(γj).

Thus, the lengths of the γj are uniformly bounded by some C = C(a), and we can reparameterize the γj such that eachγj has constant speed at most C. We further have γj ⇒ γ for some subsequence (see remark 2.0.5) and thus, using thelower semi-continuity property of L from theorem 2.1.1,

L(γ) ≤ lim infj

L(γj) = d(x, y),

18

which shows the optimality of γ. By the same argument as above we can choose an appropriate ψ ∈ Γxy which satisfiesthe desired properties. More precisely, we replace γ by an injective path and apply the same estimates as for the γj .

The next result proves that d is lower semi-continuous, which is important to make sure that the corresponding Wasser-stein distance has a minimizer (see proposition 2.4.4 later).

Proposition 2.2.3 (d lower semi-continuous). If a <∞, then d is continuous. If a = ∞, then d is lower semi-continuousunder assumption 1.3.1.

Proof. Let ((xj , yj)) ⊂ C × C be a sequence with (xj , yj) → (x, y) ∈ C × C. For a <∞ the triangle inequality implies

lim supj

d(xj , yj) ≤ lim supj

d(x, y)+a|x−xj |+a|y−yj| = d(x, y) ≤ lim infj

a|xj−x|+d(xj , yj)+a|yj−y| = lim infj

d(xj , yj).

To show the lower semi-continuity for the case a = ∞ we suppose that lim infj d(xj , yj) < ∞ (since otherwise there isnothing to show) and extract a subsequence with lim infj d(xj , yj) = limj d(xj , yj). Using proposition 2.2.2 there existsa sequence ψj ∈ Γxjyj with L(ψj) = d(xj , yj) and len(ψj) uniformly bounded. By remark 2.0.5 we can further supposethat the ψj have constant speed and ψj ⇒ ψ for some ψ ∈ Γxy. Application of theorem 2.1.1 yields

d(x, y) ≤ L(ψ) ≤ lim infj

L(ψj) = lim infj

d(xj , yj).

Clearly, the inequality then also holds for the entire sequence.

As the following two examples illustrate, the conditions are sharp.

Example 2.2.4 (d in general not upper semi-continuous if a = ∞). Let a = ∞. Assume that S is given by a linesegment [x, y] ⊂ C, b ≡ 1 on S and (xj , yj) ∈ C × C is any sequence with (xj , yj) → (x, y) such that xj , yj /∈ [x, y] (seefigure 2a). Then we have

d(x, y) = H1([x, y]) <∞ = lim supj

d(xj , yj).

Thus, d is not upper semi-continuous in (x, y). This property may even be violated if the optimal paths between xj andyj lie entirely on S (see figure 2b): Set

S = [x, y] ∪∞⋃

j=2

[x, xj ],

where xj is a sequence with xj /∈ [x, y] and xj → x. Moreover, suppose that b|[x,y] = 1 and that b is constant on each(x, xj ] with b|(x,xj]H

1((x, xj ]) = j. Then we obtain

d(x, y) = H1([x, y]) <∞ = limj

H1([x, y]) + j = limjd(xj , y).

Example 2.2.5 (d in general not lower semi-continuous without assumption 1.3.1). Assume that a = ∞, n = 2. Letb ≡ 1 on

S =⋃

j

1/j × [0, 1]

(see figure 2c) so that assumption 1.3.1 is violated. We have (xj , yj) = ((1/j, 0), (1/j, 1)) → ((0, 0), (0, 1)) = (x, y), but

d(xj , yj) ≡ 1 <∞ = d(x, y),

because every path between x and y intersects a set of positive H1-measure in C \ S.

19

x

y

xj

yj

(a) (S, b) = ([x, y], 1).

x

y

x2

x3. ..

(b) b|(x,xj ]H1((x, xj ]) = j.

x

y

xj

yj

(c) d(x, y) = ∞ > 1 ≡ d(xj , yj).

Figure 2: Sketches for examples 2.2.4 and 2.2.5.

Our final result in this section is that limit paths of sequences of optimal paths are again optimal. It will imply closednessof a certain subset E ⊂ C×C×paths needed to prove the existence of a measurable path selection later in corollary 2.3.4.

Proposition 2.2.6 (Optimal path limit). Let assumption 1.3.1 be satisfied and ((xj , yj)) ⊂ C × C be a sequence suchthat d(xj , yj) is uniformly bounded. Further, let γj ∈ Γ

xjyjC such that L(γj) = d(xj , yj) (cf. proposition 2.2.2) and assume

that each γj has constant speed. Suppose that γj ⇒ γ ∈ ΓxyC . Then L(γ) = d(x, y).

Proof. Case a <∞: The claim follows directly from theorem 2.1.1 and the triangle inequality,

d(x, y) ≤ L(γ) ≤ lim infj

L(γj) = lim infj

d(xj , yj) ≤ lim infj

d(x, y) + a|x− xj |+ a|y − yj | = d(x, y).

Case a = ∞: We can assume that the arc length of γ is positive (otherwise the optimality is obvious) and lim infj L(γj) =limj L(γj) by restricting to a subsequence. Application of theorem 2.1.1 yields d(x, y) ≤ L(γ) ≤ limj L(γj) <∞. Hence,d(x, y) is finite, and by proposition 2.2.2 there exists γ ∈ Γxy with L(γ) = d(x, y). We assume for a contradiction thatL(γ) < L(γ). Let δ > 0 be arbitrary and pick 0 < α < β < 1 such that

0 < L(γ|[0,α]) < δ/6 and 0 < L(γ|[β,1]) < δ/6.

Furthermore, let j be sufficiently large such that

L(γ|[α,β]) ≤ L(γj|[α,β]) + δ/3 as well as γ([0, α]) ∩ γj([0, α]) 6= ∅ and γ([β, 1]) ∩ γj([β, 1]) 6= ∅,

which is possible by the lower semi-continuity of L from theorem 2.1.1 and by the vanishing symmetric differencebetween γ and the sequence γj due to proposition 2.1.7. Let p ∈ γ([0, α]) ∩ γj([0, α]), q ∈ γ([β, 1]) ∩ γj([β, 1]), andtp ∈ [0, α], tq ∈ [β, 1] such that γj(tp) = p and γj(tq) = q. Then we can estimate (using that γj |[tp,tq ] is an optimal pathwith respect to L connecting p and q):

L(γ) = L(γ|[0,α]) + L(γ|[α,β]) + L(γ|[β,1]) ≤ L(γj |[α,β]) +2δ

3≤ L(γj |[tp,tq ]) +

2δ

3= d(p, q) +

2δ

3.

We further haved(p, q) ≤ d(p, x) + d(x, y) + d(y, q) ≤ L(γ|[0,α]) + L(γ) + L(γ|[β,1]) ≤ L(γ) +

δ

3

and therefore L(γ) ≤ L(γ) + δ. This is in contradiction to L(γ) < L(γ) (δ > 0 was arbitrary).

20

2.3 Existence of measurable optimal path selection

In this section we will prove a selection result: Given any x, y ∈ C we can select a path ρ(x, y) with d(x, y) = L(ρ(x, y))such that the resulting map ρ is Borel measurable. To this end we apply a measurable selection theorem from [BP73].Since L is invariant with respect to curve reparameterization, we first define an equivalence relation on Γ by

γ1 ∼ γ2 if and only if dΘ(γ1, γ2) = 0,

wheredΘ(γ1, γ2) = inf |γ1 − γ2 ϕ|∞,I |ϕ : I → I increasing and bijective

and equivalence classes will be denoted by [·]∼. Then dΘ is a metric [But+09, p. 7] on

Θ = [γ]∼ | γ ∈ Γ.

Remark 2.3.1 (Θ not complete). The space (Θ, dΘ) is separable, which follows from the fact that every continuousfunction I → R can be approximated in the uniform norm by a polynomial with rational coefficients. Unfortunately, it isnot complete. A counterexample is given by the Hilbert curve, which is a space-filling and thus not Lipschitz continuouspath mapping I onto [0, 1]2. While it does not lie in Γ, it can be approximated in dΘ by Lipschitz paths (cf. constructionin [And09, Fig. 2]).

For C > 0 and x, y ∈ C define

ΘC = θ ∈ Θ | len(θ) ≤ C and ΘxyC = θ ∈ ΘC | θ(0) = x, θ(1) = y.

Lemma 2.3.2 (ΘC complete). For all C > 0 and x, y ∈ C the metric spaces ΘC and ΘxyC (equipped with dΘ) arecomplete.

Proof. Let C > 0. It suffices to prove the claim for ΘC . Assume that (θj) ⊂ ΘC is a Cauchy sequence. Let γj ∈ θj bethe sequence of representations with constant speed and therefore (γj) ⊂ ΓC . There exists a subsequence (γj′ ) ⊂ (γj)with γj′ ⇒ γ for some γ ∈ ΓC by remark 2.0.5. Hence, we have dΘ(θj′ , θ) → 0 with γ ∈ θ ∈ ΘC . By the assumptionthat θj is a Cauchy sequence we must have dΘ(θj , θ) → 0 for the whole sequence.

We want to apply the following measurable selection statement to prove the existence of a Borel measurable path selectionρ : C × C → Θ such that d = L ρ. Note that L(θ) is well-defined for any θ ∈ Θ, because every representative of θtraverses in the same way.

Proposition 2.3.3 ([BP73, Thm. 1]). Assume that U and V are separable and complete metric spaces and E ⊂ U × Vis Borel measurable. If for each u ∈ U the section Eu = v ∈ V | (u, v) ∈ E is σ-compact, then the projection of E ontoU , denoted by projU (E), is Borel measurable and there exists a Borel-selection S ⊂ E of E, i.e.,

• S is Borel measurable,

• projU (E) = projU (S),

• there is a Borel measurable function ρ : projU (E) → V which is uniquely defined by

(u, ρ(u)) ∈ S.

21

For the rest of this section we writeTλ = (x, y) ∈ C × C | d(x, y) ≤ λ

for λ ∈ [0,∞). Those sets are closed, because d is lower semi-continuous by proposition 2.2.3. Further, we define

Cλ =

H1(S1) + λ if a = ∞,

H1(Sa/2) +2aλ if a <∞

(1)

for all such λ (cf. proposition 2.2.2). A direct consequence of proposition 2.3.3 is the following statement.

Corollary 2.3.4 (Measurable bounded optimal path map). Let assumption 1.3.1 be satisfied. For each λ ∈ [0,∞) thereexists a Borel measurable map ρλ : Tλ → ΘCλ

such that

ρλ(x, y) ∈ θ ∈ ΘxyCλ|L(θ) = d(x, y) for all (x, y) ∈ Tλ.

Proof. Fix λ ∈ [0,∞) and define complete and separable metric spaces by U = C × C and V = ΘCλ(see lemma 2.3.2).

Furthermore, letE =

((x, y), θ) ∈ U × V∣

∣ (x, y) ∈ Tλ, θ ∈ ΘxyCλ, L(θ) = d(x, y)

.

We show that E is Borel measurable. Actually, we prove that E is closed. Let (((xj , yj), θj)) ⊂ E be a sequence suchthat (xj , yj) → (x, y) in C × C and θj → θ with respect to dΘ. We have d(x, y) ≤ λ by the lower semi-continuityof d (proposition 2.2.3). By len(θj) ≤ Cλ we can represent the θj by paths with constant speed (γj) ⊂ ΓCλ

. Usingremark 2.0.5 we have γj′ ⇒ γ ∈ ΓxyCλ

for some subsequence (γj′) ⊂ (γj). This implies dΘ(γj′ , γ) → 0 which yields γ ∈ θ.Hence, θ is optimal by proposition 2.2.6 on the optimal path limit. This shows that E is closed. Now consider thesection E(x,y) = θ | ((x, y), θ) ∈ E for (x, y) ∈ C × C. We claim that E(x,y) is compact. If d(x, y) > λ, then we haveE(x,y) = ∅. Therefore, we can assume that d(x, y) ≤ λ and pick any sequence (θj) ⊂ E(x,y). Again, the lengths of theθj are uniformly bounded by Cλ and we can represent by paths with constant speed and extract a subsequence whichconverges uniformly to some θ ∈ ΘxyCλ

. By proposition 2.2.6 θ is optimal and thus θ ∈ E(x,y). Hence, E(x,y) is compact.By the statement about minimizers for d (proposition 2.2.2) we obtain projU (E) = Tλ. Finally, we apply the previousmeasurable selection theorem (proposition 2.3.3) to get the desired result.

Using an appropriate partition of C × C we can now show our main result for this section.

Proposition 2.3.5 (Measurable optimal path map). Let assumption 1.3.1 be satisfied. There exists a Borel measurablemap

ρ : C × C → Θ

such that for all (x, y) ∈ C × C we have

L(ρ(x, y)) = d(x, y) and ρ(x, y) ∈ ΘxyCd(x,y)+1, i.e., len(ρ(x, y)) ≤ Cd(x,y)+1.

Proof. For each λ ∈ [0,∞) there is a Borel measurable function ρλ : Tλ → ΘCλby corollary 2.3.4. Clearly, ρλ is Borel

measurable as a function ρλ : Tλ → Θ, because ΘCλis complete (and therefore closed) by lemma 2.3.2. We define a

mapping ρ : d <∞ → Θ. Consider the following partition of d <∞:

P1 = T1 and Pi+1 = Ti+1 \ Ti for i ∈ N.

Then Pi is Borel measurable for all i by the Borel measurability of all Ti. For (x, y) ∈ Pi for some i let

ρ(x, y) = ρi(x, y).

22

We have len(ρi(x, y)) ≤ Ci ≤ Cd(x,y)+1 by definition of Pi and corollary 2.3.4. Furthermore, for B ∈ B(Θ) we get

ρ−1(B) =⋃

i

ρ−1i (B) ∩ Pi,

which is Borel measurable. Finally, ρ can be continued to a Borel measurable function ρ : C × C → Θ. Let θxy ∈ Θ bethe straight line connection from x to y and set

ρ(x, y) =

ρ(x, y) if d(x, y) <∞,

θxy else.

Since ρ equals the Borel measurable ρ on the Borel set d <∞ =⋃

i Pi and ρ is continuous and thus Borel measurableon the complement d = ∞, the map ρ is Borel measurable on all of C × C.

2.4 Wasserstein distance with generalized urban metric as Beckmann problem

In this section we prove theorem 1.3.2. We will use the idea that a mass flux can be seen as a measure on paths, whichis more accurately defined as follows (see [But+09, Def. 2.5]).

Definition 2.4.1 (Mass flux measure). Any measure η : (Θ,B(Θ)) → [0,∞) is called mass flux measure (recall thedefinition of Θ at the beginning of section 2.3). Further, η moves µ+ onto µ− if

µ+(B) = η(θ ∈ Θ | θ(0) ∈ B) and µ−(B) = η(θ ∈ Θ | θ(1) ∈ B)

for all Borel sets B ∈ B(C), thus (µ+, µ−) is the pushforward of η under the map θ 7→ (θ(0), θ(1)).

To translate back and forth between mass flux measures and mass fluxes we will need the following type of measures.

Lemma 2.4.2 (Line integral measure). Let γ ∈ Γ be injective and define the Radon measure Fγ by

〈ϕ,Fγ〉 =

∫

I

ϕ(γ) · γ dL for all ϕ ∈ C(C;Rn).

Then we have |Fγ | = H1 γ or equivalently

〈ϕ, |Fγ |〉 =

∫

I

ϕ(γ)|γ| dL for all ϕ ∈ C(C).

Proof. Without loss of generality we can replace I by [0, len(γ)] and assume that γ is parameterized by arc length. Usingthe assumption that γ is injective and the last formula in remark 2.0.4 we get

(H1 γ)(B) = H1(γ(γ−1(B))) =

∫

γ−1(B)

|γ| dL = L(γ−1(B)) = (γ#(L [0, len(γ)]))(B)

for all B ∈ B(C). Hence, we have H1 γ = γ#(L [0, len(γ)]). Now define vγ(x) = γ(γ−1(x)) ∈ Sn−1 for H1-almost allx ∈ γ. For ϕ ∈ C(C;Rn) we obtain

〈ϕ,Fγ〉 =

∫

[0,len(γ)]

ϕ(γ) · γ dL =

∫

[0,len(γ)]

ϕ(γ) · vγ(γ) dL =

∫

C

ϕ · vγ d(γ#(L [0, len(γ)])) =∫

C

ϕ · vγ dH1 γ.

This yields Fγ = vγH1 γ and therefore |Fγ | = |vγ |H1 γ = H1 γ. In particular, by remark 2.0.4 we get

〈ϕ, |Fγ |〉 =

∫

C

ϕdH1 γ =

∫

[0,len(γ)]

ϕ(γ)|γ| dL

for ϕ ∈ C(C).

23

We can now prove theorem 1.3.2. First, we show that the minimum Beckman cost is no smaller than the Wassersteindistance: From an admissible mass flux for the Beckmann problem we substract the part with vanishing divergenceand represent the remainder by a mass flux measure [Smi93, Thm. C]. We then transform that mass flux measure intoan admissible transport plan with no larger energy (as in [Bra05, Def. 3.4.9]). For the reverse inequality we pick anadmissible optimal transport plan and push forward under the measurable path selection from proposition 2.3.5 to geta mass flux measure, which in turn induces a mass flux.

Proof of theorem 1.3.2. Wd(µ+, µ−) ≤ infξ,F⊥

∫

C b d|ξH1 S + F⊥|: We can assume that the Beckmann problem is finite.

Thus, there exist ξ ∈ L1(H1 S;Rn) and F⊥ ∈ Mn(C) such that F⊥ S = 0, div(ξH1 S + F⊥) = µ+ − µ− and∫

C

b d|ξH1 S + F⊥| <∞.

We use the same idea as in the proof of [BW18, Prop. 4.1] to replace the mass flux ξH1 S+F⊥ by a mass flux measure onΘ. By [Smi93, Thm. C] we have ξH1 S+F⊥ = F+G with div(F) = 0, div(G) = µ+−µ− and |ξH1 S+F⊥| = |F |+ |G|.Hence, we get |G| ≤ |ξH1 S + F⊥| and thus

∫

C

b d|G| ≤

∫

C

b d|ξH1 S + F⊥|.

Again by [Smi93, Thm. C] we get that G can be associated with a mass flux measure η on Θ moving µ+ onto µ−, whichis supported on loop-free paths [Smi93, (1.14)], i.e.,

∫

C

ϕ · dG =

∫

Θ

〈ϕ,Fγ〉dη(γ) =

∫

Θ

∫

I

ϕ(γ) · γ dLdη(γ)

for all ϕ ∈ C(C;Rn), using the Radon measure Fγ from lemma 2.4.2 (note that for simplicity we identify paths γ withtheir equivalence classes [γ]∼ ∈ Θ). By [Smi93, (1.10)] we have

|G|(B) =

∫

Θ

|Fγ |(B) dη(γ) (2)

for all B ∈ B(C). As in [Bra05, Def. 3.4.9] we define a transport plan π ∈ Π(µ+, µ−) (recall definition 1.2.2) by∫

C×C

ϕdπ =

∫

Θ

ϕ(γ(0), γ(1)) dη(γ)

for ϕ ∈ C(C × C). We show∫

d dπ ≤∫

b d|G| to get the desired result. First, we prove∫

C

b d|G| =

∫

Θ

∫

I

b(γ)|γ| dLdη(γ).

Let ϕ ∈ C(C) with ϕ ≥ 0. By construction of the Lebesgue integral there exist simple functions ϕi =∑N(i)j=1 c

ij1Bi

j: C →

[0,∞) with ϕi ր ϕ pointwise, where cij ≥ 0 and Bij ∈ B(C). Using the monotone convergence theorem and the fact that|Fγ |(Bij) = H1(γ ∩Bij) (lemma 2.4.2) we get

∫

C

ϕd|G| = limi

N(i)∑

j=1

cij |G|(Bij) = lim

i

N(i)∑

j=1

cij

∫

Θ

|Fγ |(Bij) dη(γ) = lim

i

N(i)∑

j=1

cij

∫

Θ

∫

I

1Bij(γ)|γ| dLdη(γ)

= limi

∫

Θ

∫

I

ϕi(γ)|γ| dLdη(γ) =

∫

Θ

∫

I

ϕ(γ)|γ| dLdη(γ).

24

The same argumentation shows the formula for general ϕ ∈ C(C) via decomposition into positive and negative part.To show the formula

∫

b d|G| =∫ ∫

b(γ)|γ| dLdη(γ) we now distinguish two cases. For the case a < ∞ let SN be anapproximating sequence for S (recall the definition from section 1.4). The difference Z = S \

⋃

N SN satisfies H1(Z) = 0.

We can thus assume S =⋃

N SN , because the divergence constraint stays satisfied if we neglect the null set Z. We now

introduce lower semi-continuous approximations (recall that SN is closed) of b by

bN =

b on SN ,

a else.

Each bN can be approximated by (Lipschitz) continuous functions fNi : C → [0,∞) with fNi ≤ fNi+1 and fNi → bNpointwise for i → ∞ [San15, Box 1.5], for instance using the Moreau envelope. Therefore, the monotone convergencetheorem yields

∫

C

bN d|G| =

∫

Θ

∫

I

bN (γ)|γ| dLdη(γ)

and thus the desired formula for b using again the monotone convergence theorem as bN ց b pointwise. For the casea = ∞ we cannot apply monotone convergence, because the above integrals may not be finite and bN is decreasing inN . Due to

∫

b d|G| <∞ we must have G = ξH1 S. The function b is lower semi-continuous on S, and thus there existLipschitz functions bj : S → [0,∞) with bj ր b pointwise on S (again, see e.g. [San15, Box 1.5]). We can now continuouslyextend each bj to C [Sim14, Ch. 2, Thm. 1.2]. Further, we have |G|(C \ S) = 0 and thus |Fγ |(C \ S) = 0 for η-almost allγ ∈ Θ by equation (2) (again we identify paths with their equivalence classes in Θ). Hence we obtain H1(γ \ S) = 0 forη-almost all γ ∈ Θ using lemma 2.4.2. By monotone convergence and the formula

∫

ϕd|G| =∫ ∫

ϕ(γ)|γ| dLdη(γ) forcontinuous ϕ we get

∫

C

b d|G| =

∫

S

b d|G| = limj

∫

S

bj d|G| = limj

∫

C

bj d|G| = limj

∫

Θ

∫

I

bj(γ)|γ| dLdη(γ)

= limj

∫

Θ

∫

I

1S(γ)bj(γ)|γ| dLdη(γ) =

∫

Θ

∫

I

1S(γ)b(γ)|γ| dLdη(γ) =

∫

Θ

∫

I


We can now finish the proof. By proposition 2.2.3 d is lower semi-continuous. Thus, again using Lipschitz approximationsand the monotone convergence theorem, we obtain

∫

C×C

d(x, y) dπ(x, y) =

∫

Θ

d(γ(0), γ(1)) dη(γ) ≤

∫

Θ

∫

I

b(γ)|γ| dLdη(γ) =

∫

C

b d|G|.

Wd(µ+, µ−) ≥ infξ,F⊥

∫

C b d|ξH1 S + F⊥|: Assume that there exists a transport plan π ∈ Π(µ+, µ−) such that

∫

C×C

d(x, y) dπ(x, y) <∞.

We consider the push-forward η of π under the Borel measurable function ρ : C × C → Θ from proposition 2.3.5 whichmaps onto optimal paths with respect to d. We get

∫

C×C

d(x, y) dπ(x, y) =

∫

C×C

L(ρ(x, y)) dπ(x, y) =

∫

Θ

L(γ) dη(γ) =

∫

Θ

∫

I


This motivates the definition of the functional F via

〈ϕ,F〉 =

∫

Θ

∫

I

ϕ(γ) · γ dLdη(γ) for all ϕ ∈ C(C;Rn).

25

Clearly, F is linear. Its continuity follows from proposition 2.3.5 and the definition of Cλ (equation (1) in section 2.3),

|〈ϕ,F〉| ≤ |ϕ|∞,C

∫

Θ

∫

I

|γ| dLdη(γ) = |ϕ|∞,C

∫

C×C

len(ρ(x, y)) dπ(x, y) ≤ |ϕ|∞,C

∫

C×C

Cd(x,y)+1 dπ(x, y)

= |ϕ|∞,C ·

∫

C×Cd(x, y) dπ(x, y) +H1(S1) + 1 if a = ∞,

2a

∫

C×C d(x, y) dπ(x, y) +H1(Sa/2) +2a if a <∞.

Note that the constant on the right-hand side is finite due to assumption 1.3.1 and the choice of π. In particular, F is aRadon measure and moreover a mass flux between µ+ and µ−, since for all ϕ ∈ C1(C) we have

〈ϕ, div(F)〉 = −

∫

C

∇ϕ · dF = −

∫

Θ

∫

I

(∇ϕ)(γ) · γ dLdη(γ) = −

∫

Θ

∫

I

d

dt(ϕ γ) dLdη(γ)

=

∫

Θ

[ϕ(γ(0))− ϕ(γ(1))] d(ρ#π)(γ) =

∫

C

ϕd[(p1)#π − (p2)#π] = 〈ϕ, µ+ − µ−〉

using the fundamental theorem of calculus in the fourth equality. We now show∫

Θ

∫

I

ϕ(γ)|γ| dLdη(γ) ≥

∫

C

ϕd|F|

for all ϕ ∈ C(C) with ϕ ≥ 0. Let f denote the Radon–Nikodym derivative of F ∈ Mn(C) with repect to |F|, i.e.,∫

C

ϕ · dF =

∫

C

ϕ · f d|F|

for all ϕ ∈ C(C;Rn). By [San15, Box 4.2] we have |f | = 1 |F|-almost everywhere on C. Additionally, using f ∈ L1(|F|;Rn)there exists1 a sequence (fk) ⊂ C(C;Rn) such that

∫

C

|f − fk| d|F| → 0.

By restricting to a subsequence we have fk → f pointwise |F|-almost everywhere on C. Further, we can suppose that|fk| ≤ 1, because the continuous functions

fk =

fk if |fk| ≤ 1,1

|fk|fk else

are better approximations of f (|F|-almost everywhere we have f ∈ Sn−1). Hence we get∫

Θ

∫

I


∫

Θ

∫

I

ϕ(γ)|fk(γ)||γ| dLdη(γ) ≥

∫

Θ

∫

I

(ϕfk)(γ) · γ dLdη(γ) =

∫

C

ϕfk · dF =

∫

C

ϕfk · f d|F|

for all ϕ ∈ C(C) with ϕ ≥ 0. Moreover, |F|-almost everywhere on C we have ϕfk · f → ϕf2 = ϕ as well as |ϕfk · f | ≤ ϕ.Thus, by Lebesgue’s dominated convergence theorem we obtain

∫

Θ

∫

I


∫

C

ϕd|F|.

1It is straightforwad to prove this using the regularity of |F| and Lusin’s theorem (which is applicable due to [Fel81, Theorem & Remark]).

26

...µ+ = δx µ− = δy

Figure 3: Sketch for example 2.4.3. If a > b ≡ const. > 0 on S and H1(S) = ∞, then it mayhappen that there does not exist an optimal mass flux for the Beckmann problem.

By [Šil08, Thm. 3.1] we get F = ξH1 S + F⊥ for some ξ ∈ L1(H1 S;Rn) and F⊥ ∈ Mn(C) with F⊥ S = 0. Thesame argument as in the first half of the proof (assuming that S is the set-theoretic limit of an approximating sequenceSN , defining the bN , exploiting monotone convergence et cetera) shows

∫

Θ

∫

I

b(γ)|γ| dLdη(γ) ≥

∫

C

b d|F|.

Hence we get∫

C×C

d(x, y) dπ(x, y) =

∫

Θ

∫

I

b(γ)|γ| dLdη(γ) ≥

∫

C

b d|F|.

We close this section with a brief discussion of existence of minimizers for the Wasserstein and the Beckmann problem.First note that without assumption 1.3.1 an optimal mass flux for the Beckmann problem may not exist as the nextexample shows.

Example 2.4.3 (Non-existence of optimal mass flux). Let n = 2 and set x = (0, 0), y = (1, 0). Assume that b ≡ 1 on

S =⋃

j

([x, zj ] ∪ [zj, y]),

where zj = (1/2, 1/j) (see figure 3). Clearly, assumption 1.3.1 is not satisfied. Note that [x, y] ∩ S = x, y. Ifµ+ = δx and µ− = δy, then an optimal transport plan for the Wasserstein distance is clearly given by π = δ(x,y). Wehave d(x, y) = infΓxy L = 1, where a sequence of minimizing paths is given by the injective paths γj that parameterize[x, zj ] ∪ [zj, y]. The γj induce a minimizing sequence Fj = ξjH1 ([x, zj ] ∪ [zj, y]) for the Beckmann problem. Morespecifically, we have

ξj =

(zj − x)/|zj − x| H1-a.e. on [x, zj ],

(y − zj)/|y − zj | H1-a.e. on [zj, y].

Nevertheless, for all a > 1 = b there does not exist an optimal mass flux for the Beckmann problem. In other words,there is no optimal path between x and y with respect to d.

We used assumption 1.3.1 to prove that d is lower semi-continuous for a = ∞ in proposition 2.2.3. From this propertywe get the existence of optimal transport plans ([San15, Thm. 1.5]).

27

Proposition 2.4.4 (Existence of optimal transport plan). Let assumption 1.3.1 be satisfied or a <∞. Then there existsan optimal transport plan π ∈ Π(µ+, µ−) such that

Wd(µ+, µ−) =

∫

C×C

d(x, y) dπ(x, y).

Since in the proof of theorem 1.3.2, from any transport plan we constructed a mass flux with no larger cost, thisimmediately implies the existence of an optimal mass flux for the Beckmann problem.

Corollary 2.4.5 (Existence of optimal mass flux). Let assumption 1.3.1 be satisfied. Then there exists an optimal massflux F = ξH1 S+F⊥ with ξ ∈ L1(H1 S;Rn) and F⊥ ∈ Mn(C) such that F⊥ S = 0 and div(ξH1 S+F⊥) = µ+−µ−

as well as

Wd(µ+, µ−) =

∫

S

b|ξ| dH1 + a|F⊥|(C).

3 Bilevel formulation of the branched transport problem

Let µ+, µ− be given probability measures on B(C). In this section we will prove theorem 1.3.4: The branched transportproblem (definitions 1.1.3 and 1.1.4) of finding an optimal mass flux from µ+ to µ− with respect to a (concave) trans-portation cost τ can be equivalently written as a generalized version of the urban planning problem (definitions 1.2.3and 1.2.4). We briefly recapitulate the setting from section 1. In the urban planning problem one optimizes over count-ably 1-rectifiable and Borel measurable networks S ⊂ C and lower semi-continuous friction coefficients b : S → [0,∞)representing a street or pipe network,

infS,b

Uc,µ+,µ− [S, b] = infS,b

WdS,a,b(µ+, µ−) +

∫

S

c(b) dH1.

The optimization depends on a fixed, decreasing maintenance cost c : R → [0,∞], and the cost for motion outside thenetwork is defined by a = inf c−1(0). In the branched transport problem on the other hand the transportation cost isa concave function τ : [0,∞) → [0,∞) with τ(0) = 0, and one looks for an optimal mass flux F ∈ DMn(Rn) withdiv(F) = µ+ − µ−,

infF

J τ,µ+,µ− [F ],

where J τ,µ+,µ− is defined via relaxation of a discrete energy. We extend τ to a function R → [−∞,∞) by settingτ(m) = −∞ for m < 0. Moreover, we set

τ ′(0) = limmց0

τ(m)

m∈ [0,∞].

We use the convex conjugate of −τ to define a maintenance cost for our generalized urban planning problem which willbe shown to be equivalent to the branched transport problem for τ ,

ε(v) = (−τ)∗(−v) = supm≥0

τ(m)−mv for v ∈ R.

We observe that by definitionτ ′(0) = inf ε−1(0) = a.

The actual statement of theorem 1.3.4 to be shown in this section is

infF

J τ,µ+,µ− [F ] = infS,b

Uε,µ+,µ− [S, b].

28

In section 3.1 we will formulate an appropriate version of the branched transport problem that will later naturallylead to our Beckmann formulation of the Wasserstein distance from theorem 1.3.2. In section 3.2 we then establishthe equivalence between the branched transport and the urban planning problem, and we discuss the relation betweenminimizers of each.

3.1 Version of the branched transport problem

In this section we introduce the reformulation of the branched transport problem from [BW18, Prop. 2.3] as a generalizedGilbert energy in order to prepare the equivalence proof in section 3.2. Further, we highlight some properties of thevariables appearing in the reformulation. We first note that it suffices to concentrate on mass fluxes with support in C(in fact, one may replace C by the convex hull of supp(µ+) ∪ supp(µ−), see [BCM09, Lem. 5.15]).

Lemma 3.1.1 ([BW18, Def. 2.2 & Lem. 2.4]). We have

infF∈DMn(Rn)

J τ,µ+,µ− [F ] = infF∈DMn(C)

J τ,µ+,µ− [F ].

We will in this section work with the following expression for the branched transport cost.

Proposition 3.1.2 ([BW18, Prop. 2.32 and its proof]). Every F ∈ DMn(C) satisfies J τ,µ+,µ− [F ] <∞ if and only if

• div(F) = µ+ − µ−,

• F = ξH1 S + F⊥ with countably 1-rectifiable S ⊂ C, ξ : S → Rn H1 S-measurable (tangent to S H1-almosteverywhere) and F⊥ singular with respect to H1 R for any countably 1-rectifiable set R ⊂ C.

Assume that J τ,µ+,µ− [F ] <∞ and F = ξH1 S + F⊥ as above. Then the branched transport cost of F is given by

J τ,µ+,µ− [F ] =

∫

S

τ(|ξ|) dH1 + τ ′(0)|F⊥|(C).

Moreover, we can always choose S = Θ∗1(|F|, .) > 0 and F⊥ = F (C \ S).

Remark 3.1.3 (S is Borel). The set S = Θ∗1(|F|, .) > 0 is Borel measurable by [Edg95, Prop. 1.1].

Example 3.1.4 (S not closed in general). For polyhedral mass fluxes (that are supported on finitely many line segments)the set S can clearly be chosen to be closed. In general this is not the case. A simple example is given by τ(m) = m,µ+ =

∑

x∈Q∩[0,1] ϕ(x)δ(x,0) and µ− =∑

x∈Q∩[0,1] ϕ(x)δ(x,1), where ϕ : Q ∩ [0, 1] → [0, 1] satisfies∑

x ϕ(x) = 1 (see

figure 4).

Using [Šil08, Thm. 3.1] it is easy to see that the following properties of ξ and F⊥ hold true.

Corollary 3.1.5 (Integrability of mass density and property of diffuse part). Assume that F ∈ DMn(C) satisfiesJ τ,µ+,µ− [F ] <∞ and write F = ξH1 S +F⊥ as in proposition 3.1.2. Then the function ξ is integrable with respect toH1 S and F⊥ S = 0. Those properties are independent of the triple (ξ, S,F⊥).

Proof. By [Šil08, Thm. 3.1] every F ∈ DMn(C) can be written as F = ϑH1 M + G + ψLn C with M ⊂ C countably1-rectifiable, ϑ ∈ L1(H1 M ;Rn) tangent to M H1-almost everywhere, G ∈ Mn(C) H1-diffuse and Ln-singular as wellas ψ ∈ L1(Ln C;Rn). If J τ,µ+,µ− [F ] <∞, we have

F = ξH1 S + F⊥ = ϑH1 M + G + ψLn C

29

ϕ(x)

S = (Q ∩ [0, 1])× [0, 1]x

Figure 4: Sketch for example 3.1.4.

with S, ξ,F⊥ as in proposition 3.1.2. If n > 1, we get (G+ψLn C) S = 0. Using this and F⊥ ⊥ H1 S (proposition 3.1.2)we obtain F⊥ S = 0 and ξH1 S = ϑH1 (M ∩ S), which yields ξ ∈ L1(H1 S;Rn). For the case n = 1 we can writeξH1 S + F⊥ = ζH1 C with the (H1 C)-integrable function

ζ =

ϑ+ ψ on M ,

ψ on C \M .

The same argument as for the case n > 1 then yields ξ ∈ L1(H1 S;Rn) and F⊥ S = 0.

For the next result we will need the notion of irrigation patterns (see [MSM03; BCM05; MS13]), which are an alternativeway to describe mass fluxes involving a time dependency. Broadly speaking, a mass flux between µ+ and µ− can beseen as a superposition of particle trajectories. The so-called standard space ([0, 1],B([0, 1]),L [0, 1]) can be used as aparameterization of all particles [Roy88, Ch. 15, Thm. 16].

Definition 3.1.6 (Reference space, irrigation pattern between µ+, µ−, total mass flux through x). We define a mapχ : [0, 1]× I → C such that χ(p, t) describes the position of particle p at time t.

• The reference space for particles is the measure space ([0, 1],B([0, 1]),L [0, 1]).

• An irrigation pattern is a Borel measurable map χ : [0, 1]× I → C such that χ(p, .) is absolutely continuous forL-almost all p ∈ [0, 1].

Let χ be an irrigation pattern.

• We say that χ is an irrigation pattern between the probability measures µ+ and µ− if and only if

µ+(B) = L(p ∈ [0, 1] |χ(p, 0) ∈ B) and µ−(B) = L(p ∈ [0, 1] |χ(p, 1) ∈ B)

for all B ∈ B(C).

• For x ∈ C we set [x]χ = p ∈ [0, 1] |x ∈ χ(p, I). The total mass flux through x is defined by mχ(x) = L([x]χ).

The following summarizing statement will be used in section 3.2 to explicitly construct a lower semi-continuous frictioncoefficient b : S → [0,∞) based on x 7→ |ξ(x)|. Note that the first two points follow directly from proposition 3.1.2,remark 3.1.3, and corollary 3.1.5.

30

Corollary 3.1.7 (Other properties of mass density). Assume that there is a mass flux G ∈ DMn(C) with J τ,µ+,µ− [G] <∞. Then there exists some F ∈ DMn(C) with J τ,µ+,µ− [F ] ≤ J τ,µ+,µ− [G] and decomposition F = ξH1 S + F⊥ as inproposition 3.1.2 such that

• S is countably 1-rectifiable and Borel measurable,

• ξ ∈ L1(H1 S;Rn) and F⊥ S = 0,

• we can choose a representative of ξ such that |ξ| is bounded by the total mass µ+(C) = 1 on S, S ∋ x 7→ |ξ(x)| isupper semi-continuous, |ξ| ≥ m = x ∈ S | |ξ(x)| ≥ m is closed in C and H1(|ξ| ≥ m) <∞ for every m > 0,

• we can write S =⋃

m>0|ξ| ≥ m.

Proof. By proposition 3.1.2, remark 3.1.3, and corollary 3.1.5 we can write G = ξGH1 S + G⊥ with S countably 1-

rectifiable and Borel measurable, ξG ∈ L1(H1 S;Rn) and G⊥ ∈ Mn(C) with G⊥ S = 0. As in the proof of the firstinequality of theorem 1.3.2 in section 2.4 we use the idea in the proof of [BW18, Prop. 4.1]; we briefly recapitulate thesteps: By [Smi93, Thm. C] we have G = F0 + F , where div(F0) = 0 and F can be decomposed into simple orientedcurves of finite length, i.e., into measures of type Fγ from lemma 2.4.2 (see also [Smi93, Exm. 1]). The measure F canbe decomposed as F = ξH1 S +F⊥ ∈ DMn(C) with |ξ| ≤ |ξG | and |F⊥|(C) ≤ |G⊥|(C) and satisfies div(F) = µ+ − µ−

as well as J τ,µ+,µ− [F ] ≤ J τ,µ+,µ− [G]. Furthermore, F can be associated with a mass flux measure η on Θ moving µ+

onto µ− (recall definition 2.4.1) by [Smi93, Thm. C]. More precisely, we have∫

C

ϕ · dF =

∫

Θ

∫

I

ϕ(γ(t)) · γ(t) dL(t) dη(γ) for all ϕ ∈ C(C;Rn) and∫

C

ψ d|F| =

∫

Θ

∫

I

ψ(γ(t))| ˙γ(t)| dL(t) dη(γ) for all ψ ∈ C(C).

Using Skorohod’s theorem [Bil99, Thm. 6.7] there is an irrigation pattern χF between µ+ and µ− which induces η,∫

C

ϕ · dF =

∫

[0,1]

∫

I

ϕ(χF (p, t)) · χF (p, t) dL(t) dL(p) for all ϕ ∈ C(C;Rn) and

∫

C

ψ d|F| =

∫

[0,1]

∫

I

ψ(χF (p, t))|χF (p, t)| dL(t) dL(p) for all ψ ∈ C(C).

Here χF denotes the derivative with respect to the second argument (which exists L I-almost everywhere). By [BW18,Prop. 4.2] we may assume S = mχF

> 0. Additionally, mχF(x) = |ξ(x)| for H1-almost every x ∈ S by the proof of

[BW18, Prop. 4.1]. This shows the desired formula for S by changing ξ such that mχF= |ξ| on S. More specifically,

represent ξ by

ξ =

ξ on |ξ| = mχF,

(mχF, 0, . . . , 0)T on |ξ| 6= mχF

.

Now fix m > 0 and let (xi) ⊂ |ξ| ≥ m be a sequence with xi → x ∈ C. By [BCM09, Lem. 3.25] the functionC ∋ y 7→ mχF

(y) is upper semi-continuous. This implies mχF(x) ≥ lim supimχF

(xi) = lim supi |ξ(xi)| ≥ m. Inparticular, we have mχF

(x) > 0 and thus x ∈ S, which implies |ξ(x)| = mχF(x) ≥ m. This proves the closedness of

|ξ| ≥ m. Moreover, |ξ| is upper semi-continuous by |ξ| = mχFon S. The boundedness of |ξ| follows from

|ξ| = mχF= L([.]χF

) ≤ 1.

31

Finally, we have

∞ > J τ,µ+,µ− [F ] =

∫

S

τ(|ξ|) dH1 + τ ′(0)|F⊥|(C) ≥

∫

|ξ|≥m

τ(|ξ|) dH1 ≥

∫

|ξ|≥m

τ(m) dH1 = τ(m)H1(|ξ| ≥ m)

and thus H1(|ξ| ≥ m) <∞ for all m > 0.

We end this subsection by reformulating the branched transport problem such that the variables to be optimized are asin the setting of the Beckmann problem from theorem 1.3.2.

Lemma 3.1.8 (Version of the branched transport problem). The branched transport problem can be written as

infF∈DMn(C)

J τ,µ+,µ− [F ] = infS,ξ,F⊥

∫

S

τ(|ξ|) dH1 + τ ′(0)|F⊥|(C)

with S ⊂ C countably 1-rectifiable and Borel measurable, ξ ∈ L1(H1 S;Rn) and F⊥ ∈ Mn(C) with F⊥ S = 0 anddiv(ξH1 S + F⊥) = µ+ − µ−.

Proof. By lemma 3.1.1, proposition 3.1.2, corollary 3.1.5, and remark 3.1.3 the right-hand side is automatically smallerthan or equal to the left-hand side. For the reverse inequality, let S, ξ,F⊥ satisfy the stated properties. We can assumethat div(ξH1 S +F⊥) = µ+ − µ− (otherwise the inequality is obvious). By [Šil08, Thm. 3.1] we have ξH1 S +F⊥ =ϑH1 M + G with M countably 1-rectifiable, ϑ ∈ L1(H1 M ;Rn) tangent to M H1 M -almost everywhere and GH1-diffuse. This is an admissible decomposition in the sense of proposition 3.1.2. We have ξH1 S = ϑH1 (M ∩ S)and ξ = ϑ H1-almost everywhere on M ∩ S as well as ξ = 0 H1-almost everywhere on S \M . Additionally, we getF⊥ = F⊥ (C \ S) = ϑH1 (M \ S) + G (C \ S). To conclude, we estimate

J τ,µ+,µ− [ϑH1 M + G] =

∫

M

τ(|ϑ|) dH1 + τ ′(0)|G|(C) =

∫

M∩S

τ(|ξ|) dH1 +

∫

M\S

τ(|ϑ|) dH1 + τ ′(0)|G|(C)

≤

∫

M∩S

τ(|ξ|) dH1 + τ ′(0)

∫

M\S

|ϑ| dH1 + τ ′(0)|G|(C)

=

∫

S

τ(|ξ|) dH1 + τ ′(0)|ϑH1 (M \ S) + G|(C)

=

∫

S

τ(|ξ|) dH1 + τ ′(0)|F⊥|(C).

3.2 Branched transport problem as generalized urban planning problem

In this section we finally prove theorem 1.3.4, essentially by constructing a minimizer for one problem from one of theother. We will also discuss a few examples illustrating the relation between the minimizers. Throughout we assume that(τ, µ+, µ−) is a triple for the branched transport problem and the corresponding maintenance cost ε is defined via τ asin definition 1.3.3. Let τ+(0) = limmց0 τ(m). We define

τ (m) =

τ(m) if m ≤ 0,

τ(m)− τ+(0) elsefor m ∈ R and ε(v) = (−τ )∗(−v) for v ∈ R.

We will need the following properties of τ .

32

Lemma 3.2.1 (Properties of τ ). The transportation cost τ is right-continuous in 0. Furthermore, we have ε = ε− τ+(0)and

∫

S

τ(|ξ|) dH1 =

∫

S

τ(|ξ|) dH1 + τ+(0)H1(|ξ| > 0)

for all ξ ∈ L1(H1 S;Rn).

Proof. The first property follows by definition. Let v ∈ R and assume that ε(v) > 0. We have

ε(v) = supm>0

τ (m)−mv = supm>0

τ(m)− τ+(0)−mv = supm∈R

τ(m)−mv − τ+(0) = ε(v)− τ+(0).

If ε(v) = 0, then we get v ≥ τ ′(0) and therefore ε(v) = τ+(0). Finally, we observe that∫

S

τ(|ξ|) dH1 =

∫

|ξ|>0

τ(|ξ|) dH1 =

∫

|ξ|>0

τ (|ξ|) + τ+(0) dH1 =

∫

|ξ|>0

τ (|ξ|) dH1 + τ+(0)H1(|ξ| > 0).

For the inequality inf J τ,µ+,µ− ≥ inf Uε,µ+,µ− we will use the following standard composition property which we statewithout proof.

Lemma 3.2.2 (Composition of semi-continuous functions). Let (X, dX) be a metric space and A ⊂ R. If g : X → A isupper semi-continuous and f : A→ R lower semi-continuous and decreasing, then f g is lower semi-continuous.

Proof of theorem 1.3.4. inf J τ,µ+,µ− ≤ inf Uε,µ+,µ− : Assume that

Uε,µ+,µ− [S, b] <∞

for some admissible pair (S, b). As discussed in the introduction of section 2, this automatically implies the validity ofassumption 1.3.1. Thus by corollary 2.4.5 we have

WdS,τ′(0),b(µ+, µ−) =

∫

S

b|ξ| dH1 + τ ′(0)|F⊥|(C)

for some ξ ∈ L1(H1 S;Rn) and F⊥ ∈ Mn(C) with F⊥ S = 0. We show that J τ,µ+,µ− [F ] ≤ Uε,µ+,µ− [S, b] forF = ξH1 S + F⊥. If τ is right-continuous in 0, then −τ is lower semi-continuous and convex and thus equals itsbiconjugate [Rin18, Prop. 2.28]. This yields

τ(m) = −(−τ)∗∗(m) = −

(

supv∈R

vm− (−τ)∗(v)

)

= −

(

supv∈R

−vm− ε(v)

)

= −ε∗(−m)

and thus

Uε,µ+,µ− [S, b] =

∫

S

b|ξ| dH1 + τ ′(0)|F⊥|(C) +

∫

S

ε(b) dH1 ≥

∫

S

infv∈R

|ξ|v + ε(v) dH1 + τ ′(0)|F⊥|(C)

= −

∫

S

ε∗(−|ξ|) dH1 + τ ′(0)|F⊥|(C) =

∫

S

τ(|ξ|) dH1 + τ ′(0)|F⊥|(C) = J τ,µ+,µ− [F ]

by proposition 3.1.2. If τ is not right-continuous, then we have τ ′(0) = ∞ and therefore F⊥ = 0. Moreover, usinglemma 3.2.1 and the previous estimate we obtain

Uε,µ+,µ− [S, b] =

∫

S

b|ξ| dH1 +

∫

S

ε(b) dH1 + τ+(0)H1(S) ≥

∫

S

τ (|ξ|) dH1 + τ+(0)H1(S)

≥

∫

S

τ (|ξ|) dH1 + τ+(0)H1(|ξ| > 0) =

∫

S

τ(|ξ|) dH1 = J τ,µ+,µ− [F ].

33

inf J τ,µ+,µ− ≥ inf Uε,µ+,µ− : Assume that there exists some G ∈ DMn(C) with J τ,µ+,µ− [G] <∞. Let F = ξH1 S+F⊥

be as in corollary 3.1.7 (F can be constructed by removing divergence-free parts of G). The function S ∋ x 7→ g(x) = |ξ(x)|is upper semi-continuous by corollary 3.1.7. Thus, |ξ| = 0 ⊂ S is Borel measurable and we can assume without loss ofgenerality that S = |ξ| > 0. For m > 0 we define

f(m) = −max(∂(−τ)(m)),

which is well-defined, because the subdifferential is closed. Furthermore, f is decreasing and lower semi-continuous on(0,∞) by construction. Using lemma 3.2.2 the function b : S → [0,∞) defined by

b(x) = f(g(x)) = −max(∂(−τ)(|ξ(x)|))

is lower semi-continuous on S. Additionally, we have ε(b) = τ(|ξ|) − |ξ|b by definition. Therefore, by proposition 3.1.2we get

∞ > J τ,µ+,µ− [G] ≥ J τ,µ+,µ− [F ] =

∫

S

τ(|ξ|) dH1 + τ ′(0)|F⊥|(C) =

∫

S

b|ξ| dH1 + τ ′(0)|F⊥|(C) +

∫

S

ε(b) dH1.

Thus we must have∫

Sε(b) dH1 < ∞ which shows that assumption 1.3.1 is satisfied. Hence we can apply theorem 1.3.2

and continue the estimation,

J τ,µ+,µ− [G] ≥

∫

S

b|ξ| dH1 + τ ′(0)|F⊥|(C) +

∫

S

ε(b) dH1 ≥WdS,τ′(0),b(µ+, µ−) +

∫

S

ε(b) dH1 = Uε,µ+,µ− [S, b].

In the remainder of the section we discuss a few consequences of the proof. In [BW18, Thm. 2.10] it is shown that thegeneralized branched transport problem either has a minimizer or is infeasible, i.e., there is no mass flux of finite energytransporting µ+ to µ−. (Note that under additional growth conditions on τ near 0 one can always obtain existence of aminimizer independent of µ+ and µ− [BW18, Cor. 2.20].) Since in the proof of theorem 1.3.4 we constructed from eachfeasible candidate for one problem a feasible candidate for the other, this immediately implies the following.

Corollary 3.2.3 (Existence of optimizers). The generalized branched transport problem and the associated generalizedurban planning problem either both admit a minimizer or are both infeasible.

While for τ ′(0) = ∞ the optimal mass flux of the generalized branched transport problem is known to be rectifiable[Whi99, Thm. 7.1], one gets an even stronger result if τ is not right-continuous in 0.

Remark 3.2.4 (Finite network length). If τ is not right-continuous in 0, then any street network (S, b) with finite urbanplanning cost satisfies H1(S) <∞. Indeed, from definition 1.3.3 of ε we obtain ε ≥ τ+(0) so that

∞ > Uε,µ+,µ− [S, b] ≥

∫

S

ε(b) dH1 ≥ τ+(0)H1(S).

Finally, let us briefly discuss and illustrate the relation between minimizers of the branched transport and the urbanplanning problem. A natural question is whether they are in one-to-one correspondence. However, this is not to beexpected for the following reason. In our equivalence proof, the central step to switch between the mass flux and thefriction coefficient as variables was the relation

τ(|ξ|) = b|ξ|+ ε(b),

which we exploited to construct one variable from the other. Note that τ(|ξ|) ≤ b|ξ| + ε(b) is nothing else than theFenchel–Young inequality, and if both ξ and b should be optimal one needs to have equality. However, this equality only

34

Fm

m 1/2−m

1/2−m m

x1

y1

x2

y2

Figure 5: Sketch illustrating example 3.2.5. The mass flux Fm is optimal for all m ∈ [0, 1/2].

yields a one-to-one relation between |ξ| and b if both τ and ε are differentiable or equivalently strictly convex. If, however,τ has a kink at |ξ|, then there exist multiple b satisfying the equality. Likewise, if ε has a kink at b, then there existmultiple solutions |ξ|. Consequently, a single minimizer of the branched transport problem will sometimes correspondto multiple minimizers of the urban planning problem and vice versa. We close this section by illustrating this fact withthree examples. The first example is standard in classical optimal transport theory and illustrates that there may bemultiple optimal mass fluxes, while the optimal friction coefficient is unique. In fact, the classical Wasserstein cost is arather degenerate case of branched transport for which infε < ∞ = τ ′(0) (see figure 1) so that the friction coefficient(which has to lie in between both values) is uniquely fixed a priori. We provide a less degenerate example directly after,in which the uniqueness of the friction coefficient comes from a kink in ε.

Example 3.2.5 (Infinitely many optimal mass fluxes, but unique friction coefficient). Let τ(m) = m and consider

µ+ = 12δx1 +

12δx2 and µ− = 1

2δy1 +12δy2

for x1 = (0, 0), x2 = (2, 0), y1 = (1, 1), y2 = (1,−1) (see figure 5). Then the optimal solution for the urban planningproblem is given by b = 1 H1-almost everywhere (S arbitrary). Since this choice of b is the only feasible one, it is unique.Nevertheless, there exist infinitely many solutions to the branched transport problem. Each one is associated with anoptimal transport plan for the Wasserstein-1-distance between µ+ and µ−, where the family of optimal transport planscan be parameterized by

m = π(x1 × y1) ∈ [0, 12 ].

Example 3.2.6 (ε has a kink). Consider the initial and final mass

µ+ = 25δ(0,0) +

15δ(1,−ℓ) +

25δ(2,0) and µ− = 2

5δ(0,1) +15δ(1,1+ℓ) +

25δ(2,1)

with parameter ℓ > 0. If τ(35 ) < τ(15 ) + τ(25 ) and ℓ is chosen sufficiently large we obtain two symmetric solutions for anoptimal mass flux F , no matter how τ looks like: The mass from (1,−ℓ) will be jointly transported with either the massfrom (0, 0) or the mass from (2, 0) (see figure 6a left). Moreover, one can argue that for large enough ℓ the mass from(1,−ℓ) will in fact in one case be transported through (0, 0) as well as (0, 1) and in the other case through (2, 0) as wellas (2, 1): Indeed, if the mass from (0, 0) and (−1, ℓ) would combine in a point pℓ 6= (0, 0), then the so-called momentumconservation (a local optimality condition at triple junctions [Xia04, Prop. 4.5]) reads

τ(15 + 25 ) · (0, 1) = τ(25 ) · vℓ + τ(15 ) · wℓ with vℓ =

pℓ|pℓ|

and wℓ =pℓ − (1,−ℓ)

|pℓ − (1,−ℓ)|.

35

(0, 0) (2, 0)

(1,−ℓ)

(0, 1) (2, 1)

(1, 1 + ℓ)

(a) 2 possible mass fluxes

15

25

35

m

τ(m)

(b) τ differentiable

12

1 b

ε(b)

(c) ε with kink at b = 12

Figure 6: Sketch illustrating example 3.2.6. Due to a kink in ε there are 2 possible optimalmass fluxes for ℓ sufficiently large.

For ℓ → ∞ the angle between vℓ and wℓ would go to zero which would violate this equality. Triple junctions near theother points can be excluded analogously. Now even though there are two solutions to the branched transport problem,if τ is such that ε has a kink in the right place or equivalently if τ is affine at least on [ 25 ,

35 ], the corresponding urban

planning problem has a unique solution. For instance, define a differentiable transportation cost (see figure 6b middle)by

τ(m) =

m if m ∈ [0, 15 ],25 − 5

4 (m− 35 )

2 if m ∈ (15 ,25 ],

320 + 1

2m if m ∈ (25 ,35 ],

− 1120 +

√

m+ 25 if m > 3

5 .

The corresponding maintenance cost (figure 6c right) is given by

ε(b) =

14b −

1120 + 2

5b if b ∈ (0, 12 ],15b

2 − 35b +

25 if b ∈ (12 , 1],

0 if b > 1.

Note that ε has a kink at b = 12 with left derivative − 3

5 and right derivative − 25 . The solution (S, b) to the urban planning

problem is uniquely determined in the following sense: Ignoring sets where b = τ ′(0) = 1 (which one may always dowithout loss of generality) and identifying b in the H1-almost everywhere sense, the optimal pair (S, b) is uniquely givenby

S = [(0, 0), (0, 1)] ∪ [(2, 0), (2, 1)] and b ≡ 12 .

The final example illustrates the reverse situation: The transportation cost τ exhibits a nondifferentiability so that toone ξ there may correspond multiple b. As a result there will be a unique optimal mass flux for the branched transportproblem, but multiple optimal solutions of the urban planning problem.

Example 3.2.7 (τ has a kink). Assume that µ+ = δx and µ− = δy with x 6= y (see figure 7 left). Then the solutionof the branched transport problem is unique (independent of τ) and given by Fopt = ~eH1 e, where e = [x, y] and

36

x

y

(a) optimal mass flux independent of τ

1

1

m

τ(m)

(b) −∂(−τ )(1) = [0, 1]

1

1

b

ε(b)

(c) b+ ε(b) ≡ 1 for b ∈ [0, 1]

Figure 7: Sketch illustrating example 3.2.7. A kink in τ leads to multiple optimal frictioncoefficients.

~e = (y − x)/|y − x|. Further, we have J τ,µ+,µ− [Fopt] = τ(1)H1(e). Now set S = e and let τ have a kink, for instanceτ(m) = min(m, 1). Then for every spatially constant (and in fact even non-constant) b ∈ [0, 1] we obtain

Uε,µ+,µ− [S, b] =Wde,1,b(δx, δy) +

∫

e

ε(b) dH1 = bH1(e) + ε(b)H1(e) = H1(e) = J τ,µ+,µ− [Fopt]

(see figure 7 right) so that there exist infinitely many optimal friction coefficients b. Note that in this example ε isdifferentiable below a = τ ′(0).

4 Acknowledgements

This work was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under thepriority program SPP 1962, grant WI 4654/1-1, and under Germany’s Excellence Strategy EXC 2044 – 390685587,Mathematics Münster: Dynamics–Geometry–Structure. B.W.’s and J.L.’s research was supported by the Alfried KruppPrize for Young University Teachers awarded by the Alfried Krupp von Bohlen und Halbach-Stiftung. The publicationwas supported by the Open Access Publication Fund of the University of Münster.

References

[AGS08] Luigi Ambrosio, Nicola Gigli, and Giuseppe Savare. Gradient Flows. In Metric Spaces and in the Space ofProbability Measures. Lectures in Mathematics. ETH Zürich. Birkhäuser Basel, 2008.

[And09] Simon Anders. “Visualization of genomic data with the Hilbert curve”. In: Bioinformatics 25.10 (2009),pp. 1231–1235.

[BB05] Alessio Brancolini and Giuseppe Buttazzo. “Optimal networks for mass transportation problems”. In: ESAIM:Control, Optimisation and Calculus of Variations 11.1 (2005), pp. 88–101.

[BCM05] Marc Bernot, Vicent Caselles, and Jean-Michel Morel. “Traffic Plans”. In: Publicacions Matemàtiques 49.2(2005), pp. 417–451.

[BCM09] Marc Bernot, Vicent Caselles, and Jean-Michel Morel. Optimal Transportation Networks. 1955th ed. Lecturenotes in Mathematics. Berlin: Springer-Verlag, 2009.

[Bil99] Patrick Billingsley. Convergence of probability measures. 2nd ed. Wiley Series in Probability and Statistics.New York: John Wiley & Sons, Inc., 1999.

37

[BP73] Lawrence D. Brown and Roger Purves. “Measurable Selections of Extrema”. In: Annals of Statistics 1.5 (1973),pp. 902–912.

[Bra05] Alessio Brancolini. Optimization problems for transportation networks. PhD thesis. 2005. url: https://www.uni-muenster.de/AMM/num/wirth/people/Brancolini/thesis.pdf.

[Bri96] Egbert Brieskorn. Felix Hausdorff zum Gedächtnis - Band I. 1st ed. Wiesbaden: Vieweg+Teubner Verlag,1996.

[But+09] Giuseppe Buttazzo, Aldo Pratelli, Eugene Stepanov, and Sergio Solimini. Optimal Urban Networks via MassTransportation. 1st ed. Berlin Heidelberg: Springer-Verlag, 2009.

[BW16] Alessio Brancolini and Benedikt Wirth. “Equivalent formulations for the branched transport and urban plan-ning problems”. In: J. Math. Pures Appl. (9) 106.4 (2016), pp. 695–724.

[BW18] Alessio Brancolini and Benedikt Wirth. “General transport problems with branched minimizers as functionalsof 1-currents with prescribed boundary”. In: Calc. Var. Partial Differential Equations 57.3 (2018), Paper No.82, 39.

[CFM19] Antonin Chambolle, Luca Alberto Davide Ferrari, and Benoit Merlet. “A phase-field approximation of theSteiner problem in dimension two”. In: Adv. Calc. Var. 12.2 (2019), pp. 157–179.

[Edg95] Gerald A. Edgar. “Fine variation and fractal measures”. In: Real Analysis Exchange 20.1 (1995), pp. 256–280.

[Els18] Jürgen Elstrodt. Maß- und Integrationstheorie. 8th ed. Berlin, Heidelberg: Springer Spektrum, 2018.

[Fal86] Kenneth John Falconer. The geometry of fractal sets. 1st ed. Cambridge: Cambridge University Press, 1986.

[FDW20] Luca Ferrari, Carolin Dirks, and Benedikt Wirth. “Phase field approximations of branched transportationproblems”. In: Calc. Var. Partial Differential Equations 59.1 (2020), Paper No. 37, 34.

[Fed69] Herbert Federer. Geometric measure theory. 153rd ed. Die Grundlehren der mathematischen Wissenschaften.New York: Springer-Verlag, 1969.

[Fel81] Marcus B. Feldman. “A Proof of Lusin’s Theorem”. In: The American Mathematical Monthly 88.3 (1981),pp. 191–192.

[Kir94] Bernd Kirchheim. “Rectifiable Metric Spaces: Local Structure and Regularity of the Hausdorff Measure”. In:Proceedings of the American Mathematical Society 121.1 (1994), pp. 113–123.

[Lan69] Serge Lang. Real Analysis. Addison-Wesley Series in Mathematics. Reading MA: Addison-Wesley, 1969.

[Mat95] Pertti Mattila. Geometry of Sets and Measures in Euclidean Spaces: Fractals and Rectifiability. 1st ed. Cam-bridge: Cambridge University Press, 1995.

[MM73] M. Marcus and V. J. Mizel. “Transformations by functions in Sobolev spaces and lower semicontinuity forparametric variational problems”. In: Bulletin of the American Mathematical Society 79.4 (1973), pp. 790–795.

[MS13] Francesco Maddalena and Sergio Solimini. “Synchronic and Asynchronic Descriptions of Irrigation Problems”.In: Advanced Nonlinear Studies 13 (2013), pp. 583–623.

[MSM03] Francesco Maddalena, Sergio Solimini, and Jean-Michel Morel. “A variational model of irrigation patterns”.In: Interfaces and Free Boundaries 5.4 (2003), pp. 391–415.

[PS13] Emanuele Paolini and Eugene Stepanov. “Existence and regularity results for the Steiner problem”. In: Cal-culus of Variations and Partial Differential Equations 46 (2013), pp. 837–860.

[Rin18] Filip Rindler. Calculus of variations. 1st ed. Basel: Springer International Publishing, 2018.

[Roy88] Halsey L. Royden. Real analysis. 3rd ed. New York: Macmillan Publishing Company, 1988.

38

https://www.uni-muenster.de/AMM/num/wirth/people/Brancolini/thesis.pdf

https://www.uni-muenster.de/AMM/num/wirth/people/Brancolini/thesis.pdf

[San15] Filippo Santambrogio. Optimal Transport for Applied Mathematicians. Calculus of Variations, PDEs, andModeling. 1st ed. Basel: Birkhäuser Verlag, 2015.

[Šil08] Miroslav Šilhavý. Divergence measure vectorfields: their structure and the divergence theorem. In: Mathe-matical Modelling of Bodies with Complicated Bulk and Boundary Behavior. 20th ed. Napoli: Aracne, 2008,pp. 217–237.

[Sim14] Leon M. Simon. Introduction to Geometric Measure Theory. Lecture notes. Stanford, 2014. url: https://web.stanford.edu/class/math285/ts-gmt.pdf.

[Smi93] Stanislav Konstantinovich Smirnov. “Decomposition of solenoidal vector charges into elementary solenoids,and the structure of normal one-dimensional flows”. In: Algebra i Analiz 5.4 (1993), pp. 206–238.

[Whi99] Brian White. “Rectifiability of Flat Chains”. In: Annals of Mathematics 150.1 (1999), pp. 165–184.

[Wir19] Benedikt Wirth. “Phase field models for two-dimensional branched transportation problems”. In: Calc. Var.Partial Differential Equations 58.5 (2019), Paper No. 164, 31.

[Xia03] Qinglan Xia. “Optimal paths related to transport problems”. In: Communications in Contemporary Mathe-matics 5.2 (2003), pp. 251–279.

[Xia04] Qinglan Xia. “Interior regularity of optimal transport paths”. In: Calc. Var. Partial Differential Equations20.3 (2004), pp. 283–299.

39

https://web.stanford.edu/class/math285/ts-gmt.pdf

https://web.stanford.edu/class/math285/ts-gmt.pdf

Documents

∗† ∗§ arXiv:2109.07820v1 [math.OC] 16 Sep 2021