24
Maximum Entropy Principle with General Deviation Measures Bogdan Grechuk Stevens Institute of Technology, Hoboken, NJ email: [email protected] Anton Molyboha Stevens Institute of Technology, Hoboken, NJ email: [email protected] Michael Zabarankin Stevens Institute of Technology, Hoboken, NJ email: [email protected] http://personal.stevens.edu/ ~ mzabaran An approach to the Shannon and R´ enyi entropy maximization problems with constraints on the mean and law invariant deviation measure for a random variable has been developed. The approach is based on the representation of law invariant deviation measures through corresponding convex compact sets of nonnegative concave functions. A solution to the problem has been shown to have an alpha-concave distribution (log-concave for Shannon entropy), for which in the case of comonotone deviation measures, an explicit formula has been obtained. As an illustration, the problem has been solved for several deviation measures, including mean absolute deviation (MAD), conditional value-at-risk (CVaR) deviation, and mixed CVaR-deviation. Also, it has been shown that the maximum entropy principle establishes a one-to-one correspondence between the class of alpha-concave distributions and the class of comonotone deviation measures. This fact has been used to solve the inverse problem of finding a corresponding comonotone deviation measure for a given alpha-concave distribution. Key words: Deviation measures; Maximum entropy principle; Shannon entropy; R´ enyi entropy MSC2000 Subject Classification: Primary: 94A17; Secondary: 46N10, 91B44 OR/MS subject classification: Primary: Probability: Entropy; Secondary: Desicion Analysis: Risk 1. Introduction. The principle of entropy maximization is widely used in a variety of applica- tions ranging from statistical thermodynamics and quantum mechanics to decision theory and financial engineering. The principle was first introduced by Jaynes [12, 13] and is intended for finding the least- informative probability distribution 1 given any available information about the distribution. A classical example of an application for this principle is to find the maximally noncommittal probability distribution of a random variable (r.v.) given its first two moments, or, equivalently, mean and standard deviation. It is well known that if the Shannon differential entropy is chosen as a measure of uncertainty and is maximized subject to these constraints, then the distribution in question is the normal one, with the mean and standard deviation given by the corresponding constraints. In application to decision and finance theories, the principle is extensively used for stock and option pricing through estimating corresponding probability distributions as well as for investigating agents’ risk preferences. For example, Cozzolino and Zahner [7] derived the maximum-entropy distribution for the future market price of a stock under the assumption that the expectation and variance 2 of the price are known, while Thomas [29] considered the maximum-entropy principle in application to decision making under uncertainty for the oil spill abatement planning problem with discrete distributions and linear constraints. Also, the principle with the R´ enyi entropy, which is a generalization of the Shannon entropy, was applied to option pricing [5] and was investigated under constraints on covariance [14]. For option pricing with the maximum-entropy principle, see also Stutzer [28] and Buchen and Kelly [4]. For the application of generalized relative entropy to statistical learning of risk preferences, the reader may refer to [11]. A recently emerged theory of general deviation measures, developed by Rockafellar et al. [20, 21], generalizes the notion of standard deviation and provides an alternative way to measure “nonconstancy” in an r.v. In general, these measures are no longer symmetric, i.e., in contrast to standard deviation, they do not penalize the ups and downs of an r.v. equally, which is a desirable property in applications such as portfolio optimization, actuarial science, etc. Examples of deviation measures include standard devi- 1 Jaynes [13] calls it least biased or maximally noncommittal probability distribution with regard to missing information. 2 Variance is derived under the assumption that the price is a stochastic process with stationary and independent incre- ments. 1

Maximum Entropy Principle with General Deviation Measures...2 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation Measures Mathematics of Operations Research

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Maximum Entropy Principle with General Deviation Measures...2 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation Measures Mathematics of Operations Research

Maximum Entropy Principle with General Deviation Measures

Bogdan GrechukStevens Institute of Technology, Hoboken, NJ

email: [email protected]

Anton MolybohaStevens Institute of Technology, Hoboken, NJ

email: [email protected]

Michael ZabarankinStevens Institute of Technology, Hoboken, NJ

email: [email protected] http://personal.stevens.edu/~mzabaran

An approach to the Shannon and Renyi entropy maximization problems with constraints on the mean and lawinvariant deviation measure for a random variable has been developed. The approach is based on the representationof law invariant deviation measures through corresponding convex compact sets of nonnegative concave functions.A solution to the problem has been shown to have an alpha-concave distribution (log-concave for Shannon entropy),for which in the case of comonotone deviation measures, an explicit formula has been obtained. As an illustration,the problem has been solved for several deviation measures, including mean absolute deviation (MAD), conditionalvalue-at-risk (CVaR) deviation, and mixed CVaR-deviation. Also, it has been shown that the maximum entropyprinciple establishes a one-to-one correspondence between the class of alpha-concave distributions and the class ofcomonotone deviation measures. This fact has been used to solve the inverse problem of finding a correspondingcomonotone deviation measure for a given alpha-concave distribution.

Key words: Deviation measures; Maximum entropy principle; Shannon entropy; Renyi entropy

MSC2000 Subject Classification: Primary: 94A17; Secondary: 46N10, 91B44

OR/MS subject classification: Primary: Probability: Entropy; Secondary: Desicion Analysis: Risk

1. Introduction. The principle of entropy maximization is widely used in a variety of applica-tions ranging from statistical thermodynamics and quantum mechanics to decision theory and financialengineering. The principle was first introduced by Jaynes [12, 13] and is intended for finding the least-informative probability distribution1 given any available information about the distribution. A classicalexample of an application for this principle is to find the maximally noncommittal probability distributionof a random variable (r.v.) given its first two moments, or, equivalently, mean and standard deviation.It is well known that if the Shannon differential entropy is chosen as a measure of uncertainty and ismaximized subject to these constraints, then the distribution in question is the normal one, with themean and standard deviation given by the corresponding constraints.

In application to decision and finance theories, the principle is extensively used for stock and optionpricing through estimating corresponding probability distributions as well as for investigating agents’ riskpreferences. For example, Cozzolino and Zahner [7] derived the maximum-entropy distribution for thefuture market price of a stock under the assumption that the expectation and variance2 of the price areknown, while Thomas [29] considered the maximum-entropy principle in application to decision makingunder uncertainty for the oil spill abatement planning problem with discrete distributions and linearconstraints. Also, the principle with the Renyi entropy, which is a generalization of the Shannon entropy,was applied to option pricing [5] and was investigated under constraints on covariance [14]. For optionpricing with the maximum-entropy principle, see also Stutzer [28] and Buchen and Kelly [4]. For theapplication of generalized relative entropy to statistical learning of risk preferences, the reader may referto [11].

A recently emerged theory of general deviation measures, developed by Rockafellar et al. [20, 21],generalizes the notion of standard deviation and provides an alternative way to measure “nonconstancy”in an r.v. In general, these measures are no longer symmetric, i.e., in contrast to standard deviation, theydo not penalize the ups and downs of an r.v. equally, which is a desirable property in applications suchas portfolio optimization, actuarial science, etc. Examples of deviation measures include standard devi-

1Jaynes [13] calls it least biased or maximally noncommittal probability distribution with regard to missing information.2Variance is derived under the assumption that the price is a stochastic process with stationary and independent incre-

ments.

1

Page 2: Maximum Entropy Principle with General Deviation Measures...2 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation Measures Mathematics of Operations Research

2 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation MeasuresMathematics of Operations Research xx(x), pp. xxx–xxx, c©200x INFORMS

ation, lower and upper semideviations, mean absolute deviation, median absolute deviation, ConditionalValue-at-Risk (CVaR) deviation, mixed CVaR-deviation, and worst-case mixed-CVaR deviation; see [21]for other examples. The aforementioned measures are law invariant, i.e., depend only on distributionsof r.v.s. Rockafellar et al. [21] investigated the relationship between deviation and risk measures andshowed that deviation measures provide significant customization in expressing agent’s risk preferences.In application to portfolio optimization, Rockafellar et al. generalized a number of results originally statedfor standard deviation, including Markowitz’s portfolio selection problem [22, 23], in which standard de-viation was replaced by an arbitrary deviation measure, the One Fund theorem [22, 23], the CapitalAsset Pricing Model (CAPM) [22] and conditions on market equilibrium with investors using a varietyof deviation measures [24]. Also, the role of deviation measures in a generalized linear regression was in-vestigated in [25]. These developments raise a question: If a particular deviation measure reflects agent’srisk preferences, what is the least informative distribution for which only the mean and correspondingdeviation are available? Answering this question is the main motivation for this paper.

In this work, we consider the class of law invariant deviation measures and show that they can berepresented in a quantile-based form on an atomless probability space over a corresponding set of non-negative concave functions. We show that for an arbitrary law invariant deviation measure, this set isconvex and compact, and in the case of comonotone deviation measures consists of a single element.Using these results, we determine which class of r.v.s maximizes the Shannon differential entropy withgiven constraints on the mean and a law invariant deviation measure and show that for every law invari-ant deviation measure the solution to the corresponding entropy maximization problem has a log-concavedistribution. Moreover, for the case of comonotone deviation measure, an explicit formula for the optimaldistribution is obtained.

Having the whole class of deviation measures at our disposal, we also address the inverse problem:Given a distribution for an r.v., determine a deviation measure that corresponds to this distributionthrough the entropy maximization principle. We show that the corresponding deviation measure can beconstructed if and only if the given distribution is log-concave. For the case of log-concave distribution, weprove that several law invariant deviation measures may correspond to the same distribution through themaximum-entropy principle, however, only one of them is comonotone. In addition, an explicit formulafor generating this comonotone deviation measure for a given log-concave distribution is provided. Solvingthe inverse problem paves the way for “restoring” agent’s risk preferences encoded in a correspondingdeviation through estimating appropriate probability distributions from historical data.

All the obtained results are extended for the entropy maximization problem with the Renyi entropy.Although the Shannon entropy remains a preferred choice as a measure of information, the Renyi entropyand particularly Renyi quadratic entropy have recently gained in popularity due to their computationalefficiency in applications. For the last years, there have been extensive studies of Renyi entropy maxi-mization with constraints on variance, covariance, and p-moment [6, 14, 17]. Distributions that maximizethe Renyi entropy subject to constraints on a deviation measure are not necessarily log-concave and as aresult, are more appropriate for estimating future returns of financial assets with heavy tails, e.g., returnsof stokes, indices, etc.

The paper is organized into five sections. Section 2 reviews the main properties of deviation measuresand represents law invariant deviation measures in a quantile-based form over a corresponding set ofnonnegative concave functions. Section 3 proves that the set is compact and establishes several corollariesfrom this fact. Section 4 considers Shannon and Renyi differential entropy maximization problems subjectto constraints on the mean and a law invariant deviation measure and solves the direct and inverseproblems. Section 5 concludes this work.

2. Law invariant deviation measures. This section reviews main properties of deviation measuresand derives several representations for law invariant deviation measures.

Let (Ω,M, P ) be a probability space, where Ω denotes the designated space of future states ω,M is afield of sets in Ω, and P is a probability measure on (Ω,M). A random variable (r.v.) is any measurablefunction from Ω to R. In this paper, we restrict our attention to r.v.s from spaces Lp(Ω) = Lp(Ω,M, P ),p = [1,∞], with norms ||X||p = (E[|X|p])1/p, p < ∞, and ||X||∞ = ess sup |X|. We also introducethe space LF (Ω) ⊂ L∞(Ω) of r.v.s that assume only a finite number of values. For an r.v. X, wedenote FX(x), fX(x) and qX(α) = infx|FX(x) > α its cumulative distribution function, probability

Page 3: Maximum Entropy Principle with General Deviation Measures...2 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation Measures Mathematics of Operations Research

Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation MeasuresMathematics of Operations Research xx(x), pp. xxx–xxx, c©200x INFORMS 3

density function (PDF), and quantile function, respectively. Throughout the paper, we assume that theprobability space Ω is atomless, i.e., there exists an r.v. with continuous cumulative distribution function.This assumption implies existence of r.v.s on Ω with all possible3 distribution functions (see, e.g., [10]).

General deviation measures, introduced by Rockafellar et al. in [20, 21], are defined as follows.

Definition 2.1 (deviation measures). A deviation measure4 is any functional D : Lp(Ω) → [0,∞]satisfying

(D1) D(X) = 0 for constant X, but D(X) > 0 otherwise ( nonnegativity),

(D2) D(λX) = λD(X) for all X and all λ > 0 ( positive homogeneity),

(D3) D(X + Y ) ≤ D(X) +D(Y ) for all X and Y ( subadditivity),

(D4) set X ∈ Lp(Ω)∣∣D(X) ≤ c is closed for all c <∞ ( lower semicontinuity).

Axioms D2 and D3 imply convexity, and axioms D1–D3 have the consequence, shown in [21], that

D(X + C) = D(X) for all constants C (insensitivity to constant shift). (1)

A deviation measure is called symmetric if axiom D2 extends also to λ < 0 as D(λX) = |λ|D(X), λ ∈ R.A deviation measure D is called law invariant if D(X1) = D(X2) for any two r.v.s X1 and X2 yieldingthe same distribution function on R, and D is called proper if D(X) <∞ for some nonconstant X.

The most well known examples of deviation measures are:

(i) standard deviation σ(X) =√E[X − EX]2;

(ii) lower and upper semideviations σ−(X) =√E[X − EX]2− and σ+(X) =

√E[X − EX]2+, respec-

tively, where [X]− = max0,−X and [X]+ = max0, X.(iii) mean absolute deviation MAD(X) = E|X − EX|;(iv) conditional value-at-risk (CVaR) deviation, defined for any α ∈ (0, 1) by

CVaR∆α (X) ≡ EX − 1

α

∫ α

0

qX(β)dβ. (2)

All these deviation measures are law invariant. Whereas standard deviation, semideviations, and meanabsolute deviation are well known measures of deviation, CVaR-deviation was introduced by Rockafellaret al. [21] as a “deviation analog” of conditional value-at-risk widely used in financial applications as acoherent measure of risk [1]. For the detailed discussion and other examples, see [21].

An r.v. X is said to dominate Y with respect to concave ordering, or X <c Y , if EX = EY and∫ x−∞ FX(t)dt ≤

∫ x−∞ FY (t)dt for all x ∈ R; see [9]. The result of Dana [9, Theorem 4.1.] implies that

on an atomless probability space a deviation measure is law invariant if and only if it is consistent withconcave ordering, i.e., X <c Y implies D(X) ≤ D(Y ).

The conjugate function of a deviation measure D is associated with risk envelope, see [20, 21].

Definition 2.2 (risk envelope). A risk envelope is a subset Q of Lq(Ω), where 1/p + 1/q = 1, whichsatisfies the following axioms:

(Q1) Q is a convex, closed set containing 1 (constant r.v.),

(Q2) EQ = 1 for every Q ∈ Q,

(Q3) for every nonconstant X ∈ Lp(Ω) there is a Q ∈ Q such that E[XQ] < EX.

3For any nondecreasing right-continuous function F : R→ [0, 1] with limx→−∞

F (x) = 0 and limx→+∞

F (x) = 1, there exists

an r.v. X : Ω→ R such that its cumulative distribution function is F (x).4The axioms are those in [25]. In [20, 21], deviation measures are defined on L2(Ω), and originally axiom D4 was not

included in the definition. Deviation measures satisfying D4 were called lower-semicontinuous deviation measures.

Page 4: Maximum Entropy Principle with General Deviation Measures...2 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation Measures Mathematics of Operations Research

4 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation MeasuresMathematics of Operations Research xx(x), pp. xxx–xxx, c©200x INFORMS

As shown in [20, 21], in the case of p = q = 2, there is a one-to-one correspondence between deviationmeasures and risk envelopes

Q =Q ∈ L2(Ω)

∣∣ E[X(1−Q)] ≤ D(X) ∀X,

D(X) = sup1−Q∈Q

E[XQ]. (3)

In particular, standard deviation σ corresponds to Q = Q∣∣σ(1−Q) ≤ 1, EQ = 1, see [21].

In fact, the representation (3) holds true for all deviation measures for p ∈ [1,∞), and for deviationmeasures satisfying the Fatou property5 for p =∞ (see, e.g., [26]). Jouini et al. [15, Theorem 2.2] provedthat on an atomless probability space every law invariant functional, satisfying D2–D4, has the Fatouproperty, which extends the representation (3) to all p ∈ [1,∞].

For law invariant deviation measures, the relationship (3) implies the following representation.

Proposition 2.1 Every law invariant deviation measure D : Lp(Ω) → [0,∞] can be represented in theform

D(X) = sup1−Q∈Q

∫ 1

0

qQ(α) qX(α)dα, (4)

where Q is the corresponding risk envelope.

Proof. We write X ∼ Y if X and Y have the same distribution function. We have

D(X) = supY :Y∼X

D(Y ) = supY : Y∼X

[sup

Q: 1−Q∈QE[QY ]

]= supQ: 1−Q∈Q

[sup

Y : Y∼XE[QY ]

]= sup

1−Q∈Q

∫ 1

0

qX(α)qQ(α)dα,

where the first equality holds thanks to law invariance of D, the second equality follows from (3), andthe last one follows from [10, Lemma 4.55].6 2

Next, we derive several representations for law invariant deviation measures, which will be used forsolving the entropy maximization problem with constraints on the mean and a law invariant deviationmeasure.

Proposition 2.2 For a functional D : Lp(Ω)→ [0,∞], the following are equivalent

(a) D is a law invariant deviation measure

(b)

D(X) = supφ(α)∈Λ

∫ 1

0

φ(α) qX(α) dα, (5)

where Λ is a collection of nondecreasing functions φ(α) ∈ Lq(0, 1), 1/p + 1/q = 1, such that∫ 1

0φ(α) dα = 0, and containing at least one nonzero element;

(c)

D(X) = supφ(α)∈Λ

∫ 1

0

CVaR∆α (X) d(ψ(α)), d(ψ(α)) = αd(φ(α)), (6)

(d)

D(X) = supg(α)∈G

∫ 1

0

g(α) d(qX(α)), (7)

where G is a collection of positive concave functions g : (0, 1)→ R.

5Functional D satisfies the Fatou property, if D(X) ≤ lim infn→∞

D(Xn) for any bounded sequence Xn with Xn → X a.s.6Lemma 4.55 in [10] is proved for the case X ∈ L∞(Ω), Q ∈ L1(Ω). However, it can be readily extended to the general

case X ∈ Lp(Ω), Q ∈ Lq(Ω) with 1/p+ 1/q = 1.

Page 5: Maximum Entropy Principle with General Deviation Measures...2 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation Measures Mathematics of Operations Research

Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation MeasuresMathematics of Operations Research xx(x), pp. xxx–xxx, c©200x INFORMS 5

Proof. Proposition 2.1 implies that every law invariant deviation measure can be represented in theform (5) with Λ =

φ(α)

∣∣∣ φ(α) = qQ(α), 1−Q ∈ Q

. Thus, (a) implies (b).

Now let D(X) be given by (6). For every nondecreasing φ(α), we have d(ψ(α)) = αd(φ(α)) ≥ 0. Thus,the properties D1–D4 for D and law invariance of D follow from the corresponding properties and lawinvariance of CVaR∆

α . Consequently, (c) implies (a).

The proofs of (b)→ (c) and (b)↔ (d) reduce to integrating (5) by parts and are presented in AppendixA. 2

Definition 2.3 A collection G of nonnegative concave functions g : (0, 1)→ R, for which (7) holds, willbe called g-envelope of a law invariant deviation measure D.

It follows from the proof of Proposition 2.2 that if D can be represented in the form (5) with some Λ,it can be represented in the form (7) with g-envelope G = g

∣∣g(α) = −∫ α

0φ(u)du, φ ∈ Λ.

Remark 2.1 The representation (6) is similar to the well known Kusuoka’s representation [16] for co-herent risk measures.7 However, in contrast to the latter, it is insensitive to constant shift (1).

Remark 2.2 The representation (7) is equivalent to

D(X) = supg(α)∈G

∫ ∞−∞

g(FX(x))dx, (8)

under the assumption that g(0) = g(1) = 0, which extends the integration interval from(ess inf X, ess supX) to R. The reader can verify (8) by substituting α = FX(x) into (7).

Next, we prove that deviation measures having single-element g-envelope G are comonotone (comono-tonically additive).

Definition 2.4 Two r.v.s X : Ω → R and Y : Ω → R are said to be comonotone if there exists a setA ⊆ Ω such that P [A] = 1 and

(X(ω1)−X(ω2))(Y (ω1)− Y (ω2)) ≥ 0 ∀ω1, ω2 ∈ A.

Definition 2.5 A deviation measure D : Lp(Ω)→ [0,∞] is called comonotone if for any two comonotoner.v.s X ∈ Lp(Ω) and Y ∈ Lp(Ω), we have

D(X + Y ) = D(X) +D(Y ). (9)

To derive a representation for comonotone law invariant deviation measures, we need to show that alaw invariant deviation measure is uniquely defined by its restriction to the space of r.v.s that assumeonly a finite number of values.

Proposition 2.3 Let D1(X) and D2(X) be two law invariant deviation measures such that D1(X) =D2(X) for every X ∈ LF (Ω). Then D1(X) = D2(X) for every X ∈ Lp(Ω).

Proof. We prove by contradiction. Suppose there exists an r.v. X ∈ Lp(Ω) such that D1(X) >D2(X). Fix n ∈ N. For i = −22n, . . . ,−1, 0, 1, . . . , 22n − 1 denote Ωi = ω ∈ Ω| i2n ≤ X(ω) < i+1

2n ,and let Ω+ = ω ∈ Ω| X(ω) ≥ 2n, Ω− = ω ∈ Ω| X(ω) < −2n. Then Fn = Ωi| − 22n ≤i ≤ 22n − 1 ∪ Ω+ ∪ Ω− is a partition of Ω, and for Xn = E[X|Fn], we have Xn <c X, whenceD2(Xn) ≤ D2(X). Because Xn ∈ LF (Ω), we have D1(Xn) = D2(Xn) ≤ D2(X) for every n ∈ N.

Let Ω∗ = Ω+ ∪ Ω−. Then Ω∗ = ∅ for p = ∞ and n > log2 ||X||∞, and thus, ess sup |X −Xn| ≤ 2−n.For p <∞, we have∫

Ω

|X −Xn|pdω =∫

Ω∗

|X −Xn|pdω +∫

Ω/Ω∗

|X −Xn|pdω ≤∫

Ω∗

|2X|pdω + 2−pn.

7A coherent risk measure [1] is a functional R : Lp(Ω)→ [0,∞] satisfying D2, D3 together with R(X +C) = R(X)−Cfor all X and constant C, and R(X) ≤ 0 for all X ≥ 0.

Page 6: Maximum Entropy Principle with General Deviation Measures...2 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation Measures Mathematics of Operations Research

6 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation MeasuresMathematics of Operations Research xx(x), pp. xxx–xxx, c©200x INFORMS

Consequently, Xn → X in Lp(Ω) as n→∞, and lower semicontinuity of D1(X) implies D1(X) ≤ D2(X)which contradicts our initial assumption. By similar reasoning, we exclude the case D1(X) < D2(X).2

Observe that, in general, in an infinite-dimensional space, two lower-semicontinuous convex positivehomogeneous functionals, assuming same values on a dense subset, may not be equal (see [3]).

For comonotone law invariant deviation measures the following representation holds.

Proposition 2.4 A functional D : Lp(Ω) → [0,∞] is a proper comonotone law invariant deviationmeasure if and only if it can be represented in the form

D(X) =∫ 1

0

g(α) d(qX(α)), (10)

for some positive concave function g : (0, 1)→ R.

Proof. By Proposition 2.2, the functional (10) is a law invariant deviation measure, which is alsocomonotone, because qX+Y (α) = qX(α) + qY (α) for comonotone r.v.s X and Y (see, e.g., [10, Lemma4.84]).

To prove necessity, we show that every proper comonotone law invariant deviation measure D can berepresented in the form (10) with g(α) = D(Xα), α ∈ (0, 1), where Xα is a collection of comonotone r.v.sgiven by8

Xα =

−1 with probability p = α,

0 with probability p = 1− α.(11)

First, we prove that g(α) is a concave function on (0, 1). Indeed, for every 0 < α1 < α2 < α3 < 1,and λ ∈ (0, 1) such that α2 = λα1 + (1− λ)α3, we have λXα1 + (1− λ)Xα3 <c Xα2 , whence D(Xα2) ≥D(λXα1 + (1− λ)Xα3). Because D is comonotone, D(λXα1 + (1− λ)Xα3) = λD(Xα1) + (1− λ)D(Xα3).This implies that g(α2) ≥ λg(α1) + (1− λ)g(α3), and consequently, g is concave.

Thus, through (10), the function g(α) defines some comonotone law invariant deviation measure D′.Next, we establish that D′(X) = D(X) for every X ∈ Lp(Ω). In fact, the equality holds for everyXα. Further, if an r.v. X ∈ LF (Ω) takes values a1 < a2 < . . . < an with probabilities p1, p2, . . . , pn,respectively, then X has the same distribution as an + (an − an−1)Xqn−1 + . . . + (a2 − a1)Xq1 , whereqi =

∑ij=1 pj , i = 1, . . . , n−1. Because all r.v.s Xqi , i = 1, . . . , n−1 are comonotone, the comonotonicity

and law invariance of D′ and D imply that D′(X) = D(X) for all X ∈ LF (Ω). It is left to applyProposition 2.3. 2

Example 2.1 CVaR-deviation (2) is a comonotone deviation measure and can be represented in the form(10) with

g(β) =

(1/α− 1)β β ≤ α,1− β β ≥ α.

(12)

Detail. Rockafellar et al. [21] showed that CVaR∆α is a law invariant deviation measure, and comono-

tonicity of CVaR∆α follows from the comonotonicity of the quantile function. By virtue of Proposition

2.4, CVaR∆α can be represented in the form (10) with g(β) = CVaR∆

α (Xβ), where Xβ is given by (11).

Remark 2.3 The deviation measure D(X) = ess supX− ess inf X is comonotone and can be representedin the form (7), with G containing single element g(α) ≡ 1. However, it cannot be represented in theform (5) with some single-element collection Λ.

8To construct such a collection, one can choose some r.v. U , uniformly distributed on [0,1], and take Xα = −1 whenever

U < α, and Xα = 0 whenever U ≥ α.

Page 7: Maximum Entropy Principle with General Deviation Measures...2 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation Measures Mathematics of Operations Research

Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation MeasuresMathematics of Operations Research xx(x), pp. xxx–xxx, c©200x INFORMS 7

3. Maximal g-envelope. The representation (7) is a surjective mapping of g-envelopes G ⊂ G ontothe set of deviation measures,9 where G is the set of all nonnegative concave functions g : (0, 1)→ [0,∞).A one-to-one correspondence can be established between law invariant deviation measures and maximalg-envelopes, defined below.

Definition 3.1 Let D be a law invariant deviation measure. Its g-envelope GM is called maximal g-envelope, if there is no g-envelope G of D such that GM ⊂ G.

The next proposition characterizes the maximal g-envelope GM of an arbitrary law invariant deviationmeasure.

Proposition 3.1 Let D be a law invariant deviation measure. Then it has exactly one maximal g-envelope GM , which is a convex set given by

GM =g(α) ∈ G

∣∣∣ ∫ 1

0

g(α) d(qX(α)) ≤ D(X) ∀X ∈ LF (Ω). (13)

For an arbitrary g-envelope G ⊂ G of D, we have G ⊆ GM .

Proof. We first establish that the set

G′M =g(α) ∈ G

∣∣∣ ∫ 1

0

g(α) d(qX(α)) ≤ D(X) ∀X ∈ Lp(Ω)

(14)

is a maximal g-envelope of D.

Let G be an arbitrary g-envelope of the deviation measure D. Then for every g(α) ∈ G, we have∫ 1

0

g(α) d(qX(α)) ≤ D(X) ∀X ∈ Lp(Ω).

Consequently, G ⊆ G′M . This implies

D(X) = supg(α)∈G

∫ 1

0

g(α) d(qX(α)) ≤ supg(α)∈G′M

∫ 1

0

g(α) d(qX(α)) ≤ D(X),

and thus the equality holds, which means that G′M is a g-envelope of D. The convexity of G′M followsfrom (14), and the maximality of G′M is guaranteed by the inclusion G ⊆ G′M for an arbitrary g-envelopeG of D.

It is left to prove that G′M = GM . Obviously, G′M ⊆ GM . According to Proposition 2.2, the setGM is a g-envelope for some law invariant deviation measure D. Then for every X ∈ LF (Ω), theinequality D(X) ≤ D(X) follows from (13), and D(X) ≥ D(X) follows from the fact that G′M ⊆ GM .Thus, D(X) = D(X) for every X ∈ LF (Ω), and by Proposition 2.3, we have D(X) = D(X) for everyX ∈ Lp(Ω). Consequently, GM is a g-envelope of D, whence GM ⊆ G′M , and the proof is finished. 2

The relation (13) along with (7) introduces a one-to-one correspondence between law invariant devia-tion measures and their maximal g-envelopes.

Example 3.1 Mean absolute deviation MAD(X) = E|X −EX| can be represented in the form (7) withthe maximal g-envelope given by

GM (MAD) =g(α) ∈ G

∣∣∣ g(0+) = g(1−) = 0, g′(0+)− g′(1−) ≤ 2. (15)

Detail. As shown in [21], MAD is a law invariant deviation measure. Its maximal g-envelope GM (MAD)is given by (13). We need to prove that GM (MAD) = A, where A ⊂ G is the right-hand side in (15). Fora sequence of r.v.s Xn, n ≥ 2, given by

Xn =

−n with probability p = 1/n,0 with probability p = 1− 2/n,n with probability p = 1/n,

9Different sets G in (7) can produce the same deviation measure.

Page 8: Maximum Entropy Principle with General Deviation Measures...2 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation Measures Mathematics of Operations Research

8 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation MeasuresMathematics of Operations Research xx(x), pp. xxx–xxx, c©200x INFORMS

we have MAD(Xn) = 2, and limn→∞

∫ 1

0g(α) d(qXn(α)) = g′(0+)− g′(1−) + lim

n→∞n(g(0+) + g(1−)). Thus,

(13) implies that g(0+) = g(1−) = 0 and g′(0+)−g′(1−) ≤ 2 for every g ∈ GM (MAD), i.e. GM (MAD) ⊆A.

To prove the reverse inclusion, we show that∫ 1

0

g(α) d(qX(α)) ≤ MAD(X) ∀X ∈ LF (Ω), ∀g(α) ∈ A. (16)

For every g(α) ∈ A, concavity implies g(α) ≤ g′(0+)α and g(α) ≤ −g′(1−)(1 − α). Thus, for everyy ∈ [−g′(1−), 2− g′(0+)], g(α) is pointwise less than

gy(α) =

(2− y)α α ≤ y/2,y(1− α) α ≥ y/2.

Consequently, we need to prove (16) only for g(.) = gy(.), y ∈ (0, 2). Clearly, we can assume that EX = 0,and then integrating by parts, obtain

∫ 1

0gy(α) d(qX(α)) = −2

∫ y/20

qX(α)dα ≤ E|X| = MAD(X), whenceGM (MAD) ⊇ A.

The following proposition establishes some properties of GM .

Proposition 3.2 Let D be a proper law invariant deviation measure and let GM be its maximal g-envelope. Then,

(a) functions g ∈ GM are uniformly bounded.

(b) GM is a compact set in the topology induced by the pointwise convergence.

Proof. We begin with proving (a). Because D is a proper deviation measure, there exists a non-constant r.v. X ∈ Lp(Ω) such that D(X) < ∞. For nonconstant X, there exist a and b such that0 < a < b < 1 and qX(a) < qX(b). Then, for every g(α) ∈ GM , we have

D(X) ≥∫ 1

0

g(α) d(qX(α)) ≥∫ b

a

g(α) d(qX(α)) ≥ minα∈[a,b]

g(α)(qX(b)− qX(a)).

Consequently, minα∈[a,b]

g(α) is bounded from above by some constant M independent of g, and as a result,

g(α0) ≤ M for some α0 ∈ [a, b]. Concavity of g implies g(α) ≤ g(α0) αα0≤ M α

α0for α ≥ α0, and

g(α) ≤ g(α0) 1−α1−α0

≤M 1−α1−α0

for α ≤ α0. Finally, g(α) ≤ max(M/α0,M/(1−α0)) ≤ max(M/a,M/(1−b))for all α.

To prove (b), we first establish that GM is a closed set with respect to pointwise convergence. Let asequence gn(α) ∈ GM converge pointwise to some limit g(α). Then, obviously, the limit function g(α) isnonnegative and concave, and thus, g(α) ∈ G. The fact that for any X ∈ LF (Ω),∫ 1

0

gn(α) d(qX(α))→∫ 1

0

g(α) d(qX(α)) as n→∞ (17)

follows from the dominated convergence theorem and statement (a). This, along with Proposition 3.1,implies that g(α) ∈ GM , and consequently, GM is a closed set with respect to pointwise convergence.

Now compactness of GM follows from Tychonoff’s product theorem (see, e.g., [30, Theorem 17.8]),stating that the product of any collection of compact topological spaces is compact in the producttopology (topology induced by pointwise convergence). Indeed, for any C > 0, the set of all functionsfrom [0, 1] to [0, C] is the product of a continuum of closed intervals [0, C], which are compact, andtherefore, the set is compact with respect to the product topology. By virtue of (a), GM is a subset ofthis set for some C, and because it is closed, it is compact. 2

The compactness of GM is critical for establishing the existence of solution to optimization problemsover GM . In particular, we state the following result.

Proposition 3.3 Let D be a proper law invariant deviation measure and let GM be its maximal g-envelope. Then for every bounded r.v. X0, there exists g0(α) ∈ GM (g-identifier of X0) such that

D(X0) =∫ 1

0

g0(α) d(qX0(α)).

Page 9: Maximum Entropy Principle with General Deviation Measures...2 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation Measures Mathematics of Operations Research

Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation MeasuresMathematics of Operations Research xx(x), pp. xxx–xxx, c©200x INFORMS 9

Proof. Because GM is compact, we only need to show that for every bounded r.v. X0, the functional

S(g) =∫ 1

0

g(α) d(qX0(α))

is continuous on GM with respect to the pointwise convergence. Then we can apply Weierstrass’s theorem(see, e.g., [2]) to conclude that the maximum in (7) is attained.

Let gn(α) ⊆ GM be a sequence converging pointwise to some limit g?(α). By Proposition 3.2(a),gn(α) are uniformly bounded by some constant C. Because

∫ 1

0C d(qX0(α)) < ∞ for X0 ∈ L∞(Ω), we

can apply the dominated convergence theorem, which states that limn→∞

S(gn) = S(g?), and the proof isfinished. 2

We say that g1 ∈ G dominates g2 ∈ G and write g1 < g2 if g1(α) ≥ g2(α) for all α. Set G ⊂ G is calleddominance closed if g1 ∈ G implies g2 ∈ G whenever g1 < g2. The following proposition provides anothercharacterization for maximal g-envelope.

Proposition 3.4 Set G ⊂ G, containing at least one nonzero element, is a maximal g-envelope of somedeviation measure if and only if it is convex, dominance closed, and closed with respect to pointwiseconvergence.

Proof. Let D be a deviation measure with a g-envelope G. If G is the maximal g-envelope of D,then (13) implies that it is convex and dominance closed, and its closedness with respect to pointwiseconvergence follows from Proposition 3.2. Let us prove the converse – i.e., if G is convex, closed, anddominance closed, then G = GM , where GM is the maximal g-envelope of D.

Obviously, G ⊆ GM , and we only need to prove that g∗ ∈ G for every g∗ ∈ GM . Let k ∈ N ,n = 2k − 1, ai = i

2k, i = 1, . . . , n. Because G is convex and closed, the set B = b = (b1, . . . , bn)|∃g ∈

G : bi ≤ g(ai), i = 1, . . . , n is also convex and closed in Rn. Thus, if (g∗(a1), . . . , g∗(an)) 6∈ B, then byseparation principle,

∑ni=1 λig

∗(ai) ≥ ε +∑ni=1 λibi for ε > 0, some λ = (λ1, . . . , λn), and every b ∈ B.

Because (0, . . . , 0, bi = C, 0, . . . , 0) ∈ B for every i = 1, . . . , n and C < 0,∑ni=1 λig

∗(ai) ≥ ε + λiC,and consequently, λi ≥ 0. Let X be an r.v. assuming values 0, λ1, λ1 + λ2, . . .,

∑ni=1 λi with equal

probabilities. Then∫ 1

0g∗(α)d(qX(α)) =

∑ni=1 λig

∗(ai) ≥ ε +∑ni=1 λig(ai) = ε +

∫ 1

0g(α)d(qX(α)) for

every g ∈ G, which implies∫ 1

0g∗(α)d(qX(α)) ≥ ε + D(X). However, this contradicts g∗ ∈ GM . Thus,

(g∗(a1), . . . , g∗(an)) ∈ B.

Let gk(α) be a piecewise-linear function with n + 1 = 2k pieces and with “vertexes” gn(ai) = g∗(ai)i = 1, . . . , n, and gn(0+) = gn(1−) = 0. Because G is dominance closed, (g∗(a1), . . . , g∗(an)) ∈ B impliesgk(α) ∈ G. Then gk(α)k∈N is a monotonically increasing sequence of nonnegative functions, and wehave lim

k→∞gk(α) = g∗(α) pointwise. Thus, g∗(α) ∈ G, whence G = GM , and the proof is finished. 2

As a corollary of Proposition 3.4, the relationships (13) and (7) introduce a one-to-one correspondencebetween law invariant deviation measures and convex, closed, dominance closed sets of nonnegativeconcave functions g : (0, 1)→ [0,∞).

The following example presents the maximal g-envelope of a comonotone deviation measure.

Example 3.2 Let D be a proper comonotone law invariant deviation measure. Then its maximal g-envelope has the form GM = h ∈ G|g < h, where the function g is given by (10).

Detail. By Proposition 2.4, the set h ∈ G|g < h is a g-envelope of D. Because it is convex, closed,and dominance closed, by Proposition 3.4, it is a maximal g-envelope.

4. Deviation measures and entropy maximization. This section investigates Shannon andRenyi entropy maximization problems with constraints on the mean and a law invariant deviation mea-sure.

Page 10: Maximum Entropy Principle with General Deviation Measures...2 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation Measures Mathematics of Operations Research

10 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation MeasuresMathematics of Operations Research xx(x), pp. xxx–xxx, c©200x INFORMS

4.1 Problem formulation. Let X ⊂ L1(Ω) be the set of all r.v.s having continuous PDFs10. Thenfor an arbitrary X ∈ X , the Shannon differential entropy S(X) (see [27]) is defined by

S(X) = −∫ +∞

−∞fX(x) ln fX(x)dx = −

∫ +∞

−∞F ′X(x) lnF ′X(x)dx, (18)

where fX(x) is the PDF of X, whereas the Renyi differential entropy Hβ(X), being a generalization ofthe Shannon differential entropy, is introduced by11

Hβ(X) =1

1− βln∫ +∞

−∞(fX(x))βdx, β >

12, β 6= 1, (19)

see [19]. When β → 1, Hβ(X) converges to S(X). By convention, let

H1(X) = S(X). (20)

The entropy maximization problem with constraints on the mean and a proper law invariant deviationmeasure is formulated as

maxX∈X

S(X) s.t. EX = µ, D(X) ≤ d, (21)

maxX∈X

Hβ(X) s.t. EX = µ, D(X) ≤ d, (22)

where β > 12 , and the constants µ and d > 0 are given. Because Hβ(kX) = Hβ(X) + ln k, k > 0, the

constraint D(X) ≤ d in (22) is always active.

In Shannon entropy maximization (21), Boltzmann’s theorem [8, Theorem 11.1.1] plays a central role.

Proposition 4.1 (Boltzmann’s theorem) Let V ⊆ R be a closed subset and let h1, . . . , hn be mea-surable functions. Also, let B be the set of all continuous r.v.s X with the support V (i.e., those whosePDFs are zero outside of V ) and satisfying the conditions

E(hj(X)) = aj j = 1, . . . , n, (23)

where a1, . . . , an are given. If there is an r.v. in B whose PDF is positive everywhere in V , and if thereexists a Shannon maximum-entropy distribution in B, then its PDF fX(x) is determined by

fX(x) = c exp(∑n

j=1λjhj(x)

)∀x ∈ V, (24)

where the constants c and λj are determined from (23) and the condition that the integral of fX(x) overV is 1.

If both constraints in (21) can be expressed in the form (23), then a solution to (21) is given by (24).

Example 4.1 (Shannon Entropy Maximization with Standard Deviation) For standard devi-ation D(X) = σ(X) =

√E[X − EX]2, the constraints EX = µ and σ(X) = d in (21) can be represented

in the form (23) with V = (−∞,∞), h1(X) = X, a1 = µ, h2(X) = (X − µ)2, and a2 = d2. In this case,a solution to (21) is the normal distribution N (µ, d2).

Example 4.2 (Shannon Entropy Maximization with Lower Semideviation) For lower semide-

viation D(X) = σ−(X) =√E([X − EX]−

)2, the constraints EX = µ and σ−(X) = d in (21) correspondto (23) with V = (−∞,∞), h1(X) = X, a1 = µ, h2(X) = [X−µ]2−, and a2 = d2. In this case, a solutionto (21) is determined by

fX(x) = c exp(λ1x+ λ2[x− µ]2−

)∀x ∈ R. (25)

In particular, if µ = 0 and d = 1 then c ≈ 0.260713, λ1 ≈ −0.638833 and λ2 = −0.5.

10The entropy maximization problem is formulated on the space L1(Ω), which includes Lp(Ω) for all p ∈ [1,∞].11The Renyi differential entropy is defined for β > 0, β 6= 1. However, we restrict our attention to the case β > 1/2 to

guarantee that the maximum-entropy distribution belongs to L1(Ω).

Page 11: Maximum Entropy Principle with General Deviation Measures...2 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation Measures Mathematics of Operations Research

Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation MeasuresMathematics of Operations Research xx(x), pp. xxx–xxx, c©200x INFORMS 11

Detail. The formula (25) follows from Boltzmann’s theorem, and the constants c, λ1, and λ2 are foundfrom the conditions

∫∞−∞ fX(x)dx = 1,

∫∞−∞ xfX(x)dx = µ, and

∫ µ−∞(x− µ)2fX(x)dx = d2. Because the

first and third integrals cannot be expressed in terms of elementary functions, the constants are foundnumerically.

Example 4.3 (Shannon Entropy Maximization with Lower Range Deviation) For lowerrange deviation D(X) = EX − ess inf X (see [21]), the constraints EX = µ and EX − ess inf X = d in(21) are equivalent to (23) with V = (µ − d,∞), h1(X) = X, and a1 = µ. In this case, a solution to(21) is the shifted exponential distribution with fX(x) = 1

d exp(−x+µ−dd ), x ≥ µ− d.

The solution of the Renyi entropy maximization problem (22) with standard deviation D(X) = σ(X)is also well known (see, e.g., [6, 14]).

Example 4.4 (Renyi Entropy Maximization with Standard Deviation) For standard devia-tion D(X) = σ(X) =

√E[X − EX]2, β 6= 1, and for µ = 0, d = 1, a solution to (22) has the PDF12

fX(x) = A[1− β−1

3β−1 x2] 1β−1

+∀x ∈ R, (26)

where [x]+ = maxx, 0, and the constant A is defined by

A =

(

Γ( 11−β )

√1−β3β−1

)/(

Γ( 11−β −

12 )√π)

β < 1,(Γ( β

β−1 + 12 )√

β−13β−1

)/(

Γ( ββ−1 )

√π)

β > 1.(27)

Detail. This result is a particular case of [14, Proposition 1.3] with n = 1.

If the constraint D(X) = d can be expressed in the form (23), a distribution maximizing the Renyientropy in (22) for β 6= 1 can be represented in the form similar to (24) in Boltzmann’s theorem (seeAppendix B).

Example 4.5 (Renyi Entropy Maximization with Lower Semideviation) For lower semidevia-

tion D(X) = σ−(X) =√E[X − EX]2−, and β 6= 1, a solution to (22) has the PDF

fX(x) = c[1 + λ1x+ λ2[x− µ]2−

] 1β−1

+∀x ∈ R, (28)

where c, λ1 and λ2 are found numerically from the conditions∫ ∞−∞

fX(x)dx = 1,∫ ∞−∞

xfX(x)dx = µ,

∫ µ

−∞(x− µ)2fX(x)dx = d2.

In particular, for µ = 0, d = 1 these coefficients are shown in Figure 1.

Figure 1: Coefficients c, λ1 and λ2 in (28) as functions of β.

However, not for every deviation measure, the constraint D(X) = d in (22) can be represented in theform (23). A simple necessary condition can be formulated in terms of mixtures.

Definition 4.1 Given r.v.s X and Y , and a number λ ∈ [0, 1], an r.v. Z with the cumulative distributionfunction FZ(z) = λFX(z) + (1−λ)FY (z) is called λ-mixture of X and Y . We write Z = λX ⊕ (1−λ)Y .

Proposition 4.2 Let D be a deviation measure such that the set PD = X : EX = 0, D(X) = 1 canbe expressed in the form (23). Then λX ⊕ (1− λ)Y ∈ PD for X,Y ∈ PD and any λ ∈ [0, 1].

12In fact, (26) remains correct for more general case β > 13

. For β ≤ 13

, the solution does not exist; see [6].

Page 12: Maximum Entropy Principle with General Deviation Measures...2 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation Measures Mathematics of Operations Research

12 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation MeasuresMathematics of Operations Research xx(x), pp. xxx–xxx, c©200x INFORMS

Proof. Let X,Y ∈ PD and let PD be represented in the form (23). Then E(hj(X)) = aj andE(hj(Y )) = aj , j = 1, . . . , n, and for the λ-mixture Z = λX ⊕ (1− λ)Y , we have

E(hj(Z)) =∫ +∞

−∞hj(z) d(λFX(z) + (1− λ)FY (z)) = λ

∫ +∞

−∞hj(z)d(FX(z))+

+(1− λ)∫ +∞

−∞hj(z)d(FY (z)) = λE(hj(X)) + (1− λ)E(hj(Y )) = aj ,

which implies Z ∈ PD. 2

Example 4.6 For CVaR-deviation (2), the necessary condition, established in Proposition 4.2, does nothold, and consequently, the set X : EX = 0, CVaR∆

α (X) = 1 cannot be expressed in the form (23).

Detail. Let two r.v.s X and Y be defined by PrX = −3 = 1/4 and PrX = 1 = 3/4, andPrY = −1 = 3/4 and PrY = 3 = 1/4, respectively. Then their 1/2-mixture Z is determinedby PrZ = −3 = 1/8, PrZ = −1 = 3/8, PrZ = 1 = 3/8, and PrZ = 3 = 1/8. We haveEX = EY = 0, CVaR∆

1/2(X) = CVaR∆1/2(Y ) = 1, but CVaR∆

1/2(Z) = 3/2 6= 1. Counterexamples forarbitrary α can be constructed similarly.

As in the case of CVaR-deviation, the necessary condition in Proposition 4.2 is not satisfied for themixed CVaR-deviation (see [21]), which is a comonotone deviation measure, defined by

D(X) =∫ 1

0

CVaR∆α (X)dm(α), (29)

where m is a weighting measure on (0, 1) (nonnegative with total measure 1). Thus, in general, theproblem (22) cannot be solved by direct application of Boltzmann’s theorem, even in the case of Shannonentropy maximization (β = 1).

Lutwak et al. [17] used relative entropy to find Renyi maximum-entropy distribution for an r.v. withgiven p-th moment. The next section couples this approach with the representation (7) to characterizemaximum-entropy distribution in (22) for an arbitrary law invariant deviation measure.

4.2 Characterization of Maximum-entropy Distribution. This section investigates the prob-lem (22) for an arbitrary law invariant deviation measure D.

The Renyi entropy (19) can be equivalently represented through the quantile function qX(α). ForX ∈ X , the quantile qX(α) is the inverse function of FX(x) and is differentiable almost everywhere.Substituting α = FX(x) into (19), we have dα = fX(x)dx and x = qX(α). By the Inverse FunctionTheorem, fX(x) = 1

q′X(FX(x)) , and the Renyi differential entropy takes the form

Hβ(X) =1

1− βln∫ 1

0

(q′X(α))1−βdα, β >12, β 6= 1, (30)

and similarly

H1(X) = S(X) =∫ 1

0

ln q′X(α)dα. (31)

Because Hβ(X + C) = Hβ(X) for any r.v. X and constant C, and Hβ(kX) = Hβ(X) + ln k for anyX and k > 0, an r.v. X∗ solves (22) if and only if (X∗ − µ) /d solves the problem

maxX∈X

Hβ(X) s.t. EX = 0, D(X) = 1. (32)

We use an approach similar to that of Lutwak et al. [17] and introduce

Nβ [X,Y ] =

(∫ 1

0q′X(α)−βq′Y (α)dα

) 1β(∫ 1

0q′X(α)1−βdα

) 11−β

(∫ 1

0q′Y (α)1−βdα

) 1β(1−β)

, β >12, β 6= 1, (33)

Page 13: Maximum Entropy Principle with General Deviation Measures...2 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation Measures Mathematics of Operations Research

Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation MeasuresMathematics of Operations Research xx(x), pp. xxx–xxx, c©200x INFORMS 13

and

N1[X,Y ] =(∫ 1

0

q′X(α)−1q′Y (α)dα)eH1(X)−H1(Y ) (34)

provided that all the integrals exist. lnNβ [X,Y ] is a version of relative β-Renyi entropy, which in contrastto the one of Lutwak [17], uses quantiles instead of probability density functions.

Proposition 4.3 Nβ [X,Y ] ≥ 1 for every X ∈ X and Y ∈ X .

Proof. Let β = 1, then

lnNβ [X,Y ] = ln∫ 1

0

q′Y (α)q′X(α)

dα−∫ 1

0

lnq′Y (α)q′X(α)

dα ≥ 0,

where the equality follows from (34) and (31), and the inequality is Jensen’s one. For β 6= 1, the prooffollows from Holder’s inequality. Let β < 1, f = q′Y (α), and g = q′X(α), then∫ 1

0

f1−βdα =∫ 1

0

(g−βf)1−βg(1−β)βdα ≤(∫ 1

0

g−βfdα

)1−β (∫ 1

0

g1−βdα

)β,

which implies Nβ [X,Y ]β(1−β) ≥ 1. Finally, let β > 1, f = q′Y (α)−1, and g = q′X(α)−1, then∫ 1

0

gβ−1dα =∫ 1

0

(gβf−1)β−1β (fβ−1)

1β dα ≤

(∫ 1

0

gβf−1dα

) β−1β(∫ 1

0

fβ−1dα

) 1β

,

which implies Nβ [X,Y ]1−β ≤ 1. Consequently, in both cases, Nβ [X,Y ] ≥ 1. 2

The next result is auxiliary and addresses existence of the integrals in (33).

Proposition 4.4 Let g : (0, 1)→ R be a positive concave function. Then∫ 1

0

g(α)udα <∞ for any constant u > −1, (35)

and the indefinite integral

h(α) =∫g(α)udα ∈ L1(0, 1) for any constant u > −2. (36)

Proof. For u ≥ 0, g(α)u is a bounded continuous function of α on (0, 1), and thus, both (35) and (36)hold. Now let u < 0. The concavity and positiveness of g(α) imply g(α) ≥ (1− 2α)g(0+) + 2αg(1/2) ≥2αg(1/2) for α ∈ (0, 1/2], and similarly, g(α) ≥ 2(1 − α)g(1/2) + (1 − 2(1 − α))g(1−) ≥ 2(1 − α)g(1/2)for α ∈ [1/2, 1). Consequently, g(α) ≥ g0(α) = 2 minα, (1− α)g(1/2) for all α, and thus,∫ 1

0

g(α)udα ≤ 2∫ 1/2

0

(2αg(1/2))udα <∞ 0 > u > −1,

which proves (35) and (36) for u > −1. For u ∈ (−2,−1], we obtain

|h(β)| ≤ |C|+

∣∣∣∣∣∫ β

12

g(α)udα

∣∣∣∣∣ ≤ |C|+∣∣∣∣∣∫ β

12

(g0(α))udα

∣∣∣∣∣︸ ︷︷ ︸Iu(β)

,

where C is a constant. Observe that Iu(β) ∼ O(βu+1

)as β → 0 and Iu(β) ∼ O

((1− β)u+1

)as β → 1

for u ∈ (−2,−1). Also Iu(β) grows as a logarithm as β → 0 or β → 1, for u = −1. Consequently, in bothcases, Iu(β) ∈ L1(0, 1) and (36) follows. 2

Next, Proposition 4.3 is applied to characterize a solution to problem (32).

Proposition 4.5 Let D be a law invariant deviation measure with maximal g-envelope GM . Let g ∈ GM ,and let X be an r.v. such that q′X(α) = Cg g(α)−

1β , where Cg = 1

/∫ 1

0g(u)1− 1

β du . Then eHβ(Y ) ≤eHβ(X)D(Y ) for every r.v. Y ∈ X .

Page 14: Maximum Entropy Principle with General Deviation Measures...2 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation Measures Mathematics of Operations Research

14 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation MeasuresMathematics of Operations Research xx(x), pp. xxx–xxx, c©200x INFORMS

Proof. Because (35) holds, Cg is finite and positive for β > 1/2. Expressing g(α) through q′X(α),

we obtain g(α) = C(q′X(α))−β with C = Cβg = 1/∫ 1

0q′X(α)1−βdα , and from g ∈ GM , we have

D(Y ) ≥∫ 1

0

g(α)q′Y (α)dα =(∫ 1

0

q′X(α)1−βdα

)−1 ∫ 1

0

q′X(α)−βq′Y (α)dα

= eHβ(Y )−Hβ(X)Nβ [X,Y ]β ≥ eHβ(Y )−Hβ(X),

where the second equality follows from (33) and (30), and the last inequality follows from Proposition4.3. 2

Proposition 4.5 implies that for the specified r.v. X, Hβ(X) ≥ Hβ(Y ) whenever D(Y ) ≤ 1. Thus, ifD(X) ≤ 1 for some g ∈ GM , then X solves (32). For a comonotone deviation measure D, this leads tothe following result.

Proposition 4.6 Let D be a proper comonotone law invariant deviation measure. Then an r.v. Xsolves problem (32) if and only if EX = 0 and

q′X(α) = Cg g(α)−1β , Cg = 1

/∫ 1

0

g(u)1− 1β du , (37)

where the function g is given by (10).

Proof. With the representation (10), D(X) =∫ 1

0g(α)q′X(α)dα = Cg

∫ 1

0g(α)1− 1

β dα = 1. ByProposition 4.5, Hβ(X) ≥ Hβ(Y ) whenever D(Y ) ≤ 1, and thus, X is a solution to (32). Because theRenyi differential entropy is a strictly concave functional of q′X , X is unique. 2

The quantile function qX(α) of an r.v. X that solves (32) with an arbitrary comonotone deviationmeasure D is found from (37):

qX(α) = Cg

∫g(α)−

1β dα+ C, (38)

where Cg is determined in (37) and the integration constant C is chosen to satisfy the constraint EX = 0.For β > 1/2, we have − 1

β > −2, and in view of (36), we obtain qX(α) ∈ L1(0, 1). This is the reason why(30) is defined for β > 1

2 .

Next we characterize the class of distributions that solve (32) with a comonotone deviation measure.

Definition 4.2 A PDF fX(x) of an r.v. X is called α-concave, if

• fX(x)α

α is a concave function on the support of fX(x) for α 6= 0,

• ln fX(x) is a concave function on the support of fX(x) for α = 0 (such functions are also calledlog-concave).

Proposition 4.7

(a) For a proper comonotone law invariant deviation measure D, the unique PDF fX(x) that solves(32) is (β − 1)-concave.

(b) For a given r.v. X with a (β−1)-concave PDF fX(x), there exists a unique comonotone deviationmeasure D such that fX(x) solves (32) with D. This deviation measure is given by

D(Y ) =∫ 1

0

g(α)dqY (α), (39)

where g(α) = C(q′X(α))−β and C = 1/∫ 1

0q′X(α)1−βdα .

Proof. The existence and uniqueness in (a) follow from Proposition 4.6. Let us prove that thesolution has a (β − 1)-concave PDF. Because g(α) is concave, (37) implies that h(α) = q′X(α)−β is aconcave function. This holds if and only if the derivative h′(α) = −β(q′X(α))−β−1q′′X(α) exists almosteverywhere and is decreasing. Differentiating the equality FX(qX(α)) = α, we obtain q′X(α) = 1

fX(qX(α))

Page 15: Maximum Entropy Principle with General Deviation Measures...2 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation Measures Mathematics of Operations Research

Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation MeasuresMathematics of Operations Research xx(x), pp. xxx–xxx, c©200x INFORMS 15

and q′′X(α) = − q′X(α)f ′X(qX(α))

f2X(qX(α))

= − f′X(qX(α))

f3X(qX(α))

. Thus, h′(α) is decreasing if and only if (fX(x))β+1 f′X(x)

f3X(x)

is

decreasing. The last expression is the derivative of fX(x)β−1

β−1 on the support of fX(x) for β 6= 1, and isthe derivative of ln fX(x) for β = 1. This proves that fX(x) is (β − 1)-concave.

The proof of (b) is straightforward. For any X with a (β − 1)-concave PDF, g(α) = C(q′X(α))−β isconcave and positive, and by Proposition 2.4, D(Y ) =

∫ 1

0g(α)dqY (α) is a comonotone deviation measure.

2

Now the problem (32) is investigated for an arbitrary (not necessarily comonotone) deviation measureD. We prove that there exists g(α) ∈ GM such that D(X) ≤ 1 for the r.v. X defined in Proposition 4.5.This g(α) solves the optimization problem from the next proposition.

Proposition 4.8 Let D be a proper law invariant deviation measure, and let GM be its maximal g-envelope. For g ∈ GM , let

Wγ(g) =

1

1−γ ln∫ 1

0g(α)1−γdα, γ ∈ (0, 2), γ 6= 1,∫ 1

0ln g(α)dα, γ = 1,

(40)

then the optimization problemmaxg∈GM

Wγ(g) γ ∈ (0, 2) (41)

has a unique solution g?(α) ∈ GM .

Proof. The functional Wγ(g) is finite by Proposition 4.4 (see the formula (35)). It is upper semi-continuous (with respect to pointwise convergence) for γ ≥ 1 by Fatou’s lemma and is continuous forγ < 1 by Proposition 3.2(a) and the dominated convergence theorem. The existence of solution followsfrom the compactness of GM (see Proposition 3.2(b)) and Weierstrass’s theorem, and the uniqueness ofsolution is the result of the convexity of GM and strict concavity of W (g). 2

The function g?(α), described in Proposition 4.8, has the following properties.

Proposition 4.9 Let D be a proper law invariant deviation measure and let GM be its maximal g-envelope. Also, let γ ∈ (0, 2), and let g?(α) ∈ GM solve (41). Then for any nonzero g(α) ∈ GM , wehave ∫ 1

0

g?(α)g(α)γ

dα ≥∫ 1

0

g(α)g(α)γ

dα (42)

and ∫ 1

0

g?(α)g?(α)γ

dα ≥∫ 1

0

g(α)g?(α)γ

dα. (43)

Proof. We begin with proving (42). Let γ = 1 then

0 ≤∫ 1

0

ln g?(α)dα−∫ 1

0

ln g(α)dα =∫ 1

0

lng?(α)g(α)

dα ≤ ln∫ 1

0

g?(α)g(α)

dα ∀g(α) ∈ GM ,

where the last inequality is Jensen’s one, and (42) follows. For γ 6= 1, by definition of g?(α), we have

11− γ

ln∫ 1

0

g(α)1−γdα ≤ 11− γ

ln∫ 1

0

g?(α)1−γdα ∀ g(α) ∈ GM . (44)

Let dν(α) = g(α)1−γdα be a nonnegative measure on (0, 1), then∫ 1

0dν(α) is finite in view of (35)

(see Proposition 4.4), and therefore, dm(α) = dν(α)∫ 10 dν(α)

is a probability measure on (0, 1). Then (42) isequivalent to ∫ 1

0

g?(α)g(α)

dm(α) ≥ 1. (45)

Using Jensen’s inequality, we obtain(∫ 1

0

g?(α)g(α)

dm(α))1−γ

≤∫ 1

0

(g?(α)g(α)

)1−γ

dm(α) =

∫ 1

0g?(α)1−γdα∫ 1

0dν(α)

≤ 1 for γ > 1,

Page 16: Maximum Entropy Principle with General Deviation Measures...2 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation Measures Mathematics of Operations Research

16 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation MeasuresMathematics of Operations Research xx(x), pp. xxx–xxx, c©200x INFORMS

and (∫ 1

0

g?(α)g(α)

dm(α))1−γ

≥∫ 1

0

(g?(α)g(α)

)1−γ

dm(α) =

∫ 1

0g?(α)1−γdα∫ 1

0dν(α)

≥ 1 for 0 < γ < 1,

which result in (45) (in the two lines above, the last inequality follows from (44)).

Now we show (43). The integral J =∫ 1

0g?(α)1−γdα is finite in view of (35), and for each λ ∈ (0, 1),

we have ∫ 1

0

g?(α)(λg(α) + (1− λ)g?(α))γ

dα ≤∫ 1

0

g?(α)((1− λ)g?(α))γ

dα = (1− λ)−γ J. (46)

Because GM is convex, gλ(α) = λg(α) + (1− λ)g?(α) ∈ GM , and it follows from (42) that∫ 1

0

g?(α)gλ(α)γ

dα ≥∫ 1

0

λg(α) + (1− λ)g?(α)gλ(α)γ

dα.

For λ 6= 0, this implies ∫ 1

0

g?(α)gλ(α)γ

dα ≥∫ 1

0

g(α)gλ(α)γ

dα. (47)

Combining (47) and (46), we obtain

(1− λ)−γ J ≥∫ 1

0

g(α)gλ(α)γ

dα.

Because g(α)gλ(α)−γ → g(α)g?(α)−γ pointwise as λ→ 0, using Fatou’s lemma, we have

J = limλ→0

((1− λ)−γ J

)≥ lim inf

λ→0

∫ 1

0

g(α)gλ(α)γ

dα ≥∫ 1

0

limλ→0

g(α)gλ(α)γ

dα =∫ 1

0

g(α)g?(α)γ

dα,

which proves (43). 2

The next proposition characterizes solutions to the problem (32) with an arbitrary law invariantdeviation measure.

Proposition 4.10 Let D be a proper law invariant deviation measure and let GM be its maximal g-envelope. Also, let g?(α) ∈ GM solve (41) for γ = 1/β. Then an r.v. X solves problem (32) if and onlyif EX = 0 and

q′X(α) = C g?(α)−1β , C = 1

/∫ 1

0

g?(u)1− 1β du. (48)

Proof. For X satisfying (48) and γ = 1β , we have

D(X) = supg(α)∈GM

∫ 1

0

g(α) d(qX(α)) = supg(α)∈GM

∫ 1

0

C g(α)g?(α)γ

dα = C

∫ 1

0

g?(α)1−γdα = 1,

where the third equality follows from (43). Thus, the constraints in (32) are satisfied, and Proposition4.5 guarantees that X is a solution to (32). The uniqueness of X follows from strict concavity of Hβ .2

Consequently, to solve (32) with a law invariant deviation measure, for which the constraint D(X) = 1cannot be expressed in the form (23), we suggest the following approach:

(i) Given law invariant D, find a solution g?(α) to the problem (41). In the case when D is comono-tone, g?(α) coincides with g(α) in (10).

(ii) Find the quantile function of X as the antiderivative of (48) such that EX = 0.

The solution to the initial problem (22) is then X? = µ+ d ·X.

Remark 4.1 Compared to solving (22) directly, this approach has the obvious advantage for comonotonedeviation measures.

Proposition 4.10 is central to characterizing the class of maximum-entropy distributions with givenmean and deviation D.

Page 17: Maximum Entropy Principle with General Deviation Measures...2 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation Measures Mathematics of Operations Research

Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation MeasuresMathematics of Operations Research xx(x), pp. xxx–xxx, c©200x INFORMS 17

Proposition 4.11 Let D be a proper law invariant deviation measure. Then the problem (32) with Dhas a unique solution fX(x), which is (β − 1)-concave (log-concave for Shannon entropy). Conversely,for an arbitrary r.v. X with EX = 0 and a (β − 1)-concave PDF fX(x), there exist infinitely manydeviation measures such that the solution to the corresponding problem (32) is fX(x). Exactly one ofthese deviation measures is comonotone and is given by (39).

Proof. The proof follows from Propositions 4.7 and 4.10. 2

4.3 Examples. The results obtained in the preceding section are illustrated for entropy maximiza-tion with the full-range deviation, CVaR-deviation, mixed CVaR-deviation, and mean absolute deviation(MAD).

First, maximum-entropy distributions for CVaR-deviation and mixed CVaR-deviation are derived forthe Shannon entropy maximization problem (21) with µ = 0 and d = 1:

maxX∈X

S(X) s.t. EX = 0, D(X) = 1. (49)

Example 4.7 (Shannon Entropy Maximization with CVaR-Deviation) A solution to (49) withCVaR-deviation (2), which is comonotone, has the PDF

fX(x) =

(1− α) exp(

1−αα

(x− 2α−1

1−α

))x ≤ 2α−1

1−α ,

(1− α) exp(−(x− 2α−1

1−α

))x ≥ 2α−1

1−α .(50)

Figure 2 shows fX(x) for α = 0.01, 0.3, 0.5, 0.7, 0.8, and 0.9.

For this fX(x), a deviation measure, restored by (39) with β = 1, is CVaR∆α (X).

Figure 2: The PDF fX(x) (see (50) that solves the entropy maximization problem (49) with CVaR-deviation for α = 0.01, 0.3, 0.5, 0.7, 0.8, and 0.9.

Detail. CVaR∆α can be represented in the form (10) with g(β) given by (12); see Example 2.1. If X

solves (49) with CVaR-deviation, then by Proposition 4.6, we have

q′X(β) =1

g(β)=

α

1−α1β β ≤ α,

11−β β ≥ α,

and for the quantile function of X, we obtain

qX(β) =∫

1g(β)

dβ = q0 +

α

1−α ln βα β ≤ α,

− ln 1−β1−α β ≥ α,

where the integration constant q0 = 2α−11−α is found from the condition EX =

∫ 1

0qX(β)dβ = 0, and

consequently,

qX(β) =2α− 11− α

+

α

1−α ln βα β ≤ α,

− ln 1−β1−α β ≥ α.

Finally, (50) is found as the derivative of the inverse function for qX(β).

Example 4.8 (Shannon Entropy Maximization with Mixed CVaR-deviation) An r.v. Xsolves (49) with mixed CVaR-deviation (29), which is a comonotone deviation measure, if and onlyif EX = 0 and

q′X(β) =(

m(0, β] + β

∫ 1−

β+

dm(α)α

− β)−1

. (51)

For this distribution, a deviation measure, restored by (39) with β = 1, is mixed CVaR-deviation.

Page 18: Maximum Entropy Principle with General Deviation Measures...2 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation Measures Mathematics of Operations Research

18 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation MeasuresMathematics of Operations Research xx(x), pp. xxx–xxx, c©200x INFORMS

Detail. According to Proposition 4.6, X solves (49) if and only if EX = 0 and q′X(β) = 1g(β) , where

g(β) is defined by (10) and can be calculated as g(β) = D(Xβ) with Xβ (11). Substituting (10) with (12)(see Example 2.1) into (29), we obtain

g(β) =∫ 1

0

CVaR∆α (Xβ)dm(α) =

∫ β

0

(1− β)dm(α) +∫ 1

β+

(1α− 1)βdm(α) =

= m(0, β]− β(m(0, β] + m(β, 1)

)+ β

∫ 1−

β+

dm(α)α

,

which proves (51).

The next example presents a solution to the Renyi entropy maximization problem (32) with CVaR-deviation.

Example 4.9 (Renyi Entropy Maximization with CVaR-Deviation) A solution to (32) withCVaR-deviation (2), which is comonotone, for β 6= 1 has the PDF

fX(x) =

β

2β−11−αα1/β

(β−12β−1

1−αα1/β x+ α+β−1

2β−11

α1/β

) 1β−1

x ≤ 2α−11−α , x ∈ V,

β2β−1

1−α(1−α)1/β

(− β−1

2β−11−α

(1−α)1/βx+ β−α

2β−11

(1−α)1/β

) 1β−1

x ≥ 2α−11−α , x ∈ V,

0 x 6∈ V,

(52)

where V = (−∞,∞) for β < 1 and V =[− α+β−1

(β−1)(1−α) ,β−α

(β−1)(1−α)

]for β > 1. Figure 3 shows the

function fX(x) for various α and β.

For this PDF, a deviation measure, restored by (39) with β 6= 1, is CVaR∆α (X).

Detail. CVaR∆α can be represented in the form (10) with g given by (12) (see Example 2.1). If X

solves (32) with D(X) = CVaR∆α (X), then by Proposition 4.6, q′X(u) = Cg g(u)−γ , where γ = 1

β and

Cg = 1/∫ 1

0g(u)1−γdu = 2−γ

(1−α)1−γ . Thus, we have

q′X(u) =Cgg(u)γ

=

2−γ1−α

(αu

)γu ≤ α,

2−γ(1−α)1−γ

1(1−u)γ u ≥ α.

Then the quantile function of X takes the form

qX(u) = C +∫ u

α

q′X(λ)dλ = C +

2−γ1−γ

αγ

1−αu1−γ − 2−γ

1−γα

1−α u ≤ α,− 2−γ

1−γ1

(1−α)1−γ (1− u)1−γ + 2−γ1−γ u ≥ α,

where the integration constant C = 2α−11−α is found from EX =

∫ 1

0qX(u)du = 0. Finally, (52) is obtained

as the derivative of the inverse function for qX(u).

(a) β = 0.6 (b) β = 1.5

(c) β = 2 (d) β = 3

Figure 3: The PDF fX(x) (see (52) that solves the Renyi entropy maximization problem (32) with CVaR-deviation for α = 0.01, 0.3, 0.5, 0.7, 0.8, and 0.9 in four cases: (a) β = 0.6, (b) β = 1.5, (c) β = 2, and(d) β = 3.

The next example presents the maximum-entropy distribution in (32) for the full-range deviation.

Page 19: Maximum Entropy Principle with General Deviation Measures...2 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation Measures Mathematics of Operations Research

Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation MeasuresMathematics of Operations Research xx(x), pp. xxx–xxx, c©200x INFORMS 19

Example 4.10 (Renyi Entropy Maximization with Full-Range Deviation) A solution to (32)with the full-range deviation D(X) = ess supX − ess inf X has the uniform distribution on (− 1

2 ,12 ). If

given this distribution, a deviation measure is restored by (39), then it is the full-range deviation.

Detail. D(X) = ess supX − ess inf X is comonotone and can be represented in the form (10) withg(α) ≡ 1. Thus, for the solution X? to (32), Proposition 4.6 implies that q′X?(α) ≡ 1, or qX?(α) = α+C.The condition EX? = 0 yields C = − 1

2 , and consequently, X? has the uniform distribution on (− 12 ,

12 ).

Next, two examples present solutions to the Shannon and Renyi entropy maximization problems withMAD.

Example 4.11 (Shannon Entropy Maximization with Mean Absolute Deviation) A solutionto (49) with MAD(X) = E|X − EX| has the PDF

fX(x) =

1/2 ex x ≤ 0,1/2 e−x x ≥ 0.

(53)

Detail. The formula (53) follows from Boltzmann’s theorem, because the constraints EX = 0 andMAD(X) = 1 can be represented in the form (23) with h1(x) = x, a1 = 0, h2(x) = |x|, and a2 = 1.

As an illustration for the developed approach, we also prove (53) using Proposition 4.10. Because thelogarithm is a monotonic function, a solution to (41) with γ = 1 and GM given by (15) can be representedin the form

gx(β) =

(2− x)β β ≤ x/2,x(1− β) β ≥ x/2,

for some x ∈ (0, 2). For∫ 1

0ln gx(β)dβ = lnx + ln(1 − x/2) − 1, the maximum is attained at x = 1, and

thus,

g∗(β) =

β β ≤ 1/2,1− β β ≥ 1/2.

A solution to (49) is given by q′X(β) = 1g?(β) (see Proposition 4.10), and the PDF (53) is the derivative

of the inverse function for qX(β).

Example 4.12 (Renyi Entropy Maximization with Mean Absolute Deviation) A solution to(32) with MAD(X) = E|X − EX| for β 6= 1 has the PDF

fX(x) =

β

4β−2

(β−12β−1x+ 1

) 1β−1

x ≤ 0, x ∈ Vβ

4β−2

(− β−1

2β−1x+ 1) 1β−1

x ≥ 0, x ∈ V,0 x 6∈ V,

(54)

where V = (−∞,∞) for β < 1 and V =[− 2β−1

β−1 ,2β−1β−1

]for β > 1.

Detail. First, we solve (41) with GM given by (15). Because h(y) = y1−γ

1−γ is a monotonic function, (41)attains its maximum at one of the functions

gx(u) =

(2− x)u u ≤ x/2,x(1− u) u ≥ x/2,

for some x ∈ (0, 2). Because∫ 1

0gx(u)1−γ

1−γ du = 1(2−γ)22−γ

(2−x)1−γ+x1−γ

1−γ , the maximum will be attained atx = 1, and we have

g∗(u) =

u u ≤ 1/2,1− u u ≥ 1/2.

Page 20: Maximum Entropy Principle with General Deviation Measures...2 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation Measures Mathematics of Operations Research

20 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation MeasuresMathematics of Operations Research xx(x), pp. xxx–xxx, c©200x INFORMS

By Proposition 4.10, a solution X to (32) is such that EX = 0 and q′X(u) = Cg g∗(u)−γ , where γ = 1

β

and Cg = 1/∫ 1

0g∗(u)1−γdu = (2− γ)21−γ . Then the quantile function of X takes the form

qX(u) = C +∫ u

12

q′X(λ)dλ = C +

2−γ1−γ (2u)1−γ − 2−γ

1−γ u ≤ 12 ,

2−γ1−γ [2(1− u)]1−γ + 2−γ

1−γ u ≥ 12 ,

where the integration constant C = 0 is found from the condition EX =∫ 1

0qX(u)du = 0. Finally, (54)

is obtained as the derivative of the inverse function for qX(u).

Remark 4.2 Applying (39) to the PDF (54), we obtain median absolute deviation D(X) =CVaR1/2(X). This illustrates the fact that different deviation measures may lead to the same optimalPDF in (32). Among all these measures, the formula (39) provides only the comonotone one.

In particular, this fact suggests a surjective mapping of all deviation measures to the class of comono-tone deviation measures. Example 4.1 shows that the solution to the Shannon entropy maximizationproblem (49) with standard deviation is the standard normal distribution N (0, 1). We consider theinverse problem: What comonotone deviation measure corresponds to N (0, 1) through the maximum-entropy principle?

Example 4.13 (Maximum-Entropy Inverse Problem for N (0, 1)) A comonotone deviation mea-sure D that produces N (0, 1) as the solution to the maximum-entropy problem (49) is given by (10)with

g(α) =1√2π

exp(−1

2(Φ−1(α))2

), (55)

where Φ−1(α) is the inverse to the cumulative distribution function of N (0, 1).

Detail. For an r.v. X ∼ N (0, 1), we have qX(α) = Φ−1(α) and q′X(α) = 1Φ′(Φ−1(α)) =

√2π ·

exp(

12 (Φ−1(α))2

). Substituting this q′X(α) into (39) with β = 1, we obtain (55).

Thus, we conclude that standard deviation corresponds to the comonotone deviation measure D inExample 4.13 through the maximum-entropy principle.

Similarly, we address the inverse problem with the Renyi entropy for N (0, 1).

Example 4.14 (Inverse Problem with the Renyi Entropy for N (0, 1)) A comonotone devia-tion measure producing N (0, 1) as a solution to the Renyi entropy maximization problem (32) with β < 1is given by (10) with

g(α) =

√β

2πexp

(−β

2(Φ−1(α))2

), (56)

where Φ−1(α) is the inverse to the cumulative distribution function of N (0, 1).

Detail. Because N (0, 1) is log-concave, it is (β − 1)-concave for β < 1. For an r.v. X ∼ N (0, 1),we have q′X(α) =

√2π exp

(12 (Φ−1(α))2

), see Example 4.13. Thus, according to Proposition 4.7(b),

a comonotone deviation measure that produces N (0, 1) as an outcome of (32) is given by (10) withg(α) = C

(q′X(α)

)−β = C(2π)−β/2 exp(−β2 (Φ−1(α))2

)where C = 1

/∫ 1

0q′X(α)1−βdα . With α = Φ(x),

we obtain C =√β(2π)(β−1)/2, and consequently, (56) follows.

The next example highlights practical aspects of the maximum-entropy principle with general deviationmeasures. It solves the inverse problem: Given historical data for stock’s rate of return, estimate theprobability distribution for the rate of return and find a deviation measure that produces that distributionthrough the maximum-entropy principle. We rely on the belief that risk preferences of the investors arefully reflected in stock’s expected rate of return and some deviation measure. In fact, this belief is theextension of Markowitz’s mean-variance approach [18], according to which all investors are concerned onlywith the mean and variance (or equivalently, standard deviation) of stocks’ rates of return. Solving the

Page 21: Maximum Entropy Principle with General Deviation Measures...2 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation Measures Mathematics of Operations Research

Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation MeasuresMathematics of Operations Research xx(x), pp. xxx–xxx, c©200x INFORMS 21

(a) empirical distribution and its approximation (b) the function g(α) in (39), β = 1

Figure 4: (a) the empirical distribution of the monthly historical rates of return for the Bank of AmericaCorporation stock and its log-concave approximation; (b) the function g(α) for the deviation measure(39) (β = 1) restored through the maximum-entropy principle for the log-concave distribution in (a).

inverse problem is based on the fact that the Shannon maximum-entropy principle establishes one-to-onecorrespondence between the class of log-concave PDFs and the class of comonotone deviation measures(see Proposition 4.7, item (b), case β = 1).

Example 4.15 (Restored Deviation Measure) Using monthly historical rates of return for theBank of America Corporation stock for the last eight years, we approximate the empirical distributionof the rate of return by a log-concave distribution13 and then restore the comonotone deviation measureusing (39) with β = 1 for the approximating distribution. The deviation measure is given by (10), whereg(α) is calculated numerically and is shown on Figure 4.

Concluding the section, we reexamine Example 4.15 with the Renyi entropy. As in Example 4.15,solving the inverse problem is based on the fact that for any fixed β > 1/2, the Renyi entropy maximizationproblem (22) establishes one-to-one correspondence between the class of (β − 1)-concave distributionsand the class of comonotone deviation measures. If for a given β we denote this class of (β − 1)-concavedistributions by Cβ , then Cβ2 ⊆ Cβ1 for any β1 < β2. Because we restrict β to be β > 1/2, the set C1/2 isthe largest among those with β ≥ 1/2.

(a) empirical distribution and its approximation (b) the function g(α) in (39)

Figure 5: (a) the empirical distribution of the monthly historical rates of return for the Bank of AmericaCorporation stock and its approximation; (b) the function g(α) for the deviation measure (39) restoredthrough the maximum-entropy principle with the Renyi entropy (β → 1/2) and for the approximatingdistribution in (a).

Example 4.16 Using the same historical data for the rate of return for the Bank of America Corporationstock as in Example 4.15, we approximate the empirical distribution of the rate of return by the distri-bution with the PDF fX(x) such that (fX(x))β−1

β−1 is a concave function for β → 1/214 and then restore acomonotone deviation measure using (39) with the approximating distribution. The deviation measure isgiven by (10), where g(α) is calculated numerically for β → 1/2 and is shown on Figure 5.

Comparing Figures 4 and 5, we conclude that although the approximating distributions are sufficientlyclose, the corresponding functions g(α) differ significantly. This observation suggests that the choice ofβ in the Renyi entropy has a strong impact on a restored deviation and, consequently, on agent’s riskpreferences associated with that deviation measure.

5. Conclusions. This work has formulated the problem of Shannon and Renyi entropy maximizationwith a constraint on a general deviation measure introduced by Rockafellar et al. and has generalized therecent results on Renyi entropy maximization with constraints on standard deviation and pth moment.It has also developed a new representation for deviation measures (Proposition 2.2(d)) that played apivotal role in adapting existing entropy-maximization approaches to solving the formulated problem.The chain of intermediate propositions and auxiliary results has culminated in Proposition 4.10. As anillustration, new maximum-entropy distributions for the Shannon and Renyi entropies, in particular withconditional value-at-risk deviation, have been obtained. As another major contribution, this work has

13The logarithm of the empirical distribution is convexified, and the approximating distribution is determined as the

normalized exponential function of the convexified distribution.14In this example, we choose β = 0.5 + 10−9.

Page 22: Maximum Entropy Principle with General Deviation Measures...2 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation Measures Mathematics of Operations Research

22 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation MeasuresMathematics of Operations Research xx(x), pp. xxx–xxx, c©200x INFORMS

solved the inverse entropy-maximization problem: Finding a deviation measure that corresponds to agiven probability distribution function through the maximum-entropy principle. This problem finds itsapplication in financial engineering and risk analysis. In particular, it could be used for restoring riskpreferences of an agent from historical rates of return of agent’s financial instruments.

Appendix A. Proof of Proposition 2.2. Let us show that (b)→ (c). It follows from (2) that

− αCVaR∆α (X) =

∫ α

0

(qX(t)− EX)dt, (57)

which, along with the property∫ 1

0φ(α) dα = 0, reduces the integral in (5) to

I =∫ 1

0

φ(α)qX(α)dα =∫ 1

0

φ(α)(qX(α)− EX)dα =∫ 1

0

φ(α)d(−αCVaR∆α (X)).

Integrating the last integral by parts, we obtain

I =(−αCVaR∆

α (X)φ(α)) ∣∣∣1

0+∫ 1

0

αCVaR∆α (X)d(φ(α)). (58)

It is left to prove that the first term in (58) vanishes. First, we show that

limα→0

(−αCVaR∆

α (X)φ(α))

= 0. (59)

With (57) and the fact that for sufficiently small α, the function |φ| monotonously decreases on (0, α),we obtain∣∣αCVaR∆

α (X)φ(α)∣∣ =

∣∣∣∣φ(α)∫ α

0

(qX(t)− EX) dt∣∣∣∣ ≤ ∫ α

0

∣∣φ(α)(qX(t)−EX)∣∣dt ≤ ∫ α

0

∣∣φ(t)(qX(t)−EX)∣∣dt.

Because qX(t) ∈ Lp(0, 1), and φ(t) ∈ Lq(0, 1), the integral∫ 1

0|φ(t)||qX(t)−EX|dt is finite. Consequently,

the fact∫ α

0

∣∣φ(t)(qX(t) − EX)∣∣dt → 0 as α → 0 can be shown by Lebesgue’s dominated convergence

theorem. This proves (59).

Similarly, we can show that limα→1

(−αCVaR∆

α (X)φ(α))

= 0. Consequently, the representations (5) and

(6) are equivalent.

Now, we show that (b)→ (d). For every nonzero φ(α) ∈ Λ, the function g(α) = −∫ α

0φ(t)dt is positive

and concave and satisfies g(0) = g(1) = 0. Integrating the original integral I by parts, we obtain

I =∫ 1

0

φ(α)qX(α)dα = (−g(α) qX(α))∣∣∣10

+∫ 1

0

g(α) d(qX(α)). (60)

It is left to prove that the first term in (60) vanishes. Indeed, if qX(α) → −∞ as α → 0 then forsufficiently small α we have |g(α) qX(α)| ≤ |qX(α)|

∫ α0|φ(t)|dt ≤

∫ α0|qX(t)||φ(t)|dt → 0 as α → 0 (the

last integral vanishes because qX(t) ∈ Lp(0, 1), and φ(t) ∈ Lq(0, 1)). Similarly, |g(α) qX(α)| → 0 asα→ 1. This proves that (b)→ (d).

Finally, we show that (d)→ (a), i.e. for every collection G of positive concave functions g : (0, 1)→ R,the functional (7) is a law invariant deviation measure. Because the axioms D1–D4 are preserved underthe supremum operation, it suffices to establish (d)→ (a) for Dg(X) =

∫ 1

0g(α) d(qX(α)) with a positive

concave function g(α).

First, we assume that g(α) is a piecewise-linear concave function with finite number of linear piecesand such that g(α) > 0 for α ∈ (0, 1) and g(0+) = g(1−) = 0. Denote a = g′(0+) and b = g′(1−) andφ(α) = −g′(α), where the derivative exists. Then integrating

∫ 1

0g(α) d(qX(α)) by parts, we obtain

Dg(X) =∫ 1

0

g(α) d(qX(α)) = b limα→1

((1− α) qX(α))− a limα→0

(α qX(α)) +∫ 1

0

φ(α) qX(α)dα. (61)

Because qX(α) ∈ Lp(0, 1) ⊂ L1(0, 1), both limits in (61) vanish (as those in (60)). Because in additionφ(α) is a nondecreasing nonzero function, φ(α) ∈ L∞(0, 1) ⊂ Lq(0, 1), and

∫ 1

0φ(α)dα = 0, it follows from

the part (b)→ (a) that the functional Dg(X) is a law invariant deviation measure.

Page 23: Maximum Entropy Principle with General Deviation Measures...2 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation Measures Mathematics of Operations Research

Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation MeasuresMathematics of Operations Research xx(x), pp. xxx–xxx, c©200x INFORMS 23

Now let g ∈ G be an arbitrary nonzero function and let gn(α) for every n ∈ N be a piecewise-linearfunction with 2n pieces such that gn(0+) = gn(1−) = 0 and gn(i/2n) = g(i/2n) for i = 1, . . . , 2n−1. Thengn(α)n∈N is a monotonically increasing sequence of nonnegative functions, and we have lim

n→∞gn(α) =

g(α) pointwise. It follows from the monotone convergence theorem that∫ 1

0

g(α) d(qX(α)) = limn→∞

∫ 1

0

gn(α) d(qX(α)) = supn∈N

∫ 1

0

gn(α) d(qX(α)). (62)

Because the axioms D1–D4 are preserved under the supremum operation, the functional (62) is a lawinvariant deviation measure and, consequently, so is (7).

Appendix B. Version of Boltzmann’s Theorem for the Renyi Entropy. If the constraintD(X) = d can be expressed in the form (23), a distribution maximizing the Renyi entropy in (22) forβ 6= 1 can be represented in the form similar to (24) in Boltzmann’s theorem.

Let V ⊆ R be a closed subset and let h1, . . . , hn be measurable functions. Also, let B be the set of allcontinuous r.v.s X with the support V (i.e., those whose PDFs are zero outside of V ) and satisfying theconditions

E(hj(X)) = aj j = 1, . . . , n, (63)

where a1, . . . , an are given.

A general formulation of the Renyi entropy maximization problem subject to (63) is given by

max∫V

(fX(x)

)βdx if β < 1 or min

∫V

(fX(x)

)βdx if β > 1

s.t.∫V

hj(x)fX(x)dx = aj j = 0, . . . , n,

fX(x) ≥ 0

where h0(x) ≡ 1 and a0 = 1.

With Lagrange multipliers λ0, . . . , λn and µ(x), the Lagrangian for this problem takes the form

L =∫V

[(fX(x)

)β +∑n

j=0λjhj(x)fX(x) + µ(x)fX(x)

]dx−

∑n

j=0λjaj

and the necessary optimality conditions are determined by

β(fX(x)

)β−1 +∑n

j=0λjhj(x) + µ(x) = 0

with the complementarity conditions µ(x)fX(x) = 0 and µ(x) ≥ 0 for β < 1 (µ(x) ≤ 0 for β > 1), whence

fX(x) =

(− 1β

∑nj=0 λjhj(x)

) 1β−1

, if∑nj=0 λjhj(x) ≤ 0 or β < 1,

0, if∑nj=0 λjhj(x) > 0 and β > 1,

or equivalently,

fX(x) =[− 1β

∑n

j=0λjhj(x)

] 1β−1

+

.

In particular, for standard lower semideviation σ−, the constraints EX = µ and σ−(X) = d correspondto (63) with V = (−∞,∞), h1(X) = X, a1 = µ, h2(X) = [X −µ]2−, and a2 = d2. In this case, a solutionto (22) is determined by (28).

Acknowledgments. The authors are grateful to the anonymous referees for their valuable commentsand suggestions, which helped to improve the quality of the paper.

References

[1] P. Artzner, F. Delbaen, J.-M. Eber, D. Heath, Coherent Measures of Risk, Mathematical Finance 9 (1999),203–227.

[2] J. Bialas, Y. Nakamura, The Theorem of Weierstrass, Formalized Mathematics 5(3) (1996), 353–359.

Page 24: Maximum Entropy Principle with General Deviation Measures...2 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation Measures Mathematics of Operations Research

24 Grechuk, Molyboha, Zabarankin: Maximum Entropy Principle with General Deviation MeasuresMathematics of Operations Research xx(x), pp. xxx–xxx, c©200x INFORMS

[3] J. Benoist, A. Daniilidis, Coincidence Theorems for Convex Functions, Journal of Convex Analysis 9(1)(2002), 259–268.

[4] P. Buchen, M. Kelly, The Maximum Entropy Distribution of an Asset Inferred from Option Prices, Journalof Financial and Quantitative Analysis 31(1) (1996), 143–159.

[5] D. C. Brody, R. C. Buckley Ian, I. C. Constantinou, Option Price Calibration from Renyi Entropy, PhysicsLetters A 366 (2007), 298–307.

[6] J. Costa, A. Hero, C. Vignat, On solutions to multivariate maximum-entropy problems, Lecture Notes inComputer Science (A. Rangarajan, M. Figueiredo, J. Zerubia, eds.), Springer-Verlag, Berlin, 2683, 2003,pp. 211–228.

[7] J. M. Cozzolino, M. J. Zahner, The Maximum-Entropy Distribution of the Future Market Price of a Stock,Operations Research 21(6) (1973), 1200–1211.

[8] T. M. Cover, J. A. Thomas, Elements of Information Theory, 1st ed., Wiley, New York, 1991.

[9] R.-A. Dana, A representation result for concave Schur-concave functions, Mathematical Finance 15(4) (2005),613–634.

[10] H. Follmer, A. Schied, Stochastic Finance: An Introduction in Discrete Time, 2nd ed. de Gruyter, Berlin,2004.

[11] C. Friedman, J. Huang, and S. Sandow, A Utility-Based Approach to Some Information Measures, Entropy9 (2007), 1–6.

[12] E. T. Jaynes, Prior Probabilities, IEEE Transactions on Systems Science and Cybernetics 4 (1968), 227–251.

[13] E. T. Jaynes, Information Theory and Statistical Mechanics, Physical Review 106(4) (1957), 620–630.

[14] O. Johnson, C. Vignat, Some results concerning maximum Renyi entropy distributions, Annales de l’InstituteHenri Poincare (B) Probability and Statistics, 43(3) (2007), 339–351.

[15] E. Jouini, W. Schachermayer, N. Touzi, Law invariant risk measures have the Fatou Property, Advances inMathematical Economics 9 (2006), 49–71.

[16] S. Kusuoka, On law invariant coherent risk measures, Advances in Mathematical Economics 3 (2001), 83–95.

[17] E. Lutwak, D. Yang, G. Zhang, Cramer-Rao and moment-entropy inequalities for Renyi entropy and gener-alized Fisher information, IEEE Transactions on Information Theory 51 (2005), 473–478.

[18] H. M. Markowitz, Portfolio selection, The Journal of Finance 7(1) (1952), 77–91.

[19] A. Renyi, On measures of information and entropy, Proceedings of the 4th Berkeley Symposium on Mathe-matics, Statistics and Probability, University of California Press, Berkeley, CA (1961), 547–561.

[20] R. T. Rockafellar, S. Uryasev, M. Zabarankin, Deviation Measures in Risk Analysis and Optimization, Re-search Report 2002-7, Dept. of Industrial and Systems Engineering, University of Florida, Gainesville, 2002.

[21] R. T. Rockafellar, S. Uryasev, M. Zabarankin, Generalized deviations in risk analysis, Finance and Stochastics10(1) (2006), 51–74.

[22] R. T. Rockafellar, S. Uryasev, M. Zabarankin, Optimality Conditions in Portfolio Analysis with GeneralDeviation Measures, Mathematical Programming 108(2–3) (2006), 515–540.

[23] R.T. Rockafellar, S. Uryasev, M. Zabarankin, Master Funds in Portfolio Analysis with General DeviationMeasures, The Journal of Banking and Finance 30(2) (2006), 743–777.

[24] R. T. Rockafellar, S. Uryasev, M. Zabarankin, Equilibrium with Investors Using a Diversity of DeviationMeasures, The Journal of Banking and Finance 31(11) (2007), 3251–3268.

[25] R. T. Rockafellar, S. Uryasev, M. Zabarankin, Risk Tuning with Generalized Linear Regression, Mathematicsof Operations Research 33(3) (2008), 712–729.

[26] B. Rudloff, Hedging in incomplete markets and testing compound hypotheses via convex duality, Dissertation,Martin-Luther University, Halle-Wittenberg, Germany, 2006.

[27] C. E. Shannon, A mathematical theory of communication, Bell System Technical Journal 27 (1948), 379–423,623–656.

[28] M. Stutzer, A Simple Nonparametric Approach to Derivative Security Valuation, Journal of Finance 51(5)(1996), 1633–1652.

[29] M. U. Thomas, A Generalized Maximum Entropy Principle, Operations Research 27(6) (1979), 1188–1196.

[30] S. Willard, General Topology, Courier Dover Publications, Mineola, NY, 2004.