The generalized maximum entropy principle

1042 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 19, NO. 5 , SEPTEMBER/OCTOBER 1989

The Generalized Maximum Entropy Principle H. K. KESAVAN AND J. N. KAPUR

Abstract -Generalizations of the well-known maximum entropy principle (MEP) by E. T. Jaynes, and the minimum discrimination information principle (MDIP) by Kullback are described. The generalization has been achieved by enunciating the entropy maximization postulate and examining its consequences. The inverse principles which are inherent in the MEP and MDIP are made quite explicit in the new methodology. Several examples are given to illustrate the power and scope of the generalized maximum entropy principle that follows from the entropy maximization postulate.

I. INTRODUCTION

URING the past three decades, the maximum en- D tropy principle (MEP) in Jaynes [3] has been invoked in the solution of a wide array of probabilistic systems. The principle draws together concepts from information theory, statistical inference, optimization, and last but not least, a precise knowledge of the partial information one has about a probabilistic system in terms of a set of statistical moments. The chief assertion of the MEP is that the most unbiased probability distribution is the maximum entropy distribution satisfying the constraints. That is, it is that distribution that is obtained by maximizing the entropy measure (usually Shannon’s) subject to the given constraints, a process that resorts to the familiar method of constrained maximization using Lagrange multipliers. The MEP has been hailed as a unifying concept both due to its philosophical foundations as well as of its success in practical implementation. From its initial success in statistical mechanics and thermodynamics, the range of applications has cut across several disciplines.

The generalized maximum entropy principle (GMEP) [ll], whch is the subject matter of this paper, is a generalization of the MEP in the sense that the latter forms an important constituent of it. Furthermore it is the MEP that provides the requisite background for the formulation of this new principle. Given the three probabilistic entities, namely, the entropy measure, the set of moment constraints and the probability distribution, the MEP provides a methodology for identifying the most unbiased probability distribution, based on a knowledge of the first two

Manuscript received September 11, 1988; revised March 3, 1989. This work was supported by the Natural Sciences and Engineering Research Council of Canada.

H. K. Kesavan is with the University of Waterloo, Systems Design Engineering, Waterloo. OT, Canada N2L 3G1.

J. N. Kapur is with the Indian National Science Academy, Kanpur, India.

IEEE Log Number 8929618.

entities. As stated earlier, the identification is based on the principle of maximization of the entropy measure subject to the given constraints. The inherent logic of this procedure is aptly viewed as a generalization of Laplace’s principle of insufficient reason that states that the uniform distribution is the most unbiased distribution when one does not have any prior knowledge about a probabilistic event .

The GMEP addresses itself to the determination of any one of the three when the remaining two probabilistic entities are specified. Again accent is placed on the most unbiased determination that is precisely what lends uniqueness to the choice from amongst a multitude of possibilities. The philosophical underpinning of the GMEP rests on the entropy maximization postulate (EMP), which states that it is the maximum information-theoretic entropy that is always the controlling quantity with respect to the states of the three mutually coupled probabilistic entities. The GMEP then spells out deductive procedures for the determination of the unspecified entity when the other two are specified. This new formalism is extracted from the basic blueprint of the formalism of the MEP by elevating entropy maximization to the level of a postulate.

For practical applications, it is convenient to have a conceptual framework that could also accommodate a fourth probabilistic entity, namely, a priori distributions. This is accomplished by considering the Kullback-Leibler [14] measure of direct divergence or cross-entropy. From this vantage point, the MEP can be neatly reinterpreted in terms of a minimization of a probabilistic distance of a probability distribution from the uniform distribution. A more general statement could be made in terms of Kullback’s [ 131 minimum discrimination information principle (MDIP) that can comprehend, in addition to the three probabilistic entities that come under the purview of MEP, the fourth entity, namely, the a priori probability distribution. The MDIP is also called Kullback‘s minimum cross-entropy principle. We can again invoke the GMEP, for the determination of any single entity when the rest of the three are specified. This deductive procedure is very similar to the earlier formalism where only three entities were involved.

Furthermore it is theoretically possible [ 111 to transform the a priori probability distribution into an additional moment constraint and thus establish the intimate connection between the two principles based on maximum entropy and minimal discrimination information. From this

0018-9472/89/0900-1042$01 .OO 01989 IEEE

KESAVAN AND KAPUR: THE GENERALIZED MAXIMUM ENTROPY PRINCIPLE 1043

new perspective, the precedence of one principle over the other in terms of being a more general one becomes a moot point.

We briefly review the principles of ME and MDI and give illustrative examples. With t h s background, we state the entropy maximization postulate and make the seven constituents of the GMEP quite explicit. This is followed by a discussion of the two formalisms that extend the scope of the MEP and MDI to their full range of possibilities so as to allow the determination of any specific probabilistic entity under conditions mentioned earlier. All these are extensively illustrated by examples taken from different fields. The paper concludes by asserting that the GMEP does indeed offer a fresh perspective in the study of probabilistic systems, much beyond what is implied in the MEP and MDI, by bringing the inverse principles into a sharp focus. That the latter principles do have practical implications is highlighted through examples.

I. THE MAXIMUM ENTROPY PRINCIPLE (JAYNES’ FORMALISM)

The principle is about the determination of the most unbiased probability distribution, when the given entities are the Shannon’s entropy function -Zy=lpl In p, and some simple linear moment constraints. As for the former, we make particular use of the fact that it is a concave function and consequently that it does possess a global maximum. Furthermore the maximization process always leads to a set of probabilities that will never assume negative values.

The problem is to maximize the entropy function n

- c PllnPl ( 1 ) I =1

subject to the constraints n n

c p l = l , C p l g , ( x l ) = a , , r = 1 2 2 , . - . , m . ( 2 ) I =1 I =1

The Lagrangian is n

L = - e P l l n P l - ( ~ o - l ) C P 1 - l r = l 1

(3)

Maximizing L, we get

In order to determine the Lagrange multipliers A,,

A,; . a , A,, we substitute (2) into ( 5 ) to get n

( 6 ) = e - X , g i ( x , ) - A z g z ( x , ) P ... - A n , g m ( x , )

I =1

and n

areA’= C gr(xi)exp[-A1g1(xi)- . . . -Amgm(xi>l~ i = l

r =1,2;. ., m . (7)

For the case of the continuous random variate, the maximum entropy principle requires: maximize

- J b f ( x ) l n f ( 4 dx (8)

subject to

l b f ( x ) dx = I , Jbf(x)g,(x) dx = a,, r =1,2; * * , m .

(9) U

The results of the discrete version can be extended to this case.

A. Formalism of the Minimum Discrimination Information Principle (MDI)

Kullback and Leibler [14] introduced the measure

(10) PI

D ( P : Q ) = c p l l n - j - 1 9;

in order to discriminate the probability distribution P from Q. Ths is always 2 0, and has a global minimum value of zero when the two distributions are identical. When Q is the uniform distribution U, that is, when gi = l / n for each i , then

n n

D ( P: U ) = C piln np, = Inn + C piln p, I =1 1 = 1

= S ( U ) - s( P) (11)

where H( U ) is the entropy associated with the uniform distribution, that is, Smax, and H ( P ) is the Shannon’s entropy measure for P.

Minimizing D( P : U ) would entail maximizing H( P ) , which becomes synonymous with the MEP. But this new measure affords another interpretation of the MEP: the ME principle seeks to determine that distribution P, out of those that satisfy the constraints, for which D ( P : U ) is a minimum. In other words one is seeking that distribution that satisfies the constraints and is closest to the uniform distribution.

Kullback‘s [13] MDI principle generalizes this concept. It seeks to minimize the directed divergence D ( P : Q) , whch means it seeks to determine the distribution P that satisfies all the constraints and is closest to a given distribution Q.

1044 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 19, NO. 5, SEPTEMBER/OCTOBER 1989

We want to minimize X.?=lpjln(pj/qr) subject to the constraints in (2) . The Lagrangian L is:

n n r n 1

r = l Lr=1 J

and

- A,g,(x)] dx= a,, r = 1 , 2 ; - . , m . (18)

111. THE GENERALIZED MAXIMUM ENTROPY PRINCIPLE (GMEP)

The MEP addresses itself to the determination of the most unbiased probability distribution proceeding from Shannon’s entropy measure and a given set of constraints. The same problem can be viewed in an expanded framework where the MDI principle is invoked because it is then possible to include a fourth probabilistic entity, namely an a priori probability distribution Q.

We now present a formalism, one in the MEP version, and another in the MDIP version, wluch constitute their generalizations. In the MEP version, the formalism renders it possible to determine any one of the probabilistic quan- tities when the other two are specified. As such there are

two inverse problems associated with it in addition to the direct principle of ME. In the MDI version, the formalism paves the way for the determination of any single entity when the rest of the three are specified. Here we can identify three inverse principles in addition to the direct principle of MDI. In total there are two direct principles and five associated inverse principles, which make it a total of seven principles that are encompassed by the GMEP. The origin of the unfolding of the seven principles is the entropy maximization postulate.

A . Entropy Maximization Postulate

The maximum information-theoretic entropy (or minimum cross-entropy) is always the controlling quantity with respect to all the probabilistic entities that appear in the MEP or MDIP formalisms.

In other words when a probabilistic system is formulated in terms of information-theoretic frameworks of ei- ther MEP or MDI, it is the maximum entropy or minimum cross-entropy that is the sovereign quantity that simultane- ously ensures the most unbiased states of all the probabilistic entities. As a matter of clarification, it should be emphasized that our generalized MEP model does not restrict the measure of entropy to Shannon’s only as in Jayne’s MEP, and to the Kullback-Leibler measure only as in Kullback’s MDIP. This latitude is quite essential for GMEP.

This postulate is certainly true of the direct principle of MEP when it seeks out the maximum entropy distribution out of all the possible distributions. The phlosophical foundation for it however arose out of reasons other than the explicit recognition of the entropy maximization postulate. Also the latter postulate has certainly been valid for the direct principle of MDI that is anchored in the notion of distance of a probability distribution P from Q. The basis for the Entropy Maximization Postulate can be viewed as purely axiomatic, although this “leap in faith” has arisen from a critical observation of the applications of the ME and MDI principles to examples from a wide variety of disciplines.

IV. GMEP (THE ME VERSION)

Let +( .) be a convex function, and let

be the measure of entropy. Let the constraints be

n n

p, = 1 and p,g, (x, ) = a,, r = 1,2,. . . , m . r = l r = l

(20)

Using the method of Lagrange multipliers, we maximize (19) subject to the (rn + 1) constraints in (20), and get an


expression for the first derivative of @ ( P ) as, D. Solution of the Direct Problem

+'(PO =A,+ Algl(Xi)+ '2g2(xi)+ . . . + 'rngm(xr). Given (22) and (23), using Lagrange's method,

- (1 +In pi) = A, + hlx ,

PI = exp [ - xlxr I m p 11 + A o l .

(26)

(27)

(21)

A. The Direct Principle so that

Given the entropy measure @(.) and the constraint mean values of gl(x,), g2(xr); - ., g,(x,), we wish to determine the probability distribution that maximizes the Substituting the first constraint, entropy measure.

Using (21) to substitute into (20), we can solve for the (m + 1) Lagrange multipliers which in turn yield the probabilities pi. See (5).

B. The First Inverse Problem (Determination of Constraints)

Given the entropy measure @(.) and the probability distribution for pi, determine one or more probability constraints that yield the given probability distribution when the entropy measure is maximized subject to these constraints.

Since we know @(pi), we also know @'(pi), and hence the RHS of (21) can be determined. This will allow us to identify the values for g,(x,); . -, g,(x,) by matching terms, and thus, a most unbiased set of constraints (20).

C. The Second Inverse Problem (Determinution of the Entropy Measure)

Given the constraints g,(x,), g2(x,);. ., g,(x,) and the probability distribution pi, determine the most unbiased entropy measure that when maximized subject to the given constraints, yields the given probability distribution.

We substitute the given values into (21) and get a differential equation that can be solved for @(.). Once @ ( - ) is known, we can determine the entropy measure

n

exp [I + A,] = exp [ - A,x,] (28) 1 =1

where A, can be found by substituting into the second constraint,

n

2 = x,exp[ - A,x,] exp[ - A,x,]. (29) r = 1

E. Determination of Constraints (The First Inverse Problem)

The constraints are determined from a knowledge of (22) and (26). Substituting into (21), we get

A0 + A,gl(xi) + . . . + 'mgrn(Xr>

= - (1+ lnp , )

= - l + l n exp[ -px r ]+pxr . (30) n

r = 1

Equating terms, we get

1 A, = p and gl(xi) = x i

W P ) = - rL1@(Pr) . We now illustrate the MEP version of the generalized

Example I p, = 1 and prxr = 2. Let

Hence, the constraints are: maximum entropy principle on the basis of an example. n n

r=l r = 1

@ ( P J = - P , l n P , (22) in which the summation from 1 to n gives the entropy measure. Also, let the constraints be

F* Determination Of the Entropy Measure (Second Inverse Problem)

I7 n c p , = 1 and Z p i x i = 2 r =1 i=l

Consider the following differential equation:

@'( p,) = A, + Alx,, and from (26), (23)

= a + b In p, P r=exP[ -Pxr l c exP[-Px,l (24)

@ ( P i ) = UP, + bP, lnp, - bP, + c K1

is taken as the probability distribution. The Lagrange multiplier p can be found from n 11

H ( P ) = - ~ @ ( p , ) = - ( a - b + c n ) - b ~ p , I n p , 11 r=l 1 =1

,?= c x,exP[-PX,l exP[-Px,l (25) r = l Ignoring the constant, we get the entropy measure

On the basis of the formalism presented earlier, we wish to demonstrate solutions to the one direct and two inverse problems for this specific example.

n

H(P) = - piInpi . i=l


V. GENERALIZED MAXIMUM ENTROPY PRINCIPLE (MDI VERSION)

Let +(-) be a convex function and let n

D ( p : Q ) = q , + ( p r / q l ) (33) I =1

be the measure of directed divergence. Let the constraints be

n t1

C p , = l , C p r g , ( x r ) = a , , r = 1 , 2 , - . . m . (34) r =1 I =1

Minimizing (33) subject to (34), we get

+'( P , / d = - (A, -1)- A1gl(xr) - A 2 g 2 ( x r ) - . . . - 'mgm(X1). (35')

For Kullback-Leibler measure is

1 4 P m = - A 0 - A d x J - A 2 g 2 ( x 1 ) - . . . -'mgm(xr). (35')

A. The Direct Problem (Determination of Probability Distribution)

determines pl, p 2 ; . -, p,. If q, and gl (x , ) , g2(x , ) ; . * , gm(x , ) are known (35(a))

B. First Inverse Problem (Determination of the Constraints)

If pi's, 4,'s and +(.) are known, (35(a)) determines the constraint functions gl( e), g2(.); . e , gm( -).

C. Determination of the Divergence Measure (Second Inverse Problem)

If pi's, 4;'s and g,(xi) are known, (35) determines +(.) and as such determines the divergence measure D ( P : Q).

D. Determination of a priori Distributions (Third Inverse Problem)

the q,'s.

distribution is

Finally, if p,'s, g,'s and +(-) are known, (35) determines

Example 2: It is given that the a posteriori probability

m'

i ! (36)

C p r = l , C ip,=m. (37)

p , = e p m - - , i=0,1,2,3; .- .

The constraints are m 00

r = O r = O

The measure of directed divergence is

CO PI 1 = O 4,

D ( P : Q ) = C p , l n - - .

The a priori probability distribution is

(39) mb , q; = e-"'o- , 1=0,1,2; . . . i !

It has to be shown that if any three of the aforemen- tioned are gven, then the fourth is the most unbiased one. That is, the fourth is such that the observed probability distribution is a minimum discrimination information probability distribution (MDIPD).

E. Solution I ) Direct problem: Given (37), (38) and (39), it is to be

shown that (36) is the most unbiased probability distribution.

Minimizing (38) subject to constraints (37), and using (39), we get

(40) mh

p , = qiabi = epmo-abr, I !

where a and b are obtained by using (37), so that

(bm,)' 00 i ( bm,)' e-moa - = I , a b - m ~ ~ - - m (41) ;=o I ! i = o i !

a = ernopm and b = m / m , . (42) Using (40) and (41),

showing that if we are given (37), (38) and (39), then (36) gives the MDIPD.

2) First inverse problem: Given (36), (38) and (39), it has to be shown that constraints (2) will make (1) the MDIPD.

We can write (36) as

(44)

(45)

where m , m

A, = m - m,, A, = In -.

Comparing with (35b) we find that g, ( x ) = x so that the most unbiased constraints are:

00 m

p , = I , ip, = m . (46) r = l r = O

3) Second inverse problem: Given (36), (37) and (39), it has to be shown that (38) is the measure of directed divergence whose minimization will lead to (36) as the MDIPD.

Let the desired measure of directed divergence be Cr=oq,q,+( p , /q , ) , then minimizing it subject to (37), and using (39), we get

Also from (36) and (39),

(47)


From (47) and (48), we get

= A + B In x, (49) so that

00 = A 00 q f - + B [ PI Oo qf-ln-- PI PI yr"] , = O 4, , = O 4, 4, , = O 4,

00

r = O

m P = A + B p , I n L - B + C . , = O 4,

This gives CT=opr In( p , / q , ) as the most unbiased measure of directed divergence. That is, it is the measure that gives rise to (36) as the MDIPD.

4) Third inoerse problem: Given (36), (37) and (38), it has to be shown that (39) is the most unbiased probability distribution which leads to (36) as the MDIPD.

By minimizing (38) subject to (37), we get

m'

I ! ( 5 2 ) p , = q,ab' or e - , - = qfabr

giving

This shows that (39) gives the most unbiased probability distribution, that is, it is the a priori probability distribution that will make (36) as the MDIPD. It is noted that the a priori distribution is not unique, since m , is arbitrary. Thus any member of the Poisson family of distributions can be regarded as the most unbiased a priori probability distribution since every member leads to (36) as the MDIPD.

VI. UNIQUENESS OF SOLUTIONS ASSOCIATED WITH THE GMEP

Whenever a solution of a problem of this nature is discussed, the discussion will be complete only when we examine the equations of existence and uniqueness. These latter inquiries arise naturally in the context of the seven problems that are enumerated in connection with the generalized maximum entropy principle. Before we proceed with applications of the GMEP in the next section, a brief preamble about existence and uniqueness of the solutions is in order.

The two direct problems, do have unique solutions because of the following observations. The entropy function H ( P ) is a concave function and when it is maximized subject to linear constraints, a local maximum is also a

global maximum. Similarly the measure of directed divergence D( P : Q) is a convex function of both P and Q and when it is minimized under linear constraints, its local minimum is also a global minimum.

However for existence of the solution, the following conditions must be met.

The set of constraints must be consistent, otherwise, there will be no probability distribution satisfying the constraints. Consequently, in such cases, the problem of determining a probability distribution with maximum entropy or minimum directed divergence does not arise. The probabilities p l , p 2 ; . ., p , must all be nonnega- tive. The concavity of H ( P ) and the convexity of D( P: Q) do not automatically ensure the nonnegativity condition. The latter can only be ensured by a suitable choice of the measures H( P) and D( P : Q), or else, the nonnegativity constraints have to be im- posed as additional constraints. The series and integrals that are encountered in the solution process must be convergent.

In the solutions of the MEP and the MDIP, existence of solutions is in general guaranteed and hence not too much attention is paid to the question of uniqueness and existence. Unfortunately we cannot assume the same in connection with the inverse problems. It is well-known that even when solutions of direct problems exist and are unique, the solutions of the inverse problems may not exist, and even if they do exist, they, however, may not be unique. In general even when a solution of the inverse problem is not unique, it belongs to a unique family of solutions. Theorems covering these theoretical questions have been proved elsewhere [12].

In the following sections, we shall illustrate the solutions of inverse problems by means of examples. In each case we shall find only one solution.

However when we obtain a most unbiased set of moment functions go( x), gl( x), g2( x), . . . , g,( x), for a solution, we also must admit as a solution the alternative sets h o ( x ) , hl(x), h 2 ( x ) ; . e , h , ( x ) , which are m + 1 linearly independent functions of g o ( x ) , g , ( x ) ; . e , g,(x). Simi- larly, when we obtain a most unbiased measure of entropy H ( P ) for the solution of an inverse problem, for the constraints just discussed, we also admit the alternative measure

m n

a m ) + c P; c P f g r ( x f ) r = O r = O

where a is greater than zero.

VII. APPLICATIONS OF GMEP

The direct principles due to Jaynes and Kullback, namely the MEP and MDIP, respectively, which are important constittients of the GMEP, are well-established and have been applied in a whole range of disciplines. To mention only a few, there are several useful applications of

1048 IEEE TRANSACTIONS O N SYSTEMS, MAN, AND CYBERNETICS. VOL. 19. NO. 5, SEPTEMBER/OCTOBER 1989

TABLE I

Range Density Function Characterizing M o m z

these principles in statistical mechanics, thermodynamics, marketing, transportation, statistics, operations research, queueing theory, business, economics, pattern recognition, image processing, spectral analysis, computerized tomogra- phy, flexible manufacturing system, life and medical sciences, etc. Some surveys of these applications are already available in Kapur [8] and Kapur and Kesavan [ll]. How- ever the inverse principles associated with the GMEP are also of significance. The next two sections deal with two important inverse principles, first with the determination of the most unbiased set of constraints and second, with the most unbiased entropy measures. Discussion will cen- ter on a number of examples to illustrate the scope of the inverse principles.

A. Finding Most Unbiased Constraints 1) Characterizing moments of probability distributions:

Almost all commonly encountered probability distributions, univariate or multivariate, discrete or continuous, can be obtained as maximum-entropy distributions subject to some simple moments being prescribed. In this discussion, we implicitly assume that they refer to Shannon's entropy measure. Thus, if,

l n f = '0 + ' l g l ( X ) + ' , g 2 ( x ) + . . . + 'rngm(x> then the characterizing moments are

E ( g , ( x ) ) , ~ ( g 2 ( ~ ) ) , * . . 9 E ( g r n ( x ) ) .

Similarly if the probability distribution is

l n ~ i = ' o + ' l g l ( x i ) + ' 2 g , ( x i ) + . . . +'rngrn(xl>; i = 1 , 2 ; . . , n

the corresponding characterizing moments are

E ( g , ( x i ) ) , ' ( g 2 ( x i ) ) 9 . . . , Egrn(X1) Out of all distributions having prescribed values for these moments, the given distribution is considered as a maximum entropy distribution. Thus we have a Table I.

Thus out of all distributions over the range ( - 00,00)

with prescribed values m and a 2 for mean and variance, the normal distribution is the distribution of maximum entropy. Accordingly, the characterizing moments for a normal distribution are mean and variance; for the gamma distribution, the characterizing moments are arithmetic and geometric means.

In Pearson's theory of estimation, only the first four algebraic moments are made use of. Instead if the characterizing moments as suggested by the inverse principle

were made use of, the results would have been akin to Fisher's [2] method of maximum likelihood. We can con- clude by observing that in fitting distributions to given data, the characterizing moments rather than the algebraic moments should be used, a perspective that is reinforced by the inverse principle.

2) Paretos' law of income distribution and the utility function: According to this law [5] , the probability that the income of a person lies between x - i d x and x = i d x is f ( x ) dx , where

- ( @ + I )

f ( x ) = ( 8 ) , x > B

so that In f ( x ) = ( 0 +l)lnB - ( B + l ) l n x . The most ased constraint corresponding to this distribution is

/Ba f ( x ) In x dx = constant.

unbi-

Thus, the constraint is on the geometric rather than on the arithmetic mean of incomes. Since in practice people are more interested in the utility value of income rather than in its monetary value, the average utility constraint

Lm f ( x ) U( x ) dx = constant

is likely to be the valid one. This suggests that in a society in which Pareto's law of income operates, the individuals are likely to be risk averse and opt for the logarithmic utility function. We have used the first inverse principle to gain an insight into the utility function of individuals. Also, we can observe that corresponding to every law of income distribution, there exists a utility function and vice versa.

3) Earthquake frequency magnitude relation: Purcaru [ 151 gave the empirical relation

In N ( x ) = a - bx + K In ( c - x ) , x < c,

where N ( x ) is the number of earthquakes of intensity 2 x withn a region in a certain period of time, and a , b , e, K are regional seismic parameters to be determined empiri- cally.

If T is the total number of earthquakes in this period in the region, then N ( x ) / T can be regarded as the probability of an earthquake of intensity > x . If f ( x ) is the probability density function and F( x ) is the cumulative function, we get

so that 1

T f ( x ) = - [exp(a - bx + K ln(c - x ) ] [ b + K / ( c - x ) ]

or

f ( x ) =exp ( a - l n T / b ) - b x + ( K - l ) l n ( c - x ) [


According to the inverse principle, f ( x ) can be a maximum entropy density function if the constraints are

/ f ( x ) dx =1, /xf ( x ) dx = A , / f ( x ) l n ( c - x) dx = In B

and

f ( x ) l n e + - - x dx= lnC. 1 [ : I In other words the seismic characteristics of the region are expressed by three mean values:

1) mean value of the earthquake intensity; 2) geometric mean of the deficit of t h s intensity from a

3) geometric mean of the deficit of this intensity from

Thus knowing the probability distribution and regarding it as a maximum-entropy distribution, we can proceed to determine the seismic characteristics of the region. If these can be correlated with some physical characterizations of the region, one can expect to gain an insight into the earthquake-frequency magnitude relation through the first inverse principle.

4) Relation between prices and attractiveness indices of brands of a product: Let m,, a , and p, be the market share, attractiveness index and price of the ith brand of a product ( i = I, 2 , . . . , n). If the budget for the purchase of the product is prescribed, we maximize -C:,,m, In m, subject to C:,,rn, =1 and Z:=lm,pl = constant. The result- ing distribution is m, = abP1. However it is observed that the market share is proportional to some power of the attractiveness index of a brand; m, a U:, so that bp1 a a f or p, a In a, , i.e., the price should be proportional to the logarithm of the attractiveness index, thus the appropriate constraint is

value C; and

another value C + K/b.

I 1

1 m, In a , = constant

that says the average of the logarithm of the attractiveness index is a characteristic of the market. Again, this is an insight gleaned by the application of the inverse principle.

5) Closed queueing networks: This theory has important applications in the study of computer networks as well as in flexible manufacturing systems.

Let n1,n2;. . ,n, be the number of jobs in the m queues of the network. It has been shown by summing Poisson arrivals and exponential service-time distribution, that

, = 1

where xl, x2;. ., x, depend on arrival and service rates and A is a normalizing constraint. This product-form probability distribution matches with observations whereas there are discrepancies in the assumed arrival and service- time distributions. Consequently, attempts have been made to identify other arrival and service distributions that could explain the right product-form probability distribu-

tion. In other words, these investigations were intended to explain the “robustness” of the product-form distribution [lo].

From the theory underlying the inverse principle, we consider the product form distribution as the maximum entropy distribution and attempt to determine the matching characterizing constraints. We get,

lnp(x, ,x , ; . . ,x , ) = l n A + n , l n x , + n2 In x, + . . . + n, In x,

so that the most unbiased constraints are

E ( n l ) = L , , E ( n , ) = L 2 ; . . , E ( n , ) =L,.

This means that the mean queue lengths L,, L,; . ., L,, must be prescribed, and these are easily observed specifica- tions.

The important conclusion of this example from the viewpoint of the inverse principle is that in queueing theory we should try to base as many of our results as possible on the observations of the queues themselves rather than on observations about arrival and service distributions.

6) Summary of applications of the first inverse principle: From a discussion of the foregoing examples, we can summarize as follows 1) Probabilistic systems are governed by probabilistic laws. 2) These laws are invariably stated in terms of their probabilistic constraints. 3) These constraints have to be determined in order to understand the behavior of the systems. 4) Almost every probability distribution is a maximum entropy distribution for some characterizing moments. 5 ) These can be determined by using the first inverse principle. 6) The characterizing moments have to be finally interpreted in terms of the specific system in which the observed probability distribution has arisen.

VIII. DETERMINATION OF THE MOST UNBIASED MEASURES OF ENTROPY

A. General Comments

Shannon’s measure of entropy has dominated the discussion of almost all applications of Jaynes’ maximum entropy principle so much that an application of MEP is inextricably linked with this measure.

Undoubtedly, Shannon’s measure is a very natural one and has been found useful in a number of applications. However the second inverse principle pertaining to entropy measures affords a wider freedom of choice as dictated by the triangular relationship that exists between the entropy measure, the constraints and the probability distribution under the controlling influence of the entropy maximization postulate. T h s freedom of choice of different entropy measures is precisely what is required in explaining the different models that appear in the study of a problem. For instance, it is well-known that there exist many models of population growth, of innovation diffusion, of spread of epidemics, of queueing systems, etc. A variety of models implies a variety of measures of entropy.


Insistence on the Shannon's measure of entropy in all cases would mean the use of biased constraints as distin- guished from characterizing constraints. The former set will invariably be more complicated than the latter.

B. Most Unbiased Estimate for a Missing Value

An experimenter makes ( N + 1) observations out of which x l , x 2 , . . -, x , are available and x is missing. We have to find an estimate of x on the basis of the n values x , , x 2 , . , x , and on no other knowledge. Accordingly, we form the probability distribution

T = x1 + x 2 + . . * + X , x1 x2 X n X - - ... ~ -.

T + X ' T + X ' ' T + x ' T + ~ ' and maximize its Shannon's measure of entropy

-f- XI X i X X In---ln-

, = , T + x T + x T + x T + x

to get the estimate

2 ( x ; ' x ; 2 , . . . , X n X " ) l / T .

But this estimate is not considered to be natural since it is always greater than arithmetic mean X that is viewed as a more natural estimate. Therefore we proceed to identify that entropy measure which, when maximized, will lead to X as the estimate. This inquiry is in accordance with the second inverse principle. Let p i ) , where + ( e ) is a convex function, be the measure of entropy. Maximizing it we get

n , Y \

The previous equation has to be satisfied by x = X, so that we get the functional equation

to solve for f ( . ) . It is easily verified that t h s solution is l / x so that

1 1 n

f ( x ) = ; - +'( x ) = - - +( x ) = In x - H = In p , . 1 = 1 X

H is a concave function, but it is always negative. This does not matter since we are here maximizing H and, consequently, are interested in only the relative values of entropy. This entropy measure can be used when each x , > 0, a requisite condition even from the maximization of Shannon's entropy. The continuous version of this entropy measure, namely,

has been extensively used in nonlinear spectral analysis and image processing [4].

C. Entropy Measure for Maximum Likelihood Principle

Let f ( x , 8 ) be the density function for a random variate x . Let x l , x Z ; . ., x , be a random sample from this popula-

tion. It is desired to obtain an estimate 4 for 8 in terms of X I , x2 , . . ., x, .

Fisher [ 2 ] has suggested that we choose that 8 which maximizes the likelihood function

L(X1, X 2 , ' . - 9 x n ; 6) = f ( x * , 8 ) f ( x * , ~ ) , - . . , f ( x , J > .

In terms of our present discussion bearing on the second inverse principle, we proceed to determine a measure of entropy whose maximization will yield the same result as the maximization of L:

n

H = ln f (x i ,8 )

which is the same measure that was obtained in the previous example.

In general in the use of the MEP, we seek to determine p 1 , p 2 ; . - , p n . In this problem, we know only the form function f for the probabilities and the unknown quantity is 8. Accordingly, we choose 6' in such a way as to maximize the measure of entropy.

In light of this discussion, we have established the connection between the well-established Fisher's principle of maximum likelihood and the second inverse principle of our model. It suggests that we use this new measure of entropy rather than Shannon's, since it is the most unbiased measure for the given problem.

i = l

D. Logistic Law of Population Growth [2]

entropy measure Let N ( t ) be the population at time t . If we maximize the

- / N ( t ) l n N ( t ) d t

subject to

/ N ( t ) d t = K ,

/ t N ( t ) dt = K ,

we get

(total population constraint)

(average age constraint)

N ( t ) = N ( 0 ) e A ' .

This is the Malthus law of population growth, but this cannot be valid for all time since it will lead to an infinite population size.

Now suppose that the population follows the logistic law of growth [7]

dN - = N ( t ) ( a - b N ( t ) ) . dt

We wish to determine the corresponding entropy measure for the given constraints, guided by the second inverse principle.

If

S = - / + [ N ( t ) ] dt


where cp( .) is a convex function, then cp’[ N ( t ) ] = a + ct , and

dN

dt cp”[N(t)]-=c.

This will be the same as the logistic law, if K

+ ” ( N ) = N ( a - b N ) ’

This leads to the entropy measure

H = - / N ( t ) In N ( t ) dt

1 b

- - / ( a - bN( t ) ) In ( a - bN( t ) d t )

that corresponds to Kapur’s [6 , 91 measure of the entropy. Since the logistic model implies an upper limit a / b for the population, we need a measure of entropy that also allows an upper limit a / b for N ( t ) and this is exactly what this new measure of entropy provides for.

E. Bass Model of Innovation Diffusion

Let f ( t ) be the proportion of all potential firms that have adopted a new technology until time t . For this Bass [l] gave the following innovation diffusion model

We wish to determine the entropy measure

- J d f ( t ) l dt

for the given constraints

/ f ( t ) dt = K , and tf ( t ) dt = K , .

Maximizing the entropy measure subject to the constraints, we obtain the Bass model

( g ’ [ f ( t ) ] = a + b t

df +” [ f ( t )] - = b. dt

Comparing with the Bass model, we get P + 4

+ Y f ) = ( p + qf )(1 - f ) . Integrating,

1 P + 4 f -cpt(f)=ln- b 1 - f

Therefore the corresponding entropy measure is

S = - / [ l - f ( t ) ] l n [ 1 - f ( t ) ] d t

Similarly for every innovation diffusion model, we can determine a corresponding entropy measure. Had we in- sisted on using Shannon’s measure only, in the particular model we discussed, we would have been led only to the unrealistic exponential growth model.

F. Bose -Einstein and Fermi - Dirac Distributions

Kapur [6] was the first to arrive at the entropy measures suggested by the second inverse principle although the latter was not explicitly formulated at the time.

Let p l , p 2 ; . ., p,, be the probabilities of a particle being in the n energy states with energies cl, c2; . * , c n . For the constraints, C:,,p, =1 and C:=,pIzl = E^, maximization of the Shannon’s measure -C:,,p, In p , would lead to the Maxwell-Boltzmann distribution

e - Pc1

P I = n e - P %

1 =1

where n

c r e - p z i

- n

r = l - €.

e - w 1 = 1

We wish to determine those measures of entropy, by maximizing which subject to the same constraints, we can obtain the Bose-Einstein and Fermi-Dirac distributions of the

where

It can

form

1 P I = e h + p i , + 1 -

the constants h and p are determined by using

= f . e X + p c , = p and e X + p r , +l - A

“ 1 ” € 1

r = l i = l

be shown that if

PI = e x + p t , 1 + a ’

the corresponding unbiased entropy measure is n 1 “

- C p , I n p , + - ( l + a p , ) l n ( l + a p r ) ; a 2 - 1 .

These measures of entropy have later been found to be applicable in other problems also.

I =1 a i = l

IX. CONCLUSION

Jaynes’ maximum entropy principle (MEP) and Kullback‘s minimum discrimination information principle (MDI), which we have referred to as the direct principles in the broader context of the generalized maximum en- trODV DrinciDle (GMEP), have found useful applications in

1052 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 19, NO. 5, SEPTEMBER/OCTOBER 1989

a wide variety of problems transcending all disciplinary boundaries. But there are other applications, as illustrated by several examples in this paper, where the focus is on the determination of the remaining probabilistic entities. The first inverse problem is addressed to the determination of a set of unbiased constraints. The second inverse problem focuses on determining the most unbiased entropy measure when the other two probabilistic entities are given. The deductive methodologies which emanate from the GMEP are rendered possible by our entropy maximization postulate. The methodologies underlying the two inverse principles are by no means trivial, but they do provide a way for tackling the problems that come under their do- main. The fresh perspective offered by the GMEP offers scope for the rigorous development of methodologies for the solutions of a wider class of problems than is now possible with the MEP and MDI.

REFERENCES

F. M. Bass, “A new product growth model for consumer durables,” Munugement Sri.. vol. 15, pp. 215-227,1969. R. A. Fisher, “On the mathematical foundations of theoretical statistics,” Phil. Trans. Roy. Soc., vol. 222(A), pp. 309-368, 1921. E. T. Jaynes, “Information theory and statistical mechanics,” Phys- rcul Reviews, vol. 106, pp. 620-630, 1957. R. W. Johnson and J. E. Shore, “Which is the better entropy expression for speech processing: log S or S log S?,” IEEE Trans. Acowt. Speech Signal Processing, vol. 32, no. 1, pp. 129-137. N. C. Kakwani Income, Inequality and Poverty. London: Oxford University Press, 1980. J. N. Kapur “Measures of uncertainty, mathematical programming and physics,” J . Ind. Soc. Agri. Stat., vol. 24, pp. 41-66, 1972. -, “Derivation of logistic law of population growth from maximum entropy principle,” Nat. Acad. Sci. Lett., vol. 6, no. 12, pp. 429-433, 1983.

-, “Twenty-five years of maximum entropy,” J . Muth. Phys. Sci., vol. 17, no. 2, pp. 103-156, 1983. -, “Four families of measures of entropy,” Ind. J . pure and Appl. Maths., vol. 17, no. 4, pp. 429-449, 1986. J. N. Kapur, and V. Kumar, “A new derivation of the product-form probability distribution,” Ind. J . of Munug. Syst., vol. 1, no. 3, pp.

J. N. Kapur and H. K. Kesavan, Generalized Maximum Entropy Principle (with Applications). Waterloo, Canada: Sandford Edu- cational Press, 1987. -, “On the families of solutions of inverse maximum entropy and minimum cross entropy problems,” Internal report, University of Waterloo, Waterloo, OT, Canada, 1988. S. Kullback, Information Theory and Statistics. New York: John Wiley, 1959. S. Kullback and R. A. Leibler, “On information and sufficiency,” Ann. Math. Slut., vol. 22, pp. 79-86, 1951. C. Purcaru, “A new magnitude-frequency relation for earthquakes and a classification of relation types,” Geoph. J . Roy. Ast . Soc.,

C. E. Shannon, “A mathematical theory of communication,” Bell System Tech. J . , vol. 27, pp. 379-423, 623-659, 1948.

109-118, 1985.

vol. 42, pp. 61-79, 1975.

H. K. Kesavan graduated from the University of Illinois, Urbana and Michigan State University, East Lansing with the M.S. and the Ph.D. degrees, in electrical engineering, respectively.

His administrative duties have included Chairman of the Electrical Engineering department at the University of Waterloo, Waterloo, OT, Canada, Chairman of the Electrical Engineering Department, I.I.T., Kanpur, India, and Chairman of the Department of Systems Design Engineering at Waterloo. He is currently a Professor at the University of Waterloo.

J. N. Kapur is presently a Professor Emeritus at Jawaharlal University, Delhi, India. He has amongst other important posts served as the Chairman of the Department of Mathematics at IIT, Kanpur, and Vice-Chancellor of Meerut University, India. He has also served as the President of the Indian Mathematical Society. He is extremely active in the furtherance of mathematical education of India in various important capacities, and has published extensively in the mathematical literature.

Documents

The generalized maximum entropy principle