11
1924 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997 A Framework for Linear Information Inequalities Raymond W. Yeung, Senior Member, IEEE Abstract— We present a framework for information inequali- ties, namely, inequalities involving only Shannon’s information measures, for discrete random variables. A region in , denoted by , is identified to be the origin of all information in- equalities involving random variables in the sense that all such inequalities are partial characterizations of . A product from this framework is a simple calculus for verifying all unconstrained and constrained linear information identities and inequalities which can be proved by conventional techniques. These include all information identities and inequalities of such types in the literature. As a consequence of this work, most identities and inequalities involving a definite number of random variables can now be verified by a software called ITIP which is available on the World Wide Web. Our work suggests the possibility of the existence of information inequalities which cannot be proved by conventional techniques. We also point out the relation between and some important problems in probability theory and information theory. Index Terms—Entropy, -Measure, information identities, in- formation inequalities, mutual information. I. INTRODUCTION S HANNON’S information measures refer to entropies, con- ditional entropies, mutual informations, and conditional mutual informations. For information inequalities, we refer to those involving only Shannon’s information measures for discrete random variables. These inequalities play a central role in converse coding theorems for problems in information theory with discrete alphabets. This paper is devoted to a sys- tematic study of these inequalities. We begin our discussion by examining the two examples below which exemplify what we call the “conventional” approach to proving such inequalities. Example 1: This is a version of the well-known data pro- cessing theorem. Let be random variables such that - - - - form a Markov chain. Then In the above, the second equality follows from the Markov condition, while the inequality follows because is always nonnegative. Manuscript received August 10, 1995; revised February 10, 1997. The material in this paper was presented in part at the 1996 IEEE Information Theory Workshop, Haifa, Israel, June 9–13, 1996. The author is with the Department of Information Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong. Publisher Item Identifier S 0018-9448(97)06816-8. Example 2: The inequalities above follow from the nonnegativity of , , and , respectively. In the conventional approach, we invoke certain elementary identities and inequalities in the intermediate steps of a proof. Some frequently invoked identities and inequalities are if - - - - if - - - - Proving an identity or an inequality using the conventional approach can be quite tricky, because it may not be easy to see which elementary identity or inequality should be invoked next. For certain problems, like Example 1, we may rely on our insight to see how we should proceed in the proof. But of course, most of our insight in problems is developed from the hindsight. For other problems like or even more complicated than Example 2 (which involves only three random variables), it may not be easy at all to work it out by brute force. The proof of information inequalities can be facilitated by the use of information diagrams 1 [25]. However, the use of such diagrams becomes very difficult when the number of random variables is more than four. 1 It was called an -diagram in [25], but we prefer to call it an information diagram to avoid confusion with an eye diagram in communication theory. 0018–9448/97$10.00 1997 IEEE

Framework for Linear Information Inequality

Embed Size (px)

DESCRIPTION

Paper

Citation preview

Page 1: Framework for Linear Information Inequality

1924 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997

A Framework for Linear Information InequalitiesRaymond W. Yeung,Senior Member, IEEE

Abstract—We present a framework for information inequali-ties, namely, inequalities involving only Shannon’s informationmeasures, for discrete random variables. A region inIR2 �1,denoted by��, is identified to be the origin of all information in-equalities involvingn random variables in the sense that all suchinequalities are partial characterizations of ��. A product fromthis framework is a simple calculus for verifying all unconstrainedand constrained linear information identities and inequalitieswhich can be proved by conventional techniques. These includeall information identities and inequalities of such types in theliterature. As a consequence of this work, most identities andinequalities involving a definite number of random variables cannow be verified by a software called ITIP which is available onthe World Wide Web. Our work suggests the possibility of theexistence of information inequalities which cannot be proved byconventional techniques. We also point out the relation between�� and some important problems in probability theory and

information theory.

Index Terms—Entropy, I-Measure, information identities, in-formation inequalities, mutual information.

I. INTRODUCTION

SHANNON’S information measures refer to entropies, con-ditional entropies, mutual informations, and conditional

mutual informations. For information inequalities, we referto those involving only Shannon’s information measures fordiscrete random variables. These inequalities play a centralrole in converse coding theorems for problems in informationtheory with discrete alphabets. This paper is devoted to a sys-tematic study of these inequalities. We begin our discussion byexamining the two examples below which exemplify what wecall the “conventional” approach to proving such inequalities.

Example 1: This is a version of the well-known data pro-cessing theorem. Let be random variables such that

- - - - form a Markov chain. Then

In the above, the second equality follows from the Markovcondition, while the inequality follows because isalways nonnegative.

Manuscript received August 10, 1995; revised February 10, 1997. Thematerial in this paper was presented in part at the 1996 IEEE InformationTheory Workshop, Haifa, Israel, June 9–13, 1996.

The author is with the Department of Information Engineering, The ChineseUniversity of Hong Kong, Shatin, N.T., Hong Kong.

Publisher Item Identifier S 0018-9448(97)06816-8.

Example 2:

The inequalities above follow from the nonnegativity of, , and , respectively.

In the conventional approach, we invoke certain elementaryidentities and inequalities in the intermediate steps of a proof.Some frequently invoked identities and inequalities are

if - - - -

if - - - -

Proving an identity or an inequality using the conventionalapproach can be quite tricky, because it may not be easy tosee which elementary identity or inequality should be invokednext. For certain problems, like Example 1, we may rely onour insight to see how we should proceed in the proof. But ofcourse, most of our insight in problems is developed from thehindsight. For other problems like or even more complicatedthan Example 2 (which involves only three random variables),it may not be easy at all to work it out by brute force.

The proof of information inequalities can be facilitated bythe use ofinformation diagrams1 [25]. However, the use ofsuch diagrams becomes very difficult when the number ofrandom variables is more than four.

1It was called anI-diagram in [25], but we prefer to call it aninformationdiagram to avoid confusion with an eye diagram in communication theory.

0018–9448/97$10.00 1997 IEEE

Page 2: Framework for Linear Information Inequality

YEUNG: A FRAMEWORK FOR LINEAR INFORMATION INEQUALITIES 1925

In the conventional approach, elementary identities andinequalities are invoked in a sequential manner. In the newframework that we shall develop in this paper, all identitiesand inequalities are considered simultaneously.

Before we proceed any further, we would like to make afew remarks. Let and be any expressions dependingonly on Shannon’s information measures. We shall call theminformation expressions, and specifically linear informationexpressions if they are linear combinations of Shannon’sinformation measures. Likewise, we shall call inequalitiesinvolving only Shannon’s information measures informationinequalities. Now if and only if .Therefore, if for any expression we can determine whether itis always nonnegative, then we can determine whether anyparticular inequality always holds. We note that ifand only if and . Therefore, it suffices tostudy inequalities.

The rest of the paper is organized as follows. In the nextsection, we first give a brief review of-Measure [25] onwhich a few proofs will be based. In Section III, we introducethecanonical formof an information expression and discuss itsuniqueness. We also define a region calledwhich is centralto the discussion in this paper. In Section IV, we presenta simple calculus for verifying information identities andinequalities which can be proved by conventional techniques.In Section V, we further elaborate the significance of bypointing out its relations with some important problems inprobability theory and information theory. Concluding remarksare given in Section VI.

II. REVIEW OF THE THEORY OF -MEASURE

In this section, we give a review of the main resultsregarding -Measure. For a detailed discussion of-Measure,we refer the reader to [25]. Further results on-Measure canbe found in [7].

Let be jointly distributed discrete randomvariables, and be a set variable corresponding to a randomvariable . Define the universal set to be and let

be the -field generated by . Theatoms of have the form , where is either or .Let be the set of all atoms of except for ,which is by construction because

(1)

Note that .To simplify notations, we shall use to denote

and to denote . Let . It wasshown in [25] that there exists a uniquesignedmeasure on

which is consistent with all Shannon’s information measuresvia the following formal substitution of symbols:

( ), i.e., for any (not necessarily disjoint)

(2)

When , we interpret (2) as

(3)

When , (2) becomes

(4)

When and , (2) becomes

(5)

Thus (2) covers all the cases of Shannon’s information mea-sures.

Let

for some (6)

Note that . Let . Definearbitrary one-to-one mappings

and let

(7)

(8)

where and for .Then

(9)

where is a unique matrix (independent of) with

ifif

(10)

An important characteristic of is that it is invertible [25],so we can write

(11)

In other words, is completely specified by the set of values, , namely, all the joint entropies involving

, and it follows from (5) that is the uniquemeasure on which is consistent with all Shannon’s infor-mation measures. Note that in general is not nonnegative.However, if form a Markov chain, is alwaysnonnegative [7].

As a consequence of the theory of-Measure, the infor-mation diagram was introduced as a tool to visualize therelationship among information measures [25]. Applicationsof information diagrams can be found in [7], [25], and [26].

Page 3: Framework for Linear Information Inequality

1926 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997

III. T HE CANONICAL FORM

In the rest of the paper, we shall assume thatare the random variables involved in our discussion.

We observe that conditional entropies, mutual informations,and conditional mutual informations can be expressed as alinear combination of joint entropies by using the followingidentity:

(12)

where . Thus any information expression canbe expressed in terms of the joint entropies. We call this thecanonical formof an information expression.

Now for any their joint entropies correspondto a vector in , where we regard as thecoordinates of . On the other hand, a vector in issaid to beconstructibleif there exist whose jointentropies are given by. We are then motivated to define

is constructible

As we shall see, not only gives a complete characterizationof all information inequalities, but it also is closely related tosome important problems in probability theory and informationtheory. Thus a complete characterization ofis of fundamen-tal importance. To our knowledge, there has not been such acharacterization in the literature (see Section V).

Now every information expression can be expressed incanonical form. A basic question to ask is in what sensethe canonical form is unique. Toward this end, we shall firstestablish the following theorem.

Theorem 1: Let be measurable such thathas zero Lebesque measure. Thencannot

be identically zero on .We shall need the following lemma which is immediate

from the discussion in [26, Sec. 6]. The proof is omitted here.Lemma 1: Let

is constructible

(cf., (11)). Then the first quadrant of is a subset of .Proof of Theorem 1:If has positive Lebesque mea-

sure, since has zero Lebesque measure andhence has zero Lebesque measure,

has positive Lebesque measure. Thencannot be a subset of , which implies that

cannot be identically zero on . Thus it suffices to provethat has positive Lebesque measure. Using the aboveLemma, we see that the first quadrant of , which haspositive Lebesque measure, is a subset of. Thereforehas positive Lebesque measure. Since is an invertiblelinear transformation of , its Lebesque measure must alsobe positive. This proves the theorem.

The uniqueness of the canonical form for very generalclasses of information expressions follows from this theorem.For example, suppose and are two polynomials ofthe joint entropies such that for all . Let

. If is not the zero function, thenhas zero Lebesque measure. By the theorem,cannot beidentical to zero on , which is a contradiction. Thereforeisthe zero function, i.e., . Thus we see that the canonicalform is unique for polynomial information expressions. Wenote that the uniqueness of the canonical form for linearinformation expressions has been discussed in [4] and [2, p.51, Theorem 3.6].

The importance of the canonical form will become clearin the next section. An application of the canonical form torecognizing the symmetry of an information expression willbe discussed in Appendix II-A. We note that any invertiblelinear transformation of the joint entropies can be used forthe purpose of defining the canonical form. Nevertheless, thecurrent definition of the canonical form has the advantagethat if and are two sets of random variables suchthat , then the joint entropies involving the randomvariables in is a subset of the joint entropies involving therandom variables in .

IV. A CALCULUS FOR VERIFYING

LINEAR IDENTITIES AND INEQUALITIES

In this section, we shall develop a simple calculus for verify-ing all linear information identities and inequalities involvinga definite number of random variables which can be provedby conventional techniques. All identities and inequalitiesin this section are assumed to be linear unless otherwisespecified. Although our discussion will primarily be on linearidentities and inequalities (possibly with linear constraints),our approach can be extended naturally to nonlinear cases.For nonlinear cases, the amount of computation required islarger. The question of what linear combinations of entropiesare always nonnegative was first raised by Han [5].

A. Unconstrained Identities

Due to the uniqueness of the canonical form for linearinformation expressions as discussed in the preceding section,it is easy to check whether two expressions and areidentical. All we need to do is to express in canonicalform. If all the coefficients are zero, then and areidentical, otherwise they are not.

B. Unconstrained Inequalities

Since all information expressions can be expressed in canon-ical form, we shall only consider inequalities in this form.The following is a simple yet fundamental observation whichapparently has not been discussed in the literature.

For any , always holds if andonly if .

This observation, which follows immediately from the def-inition of , gives a complete characterization of all un-constrained inequalities (not necessary linear) in terms of.From this point of view, an unconstrained inequality is simplya partial characterization of .

The nonnegativity of all Shannon’s information measuresform a set of inequalities which we shall refer to as thebasic

Page 4: Framework for Linear Information Inequality

YEUNG: A FRAMEWORK FOR LINEAR INFORMATION INEQUALITIES 1927

inequalities. We observe that in the conventional approachto proving information inequalities, whenever we establishan inequality in an intermediate step, we invoke one of thebasic inequalities. Therefore, all information inequalities andconditional information identities which can be proved byconventional techniques are consequences of the basic inequal-ities. These inequalities, however, are not nonredundant. Forexample, and , which are bothbasic inequalities of the random variablesand , imply

again a basic inequality of and .We shall be dealing with linear combinations whose co-

efficients are nonnegative. We call such linear combinationsnonnegative linear combinations. We observe that any Shan-non’s information measure can be expressed as a nonnegativelinear combination of the following twoelemental formsofShannon’s information measures:

i)ii) , where and .

This can be done by successive (if necessary) application(s)of the following identities:

(13)

(14)

(15)

(16)

(17)

(18)

(Note that all the coefficients in the above identities arenonnegative.) It is easy to check that the total number ofShannon’s information measures of the two elemental formsis equal to

(19)

The nonnegativity of the two elemental forms of Shannon’sinformation measures form a proper subset of the set ofbasic inequalities. We call the inequalities in this smaller setthe elemental inequalities. They are equivalent to the basicinequalities because each basic inequality which is not anelemental inequality can be obtained by adding a certain setof elemental inequalities in view of (13)–(18). The minimalityof the elemental inequalities is proved in Appendix I.

If the elemental inequalities are expressed in canonical form,then they become linear inequalities in . Denote this set ofinequalities by , where is an matrix, anddefine

(20)

Since the elemental inequalities are satisfied by any, we have . Therefore, if

then

i.e., always holds.Let , be the column -vector whose th

component is equal to and all the other components areequal to . Since a joint entropy can be expressed as anonnegative linear combination of the two elemental formsof Shannon’s information measures, eachcan be expressedas a nonnegative linear combination of the rows of. Thisimplies that is a pyramid in the positive quadrant.

Let be any column -vector. Then , alinear combination of joint entropies, is always nonnegative if

. This is equivalent to say that the minimumof the problem

(Primal) Minimize subject to

is zero. Since gives ( is the only cornerof ), all we need to do is to apply the optimality test ofthe simplex method [19] to check whether the pointis optimal.

We can obtain further insight in the problem from theDuality Theorem in linear programming [19]. The dual of theabove linear programming problem is

(Dual) Maximize subject to and

where

By the Duality Theorem, the maximum of the dual problem isalso zero. Since the cost function in the dual problem is zero,the maximum of the dual problem is zero if and only if thefeasible region

and (21)

is nonempty.Theorem 2: is nonempty if and only if for

some , where is a column -vector, i.e., is anonnegative linear combination of the rows of.

Proof: We omit the simple proof that is nonempty ifand only if for some , where is a column

-vector. Let

(22)

If for some , then . Let

Since can be expressed as a nonnegative linearcombination of the rows of

(23)

can also be expressed as a nonnegative linear combinationsof the rows of . By (22), this implies for some

.Thus always holds (subject to ) if and

only if it is a nonnegative linear combination of the elementalinequalities (in canonical form).

Page 5: Framework for Linear Information Inequality

1928 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997

We now summarize the results in this section. For infor-mation expressions and , let be thecost function subject to the elemental inequalities. Then applythe optimality test of the simplex method to the point .If is optimal, then always holds. Ifnot, then may or may not always hold. If italways holds, it is not implied by the elemental inequalities. Inother words, it cannot be proved byconventionaltechniques,namely, invoking the elemental inequalities.

Han has previously studied unconstrained information in-equalities involving three random variables [5] as well asinformation inequalities which are symmetrical in all therandom variables involved [6], and explicit characterizations ofsuch inequalities were obtained. A discussion of these resultsis found in Appendix II.

C. Constrained Inequalities

Linear constraints on arise frequently in infor-mation theory. Some examples are

1) , , and are mutually independent if and only if

2) , , and are pairwise-independent if and only if

3) is a function of if and only if .4) - - - - - - form a Markov chain if and only

if and .

In order to facilitate our discussion, we now introducean alternative set of notations for . We do notdistinguish elements and singletons of, and we write unionsof subsets of as juxtapositions. For any nonempty ,we use to denote , i.e., (refer to SectionII for the definition of ). We also define for nonempty

to simplify notations.In general, a constraint is given by a subsetof . For

instance, for the last example above,

When , there is no constraint. (In fact, there isno constraint if .) Parallel to our discussion in thepreceding subsection, we have the following more generalobservation:

Under the constraint , for any ,always hold if and only if .

Again, this gives a complete characterization of all constrainedinequalities in terms of . Thus in fact is the origin of allconstrained inequalities, with unconstrained inequalities beinga special case. In this and the next subsection, however, weshall confine our discussion to the linear case.

When is a subspace of , we can easily modify themethod in the last subsection by taking advantage of the linearstructure of the problem. Let the constraints on begiven by

(24)

where is a matrix (i.e., there are constraints).Following our discussion in the last subsection, a linearcombination of joint entropies is always nonnegativeunder the constraint if the minimum of the problem

Minimize subject to and

is zero.Let be the rank of . Since is in the null space of ,

we can write

(25)

where is a matrix whose columns form a basisof the orthogonal complement of the row space of, andis a column -vector. Then the elemental inequalitiescan be expressed as

(26)

and in terms of , becomes

(27)

which is a pyramid in (but not necessarily in the positivequadrant). Likewise, can be expressed as .

With the constraints and all expressions in terms of,is always nonnegative under the constraint if theminimum of the problem

Minimize subject to

is zero. Again, since gives ( is theonly corner of ), all we need to do is to apply the optimalitytest of the simplex method to check whether the pointis optimal.

By imposing the constraints in (24), the number of ele-mental inequalities remains the same, while the dimensionof the problem decreases from to . Again from theDuality Theorem, we see that is always nonnegative if

for some , where is a column -vector, i.e., is a nonnegative linear combination of theelemental inequalities (in terms of).

We now summarize the results in this section. Let theconstraints be given in (24). For expressions and , let

. Then let be the cost function subjectto the elemental inequalities (in terms of) and apply theoptimality test to the point . If is optimal,then always holds, otherwise it may or maynot always hold. If it always holds, it is not implied by theelemental inequalities. In other words, it cannot be proved byconventional techniques.

Page 6: Framework for Linear Information Inequality

YEUNG: A FRAMEWORK FOR LINEAR INFORMATION INEQUALITIES 1929

D. Constrained Identities

We impose the constraints in (24) as in the last subsection.As we have pointed out at the beginning of the paper, twoinformation expressions and are identical if and only if

and always hold. Thus we can apply themethod in the last subsection to verify all constrained identitiesthat can be proved by conventional techniques.

When are unconstrained, the uniqueness of thecanonical form for linear information expressions asserts that

if and only if . However, when the constraintsin (24) on are imposed, does not imply

. We give a simple example to illustrate this point.Suppose and we impose the constraint .Then every information expression can be expressed in termsof and . Now consider

(28)

Note that the coefficients in the above expression are nonzero.But from the elemental inequalities, we have

(29)

and

(30)

which imply that .We now discuss a special application of the method de-

scribed in this subsection. Let us consider the followingproblem which is typical in probability theory. Suppose we aregiven that - - - - and - - - - form a Markovchain, and that and are independent. We ask whether

and are always independent. This problem can beformulated in information-theoretic terms with the constraintsrepresented by , , and

, and we want to know whether they imply.

Problems of such kind can be handled by the methoddescribed in this subsection. Our method can prove anyindependence relation which can be proved by conventionalinformation-theoretic techniques. The advantage of using aninformation-theoretic formulation of the problem is that we canavoid manipulations of the joint distribution directly, which isawkward [8], if not difficult.

It may be difficult to devise a calculus to handle inde-pendence relations of random variables in a general setting,2

because an independence relation is “discrete” in the sensethat it is either true or false. On the other hand, the problembecomes a continuous one if it is formulated in information-theoretic terms (because mutual informations are continuousfunctionals), and continuous problems are in general lessdifficult to handle. From this point of view, the problem ofdetermining independence of random variables is a discreteproblem embedded in a continuous problem.

2A calculus for independence relations has been devised by Massey [9] forthe special case when the random variables have a causal interpretation.

V. FURTHER DISCUSSION ON

We have seen that , but it is not clear whether. If so, and hence all information inequalities are

completely characterized by the elemental inequalities. In thefollowing, we shall use the notations and when werefer to and for a specific .

For

In -Measure notations, the elemental inequalities are, , and .

It then follows from Lemma 1 that .Inspired by the current work, the characterization ofandhas recently been investigated by Zhang and Yeung. They

have found that (therefore in general), but, the closure of , is equal to [29]. This implies that all

unconstrained (linear or nonlinear) inequalities involving threerandom variables are consequences of the elemental inequal-ities of the same set of random variables. However, it is notclear whether the same is true for all constrained inequalities.They also have discovered the following conditional inequalityinvolving four random variables which is not implied by theelemental inequalities:

If and , then

If, in addition,

and

the above inequality implies that

This is a conditional independence relation which is notimplied by the elemental inequalities. However, whether

remained an open problem. Subsequently, they have de-termined that by discovering the following uncon-strained inequality involving four random variables which isnot implied by the elemental inequalities of the same set ofrandom variables [30]:

The existence of the above two inequalities indicates thatthere may be a lot of information inequalities yet to bediscovered. Since most converse coding theorems are provedby means of information inequalities, it is plausible that someof these inequalities yet to be discovered are needed to settlecertain open problems in information theory.

In the remainder of the section, we shall further elaborateon the significance of by pointing out its relations withsome important problems in probability theory and informationtheory.

Page 7: Framework for Linear Information Inequality

1930 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997

A. Conditional Independence Relations

For any fixed number of random variables, a basic questionis what sets of conditional independence relations are possible.In the recent work of Matus and Studeny [17], this problemis formulated as follows. Recall that and let

be the family of all couples where andis the union of two, not necessarily different, singletonsand

of . Having a system of random variableswith subsystems , , we introduce thenotation

where is the abbreviation of the statement “is conditionally independent of given .” For ,

means is determined by . The subsystemis presumed to be constant. A subfamily is calledprobabilistically ( -) representableif there exists a system,called a -representation, such that . The problem isto characterize the class of all -representable relations.Note that this problem is more general than the applicationdiscussed in Section IV-D.

Now is equivalent to . Ifis a proper subset of , i.e., is not of

elemental form, then can be written as a nonnegativecombination of the corresponding elemental forms of Shan-non’s information measure. We observe thatif and only if each of the corresponding elemental formsof Shannon’s information measures vanishes, and that anelemental form of Shannon’s information measure vanishesif and only if the corresponding conditional independencerelation holds. Thus it is actually unnecessary to consider

, , for separately because itis determined by the other conditional independence relations.

Let us now look at some examples. Forand

As pointed out in the last paragraph, the couplesare actually redundant. Let be a

system of random variables such that is notdeterministic, , , and and are notfunctions of each other. Then it is easy to see that

Thus is -representable.On the other hand, is not -representable, because and imply .

The recent studies on the problem of conditional indepen-dence relations was launched by a seminal paper by Dawid[3], in which he proposed four axioms as heuristic prop-erties of conditional independence. In information-theoreticterms, these four axioms can be summarized by the following

statement:

and

Subsequent work on this subject has been done by Pearl andhis collaborators in the 1980’s, and their work is summarizedin the book by Pearl [18]. Their work has mainly beenmotivated by the study of the logic of integrity constraintsfrom databases. Pearl conjectured that Dawid’s four axiomscompletely characterize the conditional independence structureof any joint distribution. This conjecture, however, was refutedby the work of Studeny [20]. Since then, Matus and Studenyhave written a series of papers on this problem [10]–[17],[20]–[24]. So far, they have solved the problem for threerandom variables, but the problem for four random variablesremains open.

The relation between this problem and is the following.Suppose we want to determine whether a subfamilyof

is -representable. Now each correspondsto setting to zero in . Note thatis a hyperplane containing the origin in . Thus is -representable if and only if there exists ain

such that for all . Therefore, the problemof conditional independence relations is a subproblem of theproblem of characterizing .

B. Optimization of Information Quantities

Consider minimizing given

(31)

(32)

(33)

where . This problem is equivalent to the followingminimization problem.

Minimize subject to

(34)

(35)

(36)

and

(37)

As no characterization of is available, this minimizationproblem cannot be solved. Nevertheless, since , ifwe replace by in the above minimization problem, itbecomes a linear programming problem which renders a lowerbound on the solution.

C. Multiuser Information Theory

The framework for information inequalities developed inthis paper provides new tools for problems in multiuserinformation theory. Consider the source coding problem inFig. 1, in which and are source random variables,

Page 8: Framework for Linear Information Inequality

YEUNG: A FRAMEWORK FOR LINEAR INFORMATION INEQUALITIES 1931

Fig. 1. A multiterminal source coding problem.

and the blocks on the left and right are encoders anddecoders, respectively. The random variables, , and

are the outputs of the corresponding encoders. Given, , and , where

and , we are interested in theadmissible region of the triple .Evidently, , , and give the number ofbits needed for the encoders. From the encoding and decodingrequirements, we immediately have , ,

, , , andequal to zero. Now there are five random

variables involved in this problem. Then the intersection ofand the set containing all such that

is the set of all possible vectors of the joint entropies involvinggiven that they satisfy the encoding and decoding

requirements of the problem as well as the constraints on thejoint entropies involving and . Then is given as theprojection of this set on the coordinates, , and . Inthe same spirit as that in the last subsection, an explicit outerbound of , denoted by , is given by replacing by .

We refer to an outer bound such as as an LP (linearprogramming) bound. This is a new tool for proving conversecoding theorems for problems in multiuser information theory.The LP bound already has found applications in the recentwork of Yeung and Zhang [28] on a new class of multiter-minal source coding problems. We expect that this approachwill have impact on other problems in multiuser informationtheory.

VI. CONCLUDING REMARKS

We have identified the region as the origin all infor-mation inequalities. Our work suggests the possibility of theexistence of information inequalities which cannot be provedby conventional techniques, and this has been confirmed bythe recent results of Zhang and Yeung [29], [30].

A product from the framework we have developed is asimple calculus for verifying all linear information inequalitiesinvolving a definite number of random variables possiblywith linear constraints which can be proved by conventionaltechniques; these include all inequalities of such type inthe literature. Based on this calculus, a software running onMATLAB called ITIP (Information-Theoretic Inequality

Prover) has been developed by Yeung and Yan [27], and itis available on World Wide Web. The following session fromITIP contains verifications of Example 1 and 2, respectively,in Section I.>> ITIP(’I(Y; Z) >= I(X; Z)’,

’I(X; Z|Y) = 0’)True>> ITIP(’H(X,Y) - 1.04 H(Y) + 0.7 I(Y; X,Z)

+ 0.04 H(Y|Z) >= 0’)True

We see from (19) that the amount of computation required ismoderate when . Our work gives a partial answer toHan’s question of what linear combinations of entropies arealways nonnegative [5]. A complete answer to this question isimpossible without further characterization of .

The characterization of is a very fundamental problemin information theory. However, in view of the difficulty ofsome special cases of this problem [15], [17], [29], [30], it isnot very hopeful that this problem can be solved completelyin the near future. Nevertheless, partial characterizations of

may lead to the discovery of some new inequalities whichmake the solutions of certain open problems in informationtheory possible.

APPENDIX IMINIMALITY OF THE ELEMENTAL INEQUALITIES

The elemental inequalities in set-theoretic notations haveone of the following two forms:

1) ;2) , where and .

They will be referred to as-inequalities and -inequalities,respectively.

We are to show that all the elemental inequalities arenonredundant, i.e., none of them is implied by the others. Foran -inequality

(38)

since it is the only elemental inequality which involves theatom , it is clearly not implied by the otherelemental inequalities. Therefore, we only need to show that all

-inequalities are nonredundant. To show that a-inequalityis nonredundant, it suffices to show that there exists a measure

on which satisfies all other elemental inequalities exceptfor that one.

We shall show that the -inequality

(39)

is nonredundant. To facilitate our discussion, we denoteby and we let

be the atoms in , where

(40)

We first consider the case when , i.e.,. We construct a measure by

ifotherwise

(41)

where . In other words, is the only

Page 9: Framework for Linear Information Inequality

1932 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997

atom with measure ; all other atoms have measure. Thenis trivially true. It is also trivial to

check that for any

(42)

and for any such that and

(43)

if . On the other hand, if is a propersubset of , then contains at leasttwo atoms, and therefore,

(44)

This completes the proof for the-inequality in (39) to benonredundant when .

We now consider the case when , or. We construct a measure as follows. For

the atoms in , let

.(45)

For , if is odd, it is referred to as anodd atomof, and if is even, it is referred to as aneven

atom of . For any atom ,we let

(46)

This completes the construction of.We first prove that

(47)

Consider

where the last equality follows from the binomial formula

(48)

for . This proves (47).Next we prove that satisfies all -inequalities. We note

that for any , the atom is not in. Thus

(49)

It remains to prove that satisfies all -inequalities exceptfor (39), i.e., for any such thatand

(50)

Consider

(51)

The nonnegativity of the second term above follows from (46).For the first term, isnonempty if and only if

and (52)

If this condition is not satisfied, then the first term in (51)becomes , and (50) follows immediately.

Let us assume that the condition in (52) is satisfied. Thenby simple counting, we see that the number atoms in

is equal to , where

For example, for , there are atoms in

namely,

where or for . We check that

We first consider the case when , i.e.,

Then

contains exactly one atom. If this atom is an even atom of, then the first term in (51) is eitheror (cf.,

(45)), and (50) follows immediately. If this atom is an oddatom of , then the first term in (51) is equal to

. This happens if and only if and have onecommon element, which implies that

is nonempty. Therefore, the second term in (51) is at least,and hence (50) follows.

Finally, we consider the case when . Using thebinomial formula in (48), we see that the number of odd atomsand even atoms of in

are the same. Therefore, the first term in (51) is equal toif

and is equal to otherwise. The former is true if and only if, which implies that

Page 10: Framework for Linear Information Inequality

YEUNG: A FRAMEWORK FOR LINEAR INFORMATION INEQUALITIES 1933

is nonempty, or that the second term is at least. Thus ineither case (50) is true. This completes the proof that (39) isnonredundant.

APPENDIX IISOME SPECIAL FORMS OF UNCONSTRAINED

INFORMATION INEQUALITIES

In this appendix, we shall discuss some special formsof unconstrained linear information inequalities previouslyinvestigated by Han [5], [6]. Explicit necessary and sufficientconditions for these inequalities to always hold have beenobtained. The relation between these inequalities and theresults in the current paper will also be discussed.

A. Symmetrical Information Inequalities

An information expression is said to besymmetricalif itis identical under every permutation among . Forexample, for , the expression

is symmetrical. This can be seen by permuting andsymbolically in the expression. Now let us consider the

expression . If we replaceand by each other, the expression becomes

, which is symbolically different from the orig-inal expression. However, both expression are identical to

. Therefore, the two expressions are in factidentical, and the expression isactually symmetrical although it is not readily recognizedsymbolically.

The symmetry of an information expression in generalcannot be recognized symbolically. However, it is readilyrecognized symbolically if the expression is in canonicalform. This is due to the uniqueness of the canonical formas discussed in Section III.

Consider a linear symmetrical information expression(in canonical form). As seen in Section IV-B, can beexpressed as a linear combination of the two elemental formsof Shannon’s information measures. It was shown in [5] thatevery symmetrical expression can be written in the form

where

and, for ,

Note that is the sum of all Shannon’s information mea-sures of the first elemental form, and for , isthe sum of all Shannon’s information measures of the secondelemental form conditioning on random variables.

It follows trivially from the elemental inequalities thatis a sufficient condition for

to always hold. The necessity of this condition can be seenby noting the existence of random variables foreach such that andfor all and . This implies thatall unconstrained linear symmetrical information inequalitiesare consequences of the elemental inequalities. We refer thereader to [5] for a more detailed discussion of symmetricalinformation inequalities.

B. Information Inequalities Involving Three Random Variables

Consider . Let

and

and let

Since is an invertible linear transformation of, all linearinformation expression can be written as , where

It was shown in [6] that always holds if and only ifthe following conditions are satisfied:

(53)

In terms of , the elemental inequalities can be expressed as, where

From the discussion in Section IV-B, we see thatalways holds if and only if is a nonnegative

linear combination of the rows of . We leave it as anexercise for the reader to show that is a nonnegativelinear combination of the rows of if and only if theconditions in (53) are satisfied. Therefore, all unconditionallinear inequalities involving three random variables areconsequences of the elemental inequalities. This result alsoimplies that is the smallest pyramid containing .

Page 11: Framework for Linear Information Inequality

1934 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997

ACKNOWLEDGMENT

The author wishes to acknowledge the help of a fewindividuals during the preparation of this paper. They includeI. Csiszar, B. Hajek, F. Matus, Y.-O. Yan, E.-h. Yang, and Z.Zhang.

REFERENCES

[1] T. M. Cover and J. A. Thomas,Elements of Information Theory.NewYork: Wiley, 1991.

[2] I. Csiszar and J. K¨orner, Information Theory: Coding Theorems forDiscrete Memoryless Systems.New York: Academic, 1981.

[3] A. P. Dawid, “Conditional independence in statistical theory (withdiscussion),”J. Roy. Statist. Soc., Ser. B, vol. 41, pp. 1–31, 1979.

[4] T. S. Han, “Linear dependence structure of the entropy space,”Inform.Contr, vol. 29, pp. 337–368, 1975.

[5] , “Nonnegative entropy measures of multivariate symmetric cor-relations,” Inform. Contr., vol. 36, pp. 133–156, 1978.

[6] , “A uniqueness of Shannon’s information distance and relatednonnegativity problems,”J. Combin.., Inform. Syst. Sci., vol. 6, no. 4,pp. 320–331, 1981.

[7] T. Kawabata and R. W. Yeung, “The structure of theI-Measure of aMarkov chain,” IEEE Trans. Inform. Theory,vol. 38, pp. 1146–1149,May 1992.

[8] J. L. Massey, “Determining the independence of random variables,” in1995 IEEE Int. Symp. on Information Theory(Whistler, BC, Canada,Sept. 17–22, 1995).

[9] , “Causal interpretations of random variables,” in1995 IEEE Int.Symp. on Information Theory(Special session in honor of Mark Pinskeron the occasion of his 70th birthday) (Whistler, BC, Canada, Sept.17–22, 1995).

[10] F. Matus, “Abstract functional dependency structures,”Theor. Comput.Sci., vol. 81, pp. 117–126, 1991.

[11] , “On equivalence of Markov properties over undirected graphs,”J. Appl. Probab., vol. 29, pp. 745–749, 1992.

[12] , “Ascending and descending conditional independence relations,”in Trans. 11th Prague Conf. on Information Theory, Statistical DecisionFunctions and Random Processes(Academia, Prague, 1992), vol. B,pp. 181–200.

[13] , “Probabilistic conditional independence structures and matroidtheory: Background,”Int. J. General Syst., vol. 22, pp. 185–196, 1994.

[14] , “Extreme convex set functions with many nonnegative differ-ences,”Discr. Math., vol. 135, pp. 177–191, 1994.

[15] , “Conditional independence among four random variables II,”Combin., Prob. Comput., to be published.

[16] , “Conditional independence structures examined via minors,”Ann. Math. Artificial Intell., submitted for publication.

[17] F. Matus and M. Studeny, “Conditional independence among fourrandom variables I,”Combin., Prob. Comput., to be published.

[18] J. Pearl,Probabilistic Reasoning in Intelligent Systems.San Mateo,CA: Morgan Kaufman, 1988.

[19] G. Strang,Linear Algebra and Its Applications, 2nd ed. New York:Academic, 1980.

[20] M. Studeny, “Attempts at axiomatic description of conditional indepen-dence,” in Proc. Work. on Uncertainty Processing in Expert Systems,supplement toKybernetika, vol. 25, nos. 1–3, pp. 65–72, 1989.

[21] , “Multiinformation and the problem of characterization of con-ditional independence relations,”Probl. Contr. and Inform. Theory, vol.18, pp. 3–16, 1989.

[22] , “Conditional independence relations have no finite completecharacterization,” inTrans. 11th Prague Conf. on Information The-ory, Statistical Decision Functions and Random Processes(Academia,Prague, 1992), vol. B, pp. 377–396.

[23] , “Structural semigraphoids,”Int. J. Gen. Syst., submitted forpublication.

[24] , “Descriptions of structures of stochastic independence by meansof faces and imsets (in three parts),”Int. J. Gen. Syst., submitted forpublication.

[25] , “A new outlook on Shannon’s information measures,”IEEETrans. Inform. Theory, vol. 37, pp. 466–474, May 1991.

[26] , “Multilevel diversity coding with distortion,” IEEE Trans.Inform. Theory, vol. 41, pp. 412–422, Mar. 1995.

[27] R. W. Yeung and Y.-O. Yan, ITIP, [Online] Available WWW:http://www.ie.cuhk.edu.hk/�ITIP.

[28] R. W. Yeung and Z. Zhang, “Miltilevel distributed source coding,” in1997 IEEE Int. Symp. on Information Theory(Ulm, Germany, June1997), p. 276.

[29] Z. Zhang and R. W. Yeung, “A non-Shannon type conditional informa-tion inequality,” this issue, pp. 1982–1986.

[30] , “On the characterization of entropy function via informationinequalities,” to be published inIEEE Trans. Inform. Theory.