INFORMATION THEORY CONDITIONAL ENTROPY Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics

Information TheoryConditional EntropyThomas Tiahrt, MA, PhDCSC492 Advanced Text Analytics

Hello and Welcome to CSC 492 Advanced Text Analytics. We continue our overview of information theory by returning to Conditional Entropy.

1Conditional Probability Distributions2

Recall the definitions from our previous session on Joint Entropy. We will be using these same definitions here in this session.23

First up is the conditional probability of s sub j given t sub k. In this case the numerator has the joint probability of s sub j and t sub k, divided by the marginal probability distribution P sub capital T over the t sub k entry. That is, in this case the t sub k is constant throughout the calculation it is a single value. The same is true for the s sub j it is a single value. It will not change during the summation that takes place in the denominator. That summation does use all of the members of set S. Those participants in the summation are the values that cover the entire set S; that is, the s sub i in the denominator cover all the elements of S.34

Lets look at two examples. Here s sub 1 t sub 1 is one tenth, and s sub 2 t sub 1 is three tenths. When we calculate the result for the first s sub j t sub k, which is s sub 1 t sub 1 we see that it ends up being one fourth. If we calculate the second s sub j t sub k, which is s sub 2 t sub 1, we see that it is three fourths. Also note that when we add them up we get 1.45

Keep in mind the two probabilities just computed when given a t sub k. The two of them added together formed the conditional probability distribution over the given t sub k, which was t sub 1 in that case. That sum is 1 and must be 1. Here we show that relationship formally. Do not neglect to observe that the summation in the denominator will include the entire set. To emphasize the difference between the denominator and the numerator we use a different index, namely i.

56

Now that we have the conditional probability given t sub k in hand, we can apply that to conditional entropy given t sub k. 67

78

Here we finally arrive at the Conditional Entropy Given T. We retain the sum of the weighted averages for the same reason that we used them in the Conditional Probability Distribution on S given T in equation 11, in the slide just before this slide. That is, we want to use the weighted average of the conditional entropies on set S given t sub k for all t sub k that are elements of set T. The conditional entropy here is composed of those averages so that our computation of the entropy will parallel the conditional probability.

89

We can simplify our calculation by observing that by using the using equations 10 and 11, we can perform the substitution shown in the middle of the page as equation 12. When we use that substitution we arrive at equation 13.910

Recall our definition of entropy when P is a joint probability distribution function over the Cartesian product of set S and set T. That is equation 14 here. We first substitute in equation 12 to convert the logarithm calculation to a conditional probability. Then we separate the given portion from the rest of the calculation; that is, we factor it out.

Next we substitute the conditional probability given T in form of the joint probability of s sub j and t sub k. Finally we replace the summations with their entropy notations. Note that the same process is applicable if we are given S instead of being given T.10Independence and Conditional Entropy11

Finally, we also want to keep in mind the role of independence in conditional entropy.11References12Sources:Foundations of Statistical Natural Language Processing, by Christopher Manning and Hinrich SchtzeThe MIT PressFundamentals of Information Theory and Coding Design, by Roberto Togneri and Christopher J.S. deSilvaChapman & Hall / CRC

12The end of the Conditional Entropy slide show has come.End of the Slides13

This ends our Conditional Entropy slide sequence.13

Documents

INFORMATION THEORY CONDITIONAL ENTROPY Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics