Probability and Information Copyright, 1996 © Dale Carnegie & Associates, Inc. A brief review

Probability and Information

Copyright, 1996 © Dale Carnegie & Associates, Inc.

A brief review

7/03Data Mining -- Probability

H Liu (ASU) & G Dong (WSU) 2

Probability

Probability provides a way of summarizing uncertainty that comes from our laziness and ignorance - how wonderful it is!

Probability, belief of the truth of a sentence 1 - true, 0 - false, 0<P<1 - intermediate degrees of belief in the

truth of the sentence Degree of truth (fuzzy logic) vs. degree of

belief



All probability statements must indicate the evidence wrt which the probability is being assessed. Prior or unconditional probability Posterior or conditional probability



Basic probability notation

Prior probability Proposition: P(Sunny) Random variable: P(Weather=Sunny) Each Random Variable has a domain

Sunny, Cloudy, Rain, Snow Probability distribution P(Weather) =

<.7,.2,.08,.02> A random variable is not a number; a number

may be obtained by observing a RV. A random variable can be continuous or

discrete



Conditional Probability

Definition P(A|B) = P(A^B)/P(B)

Product rule P(A^B) = P(A|B)P(B)

Probabilistic inference does not work like logical inference.



The axioms of probability

All probabilities are between 0 and 1

Necessarily true (valid) propositions have probability 1; false (unsatisfiable) have 0

The probability of a disjunctionP(AvB)=P(A)+P(B)-P(A^B)



The joint probability distribution

Joint completely specifies probability assignments to all propositions in the domain

A probabilistic model consists of a set of random variables (X1, …,Xn).

An atomic event is an assignment of particular values to all the variables.

Marginalization rule for RV Y and Z:

P(Y) = ΣP(Y,z) over z Let’s see an example next.



Joint Probability

An example of two Boolean variablesToothache !Toothache

Cavity!Cavity

0.04 0.01

0.06 0.89

Observations: mutually exclusive and collectively exhaustiveWhat are

P(Cavity) = P(Cavity V Toothache) = P(Cavity ^ Toothache) = P(Cavity|Toothache) =



Bayes’ rule

Deriving the rule via the product ruleP(B|A) = P(A|B)P(B)/P(A)

P(A) can be viewed as a normalization factor that makes P(B|A) + (!B|A) = 1

P(A) = P(A|B)P(B)+P(A|!B)P(!B) A more general case is

P(X|Y) = P(Y|X)P(X)/P(Y) Bayes’ rule conditionalized on evidence E

P(X|Y,E) = P(Y|X,E)P(X|E)/P(Y|E)



Independence

Independent events A, B P(B|A)=P(B), P(A|B)=P(A), P(A,B)=P(A)P(B)

Conditional independence P(X|Y,Z)=P(X|Z) – given Z, X and Y are

independent



Entropy

Entropy measures homogeneity/purity of sets of examples

Or as information content: the less you need to know (to determine class of new case), the more information you have

With two classes (P,N) in S, p & n instances; let t=p+n. View [p, n] as class distribution of S. Entropy(S) = - (p/t) log2 (p/t) - (n/t) log2 (n/t) E.g., p=9, n=5; Entropy(S) = Entropy([9,5]) = -

(9/14) log2 (9/14) - (5/14) log2 (5/14) = 0.940 E.g., Entropy([14,0])=0; Entropy([7,7])=1



Entropy curve

For p/(p+n) between 0 & 1, the 2-class entropy is 0 when p/(p+n) is 0 1 when p/(p+n) is 0.5 0 when p/(p+n) is 1 monotonically increasing

between 0 and 0.5 monotonically decreasing

between 0.5 and 1 When the data is pure, only

need to send 1 bit

1

0.5

Documents

Probability and Information Copyright, 1996 © Dale Carnegie & Associates, Inc. A brief review