29
CLASSIFICATION: Bayesian Classifiers Uses Bayes’ (Thomas Bayes, 1701-1781) Theorem to build probabilistic models of relationships between attributes and classes Statistical principle for combining prior class knowledge with new evidence from data Multiple implementations Naïve Bayes Bayesian networks

bayesian-classifiers2

  • Upload
    mkumble

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: bayesian-classifiers2

CLASSIFICATION: Bayesian Classifiers

Uses Bayes’ (Thomas Bayes, 1701-1781) Theorem to build probabilistic models of relationships between attributes and classes

Statistical principle for combining prior class knowledge with new evidence from data

Multiple implementations Naïve Bayes Bayesian networks

Page 2: bayesian-classifiers2

CLASSIFICATION: Bayesian Classifiers

Requires concept of conditional probability Measures the probability of an event given that (by

evidence or information) another event has occurred

Notation: P(A|B) = Probability of A given that knowledge of B occurred

P(A|B) = P(A∩B)/P(B) Equivalently if P(B) ≠ 0, = P(A∩B) = P(A|B)P(B)

Page 3: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Conditional Probability

Example: Suppose 1% of a specific population has a form of cancer A new diagnostic test

produces correct positive results for those with the cancer of 99% of the time

produces correct negative results for those without the cancer of 98% of the time

P(cancer) = 0.01 P(cancer | positive test) = 0.99 P(no cancer | negative test) = 0.98

Page 4: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Conditional Probability

Example: But what if you tested positive? What is the

probability that you actually have cancer? Bayes’ Theorem “reverses” the process to provide

us with an answer.

Page 5: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Bayes’ Theorem

P(B|A) = P(B∩A)/P(A), if P(A)≠0

= P(A∩B)/P(A)

= P(A|B)P(B)/P(A)

Application to our example P(cancer | test positive) =

P(test positive | cancer)*P(cancer)/P(test positive) = 0.01*0.99/(0.01*0.99+0.99*0.98) = 0.01

Page 6: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Bayes’ Theorem

cancer

Test positive

0.01Test negative

No cancerTest positive

Test negative

0.99

0.99

0.01

0.98

0.02

Page 7: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Naïve Bayes’

Bayes’ Theorem Interpretation P(class C| F1, F2, … , Fn) =

P(class C) × P(F1, F2, … , Fn| C)/P(F1, F2, … , Fn) posterior = prior × likelihood/evidence

Page 8: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Naïve Bayes’

Key concepts Denominator independent of class C Denominator effectively constant Numerator equivalent to joint probability model

P(C, F1, F2, … , Fn) Naïve conditional independence assumptions

P(C|F1, F2, … , Fn) ∝ P(C)P(F1|C) P(F2|C) ⋯ P(Fn|C)

Page 9: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Naïve Bayes’

Multiple distributional assumptions possible Gaussian Multinomial Bernoulli

Page 10: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Naïve Bayes’ Example

Training set (example from Wikipedia)

Page 11: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Naïve Bayes’ Example

Assumptions Continuous data Gaussian (Normal) distribution

P(male) = P(female) = 0.5

Page 12: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Naïve Bayes’ Example

Classifier generated from training set

Page 13: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Naïve Bayes’ Example

Test sample

Page 14: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Naïve Bayes’ Example

Calculate posterior probabilities for both genders Posterior(male) = P(male)P(height|

male)P(weight|male)P(foot size|male)/evidence

Posterior(female) = P(female)P(height|female)P(weight|female)P(foot size|female)/evidence

Evidence is constant and same so we ignore denominators

Page 15: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Naïve Bayes’ Example

Calculations for male P(male) = 0.5 (assumed) P(height|male) =

P(weight|male) =

P(foot size|male) =

Posterior numerator (male)

Page 16: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Naïve Bayes’ Example

Calculations for female P(male) = 0.5 (assumed) P(height|fwmale) =

P(weight|female) =

P(foot size|female) =

Posterior numerator (female)

Page 17: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Naïve Bayes’ Example

Conclusion Posterior numerator (significantly) greater for

female classification than for male, so classify sample as female

Page 18: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Naïve Bayes’ Example

Note We did not calculate P(evidence) [normalizing

constant] since not needed, but could P(evidence) = P(male)P(height|

male)P(weight|male)P(foot size|male)+

P(female)P(height|female)P(weight|female)P(foot size|female)

Page 19: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Bayesian Networks

Judea Pearl (UCLA Computer Science, Cognitive Systems Lab): one of the pioneers of Bayesian Networks

Author: Probabilistic Reasoning in Intelligent Systems,1988

Father of journalist Daniel Pearl Kidnapped and murdered in Pakistan in 2002

by Al-Queda

Page 20: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Bayesian Networks

Probabilistic graphical model

Represents random variables and conditional dependencies using a directed acyclic graph (DAG)

Nodes of graph represent random variables

Page 21: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Bayesian Networks

Edges of graph represent conditional dependencies

Unconnected nodes conditionally independent of each other

Does not require all attributes to be conditionally independent

Page 22: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Bayesian Networks

Probability table associating each node to its immediate parent nodes If node X has no immediate parents, table

contains only prior probability P(X) If one parent Y, table contains P(X|Y) If multiple parents {Y1, Y2, ⋯ , Yn}, table

contains P(X|Y1, Y2, ⋯ , Yn)

Page 23: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Bayesian Networks

Page 24: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Bayesian Networks

Model encodes relevant probabilities from which probabilistic inferences can then be calculated Joint probability: P(G, S, R) = P(R)P(S|R)*P(G|S, R)

G = “Grass wet” S = “Sprinkler on” R = “Raining”

Page 25: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Bayesian Networks

We can then calculate, for example:

Page 26: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Bayesian Networks

That is

Page 27: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Bayesian Networks

Building the model Create network structure (graph) Determine probability values of tables

Simplest case Network defined by user

Most real-world cases Defining network too com[plex Use machine learning: many algorithms

Page 28: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Bayesian Networks

Algorithms built into Weka User defined network Conditional independence tests Genetic search Hill climber K2 Simulated annealing Maximum weight spanning tree Tabu search

Page 29: bayesian-classifiers2

BAYESIAN CLASSIFIERS: Bayesian Networks

Many other versions online BNT (Bayes’ Net Tree) Matlab toolbox

Kevin Murphy, University of British Columbia http://www.cs.ubc.ca/~murphyk/Software/