Bayesian Networks Martin Bachler martin.bachler@igi.tugraz.at MLA - VO 06.12.2005

Bayesian Networks

Martin Bachlermartin.bachler@igi.tugraz.at

MLA - VO06.12.2005

Overview

• „Microsoft‘s competitive advantage lies in its expertise in Bayesian networks“(Bill Gates, quoted in LA Times, 1996)

Overview

• (Recap of) Definitions

• Naive Bayes– Performance/Optimality ?– How important is independence ?– Linearity ?

• Bayesian networks

Definitions

• Conditional probability

• Bayes theorem

P A,BP A| B

P A,BP B | A

P A| B P B P B | A P A

P B | A P AP A| B

Definitions• Bayes theorem

• Likelihood

• Prior probability

• normalization term

P B | A P AP A| B

P B | A

Definitions

• Classification problem– Input space X={x1 x x2 x…x xn}

– Output space Y = {0,1}

– Target concept C:X→Y

– Hypothesis space H

• Bayesian way of classifying an instance :

1 n c Y

h ,..., arg max P( c | )

P( | c ) P( c )arg max

arg max P( | c ) P( c )

Definitions

• Theoretically OPTIMAL!

• For large n the estimation of is very hard!

• => Assumption: pairwise conditional independence between input-variables given C:

1 n c Y 1 nh ,..., arg max P( ,..., | c ) P( c )

1 nP( ,..., | c )

i j i jP( x ,x |C ) P( x |C ) P( x |C )

i, j 1,...,n;i j

Overview

Naive Bayes

1 n c C ii 1

h ,..., arg max P( | c ) P( c )

i j i jP( x ,x |C ) P( x |C ) P( x |C )

i, j 1,...,n;i j

1 2 n 1 2 n ii 1

P( x ,x ,...,x |C ) P( x |C ) P( x |C ) ... P( x |C ) P( x |C )

Example

1/41100

………………

P(x2|C)C

P(x1|C)

h 1,1 arg max[ P( x 1|C 1) P( x2 1|C 1) P( C 1),

P( x 1|C 0 ) P( x2 1|C 0 ) P( C 0 )]

h 1,0 arg max[...,...] 1

h 0,1 arg max[...,...] 1

h 0,0 arg max[...,...] 0

1 n c C ii 1

h ,..., arg max P( | c ) P( c )

Naive Bayes - Independence

• The independence assumption is very strict!

• For most practical problems it is blatantly wrong!(not even fulfilled in the previous example!...see later)

=> Is naive Bayes a rather „academic“ algorithm ?

• For which problems is naive Bayes optimal ?(Lets assume for the moment we can perfectly

estimate all necessary probabilites)

• Guess: For problems for which the independence assumption holds

• Let‘s check… (empirically + theoretically)

Independence - Example

1111000

1/31/31/90100

0100010

2/31/31/91/3110

1000001

1/32/32/91/3101

0000011

2/32/34/91/3111

P(x2|C)P(x1|C)P(x1|C)P(x2|C)P(x1,x2|C)Cx2x1

0 1 2C x x

Independence - Example 1 2C x x

Independence - Example

1/21/21/41/2000

1/21/21/40100

1/21/21/40010

1/21/21/41/2110

1/21/21/40001

1/21/21/41/2101

1/21/21/41/2011

1/21/21/40111

P(x2|C)P(x1|C)P(x1|C)P(x2|C)P(x1,x2|C)Cx2x1

0 1 2C x x

Independence - Example 1 2C x x

[1] Domingos, Pazzani, Beyond independence, Conditions for the optimality of the simple Bayesian classifier, 1996

i j i j i jD x ,x |C H x |C H x |C H x x |C

• For which problems is naive Bayes optimal ?

• Guess:For problems for which the independence assumption holds

• Empirical answer: Not really….

• Theoretical answer ?

Naive Bayes - optimality

• Example: 3 features x1, x2, x3

• P(c=0) = P(c=1)

• x1, x3 independent; x2 = x1 (totally dep.)

=> optimal classification:

naive Bayes:

[1] Domingos, Pazzani, Beyond independence, Conditions for the optimality of the simple Bayesian classifier, 1996

opt 1 3 1 3

1 3 1 3

h sgn P x |1 P x |1 P x |0 P x |0

sgn P 1| x P 1| x P 0 | x P 0 | x

nb 1 3 1 3

1 3 1 3

h sgn P x |1 P x |1 P x |0 P x |0

sgn P 1| x P 1| x P 0 | x P 0 | x

• Let p =P(1|x1), q = P(1|x3)

• optimal:

• naive Bayes:

opth sgn p q (1 p ) (1 q )

nbh sgn p² q (1 p )² (1 q )

independence assumption holds

optimal and naive classifier disagree only

• In general: Instance x = <x1,…,xn>

Theorem 1:A naive Bayesian classifier is optimal for x, iff

p P(1| x )

r P(1) / P( x ) P( x |1)

s P(0 ) / P( x ) P( x |0 )

1 1p r s p r s

region of optimality

independence assumption holds

only here

• This is a criterion for local optimality ( instance)

• What about global optimality ?Theorem 2: The naive Bayesian classifier is globally

optimal for a dataset Ѕ iff

x x x x x x

1 1x S : p r s p r s

Naive Bayes - optimality• What is the reason for this ?

– Difference between classification and probability (distribution) estimation

– I.e. for classification the perfect estimation of probabilities is not important as long as for each instance the maximum estimate corresponds to the maximum true probability.

• Problem with this result: Verification of global optimality (optimality for all instances) ?

• For which problems is naive Bayes optimal ?

• Guess:For problems for which the independence assumption holds

• Theoretical answer no 1:For all problems for for which Theorem 2 holds.

Naive Bayes - linearity

• other question:

how does naive Bayes‘ hypothesis depend on the input variables ?

• Consider simple case of binary variables only…

• It can be shown (e.g.[2]) that in binary domains naive Bayes is LINEAR in the input variables!!

[2]: Duda, Hart: Pattern classification and Scene Analysis, Wiley, 1973

Naive Bayes - linearity

• Proof…

Naive Bayes – linearity - examples

naive Bayes

Perceptron

Naive Bayes – linearity - examples

Naive Bayes - linearity• For boolean domains naive Bayes‘ hypothesis is

a linear hyperplane!

=> It can only be globally optimal for linearly separable problems!!

BUT: It is not optimal for all linearly separable problems! (e.g. not for certain m-out-of-n concepts)

Naive Bayes - optimality• For which problems is naive Bayes optimal ?• Guess:

For problems for which the independence assumption holds

• Theoretical answer no 1:For all problems for for which Theorem 2 holds.

• Theoretical answer no 2: For a (large) subset of the set of linearly separable problems.

class of concepts for which perceptron is optimal

class of concepts for which naive Bayes is optimal

Overview

Bayesian networks

• The problem-class for which naive Bayes is optimal is quite small….

• Idea: Relax the independence-assumption to obtain a more general classifier

• I.e. model cond. dependencies between variables

• Different techniques (e.g. hidden variables,…)

• Most established: Bayesian networks

Bayesian networks

• Bayesian network:– tool for representing statistical dependencies

between a set of random variables– acyclic directed graph– one vertex for each variable– for each pair of stat. dependent variables there is

an edge in the graph between the corresponding vertices

– not connected variables(vertices) are independent!– each vertex has a table of local probability

distributions

Bayesian networks

• Each variable is dependent only on its parents in the network!

x3x2 x4

„parents“ of x4 (Pa4)

i l i i i l 1 n i iP( x | x ,Pa ) P( x | Pa ); x { x ,....,x }\ x Pa

Bayesian networks

Bayesian network – based classifier:

x3x2 x4

1 n c C i ii 1

h ,..., arg max P( | c,Pa ) P( c )

1 n y 1 2 2

3 4 3 5 4

h ,..., arg max [ P( | , y ) P( | y )

P( | y ) P( | , y ) P( | , y ) P( y )]

Bayesian networks

• In the case of boolean attributes this is again linear, but not on the input-variables:

• Linear on product-features:

c Y i i i i i ii 1i 1

h( ) arg max P( | c,Pa ) P(c) sgn w [x Pa ... Pa ] b

Bayesian networks

• The difficulty here is to estimate the correct network-structure (and probability-parameters) from training data!

• For general Bayesian networks this problem is NP-hard!

• There exist numerous heuristics for learning Bayesian networks from data!

References[1] Domingos, Pazzani, Beyond independence, Conditions for the optimality of the simple Bayesian classifier, 1996

[2] Duda, Hart: Pattern classification and Scene Analysis, Wiley, 1973

Bayesian Networks Martin Bachler martin.bachler@igi.tugraz.at MLA - VO 06.12.2005

Documents

Methodik und Architektur zur Absicherung moderner IT ... J. Wilke, Potsdam, den 06.12.2005 Universität Potsdam, Hasso-Plattner-Institut Doktorandenkolloquium: Prof. Meinel Total IT-Security

16 inforionen - soeruper.info1608.pdf · 08 1 16 sruper inforionen Bernstorff-Gymnasium Satrup: 203 Abiturienten 2016 Chris Adam, Wigo Andresen, Lydia Arndt, Christian Bachler, Timon

24H series 2016 - Race 1 24H - Private test All Classes 14 ... · Black Falcon Team TMD Friction Kaiser-Spreng-Minsky-Wallenwein-Bachler 33 60 991 10 Black Falcon Team TMD Friction

05PWC0875 IFRS 06.12.2005 15:42 Uhr Seite 1 Challenge* · connectedthinking International Financial Reporting Standards for the Oil & Gas and Utility Industries Implementation Challenge

Künstliche Intelligenz und Maschinelles Lernen in Computerspielen DI Michael Pfeiffer pfeiffer@igi.tugraz.at Institut für Grundlagen der Informationsverarbeitung

ALICE Physics Week - Erice, 06.12.2005 Andrea Dainese 1 Andrea Dainese Padova – Università e INFN D 0 meson reconstruction with ALICE 1 st ALICE Physics

D4.6 Collective Intelligence Analytics Dashboard …0.2 Thomas Ullmann (OU) November 24, 2014 Draft 0.3 Michelle Bachler (OU) November 25, 2014 Proof reading 0.4 Thomas Ullmann (OU)

Neural Networks B igi.tugraz.at/lehre/NNB/SS09/ Lecture 1

A brief Introduction to Particle Filters Michael Pfeiffer pfeiffer@igi.tugraz.at 18.05.2004

BACCALAUREATE DIPLOMA PROGRAMME · go to Susie Bachler, Rich Holdgreve-Resendez, and Dawn Jaramillo. For questions about this report, please contact Shelley H. Billig at 1-800-922-3636,

Michael Pfeiffer pfeiffer@igi.tugraz.at 25.11.2004

Wild camel training and collaring mission for the Great Gobi A … · 05.12.2005 Camp3-Camp4 camel search + capture attempt 06.12.2005 Camp4-Camp5+6 (Container) camel search + capture

Fgf8a …...the medial and lateral nasal prominences that form the lip and primary palate (Bachler and Neübuser, 2001; Wilke et al., 1997). Mice with a heterozygous Fgf8 null allele

Signers of the PSC/CUNY Timesheet Petition of the PSCCUNY... · DAMARIS AROCHO PAUL BACHLER ... ALANA GAYMON CARMELLA GEIB ... Signers of the PSC/CUNY Timesheet Petition As of February

T. Poggio, R. Rifkin, S. Mukherjee, P. Niyogi: General Conditions for Predictivity in Learning Theory Michael Pfeiffer pfeiffer@igi.tugraz.at 25.11.2004

Reinforcement Learning for Motor Control Michael Pfeiffer 19. 01. 2004 pfeiffer@igi.tugraz.at

title page On Measuring and Reducing ... - David A. Jaeger · 11/7/2016 · Research Grant 20 to Ted Joyce and David Jaeger. We thank John Choonoo and Paul Bachler of ... (Marcus

Angaza Design Document UXDi Fall 2013 - Papp + Bachler

Photo pleine page · bachler be mi-octobre à mi-avril, location-vente skis, snowboards, raquettes à neige, luges. etc... a. 021948 78 86 . 021948 03 32 FAVORISEZ NOS ANNONCEURS

Comparing Projected In-Situ Feedback at the Manual ...Comparing Projected In-Situ Feedback at the Manual Assembly Workplace with Impaired Workers Markus Funk1, Andreas Bachler¨ 2,