6
COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech Szymanski Dept. of Computer Science, University of Otago COSC343 Lecture5 Lecture 5: Bayesian Reasoning Lech Szymanski (Otago) Conditional probability and independence Curse of dimensionality Bayes’ rule Naive Bayes Classifier In today’s lecture COSC343 Lecture5 Lech Szymanski (Otago) Consider a medical scenario, with 3 Boolean variables cavity (does the patient have a cavity or not?) toothache (does the patient have a toothache or not?) catch (does the dentist’s probe catch on the patient’s tooth?) Here’s an example probabilitymodel: the joint probability distribution Recap: a simple medical example COSC343 Lecture5 toothache toothache catch catch catch catch cavity .108 .012 .072 .008 cavity .016 .064 .144 .576 Lech Szymanski (Otago) Probability of an event given information about another event. Conditional probabilities COSC343 Lecture5 “given” probability of a given b “and”

In today’s lecture Conditional probability and independence · time to compute a probability ... If I already knew the value of Cavity, then learning about Catch won’t tell me

Embed Size (px)

Citation preview

COSC343:ArtificialIntelligence

LechSzymanskiLech Szymanski (Otago)

Lech Szymanski

Dept. of Computer Science, University of Otago

COSC343Lecture5

Lecture5:BayesianReasoning

LechSzymanskiLech Szymanski (Otago)

• Conditionalprobabilityandindependence

• Curseofdimensionality

• Bayes’rule

• NaiveBayesClassifier

Intoday’slecture

COSC343Lecture5

LechSzymanskiLech Szymanski (Otago)

Consideramedicalscenario,with3Booleanvariables

• cavity (doesthepatienthaveacavityornot?)• toothache (doesthepatienthaveatoothacheornot?)• catch (doesthedentist’sprobecatchonthepatient’stooth?)Here’sanexampleprobabilitymodel:thejointprobabilitydistribution

Recap:asimplemedicalexample

COSC343Lecture5

toothache toothachecatch catch catch catch

cavity .108 .012 .072 .008cavity .016 .064 .144 .576

LechSzymanskiLech Szymanski (Otago)

Probabilityofaneventgiven informationaboutanotherevent.

Conditionalprobabilities

COSC343Lecture5

“given”

probability of a given b

“and”

LechSzymanskiLech Szymanski (Otago)

Assumewe’regiventhefollowingjointprobabilitydistribution

andlearnthatapatienthasatoothache

E.g.howtocalculate?

Computingconditionalprobabilities

COSC343Lecture5

toothache toothachecatch catch catch catch

cavity .108 .012 .072 .008cavity .016 .064 .144 .576

LechSzymanskiLech Szymanski (Otago)

Conditionalprobabilitiescanalsobecomputedforwhole

distributions.

Forinstance,theconditionalprobabilitydistribution ofCavity andCatch giventoothache showsallpossiblevaluesofCavity andCatchgiventoothache.

Conditionalprobabilityofwholedistributions

COSC343Lecture5

toothache toothachecatch catch catch catch

cavity .108 .012 .072 .008cavity .016 .064 .144 .576

catch catchcavity .188/.2 .12/.2

cavity .16/.2 .064/.2

LechSzymanskiLech Szymanski (Otago)

Inferencefromfulljointdistributionsisallaboutsummingoverhiddenvariables.

• Ifthereare Booleanvariablesinthetable,ittakestimetocomputeaprobability

• Moreover,thespacecomplexityofthetableitselfis

• Thismeansasincreases,itbecomesinfeasibleeventostorethefulljointdistribution

• Andremember:someonehastobuildthefulljointdistributionfromempiricaldata!

Inpractice,probabilisticreasoningtasksofteninvolvethousandsofrandomvariables.Obviously,amoreefficientreasoningtechniqueisnecessary.

Complexityofreasoningfromafulljointdistribution

COSC343Lecture5 LechSzymanskiLech Szymanski (Otago)

Curseofdimensionality

COSC343Lecture5

When the dimensionality ofproblem increases, thevolume of the spacebecomes so large, you needexponentially more points tosample the space at thesame density. Thisphenomenon is sometimesreferred as the curse ofdimensionality.

https://haifengl.wordpress.com/2016/02/29/there-is-no-big-data-in-machine-learning/

LechSzymanskiLech Szymanski (Otago)

• IfthenitissaidthatAandBareindependent

IfAandBareindependent,thenitfollowsthat:

and

Independence

COSC343Lecture5

Huge reductions in complexity can be made if we know that some variables in our domain are independent of others

LechSzymanskiLech Szymanski (Otago)

SayweaddavariableWeather toourdentistmodel,withfourvalues:sunny,cloudy,rainy,snowy.• Weatherandteethdon’taffecteachother

• Tocomputetheprobabilityofapropositioninvolvingweatherandteeth,wecanjustmultiplytheresultsfoundinthetwodistributions

Given:

itfollowsthat:

Anexampleofindependence

COSC343Lecture5

LechSzymanskiLech Szymanski (Otago)

Ifwewantedtocreateafulljointdistribution,wecoulddosobymultiplyingtheappropriateprobabilitiesfromthetwodistributions

• Insteadof4dimensionsof2x2x2x4=32sampleswehavetwospacesof3and1dimensionswithtotalof2x2x2+4=12samples

Independence

COSC343Lecture5

toothache toothachecatch catch catch catch

cavity .108 .012 .072 .008cavity .016 .064 .144 .576

sunny .6cloudy .029rainy .1snowy .001

LechSzymanskiLech Szymanski (Otago)

Absoluteindependenceispowerful,butrare.

• Canwefactoroutfurther?

• Forinstance,indentistry,mostvariablesarerelated,butnotdirectly.

Whatdodo?

Averyusefulconceptisconditionalindependence• Forinstance,havingacavity

• changestheprobabilityofatoothache

• andchangestheprobabilityofacatch

• butthesetwoeffectsareseparate!

Conditionalindependence

COSC343Lecture5

LechSzymanskiLech Szymanski (Otago)

KnowingthevalueofToothache certainlygivesusinformationaboutthevalueofCatch.• Butonlyindirectly,astheyarebothlinkedtothevariable

Cavity.

IfIalreadyknewthevalueofCavity,thenlearningaboutCatchwon’ttellmeanythingelseaboutthevalueofToothache.

Therefore:

Conditionalindependence

COSC343Lecture5

Conditional independence

Conditional probability

LechSzymanskiLech Szymanski (Otago)

Fromthedefinitionofconditionalprobability

itfollowsthat

andso

Bayes’rule

COSC343Lecture5

Bayes’ Rule

and

LechSzymanskiLech Szymanski (Otago)

Aproblem

COSC343Lecture5

Chills(C)

Runningnose(R)

Headache

(H)Fever(Fr)

Flu

y n mild y n

y y no n y

y n strong y y

n y mild y y

n n no n n

n y strong y y

n y strong n n

y y mild n y

y n mild n ?

Information from previous patients:

How to diagnose a person with the following symptoms?

Compute

and

and see which one is more likely.

effects cause

LechSzymanskiLech Szymanski (Otago)

It’shardtoestimate:

• Youhavetohaveinformationonmanygroupsofpeople– oneforeachpossiblecombinationofsymptoms– andtesteachpersonineachgroupforflu

It’sfareasiertoestimate:

• Youtakeasampleofonlytwogroupsofpeople– thosewithfluandthosewithout,andtestthemforsymptoms

FromBayes’rule

What’smorelikely?

COSC343Lecture5

Larger or smaller?causeeffects

LechSzymanskiLech Szymanski (Otago)

Inverttheproblem

COSC343Lecture5

Bayes

Need probability distributions and

causeeffects

causeeffects

Normalising denominator is the same on both sides, and so plays no part in the comparison

LechSzymanskiLech Szymanski (Otago)

Gatheringpriorprobabilitiesfromtrainingset

COSC343Lecture5

Chills(C)

Runningnose(R)

Headache

(H)Fever(Fr)

Flu

y n mild y n

y y no n y

y n strong y y

n y mild y y

n n no n n

n y strong y y

n y strong n n

y y mild n y

y n

5/8 3/8

LechSzymanskiLech Szymanski (Otago)

• joint probability distribution has 4

dimensions and 24 combinations of symptoms

• has 1 dimension and 2 combinations of symptoms

• has 1 dimension and 2 combinations of symptoms

• has 1 dimension and 2 combinations of symptoms

• has 1 dimension and 3 combinations of symptoms

Gatheringconditionalprobabilitiesfromtrainingset

COSC343Lecture5

Chills(C)

Runningnose(R)

Headache

(H)Fever(Fr)

Flu

y n mild y n

y y no n y

y n strong y y

n y mild y y

n n no n n

n y strong y y

n y strong n n

y y mild n y

• If the symptoms were independent events, then from independence we would have

Let’s just assume the symptoms are independent! Naive Bayes

LechSzymanskiLech Szymanski (Otago)

Gatheringconditionalprobabilitiesfromtrainingset

COSC343Lecture5

Chills(C)

Runningnose(R)

Headache

(H)Fever(Fr)

Flu

y n mild y n

y y no n y

y n strong y y

n y mild y y

n n no n n

n y strong y y

n y strong n n

y y mild n y

y n

5/8 3/8

y n

3/5 2/5

y n

1/3 2/3

y n

4/5 1/5

y n

1/3 2/3

y n

3/5 2/5

y n

1/3 2/3

mild no strong

2/5 1/5 2/5

mild no strong

1/3 1/3 1/3

LechSzymanskiLech Szymanski (Otago)

NaiveBayesClassifier

COSC343Lecture5

C R H Fr Flu

y n mild n ?

And so, for the person with the symptoms:

Probability of flu given symptoms:

- From Bayes’ rule

- From the assumption of independence

Naive Bayes Classifier deems this person more likely not to have a flu!

- From the probability distributions derived from the training data

LechSzymanskiLech Szymanski (Otago)

NaiveBayesClassifier

COSC343Lecture5

y n

nomild

strong

where

,

and

no flu

flu

LechSzymanskiLech Szymanski (Otago)

• Conditionalprobabilitiesmodelchangesinuncertainty

givensomeinformation

• Independencesimplifiescomputation

• Bayes’rule:possibletocomputefrom

• NaiveBayes:assumecauses inputvariablesareindependent,sothat

Summaryandreading

COSC343Lecture5

Readingforthelecture:AIMAChapter13Sections3-5

Readingfornextlecture:AIMAChapter18Section3