59
Exploiting Common SubRelations: Learning One Belief Net for Many Classification Tasks R Greiner, Wei Zhou University of Alberta

Exploiting Common SubRelations: Learning One Belief Net for Many Classification Tasks

  • Upload
    brook

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

Exploiting Common SubRelations: Learning One Belief Net for Many Classification Tasks. R Greiner, Wei Zhou University of Alberta. Situation. CHALLENGE: Need to learn k classifiers Cancer, from medical symptoms Meningitis, from medical symptoms Hepatitis, from medical symptoms … - PowerPoint PPT Presentation

Citation preview

Page 1: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Exploiting Common SubRelations:Learning One Belief Net for Many Classification

Tasks

R Greiner, Wei Zhou University of Alberta

Page 2: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

SituationCHALLENGE: Need to learn k classifiers

Cancer, from medical symptoms Meningitis, from medical symptoms Hepatitis, from medical symptoms …

Option 1: Learn k different classifier systems{SCancer, SMenin, …, Sk}

Then use Si to deal with ith “query class” but… but… need to re-learn

inter-relations among Factors, Symptoms,

common to all k classifiers

Page 3: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Common Interrelationships

Cancer Menin Cancer Menin

Page 4: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Use Common Structure! CHALLENGE: Need to learn k classifiers

Cancer, from medical symptoms Meningitis, from symptoms Hepatitis, from symptoms …

Option 2: Learn 1 “structure” S of relationshipsthen use S to address all k classification tasks

Actual Approach: Learn 1 Bayesian BeliefNet,

inter-relating info for all k types of queries

Page 5: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

OutlineMotivationMotivation

Handle multiple class variables

FrameworkFramework Formal model Belief Nets, …-classifier

ResultsResults Theoretical Analysis Algorithms (Likelihood vs Conditional Likelihood)

Empirical Comparison• 1 Structure vs k Structures; LL vs LCL

ContributionsContributions

Page 6: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Cancer Menin Gender Age Smoke Height Btest

T F 35 T

F M 25 6’

T F t

F T 5’3” t

Cancer

Menin

Gender

Age

Smoke

Height

Btest

? M 18 T f

? M 5’0” f

Cancer

Menin

Gender

Age

Smoke

Height

Btest

TT M 18 T f

FF M 5’0” f

MC-Learner

MC

Training

Data

Page 7: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Return value Q = q

Multi-Classifier I/OGiven “query”

“class variable” Q and “evidence” E=e

Cancer=?, given Gender=F, Age=35, Smoke=t ?

Cancer = Yes

Cancer Menin Gender Age Smoke Height Btest

? F 35 T

Cancer Menin Gender Age Smoke Height Btest

Yes F 35 T

Page 8: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

MultiClassifier

Like standard Classifiers, can deal with different evidence E different evidence values e

Unlike standard Classifiers, can deal with different class variables Q

MC(Cancer; Gender=M, Age=25, Height=6’) = No

MC(Meningitis; Gender=F, BloodTest = t ) = Severe

Able to “answer queries” classify new unlabeled tuples

Given “Q=?, given E=e”, return “q”

Page 9: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

MC-Learner’s I/OInput: Set of “queries” (labeled partially-specified tuples)

input to standard (partial-data) learners

Output: MultiClassifier

Query var Q Evidence vars ECancer = t Gender=F, Age=35, Smoke=t

Cancer = f Gender = M, Age = 25, Height=6’

Menin = t Gender = F, Btest = t

Cancer Menin Gender Age Smoke Height Btest

T F 35 T

F M 25 6’

T F t

Page 10: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Error Measure“Labeled query” [Q, E=e], q

Prob([Q, E=e] asked) Query Distribution: …can be uncorrelated with “tuple distribution”

MC(Q, E=e) = q’MultiClassifier MC returns

CE(MC) = [Q, E=e], qProb([Q, E=e] asked) [|MC(Q, E=e) =?q|]

Classification Error of MC

[|a =? b|] 1 if a=b, 0 otherwise “0/1” error

Page 11: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Learner’s TaskGiven

space of “MultiClassifiers” { MCi } sample of labeled queries

drawn from “query distribution”

MC*= argmin{ MCi }{CE(MCi) } w/minimal error

over query distribution.

Cancer Menin Gender Age Smoke Height Btest

T F 35 T

F M 25 6’

T F t

Find

Page 12: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Outline MotivationMotivation

Handle multiple class variables

FrameworkFramework Formal model

ResultsResults Theoretical Analysis Algorithms (Likelihood vs Conditional Likelihood)

Empirical Comparison• 1 Structure vs k Structure; LL vs LCL

ContributionsContributions

Belief Nets, …-classifier

Page 13: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Simple Belief NetH

B

J

P(J | H, B=0) = P(J | H, B=1) J, H ! P( J | H, B) = P(J | H)

J is INDEPENDENT of B, once we know HDon’t need B J arc!

h P(B=1 | H=h)

1 0.95

0 0.03

P(H=1)

0.05

h P(J=1| h )

1 0.8

0 0.3

Skip Details

Page 14: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Example of a Belief Net

Simple Belief Net:

0.950.05

P(H=0)P(H=1)

0.970.030

0.050.951

P(B=0 | H=h)P(B=1 | H=h)h

0.70.300

0.7

0.2

0.2

P(J=0|h,b)

0.310

0.801

0.811

P(J=1|h,b)bh

H

B

J

Node ~ VariableLink ~ “Causal dependency”

“CPTable” ~ P(child | parents)Skip

Page 15: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Encoding Causal Links (cont’d)H

B

J

P(J | H, B=0) = P(J | H, B=1) J, H ! P( J | H, B) = P(J | H)

J is INDEPENDENT of B, once we know HDon’t need B J arc!

h P(B=1 | H=h)

1 0.95

0 0.03

P(H=1)

0.05

h b P(J=1|h , b )

1 1 0.8

1 0 0.8

0 1 0.3

0 0 0.3

Page 16: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Encoding Causal Links (cont’d)H

B

J

P(J | H, B=0) = P(J | H, B=1) J, H ! P( J | H, B) = P(J | H)

J is INDEPENDENT of B, once we know HDon’t need B J arc!

h P(B=1 | H=h)

1 0.95

0 0.03

P(H=1)

0.05

h P(J=1|h )

1 0.8

1

0 0.3

0

Page 17: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Encoding Causal Links (cont’d)H

B

J

P(J | H, B=0) = P(J | H, B=1) J, H ! P( J | H, B) = P(J | H)

J is INDEPENDENT of B, once we know HDon’t need B J arc!

h P(B=1 | H=h)

1 0.95

0 0.03

P(H=1)

0.05

h P(J=1| h )

1 0.8

0 0.3

Page 18: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Include Only Causal LinksSufficient Belief Net:

Requires: P(H=1) knownP(J=1 | H=1) knownP(B=1 | H=1) known

(Only 5 parameters, not 7)

H

B

J

P(H=1)

0.05

h P(B=1 | H=h)

1 0.95

0 0.03h P(J=1 | H=h)

1 0.8

0 0.3

Hence: P(H=1 | J=0, B=1) = P(H=1) P(J=0 | H=1) P(B=1 | J=0,H=1) 1

P(B=1 | H=1)

Page 19: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

BeliefNet as (Multi)ClassifierFor query [Q, E=e], BN will return distribution

PBN(Q=q1 | E=e ), PBN(Q=q2 | E=e ), … PBN(Q=qm | E=e )

(Multi)Classifier MCBN(Q, E=e ) = argmaxqi { PBN(Q= qi | E=e ) }

qq11 q q22 q q33 … q … qmm

ProbProb

Page 20: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Learning Belief Nets Belief Net = G,

G = directed acyclic graph (“structure” – what’s related to what”)

= “parameters” – strength of connections Learning Belief Net G, from “data”:

1. Learning structure G

2. Find parameters that are best, for G

Our focus: #2 (parameters); Best minimal CE-error

Page 21: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Learning BN Multi-Classifier

Structure G + Labeled Queries

Goal: Find CPtables to minimize CE error

Cancer Menin Gender Age Smoke Height Btest

T F 35 T

F M 25 6’

T F t

F T 5’3” t

* = argmin { [Q, E=e], vProb([Q, E=e] asked)

[|MC G,

(Q, E=e) =? q|] }

Page 22: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Issues

Q1: How many labeled queries are required?

Q2: How hard is learning, given distributional info?

Q3: What is best algorithm for learning … … Belief Net?

… Belief Net Classifier?

… Belief Net Multiclassifier?

Page 23: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Sample Complexity: … BN structure w/ N variables, K CPtable entries, i >0, needsample of

labeled queries.

Q1, Q2: Theoretical Results• PAC(, )-learn CPtables:

Given BN structure, find CPtables whose CE-error is, with prob 1-, within of optimal

)

K6( log K

2log

ln18 ),(

2

KM

NN

Computational Complexity: NP-hard to find CPtable w/ min’l CE error (over for any O(1/N) ) from labeled queries… from known structure!

Page 24: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Use Conditional LikelihoodGoal: minimize “classification error”,

based on training sample [Qi, Ei=ei], qi*

Sample typically includes high-probability queries [Q, E=e] only most likely answers to these queries

q* = argmaxq { P( Q=q | E=e ) }

LCLD( ) = [q*,e] D log P( Q=q* | E=e )

Maximize Conditional Likelihood

As NP-hard… Not standard model?

Page 25: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Gradient Descent Alg: ILQ

How to change CPtable c|f = B(C=c | F=f)given datum “[Q=q, E=e]”,corresponding to

Q

C

……

c|f …

P(C|f1, f2)F2F1

F2F1

E

Cancer Menin Gender Age Smoke Height Btest

Yes F 35 T

)|,(),|,(1

||

)|(

efcBeqfcBd

LCLd

fcfc

eq

+ sum over queries “[Q=q, E=e]”, conjugate gradient, …

Descend along derivative:

Page 26: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Better Algorithm: ILQConstrained Optimization

(c|f 0, c=0|f + c=1|f = 1)

New parameterization c|f :

fcc

cf

e

efc '

'|

for each “row” rj, set c0|rj = 0 for one c0

)|(),|(

)],|(),|,([

|

|

)|(

efBeqfB

eqfBeqfcBd

LCLd

fc

fc

eq

Q

C

……

c|f …

P(C|f1, f2)F2F1

F2F1

E

Page 27: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Q3: How to Learn BN MultiClassifier?

Approach 1: Minimize error Maximize Conditional Likelihood

(In)Complete Data: ILQ

Approach 2: Fit to dataApproach 2: Fit to data Maximize LikelihoodMaximize Likelihood

Complete Data: Observed Frequency EstimateComplete Data: Observed Frequency Estimate Incomplete Data: EM / Incomplete Data: EM / APNAPN

Page 28: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Empirical StudiesTwo different objectives 2 learning algs

Maximize Conditional Likelihood: ILQ Maximize Likelihood: Maximize Likelihood: APNAPN

Two different approaches to MultipleClasses 1 copy of structure k copies of structure k naïve-bayesk naïve-bayes

Several “datasets” Alarm Insurance ……

Error: “0/1”; MSE() = i[Ptrue(qi|ei) – P (qi|ei)]2

Page 29: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

1- vs k- Structures

Cancer Menin

Cancer

Menin

Gender

Age

Smoke

Height

Btest

T F 3 T

F M 2 6’

T F t

F T 5’3 t

Cancer MeninMenin

Cancer

Menin

Gender

Age

Smoke

Height

Btest

T F 3 T

F M 2 6’

T

F T 5’3 t

CancerCancer Menin

Cancer

Menin

Gender

Age

Smoke

Height

Btest

T

F

T F t

F

Page 30: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Alarm Belief Net 37 vars, 46 links, 505 parameters

Empirical Study I: Alarm

Page 31: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Query Distribution [HC’91] says, typically

8 vars QQ N appear as query 16 vars EE N appear as evidence

Select Q QQ uniformly Use same set of 7 evidence E EE Assign value e for E , based on Palarm(E =e)

Find “value” v based on Palarm(Q =v | E =e)

Each run uses m such queries, m=5,10, … 100, …

Page 32: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Results (Alarm; ILQ; SmallSample)CE

MSE

Page 33: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Results (Alarm; ILQ; LargeSample)CE

MSE

Page 34: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Comments on Alarm ResultsFor small Sample Size

“ILQ- 1 structure” better than “ILQ- k structures”

For large Sample Size “ILQ- 1 structure” “ILQ- k structures”

• ILQ-k has more parameters to fit, but … lots of data

APN ok, but much slower (did not converge in bounds)APN ok, but much slower (did not converge in bounds)

Page 35: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Empirical Study II: InsuranceInsurance Belief Net

27 vars, (3 query, 8 evidence) 560 parameters

Distribution: Select 1 query

randomly from 3 Use all 8 evidence …

(Simplified Version)

Page 36: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Results (Insur; ILQ)CE

MSE

Page 37: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Summary of ResultsLearning for given structure,

to minimize CED() or MSED() Correct structure

Small number of samples

• ILQ-1 (APN-1)(APN-1) win (over ILQ-k, APN-k, APN-k)

Large number of samples

• ILQ-k ILQ-1win (over APN-1, APN-k)(over APN-1, APN-k)

Incorrect structure (naïve-bayes)Incorrect structure (naïve-bayes) ILQ winsILQ wins

Page 38: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Future WorkBest algorithm for learning optimal BN?

Actually optimize CE-Err (not LCL) Learning STRUCTURE as well as CPtables Special cases where ILQ is efficient (?complete data?)

Other “learning environments” Other prior knowledge -- Query Forms Explicitly-Labeled Queries

Better understanding of sample complexityw/out “” restriction

Page 39: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Related WorkLike (ML) classification but…

Probabilities, not discrete Diff class var’s, diff evidence sets... … see Caruana

““Learning to Reason” [KR’95]Learning to Reason” [KR’95]“do well on tasks that will be encountered”“do well on tasks that will be encountered”… but different performance system… but different performance system

Sample Complexity [FY, Hoeffgen] … diff learning model

Computational Complexity [Kilian/Naor95] NP to find ANY distr w/min L1-error wrt uncond query for BN L2 conditional

Page 40: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Exploiting Common Relations 40

Take Home Msgs To max performance:

use Conditional Likelihood (ILQ)not Likelihood (APN/EM, OFE)

Especially if structure wrong, small sample, …

… controversial… To deal with MultiClassifiers

Use 1 structure, not k If small sample, 1struct better performance

If large sample, same performance, … but 1struct smaller

… yes, of course… Relation to Attrib vs Relation:

Not “1 example for many class of queries” but “1 example for 1 class of queries,

BUT IN ONE COMMON STRUCTURE”

Page 41: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

ContributionsAppropriate model for learning

Extends standard learning environments Labeled Queries, with different class variables

Sample Complexity Need “few” labeled-queries

Computation Complexity Effective Algorithm NP-hard Gradient descent

Empirical Evidence: works well!http://www.cs.ualberta.ca/~greiner/BN-results.html

Learn MultiClassifier that works well in practice

Page 42: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Questions?

LCL vs LLDoes diff matter?ILQ vs APNQuery FormsSee also

http://www.cs.ualberta.ca/~greiner/BN-results.html

Page 43: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Learning ModelMost belief net learners try to maximize

LIKELIHOOD

Our goal is different: We want to minimize error, over distribution of queries.

LL D ( ) = xD log P( x )

… as goal is “fit to data” D

If never asked

don’t care if

“What is p(jaun | btest- ) ?”

BN(jaun | btest- ) p(jaun | btest- )

Page 44: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Different Optimization

LL D( ) = [q*,e] D log P( Q=q* | E=e ) + [q*,e] D log P( E=e )

= LCL D( ) + [q*,e] D log P( E=e )

As [q*,e] D log P( E=e ) non-trivial,

LL = argmax { LL D( ) } LCL = argmax {LCL D( ) }

Discriminant analysis: Maximize Overall Likelihood vsMinimize Predictive Error

To find LCL: NP-hard, so…ILQ Return

LL LCL

Page 45: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

A belief net is … representation for a distributionrepresentation for a distribution system for answering queriessystem for answering queries

Suppose BN must answer: “What is p(hep | jaun, btest- ) ?”

but not “What is p(jaun | btest-) ?”

So… BN is good if even if

Why Alternative Model?

BN(hep | jaun, btest- ) = p(hep | jaun, btest- )

BN(jaun | btest- ) p(jaun | btest- )

Page 46: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Query Distr vs Tuple Distr

Distribution over tuples p(q) p(hep, jan, btest-, …) = 0.07

p(flu, cough, ~headache, …) = 0.43 Distribution over queries sq(q) = Prob(q asked)

Ask “What is p(hep | jan, btest-)?” 30%

Ask “What is p(flu | cough, ~headache)?” 22%

Can be uncorrelated: EG: Prob[ Asking Cancer ] = sq(“cancer”) = 100%

even if Pr[ Cancer ] = p(cancer) = 0

Page 47: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Query Distr Tuple DistrSpse GP asks all ADULT FEMALE patients

“Pregnant” ? Pregnant Adult Gender

+ + F

- + F

+ + F

Data P( Preg | Adult, Gender=F ) = 2/3Is this really TUPLE distr?

• P(Gender=F) = 1 ?

NO: only reflects questions asked ! Provide info re: P(preg | Adult=+, Gender=F) but NOT about P(Adult), …

Page 48: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Query Distr Tuple DistrQuery Probability:

independent of tuple probability:

Prob([Q, E=e] asked)

[Q, E=e], q* Note: value of query -- q* of -- IS based on P(Q=q | E=e)

P(Q=q, E=e)• Could always ask about 0-prob situation• Always ask “[Pregnant=t, Gender=Male]” sq(Pregnant=t, Gender=Male)=1, but P(Pregnant=t, Gender=Male ) = 0

P(E=e)• If sq(Q, E=ei) P(E=ei), then

• P(Gender=Female ) = P(Gender=Male ) sq(Pregnant, Gender=Female) = sq(Pregnant, Gender=Male)

Return

Page 49: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Does it matter? If all queries involve same query variablesame query variable, ok to pretend

sq(.) ~ p(.) as no-one ever asks about EVIDENCE DISTRIBUTION

Pregnant Adult Gender

+ + F

- + F

+ + F

Eg, in

As no one asks “What is P(Gender)?, doesn’t matter …

But problematic in MultiClassifier…

if other queries – eg, sq(Gender; .)

Page 50: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

ILQ (cond likelihood) vs APN (likelihood)

Wrong structure: ILQ better than APN/EM Experiments…

• Artificial data

• Using Naïve Bayes (UCI)

Correct structure ILQ often better than OFE, APN/EM Experiments

Discriminant analysis: Maximize Overall Likelihood vs Minimize Predictive Error

Page 51: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Wrong Structure Imth target distribution:

“TAN” w/ E1 E2 … Em

QQ

EE11 EEmm EEkk……EE11……

QQ

EE11 EEmm EEkk……EE11……

Wrong structure (NB)

Results… (k=5, m=0..4)

Page 52: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Results – Wrong StructureCE

MSE

Page 53: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Wrong Structure IILearn NaiveBayes for REAL-World dataset:

Chess (FLARE, DNA)

CE:

MSE:

Page 54: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Correct StructureIf structure is correct

ILQ, OFE / (APN,EM) should converge to optimal

Which is more efficient?

Depends…

Page 55: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

“Correct” StructureFill in parameter for CORRECT STRUCTURE

for REAL-World dataset: Chess (FLARE, DNA)Structure learned using PowerConstructor

CE:

Page 56: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Summary of Results

Dataset ILQ OFE EM APNFlare 0.1756 0.198

DNA 0.0489 0.0557

Chess 0.0558 0.1423

Vote 0.0345 0.1057 0.1057

MSE results vs OFE if data complete vs APN/EM if data incomplete

Page 57: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Query FormsMD asks “What is P(D|A, B)?” 20%

sq(D=t|A,B) = 0.2 sq(D=t; A,B) = i sq(D=t; A=ai, B=bi)

Challenge#1: Subdistribution: sq(D; A=ai,B=bi) = sq(D; A,B)

(A=ai,B=bi| “Asked D|A,B question”)

Perhaps is Uniform? …or = P(A=ai,B=bi) ? NO! sq(Pregnant | Gender) = 1.0

? sq( Preg | Gend=M) = sq( Preg | Gend=F) ??

Challenge#2: Need 2k labels!… but in UQT model, perhaps not needed…

In UQT, may need to SHRINK network!… but QueryFORMS may be sufficient!

Back

Page 58: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

For each query p(q|e), for each cf,

Efficiency of ILQ

If q,e d-separated from c,f: d LCL(q|e) /d c|f = 0, skip!Saves 10-90% of work!

Current Timing (PIII-500)

ALARM: 100 millisec / query (each iter) INSURANCE: 30 millisec/query (each iter)

)|(),|(

)],|(),|,([

|

|

)|(

efBeqfB

eqfBeqfcBd

LCLd

fc

fc

eq

Page 59: Exploiting Common SubRelations: Learning  One  Belief Net for Many Classification Tasks

Results (Alarm; ILQ-1/APN-1; LargeSample)

CE

MSE