20
. Likelihood Computations Likelihood Computations Using Value Abstraction Using Value Abstraction Application in Genetic Application in Genetic Linkage Analysis Linkage Analysis N ir Friedm an D an G eiger N oam Lotner H ebrew U niversity Technion H ebrew U niversity

Likelihood Computation Given a Bayesian network and evidence e, compute P( e ) Sum over all possible values of unobserved variables Bet1Die Win1

Embed Size (px)

Citation preview

.

Likelihood Computations Using Likelihood Computations Using Value AbstractionValue Abstraction

Application in Genetic Linkage Application in Genetic Linkage

AnalysisAnalysis

Nir Friedman Dan Geiger Noam LotnerHebrew University Technion Hebrew University

Likelihood Computation

Given a Bayesian network <G,> and evidence e, compute P(e)

Sum over all possible values of unobserved variables

nXXValv

vePeP1

,

Bet1 Die

Win1Win1

e = { Win1 = true }

The Basic Concept

P(e,Die=1)= P(e,Die=3)= P(e,Die=5)andP(e,Die=2)= P(e,Die=4)= P(e,Die=6)

Bet1 Die

Win1

Val(Bet1) = {odd,even}e = { Win1=true }

The exact value of Die need not be known to

calculate exact likelihood

Group values, calculate once for each group

Value Abstraction

Val(Die)

1

2

3

4

5

6

Val(Diea

)

{1,4}

{2,5,6}

{3}

Defi ne: P X v P X va a

v Val xv va

A partition of a variable’s domainBet1 Die

Win1

Safe Value Abstraction

An abstraction is safe w.r.t. evidence e if v v P e X v P e X v

Preserves likelihood information

Val(Die)

Val(Diea

)1

2

3

4

5

6

{1,3,5}

{2,4,6}

Bet1 Die

Win1

Win2

Bet2Val(Bet2)={ 1-2, 3-6 }

1

2

3

4

5

6

{1,3,5}

{2,4,6}

1

2

3

4

5

6

{1}

{2}

{3,5}

{4,6}

{1,3,5}

{2,4,6}

e = {Win1=true}

e = {Win1=true, Win2=true}

Bet1 Die

Win1

Safe Value Abstraction

Win1

Val(Bet1)={odd, even }

Win2

A safe abstraction for Val(Die)

Need to refine

Refinement

Cautious Value Abstraction

Bet1 Die

Win1Win2

Bet2

Maximal abstraction - a tight refinement

{2}

{3,5}

{4,6}

{1}{1,3,5}

{2,4,6}

{1,2}

{3-6}

Win1=true

Win2=true

Val(Bet1)={ odd, even } Val(Bet2)={ 1-2, 3-6 }

Abstracting a Bayesian Net

An abstraction of Xi implies a partition of Pai’s values

Abstract each variable after it’s children are abstracted, use a tight refinement of all partitions implied by children

Output - Ga : aGePGeP ,,

For each variable:1. Calculate maximal abstraction2. Propagate to parents

X

Initialization:Abstract observed variables

Linear in # variables and network representation

The Application-Genetic Linkage Analysis

The goal - Find the location of a disease (target) gene on a

chromosome relative to some other (known) locations

Map of human chromosome 16

Known loci

Recombination Fraction

A B

No crossover between A and B

Crossover between A and B

P = P = 1-

The data - pedigrees

Linkage Analysis

1 2

3 4

5 6

2/3

Converting the pedigree to a Bayesian netOne locus:

The Probabilistic Model

1 2

3

4

5 6

Orange - genotype nodes

Blue - phenotype nodes

Red - selector nodes (these represent linkage)

1 2

34

5 6

Locus #1

5

1 2

34

6

Locus #2

The Probabilistic Model

More than 1 locus:

s2

s1

112

12

ssP

ssP

1e+6

1e+910

100

1000

10000

1e+5

10 1000 1e+5 1e+7

Abs

trac

ted

Original

Clique-tree size

100

1000

100 1000

Abs

trac

ted

Original

Network size

Experimental Evaluation

90 pedigrees (5-200 individuals) from 10 studies Total of 280 linkage analysis problems

Varied number of loci

# loci:

+ 1

+ 2

+ 3

+ 4

Bet1 Die

Win2

Bet2

Win1

Abstracting Multiple Variables

1

2

3

4

5

6

{1,3,5}

{2,4,6}

Diea

odd

even

odd

even

Bet1a

loss

win

1,o

2,o

3,o

4,o

5,o

6,o

1,e

2,e

3,e

4,e

5,e

6,e

y

yvfvm ,121

z

zvxfvxm ,,, 232

Clique-Tree Elimination

X,V,U

X,W

X,V,ZV,Y

vxmvmuvxfxmuv

,,, 3231,

343

C1 C2

C3

C4

Message-Specific Abstraction

X,V,U

X,W

X,V,ZV,YC1 C2

C3

C4

vxmvmuvxfxmuv

,,, 3231,

343

Given safe abstractions for f3, m13, m23

- construct a safe abstraction for m34

Refinement multiplication

Projection summation

xmxmxx is safe for message m if

Use dynamic programming to efficiently compute a safe abstraction for the whole tree

Experimental Evaluation

How much more can we save ?

# loci: + 1 + 2 + 3 + 4

Abs

trac

ted

cliq

ue-t

ree

Abstracted network

10000

10

100

1e+6

10 100 10000 1e+6

Cliq

ue-t

ree

size

rat

io

Abstracted network

1

10

100

1000

10 100 1000 10000 100000 1e+6

Clique-tree size Ratio

Total Reduction

Cliq

ue-s

ize

Rat

io (

orig

/abs

)

Problem size (#individuals X #genotypes)

10

100

1000

10000

100000

1e+006

1e+007

1e+008

1e+009

1 10 100 1000 10000

Summary

Safe abstraction w.r.t. specific evidence An algorithm to reduce problem complexity

Linear in net representationIndependent of inference procedure

Motivated by VITESSE[ ] Further reductions with inference procedure known Caveats

As costly as inferenceCost is ammortized when used for e.g. parameter

estimation Representation of abstractions