Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler...

Learning – EM in The ABO locusTutorial #9

Based on original slides of Ydo Wexler & Dan Geiger

Genotype statistics

Mendelian Genetics:• locus - a particular location on a chromosome (genome)

- Each locus has two copies – alleles (one paternal and one maternal)- Each copy has several relevant states - genotypes

• locus genotype is determined by the combined genotype of both copies.• locus genotype yields phenotype (physical features)

NN tsts ,,

We wish to estimate the distribution of all possible genotypes.

Suppose we randomly sample N individuals and found the

number Ns,t.

The MLE is given by: Sampling genotypes is costlySampling phenotypes is cheap

The ABO locus

• ABO locus determines blood-type

• It has six possible genotypes {a/a, a/o, b/o, b/b, a/b, o/o}.

• They lead to four possible phenotypes: {A, B, AB, O}

We wish to estimate the proportion in a population of the 6

genotypes.

- Sample genotype – sequence a genomic region

- Sample phenotype - checking presence of antibodies (simple

blood test)

Problem: phenotype doesn’t reveal genotype (in case of

The ABO locus

Problem: phenotype doesn’t reveal genotype

The probabilistic model: Allele genotypes are distributed

i.i.d w.p a ,b ,o, and determine probabilities for locus genotypes:

• a/b=2a b ; a/o=2a o ; b/o=2b o

• a/a= a2 ; b/b=b

2 ; o/o=o2

This implies probabilities for phenotypes:

• Pr[P=A |Θ] = a/a+a/o = a2+2a o

• Pr[P=B |Θ] = b/b+b/o = b2+2b o

• Pr[P=AB |Θ] = a/b= 2a b

• Pr[P=O |Θ] = o/o = o2

Hardy-Weinbergequilibrium

Θ - model parameter set

Θ={a ,b ,o}

Likelihood of phenotype data

Given a population phenotype sample: Data =

{B,A,B,B,O,A,B,A,O,B, AB}

the likelihood of our parameter set Θ={a ,b ,o} is:

3 5 212 2 2Pr[ | ] 2 2 2a a o b b o a b oData A B AB O

• Maximum of this function yields the MLE

Use EM to obtain this

The EM algorithm

The setting for the algorithm:

• Our data is a series of outcomes of experiments.

• Each experiment is conducted identically and independently.

• The outcome of an experiment is a function of values selected

for a set of discrete random variables – X1,..Xn .

• The actual values selected for X1,..Xn may be hidden from us.

We wish to find the MLE of the p.d’s for X1,..Xn .

The EM algorithmThe setting for the algorithm:• Our data is a series of outcomes of experiments.• Each experiment is conducted identically and independently.• The outcome of an experiment is a function of values selected for a set

of discrete random variables – X1,..Xn .

• The actual values selected for X1,..Xn may be hidden from us.

We wish to find the MLE of the p.d’s for X1,..Xn .Examples:

1.Genotyping in the ABO locus:• Single hidden variable X – a single allele genotype (a,b, or o)

• Model parameters - Θ={a ,b ,o}

2.Hidden Markov Models:

• Two hidden variables Ts , Es for every state state s

(Es – chooses signal ; Ts – chooses next state)

• Model parameters – transition and emmission probabilities.

The EM algorithm

Start with some set of parameters- Θ.

Iterate until convergence:

• E-step:

calculate the expected count for every possible result of every hidden variable in the model, as implied by data and Θ

• M-step:

For every hidden variable:

- Use expected counts as statistics to yield Θ’ MLE(data,Θ)

The EM algorithmE-step:

calculate the expected count for every possible result of every hidden variable in the model, as implied by data and Θ

M-step:

For every hidden variable:- Use expected counts as statistics to yield Θ’ MLE(data,Θ)

In our example:

• Single hidden variable X – a single allele genotype (a,b, or o)

E-step: count the expected number of a,b,o alleles in

population(total number of counts - 2n).

M-step: set ’a = #a/2n ; ’b = #b/2n ; ’o = #o/2n .

E-step calculations – gene counting

genotype

gene count

pheno-

2 a o 2

2 b o 2

2 a b 2

gene count

observed outcome

of “experiment”

result(s) ofhidden variables

Datatype #people

We start with an initial guess: 0 = {0.2, 0.2, 0.6}

A numeric example

Sufficient statistics:

nA , nB , nAB , nO

1st iteration: 0= {0.2, 0.2, 0.6}

2( )100 200 0 50 1 50 0 164

(2 )o a

A numeric example - execution of EMData

type #people

O 50E-step: A B AB O

E[(#a)] =

E[(#b)] =

E[(#o)] =

2( )100 0 200 50 1 50 0 278

(2 )o b

2 2100 200 50 0 50 2 357

(2 ) (2 )o o

o a o b

800 = 2nM-step:2 4 1

7 7 7164 278 357' 0.205 ; ' 0.348 ; ' 0.447

800 800 800a b o

1= {0.205, 0.348, 0.447}

A numeric example - execution of EMData

type #people

O 50E-step: A B AB O

E[(#a)] =

E[(#b)] =

E[(#o)] =

800 = 2nM-step:

2nd iteration: 1= {0.205, 0.348, 0.447}

2= {0.211, 0.383, 0.406}

168.66 306.04 325.3' 0.211 ; ' 0.383 ; ' 0.406

800 800 800a b o

2( )100 200 0 50 1 50 0 168.66

(2 )o a

2( )100 0 200 50 1 50 0 306.04

(2 )o b

100 200 50 0 50 2 325.3(2 ) (2 )

o a o b

E-step:

2( )[# ] 1

2 2[# ] 2

(2 ) (2 )

o aA AB

o bB AB

o oA B O

o a o b

E a n n

E b n n

E o n n n

Sufficient statistics – nA , nB , nAB , nO

M-step: [# ] [# ] [# ]; ;

2 2 2a b b

E a E b E b

EM algorithm for the ABO locus - summary

Iteration update formula:

2( )11

2 (2 )

2( )11

2 (2 )

2 (2 ) (2 )

o aa A AB

o bb B AB

o oo A B O

o a o b

n n nn

Sufficient statistics – nA , nB , nAB , nO ,

EM algorithm – ABO exampleData

type #people

Learning iteration

EM algorithm – ABO exampleData

type #people

Learning iteration

good convergence(maybe)

Alternative solution

Alternative view:

• Single hidden variable X’ – a maternal allele genotype (a,b, or

E-step: count the expected number of maternal a,b,o alleles

in population (total number of counts - n).

M-step: set ’a = #a/n ; ’b = #b/n ; ’o = #o/n .

Initial view:• Single hidden variable X – a single allele genotype (a,b, or o)

E-step: count the expected number of a,b,o alleles in population(total number of counts - 2n).

M-step: set ’a = #a/2n ; ’b = #b/2n ; ’o = #o/2n .

1/2 1/2 0

E-step calculations – gene countingmat.gen.

pheno-

observed outcome

of “experiment”

result(s) ofhidden variables

( )b b o

( )a a o

Exactly ½ of what we got by gene counting

Iteration update formula:

( )1 1

(2 ) 2

( )1 1

(2 ) 2

(2 ) (2 )

o aa A AB

o bb B AB

o oo A B O

o a o b

n n nn

Sufficient statistics – nA , nB , nAB , nO ,

Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler...

Documents

Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

Howard S. Veisz KORNSTEIN VEISZ WEXLER · 2020. 1. 15. · Howard S. Veisz KORNSTEIN VEISZ WEXLER. Howard S. Veisz Marvin Wexler Lawrence C. Fox KORNSTEIN VEISZ WEXLER. & POLLARD

Gronau kagermeier lsa-bristol industriekultur 2006_def

Soil Survey of Effingham County, Georgia...Reverend Israel Christian Gronau, were in search of religious freedom. Oglethorpe, accompanied by Gronau and a few others, selected a spot

Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Prof. Shlomo Moran TA: Ydo Wexler Lecture: Tuesday12:30-14:30, Taub 6 Tutorial: Tuesday11:30-12:30,

Gronau kagermeier rgs_2010_1_09_2010 [kompatibilitätsmodus]

Wexler Tab 7i Schematics

Lynn Wexler - David Magazine, June 2013 Issue

By: Elinor Quint, Shiry Taitlbaum , Callie Wexler

Lynn Wexler - David Magazine November 2011 Issue

Lynn Wexler - David Magazine February 2012 Issue

Lynn Wexler - David Magazine November 2012 Issue

ROCK LIKE A HURRICANE 1208.2012 rock pop Il … · ROCK LIKE A HURRICANE 1208.2012 rock pop Il Gronau sound, vision and exhibitions Udo-Lindenberg-Platz I 48599 Gronau

Crystal Gronau & Marlene Zobayan Rutlen Associates LLC October 9, 2015, Session 5

Lynn Wexler David Magazine January 2014 issue

Personal Injury Lawyer St. Catharines - Wexler

Computational Genomics Lecture 7c Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

EFF: wexler amicus

Lynn Wexler David Magazine February 2014

Rock‘n Pop Museum, Gronau Rock‘n Pop Museum, Gronau, Germany