24
Recitation on EM slides taken from: http://www.cs.ucsb.edu/~ambuj/Courses/bioinforma tics/EM.pdf Computational Genomics Recitation #6

Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf

Embed Size (px)

Citation preview

Page 1: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf

Recitation on EMslides taken from:

http://www.cs.ucsb.edu/~ambuj/Courses/bioinformatics/EM.pdf

Computational GenomicsRecitation #6

Page 2: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf
Page 3: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf

All EM questions are in the format:

1. Write the likelihood function.2. Write the Q function.3. Derive the update rule.

Page 4: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf
Page 5: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf

Estimation problems

Page 6: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf

Estimation problems

What is the unobserved data in this case?

Page 7: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf

Estimation problems

Page 8: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf

?

?

?

Page 9: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf
Page 10: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf

?

?

?

Page 11: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf
Page 12: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf

?

?

?

?

??

?

?

?

Page 13: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf

?

??

Page 14: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf
Page 15: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf
Page 16: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf
Page 17: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf
Page 18: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf

EM question

• Let G = (G1, … , Gn) be n contiguous DNA regions representing genes. For each Gi we define the mRNA concentration of the gene as Pi, s.t. their sum is equal to 1. P = (P1, … , Pn) can be interpreted as the normalized expression levels for the regions in G.

Page 19: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf

EM question

• Our model assumes that reads are generated by randomly picking a region R from G according to the distribution P, and then copying this region. The copying process is error-prone. This process is repeated until we have a set of m reads R = r1, … , rm generated according to the model described above.

Page 20: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf

EM question

• For each region Gj and read ri, we have a probability pij = P(rj | Gi), the probability of observing rj given that the locus of the read was gene Gi. In practice, for each read rj, this probability will be close to zero for all but a few regions.

Page 21: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf

Likelihood function

• Write the likelihood of observing the m reads.

?

Page 22: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf

Q function

• Write the Q(P | P(t)) term.

?

?

Page 23: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf

M-step

• Write the M-step term using argmax function.

Page 24: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf

Update rule

• Infer from c the update step for P.

When we want to maximize ∑iailog(Pi) based on Pi, we achieve the maximum Pi=ai/∑iai

?