19
.

Parameter Estimation using likelihood functions Tutorial #1

  • Upload
    dolf

  • View
    18

  • Download
    0

Embed Size (px)

DESCRIPTION

Parameter Estimation using likelihood functions Tutorial #1. This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at www.cs.huji.ac.il/~pmai . Changes made by Dan Geiger and Ydo Wexler. Example: Binomial Experiment. - PowerPoint PPT Presentation

Citation preview

Page 1: Parameter Estimation using likelihood functions Tutorial #1

.

Page 2: Parameter Estimation using likelihood functions Tutorial #1

.

Parameter Estimation using likelihood functions

Tutorial #1

This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at www.cs.huji.ac.il/~pmai. Changes made by Dan Geiger and Ydo Wexler.

Page 3: Parameter Estimation using likelihood functions Tutorial #1

3

Example: Binomial Experiment

When tossed, it can land in one of two positions: Head or Tail

Head Tail

We denote by the (unknown) probability P(H).

Estimation task:Given a sequence of toss samples x[1], x[2], …, x[M] we want to estimate the probabilities P(H)= and P(T) = 1 -

Page 4: Parameter Estimation using likelihood functions Tutorial #1

4

Statistical Parameter Fitting

Consider instances x[1], x[2], …, x[M] such that

The set of values that x can take is known Each is sampled from the same distribution Each sampled independently of the rest

i.i.d.samples

The task is to find a vector of parameters that have generated the given data. This vector parameter can be used to predict future data.

Page 5: Parameter Estimation using likelihood functions Tutorial #1

5

The Likelihood Function How good is a particular ?

It depends on how likely it is to generate the observed data

The likelihood for the sequence H,T, T, H, H is

m

D mxPDPL )|][()|()(

)1()1()(DL

0 0.2 0.4 0.6 0.8 1

L()

Page 6: Parameter Estimation using likelihood functions Tutorial #1

6

Sufficient Statistics

To compute the likelihood in the thumbtack example we only require NH and NT

(the number of heads and the number of tails)

NH and NT are sufficient statistics for the binomial distribution

THD

NNL )1()(

Page 7: Parameter Estimation using likelihood functions Tutorial #1

7

Sufficient Statistics

A sufficient statistic is a function of the data that summarizes the relevant information for the likelihood

Datasets

Statistics

Formally, s(D) is a sufficient statistics if for any two datasets D and D’

s(D) = s(D’ ) LD() = LD’ ()

Page 8: Parameter Estimation using likelihood functions Tutorial #1

8

Maximum Likelihood Estimation

MLE Principle:

Choose parameters that maximize the likelihood function

This is one of the most commonly used estimators in statistics

Intuitively appealing One usually maximizes the log-likelihood

function defined as lD() = logeLD()

Page 9: Parameter Estimation using likelihood functions Tutorial #1

9

Example: MLE in Binomial Data

Applying the MLE principle we get 1loglog THD NNl

1TH NN

0 0.2 0.4 0.6 0.8 1

L()

Example:(NH,NT ) = (3,2)

MLE estimate is 3/5 = 0.6

TH

H

NN

N

(Which coincides with what one would expect)

Page 10: Parameter Estimation using likelihood functions Tutorial #1

10

From Binomial to Multinomial

For example, suppose X can have the values 1,2,…,K (For example a die has 6 sides)

We want to learn the parameters 1, 2. …, K

Sufficient statistics:N1, N2, …, NK - the number of times each outcome is observed

Likelihood function:

MLE:

K

k

NkD

kL1

)(

N

Nkk

ˆ

Page 11: Parameter Estimation using likelihood functions Tutorial #1

11

Example: Multinomial

Let be a protein sequence We want to learn the parameters q1, q2,…,q20

corresponding to the frequencies of the 20 amino acids

N1, N2, …, N20 - the number of times each amino acid is observed in the sequence

Likelihood function:

20

1

)(k

NkD

kqqL

nxxx ....21

n

Nq k

k MLE:

Page 12: Parameter Estimation using likelihood functions Tutorial #1

12

Is MLE all we need?

Suppose that after 10 observations, ML estimates P(H) = 0.7 for the thumbtack Would you bet on heads for the next toss?

Suppose now that after 10 observations,ML estimates P(H) = 0.7 for a coinWould you place the same bet?

Solution: The Bayesian approach which incorporates your subjective prior knowledge. E.g., you may know a priori that some amino acids have the same frequencies and some have low frequencies. How would one use this information ?

Page 13: Parameter Estimation using likelihood functions Tutorial #1

13

Bayes’ rule

)(

||

xP

yPyxPxyP

yPyxPyxP |,

Where

Bayes’ rule:

y

yPyxPxP |

It hold because:

yy

yPyxPyxPxP |,

Page 14: Parameter Estimation using likelihood functions Tutorial #1

14

Example: Dishonest Casino

A casino uses 2 kind of dice:

99% are fair

1% is loaded: 6 comes up 50% of the times We pick a die at random and roll it 3 times We get 3 consecutive sixes

What is the probability the die is loaded?

Page 15: Parameter Estimation using likelihood functions Tutorial #1

15

Dishonest Casino (cont.)

The solution is based on using Bayes rule and the fact that while P(loaded | 3sixes) is not known, the other three terms in Bayes rule are known, namely:

P(3sixes | loaded)=(0.5)3

P(loaded)=0.01 P(3sixes) =

P(3sixes | loaded) P(loaded)+P(3sixes | not loaded) (1-P(loaded))

)3(

)()|3()3|(

sixesP

loadedPloadedsixesPsixesloadedP

Page 16: Parameter Estimation using likelihood functions Tutorial #1

16

Dishonest Casino (cont.)

21.0

99.061

01.05.0

01.05.0

)()|3()()|3(

)()|3(

)3(

)()|3()3|(

33

3

fairPfairsixesPloadedPloadedsixesP

loadedPloadedsixesP

sixesP

loadedPloadedsixesPsixesloadedP

Page 17: Parameter Estimation using likelihood functions Tutorial #1

17

Biological Example: Proteins

Extracellular proteins have a slightly different amino acid composition than intracellular proteins.

From a large enough protein database (SWISS-PROT), we can get the following:

p(ext) - the probability that any new sequence is extracellular

int

iaq p(ai|int) - the frequency of amino acid ai for intracellular proteins

extai

q p(ai|ext) - the frequency of amino acid ai for extracellular

proteinsp(int) - the probability that any new sequence is intracellular

Page 18: Parameter Estimation using likelihood functions Tutorial #1

18

Biological Example: Proteins (cont.)

What is the probability that a given new protein sequence x=x1x2….xn is extracellular?

Assuming that every sequence is either extracellular or intracellular (but not both)

we can write .

Thus,

(int)1)( pextp

int|(int)|)( xppextxpextpxp

Page 19: Parameter Estimation using likelihood functions Tutorial #1

19

Biological Example: Proteins (cont.)

Using conditional probability we get

, )|(| extxpextxp ii int)|(int|

iixpxP

• The probabilities p(int), p(ext) are called the prior probabilities.

• The probability P(ext|x) is called the posterior probability.

ii

ii

ii

xppextxpextp

extxpextpxextP

int)|((int))|()(

)|()(|

By Bayes’ theorem