Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Final Project meet with me over the next week or
so to discuss possibilities
What is Likelihood?
What is Likelihood?
prob(data|parm) probability of some data given the known parameters of some model
L(data|parm) likelihood of known data given particular candidate parameters of the model
know parameters à predict some outcome in future data
observing some data à estimate some parameters that maximize the likelihood of the observed data
imagine an unfair coin that gives heads with probability p=.6 and tails with probability q=1-p=.4
what is the probability of getting x heads on N flips?
Prob(x | p) = Nx
!
"#
$
%& px (1' p)N'x = N!
x!(N ' x)!px (1' p)N'x
Binomial Distribution f(x; p) give the probability of observing x “successes” for a Bernoulli process with probability p
probability of getting any particular combination of x heads and N-x tails
number of ways of getting x heads and N-x tails
imagine we flip a coin 10 times and get 4 heads (N=10, x=4) what is the maximum likelihood estimate of p?
L(x | p) = Prob(x | p) = Nx
!
"#
$
%& px (1' p)N'x = N!
x!(N ' x)!px (1' p)N'x
now, x (and N) are fixed we want to find the value of p that maximizes the likelihood L of the data
Matlab week5.m
imagine we flip a coin 10 times and get 4 heads (N=10, x=4) what is the maximum likelihood estimate of p? USING CALCULUS
L(x | p) = Prob(x | p) = Nx
!
"#
$
%& px (1' p)N'x = N!
x!(N ' x)!px (1' p)N'x
lnL = log Nx
!
"#
$
%&
!
"##
$
%&&+ x ln p+ (N ' x)ln(1' p)
d lnLdp
= x 1p
!
"#
$
%&+ (N ' x) 1
1' p!
"#
$
%&('1) = 0
x(1' p)' (N ' x)p = 0x ' xp' Np+ xp = 0
p = xN
imagine we flip a coin 10 times and get 4 heads (N=10, x=4) what is the maximum likelihood estimate of p? USING CALCULUS
L(x | p) = Prob(x | p) = Nx
!
"#
$
%& px (1' p)N'x = N!
x!(N ' x)!px (1' p)N'x
lnL = log Nx
!
"#
$
%&
!
"##
$
%&&+ x ln p+ (N ' x)ln(1' p)
d lnLdp
= x 1p
!
"#
$
%&+ (N ' x) 1
1' p!
"#
$
%&('1) = 0
x(1' p)' (N ' x)p = 0x ' xp' Np+ xp = 0
p = xN
imagine we flip a coin 10 times and get 4 heads (N=10, x=4) what is the maximum likelihood estimate of p? USING CALCULUS
L(x | p) = Prob(x | p) = Nx
!
"#
$
%& px (1' p)N'x = N!
x!(N ' x)!px (1' p)N'x
lnL = log Nx
!
"#
$
%&
!
"##
$
%&&+ x ln p+ (N ' x)ln(1' p)
d lnLdp
= 0+ x 1p
!
"#
$
%&+ (N ' x) 1
1' p!
"#
$
%&('1) = 0
x(1' p)' (N ' x)p = 0x ' xp' Np+ xp = 0
p = xN
imagine we flip a coin 10 times and get 4 heads (N=10, x=4) what is the maximum likelihood estimate of p? USING CALCULUS
L(x | p) = Prob(x | p) = Nx
!
"#
$
%& px (1' p)N'x = N!
x!(N ' x)!px (1' p)N'x
lnL = log Nx
!
"#
$
%&
!
"##
$
%&&+ x ln p+ (N ' x)ln(1' p)
d lnLdp
= 0+ x 1p
!
"#
$
%&+ (N ' x) 1
1' p!
"#
$
%&('1) = 0
x(1' p)' (N ' x)p = 0x ' xp' Np+ xp = 0
p = xN
Now let’s try finding maximum likelihood parameters for a distribution you’ve never seen before. It’s called the Lamron Distribution:
Prob(x |!," 2 ) = 1" 2#
exp !(x !!)2
2" 2"
#$
%
&'
typically, we would want to know the probability of observing x (actually a range of x) given α and β
Now let’s try finding maximum likelihood parameters for a distribution you’ve never seen before. It’s called the Lamron Distribution:
Prob(x |!," 2 ) = 1" 2#
exp !(x !!)2
2" 2"
#$
%
&'
typically, we would want to know the probability of observing x (actually a range of x) given α and β
let us assume instead that we have some observed data x1,x2,x3…xn and we want to find the maximum likelihood estimates for α and β
Now let’s try finding maximum likelihood parameters for a distribution you’ve never seen before. It’s called the Lamron Distribution:
Prob(x1, x2,... |!,"2 ) = 1
" 2#exp !
(xi !!)2
2" 2"
#$
%
&'
i(
typically, we would want to know the probability of observing x (actually a range of x) given α and β
let us assume instead that we have some observed data x1,x2,x3…xn and we want to find the maximum likelihood estimates for α and β
Now let’s try finding maximum likelihood parameters for a distribution you’ve never seen before. It’s called the Lamron Distribution:
Prob(x1, x2,... |!,"2 ) = 1
" 2#exp !
(xi !!)2
2" 2"
#$
%
&'
i(
typically, we would want to know the probability of observing x (actually a range of x) given α and β
let us assume instead that we have some observed data x1,x2,x3…xn and we want to find the maximum likelihood estimates for α and β
Now let’s try finding maximum likelihood parameters for a distribution you’ve never seen before. It’s called the Lamron Distribution:
in other words, what values of parameters (α and β) make that observed set of data most likely
Matlab week5.m
Of course, this is just the Normal Distribution
Prob(x |µ,! 2 ) = 1! 2"
exp !(x !µ)2
2! 2
"
#$
%
&'
Prob(x |µ,! 2 ) = 1! 2"
exp !(x !µ)2
2! 2
"
#$
%
&'
L(x |µ,! 2 ) = 1! 2"
exp !(x !µ)2
2! 2
"
#$
%
&'
L(µ,! 2; x1x2...xn ) =1
! 2"exp !
(xi !µ)2
2! 2
"
#$
%
&'
i=1
n
(
lnL = ! 12 ln(2" )! ln! !
(xi !µ)2
2! 2
)
*+
,
-.
i=1
n
/
0 lnL0µ
=1! 2 (xi !µ)
i=1
n
/ = 0
µ̂ =xi
i=1
n
/n
0 lnL0!
= !n!+1! 2 (xi !µ)
2
i=1
n
/ = 0
!̂ =(xi !µ)
2
i=1
n
/n
That’s the likelihood of observing one data point x. What about the likelihood of observing x1, x2, x3, …
Prob(x |µ,! 2 ) = 1! 2"
exp !(x !µ)2
2! 2
"
#$
%
&'
L(x |µ,! 2 ) = 1! 2"
exp !(x !µ)2
2! 2
"
#$
%
&'
L(µ,! 2; x1x2...xn ) =1
! 2"exp !
(xi !µ)2
2! 2
"
#$
%
&'
i=1
n
(
lnL = ! 12 ln(2" )! ln! !
(xi !µ)2
2! 2
)
*+
,
-.
i=1
n
/
0 lnL0µ
=1! 2 (xi !µ)
i=1
n
/ = 0
µ̂ =xi
i=1
n
/n
0 lnL0!
= !n!+1! 2 (xi !µ)
2
i=1
n
/ = 0
!̂ =(xi !µ)
2
i=1
n
/n
Recall that if p(x) is the probability of observing x, and if x1, x2, x3, are independent and identically distributed, then p(x1, x2, x3) = p(x1)p(x2)p(x3)
Prob(x |µ,! 2 ) = 1! 2"
exp !(x !µ)2
2! 2
"
#$
%
&'
L(x |µ,! 2 ) = 1! 2"
exp !(x !µ)2
2! 2
"
#$
%
&'
L(x1x2...xn |µ,!2 ) = 1
! 2"exp !
(xi !µ)2
2! 2
"
#$
%
&'
i=1
n
(
lnL = ! 12 ln(2" )! ln! !
(xi !µ)2
2! 2
)
*+
,
-.
i=1
n
/
0 lnL0µ
=1! 2 (xi !µ)
i=1
n
/ = 0
µ̂ =xi
i=1
n
/n
0 lnL0!
= !n!+1! 2 (xi !µ)
2
i=1
n
/ = 0
!̂ =(xi !µ)
2
i=1
n
/n
What is lnL?
Prob(x |µ,! 2 ) = 1! 2"
exp !(x !µ)2
2! 2
"
#$
%
&'
L(x |µ,! 2 ) = 1! 2"
exp !(x !µ)2
2! 2
"
#$
%
&'
L(x1x2...xn |µ,!2 ) = 1
! 2"exp !
(xi !µ)2
2! 2
"
#$
%
&'
i=1
n
(
lnL = ! 12 ln(2" )! ln! !
(xi !µ)2
2! 2
)
*+
,
-.
i=1
n
/
0 lnL0µ
=1! 2 (xi !µ)
i=1
n
/ = 0
µ̂ =xi
i=1
n
/n
0 lnL0!
= !n!+1! 2 (xi !µ)
2
i=1
n
/ = 0
!̂ =(xi !µ)
2
i=1
n
/n
L(x |µ,! 2 ) = 1! 2"
exp !(x !µ)2
2! 2
"
#$
%
&'
L(x1x2...xn |µ,!2 ) = 1
! 2"exp !
(xi !µ)2
2! 2
"
#$
%
&'
i=1
n
(
lnL = ! 12 ln(2" )! ln! !
(xi !µ)2
2! 2
)
*+
,
-.
i=1
n
/
0 lnL0µ
=1! 2 (xi !µ)
i=1
n
/ = 0
µ̂ =xi
i=1
n
/n
0 lnL0!
= !n!+1! 2 (xi !µ)
2
i=1
n
/ = 0
!̂ =(xi !µ)
2
i=1
n
/n
Prob(x |µ,! 2 ) = 1! 2"
exp !(x !µ)2
2! 2
"
#$
%
&'
What is the derivative wrt μ?
L(x |µ,! 2 ) = 1! 2"
exp !(x !µ)2
2! 2
"
#$
%
&'
L(x1x2...xn |µ,!2 ) = 1
! 2"exp !
(xi !µ)2
2! 2
"
#$
%
&'
i=1
n
(
lnL = ! 12 ln(2" )! ln! !
(xi !µ)2
2! 2
)
*+
,
-.
i=1
n
/
0 lnL0µ
=1! 2 (xi !µ)
i=1
n
/ = 0
µ̂ =xi
i=1
n
/n
0 lnL0!
= !n!+1! 2 (xi !µ)
2
i=1
n
/ = 0
!̂ =(xi !µ)
2
i=1
n
/n
Prob(x |µ,! 2 ) = 1! 2"
exp !(x !µ)2
2! 2
"
#$
%
&'
Prob(x |µ,! 2 ) = 1! 2"
exp !(x !µ)2
2! 2
"
#$
%
&'
n
x
xnLn
x
xL
xL
xxxxL
xxL
n
ii
n
ii
n
ii
n
ii
n
i
i
n
i
in
∑
∑
∑
∑
∑
∏
=
=
=
=
=
=
−=
=−+−=∂∂
=
=−=∂∂
⎥⎦
⎤⎢⎣
⎡ −−−−=
⎟⎟⎠
⎞⎜⎜⎝
⎛ −−=
⎟⎟⎠
⎞⎜⎜⎝
⎛ −−=
1
2
1
22
1
12
12
2
21
12
2
212
2
22
)(ˆ
0)(1ln
ˆ
0)(1ln
2)(ln)2ln(ln
2)(exp
21)...;,(
2)(exp
21);,(
µσ
µσσσ
µ
µσµ
σµ
σπ
σµ
πσσµ
σµ
πσσµ
L(µ,! 2; x) = 1! 2"
exp !(x !µ)2
2! 2
"
#$
%
&'
L(µ,! 2; x1x2...xn ) =1
! 2"exp !
(xi !µ)2
2! 2
"
#$
%
&'
i=1
n
(
lnL = ! 12 ln(2" )! ln! !
(xi !µ)2
2! 2
)
*+
,
-.
i=1
n
/
0 lnL0µ
=1! 2 (xi !µ)
i=1
n
/ = 0
µ̂ =xi
i=1
n
/n
0 lnL0!
= !n!!122! 3 (xi !µ)
2
i=1
n
/ = 0
!̂ =(xi !µ)
2
i=1
n
/n
What is the derivative wrt σ?
Prob(x |µ,! 2 ) = 1! 2"
exp !(x !µ)2
2! 2
"
#$
%
&'
n
x
xnLn
x
xL
xL
xxxxL
xxL
n
ii
n
ii
n
ii
n
ii
n
i
i
n
i
in
∑
∑
∑
∑
∑
∏
=
=
=
=
=
=
−=
=−+−=∂∂
=
=−=∂∂
⎥⎦
⎤⎢⎣
⎡ −−−−=
⎟⎟⎠
⎞⎜⎜⎝
⎛ −−=
⎟⎟⎠
⎞⎜⎜⎝
⎛ −−=
1
2
1
22
1
12
12
2
21
12
2
212
2
22
)(ˆ
0)(1ln
ˆ
0)(1ln
2)(ln)2ln(ln
2)(exp
21)...;,(
2)(exp
21);,(
µσ
µσσσ
µ
µσµ
σµ
σπ
σµ
πσσµ
σµ
πσσµ
L(µ,! 2; x) = 1! 2"
exp !(x !µ)2
2! 2
"
#$
%
&'
L(µ,! 2; x1x2...xn ) =1
! 2"exp !
(xi !µ)2
2! 2
"
#$
%
&'
i=1
n
(
lnL = ! 12 ln(2" )! ln! !
(xi !µ)2
2! 2
)
*+
,
-.
i=1
n
/
0 lnL0µ
=1! 2 (xi !µ)
i=1
n
/ = 0
µ̂ =xi
i=1
n
/n
0 lnL0!
= !n!!122! 3 (xi !µ)
2
i=1
n
/ = 0
!̂ =(xi !µ)
2
i=1
n
/n
Consider the Exponential Distribution
A Poisson process is a “memoryless” process that generates random events over time. It’s “memoryless” in the sense that whether an event occurs on the next time instant does not depend on how long ago the last event occurred. Examples: ~ spikes in neurons, radioactive decay The exponential distribution give the distribution of times between events. The Poisson distribution gives you the distribution of the number of events within a given period of time. They’re complementary.
f (t | !) = ! exp(!! " t)
Consider the Exponential Distribution
f(t;λ) tells you the probability of observing an event given the rate λ. We want to figure out the rate λ given some data (t1,t2,…,tn)
f (t | !) = ! exp(!! " t)
L(t | !) = ! exp(!! " t)
L(!;t1, t2,..., tn ) = ! exp(!! " ti )i=1
n
#
L(!;t1, t2,..., tn ) = !n exp(!! ti
i=1
n
$ )
lnL = n ln! !! tii=1
n
$
% lnL%!
=n!! ti
i=1
n
$ = 0
!̂ =n
tii=1
n
$
L(t | !) = ! exp(!! " t)
L(t1, t2,..., tn | !) = ! exp(!! " ti )i=1
n
#
L(t1, t2,..., tn | !) = !n exp(!! ti
i=1
n
$ )
lnL = n ln! !! tii=1
n
$
% lnL%!
=n!! ti
i=1
n
$ = 0
!̂ =n
tii=1
n
$
L(t | !) = ! exp(!! " t)
L(t1, t2,..., tn | !) = ! exp(!! " ti )i=1
n
#
L(t1, t2,..., tn | !) = !n exp(!! ti
i=1
n
$ )
lnL = n ln! !! tii=1
n
$
% lnL%!
=n!! ti
i=1
n
$ = 0
!̂ =n
tii=1
n
$
L(t | !) = ! exp(!! " t)
L(t1, t2,..., tn | !) = ! exp(!! " ti )i=1
n
#
L(t1, t2,..., tn | !) = !n exp(!! ti
i=1
n
$ )
lnL = n ln! !! tii=1
n
$
% lnL%!
=n!! ti
i=1
n
$ = 0
!̂ =n
tii=1
n
$
L(t | !) = ! exp(!! " t)
L(t1, t2,..., tn | !) = ! exp(!! " ti )i=1
n
#
L(t1, t2,..., tn | !) = !n exp(!! ti
i=1
n
$ )
lnL = n ln! !! tii=1
n
$
% lnL%!
=n!! ti
i=1
n
$ = 0
!̂ =n
tii=1
n
$
L(t | !) = ! exp(!! " t)
L(t1, t2,..., tn | !) = ! exp(!! " ti )i=1
n
#
L(t1, t2,..., tn | !) = !n exp(!! ti
i=1
n
$ )
lnL = n ln! !! tii=1
n
$
% lnL%!
=n!! ti
i=1
n
$ = 0
!̂ =n
tii=1
n
$
i.e., the average rate
L(t | !) = ! exp(!! " t)
L(t1, t2,..., tn | !) = ! exp(!! " ti )i=1
n
#
L(t1, t2,..., tn | !) = !n exp(!! ti
i=1
n
$ )
lnL = n ln! !! tii=1
n
$
% lnL%!
=n!! ti
i=1
n
$ = 0
!̂ =n
tii=1
n
$
Think about a categorization (or identification) experiment
stimulus Si is categorized with response A or B
What is the data?
27 13 S1
A B
22 18 S2
.
.
.
8 32 Sk
what’s the likelihood of parameters of the GCM given this data?
L( fA|S1, fB|S1, fA|S2, fB|S2,..., fA|Sk, fB|Sk | parameters) =
L( fA|Si, fB|Si | parameters)i
k
! =
NSi
fA|Si
"
#
$$
%
&
''
i
k
! pA|SifA|Si (1( pA|Si )
NSi( fA|Si =
fA|Si + fB|SifA|Si fB|Si
"
#
$$
%
&
''
i
k
! pA|SifA|Si pB|Si
fB|Si
Think about a categorization (or identification) experiment
stimulus Si is categorized with response A, B, or C stimulus Si identified with response associated with stimulus 1, 2, …, n
What is the data?
34 13 3 Si
A B C
when presented with Si, there are 3 possible discrete outcomes – calling it A, B, or C
34 13 3 Si
A B C
when presented with Si, there are 3 possible discrete outcomes – calling it A, B, or C
Imagine that we knew perfectly the mechanism that was driving people’s categorization responses. This mechanism specifies p(A|Si), p(B|Si), and p(C|Si). If we had N=50 presentations of stimulus Si, we could figure out the probability of observed xA=34 A responses, xB=13 B responses, and xC=3 C responses.
Prob(xA, xB, xC; p(A|Si )p(B|Si )p(C|Si ) ) =N
xA xB xC
!
"##
$
%&& p(A|Si )
xA p(B|Si )xB p(C|Si )
xC
34 13 3 Si
A B C
when presented with Si, there are 3 possible discrete outcomes – calling it A, B, or C
Imagine that we knew perfectly the mechanism that was driving people’s categorization responses. This mechanism specifies p(A|Si), p(B|Si), and p(C|Si). If we had N=50 presentations of stimulus Si, we could figure out the probability of observed xA=34 A responses, xB=13 B responses, and xC=3 C responses. But we don’t know the model – that’s what we want to discover. But we do know the data.
34 13 3 Si
A B C
Prob(xA, xB, xC | p(A|Si )p(B|Si )p(C|Si ) ) =N
xA xB xC
!
"##
$
%&& p(A|Si )
xA p(B|Si )xB p(C|Si )
xC
L(xA, xB, xC | p(A|Si )p(B|Si )p(C|Si ) ) =N
xA xB xC
!
"##
$
%&& p(A|Si )
xA p(B|Si )xB p(C|Si )
xC
where do these probabilities come from? they’re from the model we’re trying to test
34 13 3 Si
A B C
∑ ∑
∑
∈ ∈
∈
+=
Aj BjijBijA
AjijA
i sbsb
sbSAP )|( ⎟
⎠
⎞⎜⎝
⎛−−= ∑
mmmmij jiwcs exp
Prob(xA, xB, xC | p(A|Si )p(B|Si )p(C|Si ) ) =N
xA xB xC
!
"##
$
%&& p(A|Si )
xA p(B|Si )xB p(C|Si )
xC
L(xA, xB, xC | p(A|Si )p(B|Si )p(C|Si ) ) =N
xA xB xC
!
"##
$
%&& p(A|Si )
xA p(B|Si )xB p(C|Si )
xC
34 13 3 Si
A B C
∑ ∑
∑
∈ ∈
∈
+=
Aj BjijBijA
AjijA
i sbsb
sbSAP )|( ⎟
⎠
⎞⎜⎝
⎛−−= ∑
mmmmij jiwcs exp
Prob(xA, xB, xC | p(A|Si )p(B|Si )p(C|Si ) ) =N
xA xB xC
!
"##
$
%&& p(A|Si )
xA p(B|Si )xB p(C|Si )
xC
L(xA, xB, xC | p(A|Si )p(B|Si )p(C|Si ) ) =N
xA xB xC
!
"##
$
%&& p(A|Si )
xA p(B|Si )xB p(C|Si )
xC
34 13 3 Si
A B C
∑ ∑
∑
∈ ∈
∈
+=
Aj BjijBijA
AjijA
i sbsb
sbSAP )|( ⎟
⎠
⎞⎜⎝
⎛−−= ∑
mmmmij jiwcs exp
L(xA, xB, xC | bA,bB,bC,c,w1,w2 ) =N
xA xB xC
!
"##
$
%&& p(A|Si )
xA p(B|Si )xB p(C|Si )
xC
Prob(xA, xB, xC | p(A|Si )p(B|Si )p(C|Si ) ) =N
xA xB xC
!
"##
$
%&& p(A|Si )
xA p(B|Si )xB p(C|Si )
xC
34 13 3 S1
A B C
34 13 3 S2
34 13 3 Sn
L(data | bA,bB,bC,c,w1,w2 ) =Ni
xiA xiB xiC
!
"
##
$
%
&& p(A|Si )
xiA p(B|Si )xiB p(C|Si )
xiC
i=1
n
'
34 13 3 S1
A B C
34 13 3 S2
34 13 3 Sn
∑ ∑∑ ∑∑+−=i i j i j
ijijiji SRPxxNL )|(ln!ln!lnln
Wichmann & Hill fitting psychometric functions (supplemental reading)
motion coherence
accu
racy
are the dots moving right or left?
Psychophysical functions fitting them using maximum likelihood methods
Imagine a Psychophysical Experiment
motion coherence
accu
racy
Why would you fit a Psychophysical Experiment
motion coherence
accu
racy
75%
what level of motion coherence gives 75%
Why would you fit a Psychophysical Experiment
motion coherence
accu
racy
what is the slope of the psychometric function at 75% (or 50% for some applications)?
motion coherence ac
cura
cy
motion coherence
accu
racy
Why would you fit a Psychophysical Experiment
luminance #1
luminance #2
luminance #1
luminance #2
How would you do it using maximum likelihoods?
motion coherence
accu
racy
motion coherence
accu
racy
L(data | params) =Ni
ni,COR ni,INC
!
"
##
$
%
&&p(COR|Si )ni,COR p(INC|Si )
ni,INC
i=1
m
'
motion coherence
accu
racy
L(data | params) =Ni
ni,COR ni,INC
!
"
##
$
%
&&p(COR|Si )ni,COR p(INC|Si )
ni,INC
i=1
m
'
what function defines these?
L(data |!,",#,$) =Ni
ni,COR ni,INC
!
"
##
$
%
&&p(COR|Si )ni,COR p(INC|Si )
ni,INC
i=1
m
'
L(data |!,",#,$) =Ni
ni,COR Ni ( ni,COR
!
"
##
$
%
&&p(COR|Si )i,nCOR (1( p(COR|Si ) )
Ni(ni,COR
i=1
m
'
p(COR|Si ) =%(x;!,",#,$) = # + (1($ (# )F(x;!,")
L(data |!,",#,$) =Ni
ni,COR ni,INC
!
"
##
$
%
&&p(COR|Si )ni,COR p(INC|Si )
ni,INC
i=1
m
'
L(data |!,",#,$) =Ni
ni,COR Ni ( ni,COR
!
"
##
$
%
&&p(COR|Si )i,nCOR (1( p(COR|Si ) )
Ni(ni,COR
i=1
m
'
p(COR|Si ) =%(x;!,",#,$) = # + (1($ (# )F(x;!,")
F(x;!,") =1( exp(((x / ")! )
F(x;!,") = 11+ exp(((x (!) / ")
F(x;!,") = 121+ erf x (!
" 2
!
"#
$
%&
!
"##
$
%&&
F(x;!,") =1( exp((exp((x (!) / "))
Weibull
Logistic
Normal
Gumbel
see week5_psychometric_function.m