Final Project...Final Project meet with me over the next couple of weeks to discuss possibilities...

Preview:

Citation preview

Final Project meet with me over the next couple of

weeks to discuss possibilities

READ FOR NEXT WEEK: Zhang, W., & Luck, S.J. (2008). Discrete fixed-resolution representations in visual working memory. Nature, 453, 233-235. Along with the Lewandowsky & Farrell chapters

Using Models to Test Hypotheses

Prototype Model Exemplar Model

Mixture Model

Prototype Model Exemplar Model

Mixture Model

compare nonnested models

compare nested models

compare nested models

Prototype Model Exemplar Model

Mixture Model

compare nonnested models

compare nested models

compare nested models

Saturated Model

Prototype Model Exemplar Model

Mixture Model

compare nonnested models

compare nested models

compare nested models

Saturated Model

Null Model

Some issues regarding fit measures

•  speed of computation •  consistency

– allow you to recover the true parameters? – all we’ve discussed are consistent

•  efficiency – minimum variance of parameter estimates? – SSE and %Var are inefficient –  lnL, χ2, weighted SSE are efficient

•  permit statistical tests

Prototype Model Exemplar Model

Mixture Model

compare nonnested models

compare nested models

compare nested models

Saturated Model

Null Model

Exemplar Model

Mixture Model

compare nested models

Exemplar Model

Mixture Model

compare nested models

Null Model

Saturated Model this has to be as good as any model can do …

Exemplar Model

Mixture Model

compare nested models

Null Model

Saturated Model this has to be as good as any model can do …

One parameter equal to each free data point

Exemplar Model

Mixture Model

compare nested models

Null Model

Saturated Model this has to be as good as any model can do …

this is not as bad as any model could do, but it’s a good floor

Exemplar Model

Mixture Model

compare nested models

Null Model

Saturated Model this has to be as good as any model can do …

this is not as bad as any model could do, but it’s a good floor

clearly, this model needs to fit better than the Null Model

Exemplar Model

Mixture Model

compare nested models

Null Model

Saturated Model this has to be as good as any model can do …

this is not as bad as any model could do, but it’s a good floor

clearly, this model needs to fit better than the Null Model

logically, this model MUST fit better than the exemplar model … does the exemplar model fit significantly worse?

Does a model have to account for all the variability in the observed data?

Does a model have to account for all the variability in the observed data? Does it need to fit as well as the saturated model? Well, that’s perhaps the ultimate goal (and having no free parameters to boot). But accounting for some of the variability is what it means to have a theory. You explain some of the variability. Better models explain MORE of the variability. And some of the variability could simply be noise (of various sorts). see Dell, G.S., Schwartz, M.F., Martin, N., Saffran, E.M.,

Gagnon, D.A. (2000). The role of computational models in neuropsychological investigations of language: Reply to Ruml and Caramazza (2000). Psychological Review, 107, 635-645.

fit the exemplar model

fit the mixed model

does the exemplar model fit significantly worse than the exemplar model?

tests a hypothesis of whether people need do abstract a prototype on top of remembering specific exemplars

Likelihood (L) log Likelihood (ln L)

instead of SSE or r2

Likelihood (L) log Likelihood (ln L)

instead of SSE or r2

We will MAXIMIZE likelihood “Maximum Likelihood Parameter Estimation”

Likelihood (L) log Likelihood (ln L)

instead of SSE or r2

FLln

RLln

ln L of the full model (e.g., mixed model)

ln L of the restricted model (e.g., exemplar model)

Likelihood (L) log Likelihood (ln L)

instead of SSE or r2

FLln

RLln

RF LL lnln −

ln L of the full model (e.g., mixed model)

ln L of the restricted model (e.g., exemplar model)

difference between fit to full versus restricted model

Likelihood (L) log Likelihood (ln L)

instead of SSE or r2

FLln

RLln

RF LL lnln −

]ln[ln2 RF LL −×

ln L of the full model (e.g., mixed model)

ln L of the restricted model (e.g., exemplar model)

difference between fit to full versus restricted model

need to multiply by 2 (because God said so)

Likelihood (L) log Likelihood (ln L)

instead of SSE or r2

FLln

RLln

RF LL lnln −

]ln[ln2 RF LL −×

]ln[ln22RF LLG −×=

ln L of the full model (e.g., mixed model)

ln L of the restricted model (e.g., exemplar model)

difference between fit to full versus restricted model

need to multiply by 2 (because God said so)

log Likelihood ratio statistic distributed as χ2 with df = NparmsF - NparmsR

Likelihood (L) log Likelihood (ln L)

instead of SSE or r2

]ln[ln22RF LLG −×= log Likelihood ratio statistic

distributed as χ2 with df = NparmsF - NparmsR

If that statistic exceeds the critical χ2 with the specified df (at selected alpha level), then the restricted model fits significantly worse than the general model

Likelihood (L) log Likelihood (ln L)

instead of SSE or r2

]ln[ln22RF LLG −×= log Likelihood ratio statistic

distributed as χ2 with df = NparmsF - NparmsR

EXAMPLE

12.263ln −=FL12.293ln −=RL

19545240 =−=df

Likelihood (L) log Likelihood (ln L)

instead of SSE or r2

]ln[ln22RF LLG −×= log Likelihood ratio statistic

distributed as χ2 with df = NparmsF - NparmsR

EXAMPLE

12.263ln −=FL12.293ln −=RL

40.61]82.29312.263[22 =−−−×=G

19545240 =−=df

6.228)05.,195(2 === αχ dfC

Likelihood (L) log Likelihood (ln L)

instead of SSE or r2

]ln[ln22RF LLG −×= log Likelihood ratio statistic

distributed as χ2 with df = NparmsF - NparmsR

EXAMPLE

12.263ln −=FL12.293ln −=RL

40.61]82.29312.263[22 =−−−×=G

19545240 =−=df

6.228)05.,195(2 === αχ dfC

the restricted model DOES NOT fit significantly worse …

Likelihood (L) log Likelihood (ln L)

instead of SSE or r2

]ln[ln22RF LLG −×= log Likelihood ratio statistic

distributed as χ2 with df = NparmsF - NparmsR

Likelihood Ratio Testing

⎟⎟⎠

⎞⎜⎜⎝

⎛×=

R

F

LLG ln22

Likelihood (L) log Likelihood (ln L)

What is Likelihood? I’ll start with just giving you the equations … more later (so you can do the homework)

ln L We will stay concrete for now. There are n stimuli and m responses. For identification, each stimulus has a unique response. For categorization, groups of stimuli can have the same (category) response.

ln L We will stay concrete for now. There are n stimuli and m responses. For identification, each stimulus has a unique response. For categorization, groups of stimuli can have the same (category) response.

NOTE: I will be giving you a maximum likelihood equation that can be used with this kind of choice data. Maximum likelihood methods are extremely general (and can get rather complicated). Maximum Likelihood techniques are not only used for evaluating computational models, but are also used widely in statistics.

ln L We will stay concrete for now. There are n stimuli and m responses. For identification, each stimulus has a unique response. For categorization, groups of stimuli can have the same (category) response. In order to use the following form of maximum likelihood statistic, we need to have data in the form of response frequencies, not response probabilities (that limitation is only true for this example).

ln L We will stay concrete for now. There are n stimuli and m responses. For identification, each stimulus has a unique response. For categorization, groups of stimuli can have the same (category) response. In order to use the following form of maximum likelihood statistic, we need to have data in the form of response frequencies, not response probabilities.

ijf

iN)|( ij SRP

observed frequency with which stimulus i is given response j

number of presentations of stimulus i

predicted probability with which stimulus i is given response j

ln L

ijf

iN)|( ij SRP

observed frequency with which stimulus i is given response j

number of presentations of stimulus i

predicted probability with which stimulus i is given response j

∑ ∑∑ ∑∑+−=i i j i j

ijijiji SRPffNL )|(ln!ln!lnln

∏=

=N

iiN

1

! ∏=

=N

iiN

1

ln!ln ∑=

=N

iiN

1ln!ln

ln L

ijf

iN)|( ij SRP

observed frequency with which stimulus i is given response j

number of presentations of stimulus i

predicted probability with which stimulus i is given response j

∑ ∑∑ ∑∑+−=i i j i j

ijijiji SRPffNL )|(ln!ln!lnln

imii fim

fi

fi

i imii

i SRPSRPSRPfff

NL )|()|()|( 21

2121

⋅⋅⋅⎟⎟⎠

⎞⎜⎜⎝

⋅⋅⋅=∏

∑ ∑∑ ∑∑+−=i i j i j

ijijiji SRPffNL )|(ln!ln!lnln

imii fim

fi

fi

i imii

i SRPSRPSRPfff

NL )|()|()|( 21

2121

⋅⋅⋅⎟⎟⎠

⎞⎜⎜⎝

⋅⋅⋅=∏

Why are we taking logs?

Why are we taking logs?

)()()log()log()log(

)log()log()/log()log()log()log(

))(log(

)(

xfexfeapa

babababa

xf

xf

p

=

=

×=

−=

+=×

Why are we taking logs?

log(f(x)) is a “monotonic function” so max[f(x)] is the same as max[log(f(x))]

ln L

ijf

iN)|( ij SRP

observed frequency with which stimulus i is given response j

number of presentations of stimulus i

predicted probability with which stimulus i is given response j

∑ ∑∑ ∑∑+−=i i j i j

ijijiji SRPffNL )|(ln!ln!lnln

You want to MAXIMIZE the lnL … which is the same as MINIMIZING the –lnL …

First, let’s talk about probability

Prob(data|parm) probability of some data given the parameters of some model

knowing parameters à predict some outcome

imagine an unfair coin that gives heads with probability p=.6 and tails with probability q=1-p=.4

obviously, the probability of getting a head is .6

imagine an unfair coin that gives heads with probability p=.6 and tails with probability q=1-p=.4

what is the probability of getting two heads on two flips?

coin flips are independent

imagine an unfair coin that gives heads with probability p=.6 and tails with probability q=1-p=.4

what is the probability of getting two heads on two flips?

coin flips are independent

p(event A AND event B) = p(event A) x p(event B) if A and B are INDEPENDENT

p(head) x p(head) .6 x .6 .36

imagine an unfair coin that gives heads with probability p=.6 and tails with probability q=1-p=.6

what is the probability of getting one head and one tail on two flips?

imagine an unfair coin that gives heads with probability p=.6 and tails with probability q=1-p=.6

what is the probability of getting one head and one tail on two flips?

p(head) x p(tail) .6 x .4 .24

+ p(tail) x p(head) .4 x .6 .24

.48

imagine an unfair coin that gives heads with probability p=.6 and tails with probability q=1-p=.6

what is the probability of getting x heads on N flips?

imagine an unfair coin that gives heads with probability p=.6 and tails with probability q=1-p=.6

what is the probability of getting x heads on N flips?

Prob(x | p) = Nx

!

"#

$

%& px (1' p)N'x = N!

x!(N ' x)!px (1' p)N'x

Binomial Distribution f(x; p) give the probability of observing x “successes” for a Bernoulli process with probability p

Matlab Example

coin flips have only two outcomes (heads or tails) that is, a flip of the coin can have only two mutually exclusive events … what about a situation with more than just two possible outcomes? what if an event has three possible outcomes? or four possible outcomes?

Multinomial Distribution probability of outcome 1 is p1 probability of outcome 2 is p2 probability of outcome 3 is p3 we want to know what the probability of observing x1 events with outcome 1, x2 events with outcome 2, and x3 events with outcome 3

Prob(x1, x2, x3 | p1p2p3) =N

x1 x2 x3

!

"##

$

%&& p1

x1p2x2 p3

x3 =N!

x1!x2 !x3!p1x1p2

x2 p3x3

Multinomial Distribution probability of outcome 1 is p1 probability of outcome 2 is p2 probability of outcome 3 is p3 we want to know what the probability of observing x1 events with outcome 1, x2 events with outcome 2, and x3 events with outcome 3

Prob(x1, x2,..., xm | p1p2,..., pm ) =N

x1 x2 ... xm

!

"##

$

%&& p1

x1p2x2 ' ' ' pm

xm =N!

x1!x2 !' ' ' xm!p1x1p2

x2 ' ' ' pmxm

Matlab Example

What is Likelihood?

prob(data|parm) probability of some data given the known parameters of some model

L(data|parm) likelihood of known data given particular candidate parameters of the model

know parameters à predict some outcome

observing some data à estimate some parameters that maximize the likelihood of the data

imagine we flip a coin 10 times and get 4 heads (N=10, x=4) what is the maximum likelihood estimate of p?

L(x | p) = Prob(x | p) = Nx

!

"#

$

%& px (1' p)N'x = N!

x!(N ' x)!px (1' p)N'x

now, x (and N) are fixed we want to find the value of p that maximizes the likelihood L of the data

imagine we flip a coin 10 times and get 4 heads (N=10, x=4) what is the maximum likelihood estimate of p? USING CALCULUS

L(x | p) = Prob(x | p) = Nx

!

"#

$

%& px (1' p)N'x = N!

x!(N ' x)!px (1' p)N'x

lnL = log Nx

!

"#

$

%&

!

"##

$

%&&+ x ln p+ (N ' x)ln(1' p)

d lnLdp

= x 1p

!

"#

$

%&+ (N ' x) 1

1' p!

"#

$

%&('1) = 0

x(1' p)' (N ' x)p = 0x ' xp' Np+ xp = 0

p = xN

imagine we flip a coin 10 times and get 4 heads (N=10, x=4) what is the maximum likelihood estimate of p? USING CALCULUS

L(x | p) = Prob(x | p) = Nx

!

"#

$

%& px (1' p)N'x = N!

x!(N ' x)!px (1' p)N'x

lnL = log Nx

!

"#

$

%&

!

"##

$

%&&+ x ln p+ (N ' x)ln(1' p)

d lnLdp

= x 1p

!

"#

$

%&+ (N ' x) 1

1' p!

"#

$

%&('1) = 0

x(1' p)' (N ' x)p = 0x ' xp' Np+ xp = 0

p = xN

imagine we flip a coin 10 times and get 4 heads (N=10, x=4) what is the maximum likelihood estimate of p? USING CALCULUS

L(x | p) = Prob(x | p) = Nx

!

"#

$

%& px (1' p)N'x = N!

x!(N ' x)!px (1' p)N'x

lnL = log Nx

!

"#

$

%&

!

"##

$

%&&+ x ln p+ (N ' x)ln(1' p)

d lnLdp

= 0+ x 1p

!

"#

$

%&+ (N ' x) 1

1' p!

"#

$

%&('1) = 0

x(1' p)' (N ' x)p = 0x ' xp' Np+ xp = 0

p = xN

imagine we flip a coin 10 times and get 4 heads (N=10, x=4) what is the maximum likelihood estimate of p? USING CALCULUS

L(x | p) = Prob(x | p) = Nx

!

"#

$

%& px (1' p)N'x = N!

x!(N ' x)!px (1' p)N'x

lnL = log Nx

!

"#

$

%&

!

"##

$

%&&+ x ln p+ (N ' x)ln(1' p)

d lnLdp

= 0+ x 1p

!

"#

$

%&+ (N ' x) 1

1' p!

"#

$

%&('1) = 0

x(1' p)' (N ' x)p = 0x ' xp' Np+ xp = 0

p = xN