35
Some dates check out “outline version 3.0.pdf” • Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang • Revised version to Lee Hsiang by 27 Sep • Writing assignment (Sepkoski and 10 commandments: ASAP) • All R assignments in one annotated R file by 20 Sep noon; results (including plots), description of data you downloaded and interpretation in a separate pdf file.

Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

Embed Size (px)

Citation preview

Page 1: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

Some dates

check out “outline version 3.0.pdf”• Return reviews to reviewees (use track changes and

“comments) by 20 Sep – send also to Lee Hsiang• Revised version to Lee Hsiang by 27 Sep• Writing assignment (Sepkoski and 10

commandments: ASAP)• All R assignments in one annotated R file by 20 Sep

noon; results (including plots), description of data you downloaded and interpretation in a separate pdf file.

Page 2: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

All models are wrong but some are useful.

-Box 1976 J. Am. Stat. Assoc. 71:791-799

Page 3: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

Introduction to likelihood, AIC and model selection

Page 4: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

4

Models

• Model– an idea (formalized or not) about how the world (or a part

of it) works.– may be conceptual, verbal or mathematical.– competing hypotheses can be represented as different

models.– mathematical models can be developed to project the

consequences of hypotheses• Parameter – a true characteristic of the “population” of interest• Estimator – an equation or process used to produce a parameter

estimate from observed data

Page 5: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

5

Illustrating likelihood (Steve Wang’s e.g.)

Weather (unknown)

Total

Cold Hot

Clothing (Observed)

Jacket 0.8 0.1 0.9

Tshirts 0.2 0.9 1.1

Total 1.0 1.0

Rows: statement about an unknown and unobserved state Likelihood (weather|clothing)

Columns: state of observed dataProbability(clothing|weather)

Page 6: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

Illustration of MLE

• (Steve Wang’s example from 2010)• Sampled 297 gastropod shells from mid Palaeocene Alabama. 138 had

drill holes.• Of all gastropods that are in that locality (including those not sampled or

not preserved), what proportion drilled? Let this = p• Let n = sampled gastropods, let x = drilled of those sampled

Page 7: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

Illustration of MLE

• (Steve Wang’s example from 2010)• Sampled 297 gastropod shells from mid Palaeocene Alabama. 138 had

drill holes.• Of all gastropods that are in that locality (including those not sampled or

not preserved), what proportion drilled? Let this = p• Let n = sampled gastropods, let x = drilled of those sampled• L(p)=P(x|p)

The likelihood of the true parameter is the probability of the observed data (x) given the parameter

Page 8: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

Binomial distribution

• A gastropod sampled can be of two states: drilled or not drilled. • Hence this data can be modeled using a binomial distribution• Repeat n trials under the same conditions (= picking up gastropods)• Each trial has two outcomes, drilled or NOT drilled • The probability of being drilled is p and NOT drilled is 1-p

If A is an event, then

Not A

A

Page 9: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

Binomial distribution

• A gastropod sampled can be of two states: drilled or not drilled. • Hence this data can be modeled using a binomial distribution• Repeat n trials under the same conditions (= picking up gastropods)• Each trial has two outcomes, drilled or NOT drilled • The probability of being drilled is p and NOT drilled is 1-p• The trials are independent (because one gastropod is drilled has nothing

to do with another gastropod being drilled…. Think about this one)• =

Page 10: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

Binomial distribution

0.35 0.40 0.45 0.50 0.55 0.60

0.0

00

.02

0.0

4

p

L(p

)

110 120 130 140 150 160 170

0.0

00

.02

0.0

4

x

P(x

| p

)

n = 297 (total gastropods)X = 138 (drilled)

Page 11: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

Find MLE

• Find the value of p that maximizes L(p)• =• ) +• Solve for p at which d/dp log L(p) is zero to get x/n• a trivial example with a closed form estimate, but mostly not true, have to

use computer intensive ways to “search” for the maximum

0.35 0.40 0.45 0.50 0.55 0.60

-14

-10

-6

p

log

L(p

)

Page 12: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised
Page 13: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

• Foote (2003). "Origination and extinction through the Phanerozoic: A new approach." Journal of Geology 111(2): 125-148.

• Hunt (2007). "The relative importance of directional change, random walks, and stasis in the evolution of fossil lineages." Proceedings of the National Academy of Sciences of the United States of America 104(47): 18404-18408.

• Wagner (2000). "Likelihood tests of hypothesized durations: determining and accommodating biasing factors." Paleobiology 26(3): 431-449.

• Liow, Fortelius et al. (2008). "Higher origination and extinction rates in larger mammals." Proceedings of the National Academy of Sciences of the United States of America 105(16): 6097-6102.

• And many many more

Examples of Paleo Maximum Likelihood uses

Page 14: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

Simple Likelihood Ratio example

• What can we do with a Likelihood?• Compare two models using likelihood

– by using a likelihood ratio, limited use, like hypothesis testing, but sometimes useful

• 779 ancestor-descendent mammal pairs in NAm (Cenozoic)• 442 of descendants are larger than their ancestors (56.7%)• Based in these data, is there a general size increase in

mammals in NAm over time or not?

Page 15: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

Likelihood Ratio Test (LRT)• used to evaluate the difference between nested models• One model is considered nested in another if the first model can be

generated by imposing restrictions on the parameters of the second

• 442 of descendants are larger than their ancestors (56.7%)

• =

Page 16: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

Likelihood Ratio Test (LRT)• used to evaluate the difference between nested models• One model is considered nested in another if the first model can be

generated by imposing restrictions on the parameters of the second

• 442 of descendants are larger than their ancestors (56.7%)• = R class exercise, hint the function you need is called choose

to write the binomial coefficient . • Write this equation using R =

Page 17: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

#class exercise mammal body size example from Wang 2010a=(choose(779, 442))*(0.5^442)*((1-0.5)^(779-442))b=(choose(779, 442))*(0.567^442)*((1-0.567)^(779-442))lambda=a/b-2*log(lambda)

Page 18: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

Test statistic is -2log() (follows a chi sq distribution so we can use that)

Page 19: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

H testing, L ratios, Model comparison

• Hypothesis testing compares 2 models, one null and one alternative (L ratio test as well).

• Sometimes the null makes sense, but often it does not, hence the only thing we are finding out with a significance test is whether we have enough data to show that null is false.

• Why is previous example ok?

Page 20: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised
Page 21: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

Valid inference

Fischer 1922: valid inference=1. model specification 2. estimation of model parameters3. estimation of precision• In much of science, neither the model parameters nor

the model is known! • Hence problem with step 1. --- missing model

formulation and selection! • We need to be able to formulate models and select

among them, but how?

Page 22: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

TRUTH

Model 1

Model 2

Model 3

Model 4

KL dist 1

KL dist 2

KL dist 4

KL dist 3

Kullback-Leibler Distance Illustration (basis for AIC)

Page 23: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

In practice

• Make a set of candidate models: think long and hard about “realistic” and non trivial-models (~4-20)

• Write them down and identify (or make) a global model • A global model is one where all variables thought to be important

are included• If global model fits data adequately, the selected model that is

more parsimonious will also fit the data (this is an empirical result)• Note: if a really good model is not in the candidate set, we’re

screwed*• Compare the models (model comparison) • There are many information criteria “out there” • There we argue for the AIC (Akaike Information Criteria)

Page 24: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

Akaike Information Criteria

• AIC is based on the Kullback-Leibler distance (solid theoretical basis)

• Does not assume that “truth” is in the candidate set of models.

• Measures relative distance to truth • AIC chooses the candidate model with the

smallest expected K-L distance.• Balance between narrowing distance to truth and

precision of estimates• Can be calculated using either least squares regression

(caveats) and better, likelihood ,

Page 25: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

+

K

Dis

tance

betw

een

est

imate

and t

ruth

Spre

ad o

f estim

ate

s

Page 26: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

26

• Balance model fit with estimator precision

• Models with small AIC values are preferred

• However it is differences between AICs for different models that is crucial

Akaike Information Criteria

Page 27: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

TRUTH

Model 1

Model 2

Model 3

Model 4

KL dist 1

KL dist 2

KL dist 4

KL dist 3

Kullback-Leibler Distance Illustration (basis for AIC)

Page 28: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

Model 1

Model 2

Model 3

Model 4

ΔAIC

Page 29: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

29

• AIC = AIC –min(AIC) • AIC measures the distance between the AIC for the model

being considered and the AIC of the model with the lowest AIC in the candidate set of models

• Information lost when given model is used instead of the top model

• Absolute AIC values have little meaning, they reflect sample size mostly

• A rule of thumb is – AICc 2 substantial support and should be used for making

inferences.– AICc of about 4 to 7 have considerably less support– AICc > 10 have essentially no support

ΔAIC

Page 30: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

30

AIC model weights• Model weights can be calculated that are ad-hoc measures of

support for each model in the candidate set

• Uses: – For getting model averaged estimates – -evaluate the importance of specific factors (by summing across

models with the overlapping variables)

Page 31: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

31

Adjusted AICs

• Adjusted AIC for amount of data versus number of parameters trying to estimate (small sample adjustment)

• Adjusted AIC for overdispersion/poor model fit (coun data))/)]+2K

• Both

Page 32: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

• R demo

Page 33: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

33

References• Readings:

– Gary White’s lecture notes on Model selection– Johnson and Omland TREE 2004 (an easy to read over view on

model selection written for biologists, but should be understandable to paleobiologists)

• Burnham and Anderson 2002 Model Selection and Multimodel Inference: A practical Information-Theoretic Approach– This is “the” book on model selection. The material for this lecture

came largely out of Chapters 1-3

Page 34: Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised

Continued R assignment• Using the same data you downloaded, tabulate the number of occurrences per

unit time (of your choice) and plot that (either for the entire taxonomic group, or split up into genera or species, depending on how much data you happen to have, using an R script

• Fit models of occurrences ~ time as you see fit after plotting out the data. We showed some non linear models in class, but you might like some linear models as well, check out

http://stat.ethz.ch/R-manual/R-devel/library/stats/html/lm.html• Write a summary of your observations• This is more a simple exercise in “semi-canned” model fitting just so you feel

comfortable with models and comparing them using AIC. All the maximum likelihood is done for you in R, but hopefully you understand the principle from the short lecture. There are many model fitting and optimization “machines” in R.

• In preparation for next lecture, download: (only runs in windows)http://warnercnr.colostate.edu/~gwhite/mark/mark.htm#Introduction