Probability Distributions and Dataset Properties Lecture 2 Likelihood Methods in Forest Ecology...

Preview:

Citation preview

Probability Distributions and Dataset Properties

Lecture 2

Likelihood Methods in Forest Ecology

October 9th – 20th , 2006

Statistical Inference

Data

Scientific Model (Scientific hypothesis)

Probability Model(Statistical hypothesis)

Inference

Parametric perspective on inference

Scientific Model (Hypothesis test)Often with linear models

Probability Model(Normal typically)

Inference

Likelihood perspective on inference

Data

Scientific Model (hypothesis)

Probability Model

Inference

An example...

The Data:xi = measurements of DBH on 50 treesyi = measurements of crown radius on those trees

The Scientific Model:yi = xi + (linear relationship, with 2 parameters ( and an error term () (the residuals))

The Probability Model: is normally distributed, with E[] and variance estimated from the observed variance of the residuals...

Data

Scientific Model (hypothesis)

Probability Model

Inference

Data

Scientific Model (hypothesis)

Probability Model

Inference

The triangle of statistical inference: Model

• Models clarify our understanding of nature.• Help us understand the importance (or

unimportance) of individuals processes and mechanisms.

• Since they are not hypotheses, they can never be “correct”.

• We don’t “reject” models; we assess their validity.• Establish what’s “true” by establishing which

model the data support.

The triangle of statistical inference:Probability distributions

• Data are never “clean”.• Most models are deterministic, they

describe the average behavior of a system but not the noise or variability. To compare models with data, we need a statistical model which describes the variability.

• We must understand the the processes giving rise to variability to select the correct probability density function (error structure) that gives rise to the variability or noise.

DBH (cm)

0 10 20 30 40 50C

row

n ra

dius

(m

)0

1

2

3

4

5

6

The Data: xi = measurements of DBH on 50 trees yi = measurements of crown radius on those trees

The Scientific Model: yi = DBHi +

The Probability Model: is normally distributed.

Data

ScientificProbability Model

Inference

Data

Scientific Model

Probability Model

Inference

An example: Can we predict crown radius using tree diameter?

0

2

4

6

8

10

12

14

16

1.62 2.10 2.57 3.05 3.52 4.00 4.47 4.95 5.42 5.89

Crown radius

Fre

qu

en

cy

Why do we care about probability?

• Foundation of theory of statistics.• Description of uncertainty (error).

– Measurement error

– Process error

• Needed to understand likelihood theory which is required for:Estimating model parameters.Model selection (What hypothesis do data support?).

Error (noise, variability) is your friend!

• Classical statistics are built around the assumption that the variability is normally distributed.

• But…normality is in fact rare in ecology.

• Non-normality is an opportunity to:Represent variability in a more realistic way.Gain insights into the process of interest.

The likelihood framework

Ask biological question

Collect data

Probability Model Model noise

Ecological Model Model signal

Estimate parameters

Estimate support regions

Answer questions

Model selection

Bolker, Notes

Probability Concepts

• An experiment is an operation with uncertain outcome.

• A sample space is a set of all possible outcomes of an experiment.

• An event is a particular outcome of an experiment, a subset of the sample space.

Random Variables

• A random variable is a function that assigns a numeric value to every outcome of an experiment (event) or sample. For instance

Event Random variable

Tree Growth = f (DBH, light, soil…)

Function: formula expressing a relationship between two variables.

All pdf’s are functions BUT NOT all functions are PDF’s.

Functions and probability density functions

Functions = Scientific Model

pdf’s

Crown radius = DBHWE WILL TALK ABOUT THIS LATER

Used to model noise:Y-(DBH)

Probability Density Functions: properties

• A function that assigns probabilities to ALL the possible values of a random variable (x).

Sx

)x(f

)x(f

1

10

x

Pro

babi

lity

den

sity

f(x)

Probability Density Functions: Expectations

• The expectation of a random variable x is the weighted value of the possible values that x can take, each value weighted by the probability that x assumes it.

• Analogous to “center of gravity”. First moment.

0

1

)x(p:x

N

ii

)x(xpN

x

]X[E

-1 0 1 2

p(-1)=0.10 p(0)=0.25 p(1)=0.3 p(2)=0.35

Probability Density Functions: Variance

• The variance of a random variable reflects the spread of X values around the expected value.

• Second moment of a distribution.

22

2

])X[E(]X[E

]))x(EX[(E]X[Var

Probability Distributions

• A function that assigns probabilities to the possible values of a random variable (X).

• They come in two flavors:

DISCRETE: outcomes are a set of discrete possibilities such as integers (e.g, counting).

CONTINUOUS: A probability distribution over a continuous range (real numbers or the non-negative real numbers).

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Event (x)

Pro

ba

bil

ity

Probability Mass Functions

For a discrete random variable, X, the probability that x takes ona value x is a discrete density function, f(x) also known as probability mass or distribution function.

Sx

)x(f

)x(f

}xX{f)x(f

1

10

Pro

babi

lity

den

sity

f(x)

Probability Density Functions: Continuous variables

A probability density function (f(x)) gives the probability that a random variable X takes on values within a range.

1

0

dx)x(f

)x(f

}bXa{Pdx)x(fb

a

a b

Some rules of probability

)A(obPr)A|B(obPr)BA(obPr

)A(obPr

)BA(obPr)A|B(obPr

)B(obPr

)BA(obPr)B|A(obPr

)B(obPr)*A(obPr)BA(obPr

)BA(obPr)B(obPr)A(obPr)BA(obPr

assuming independence

A B

Real data: Histograms

-5 -4 -3 -2 -1 0 1 2 3TEN

0

1

2

3

4

Cou

nt

0.0

0.1

0.2

0.3

0.4

Proportion per B

ar

-10 -5 0 5FIFTY

0

5

10

15

Cou

nt

0.0

0.1

0.2

0.3

Proportion per B

ar

-10 -5 0 5 10HUNDRED

0

10

20

30

40

Cou

nt

0.0

0.1

0.2

0.3

0.4

Proportion per B

ar

-10 -5 0 5 10FIVEHUND

0

20

40

60

80

100

120

Cou

nt

0.0

0.1

0.2

Proportion per B

ar

-10 -5 0 5 10THOUS

0

50

100

150

Cou

nt

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

Proportion per B

ar

n = 10 n = 50 n = 100

n = 500 n = 1000

VARIABLE VARIABLE

VARIABLE

VARIABLE

VARIABLE

Histograms and PDF’s

Probability density functions approximate the distribution of finite data sets.

VARIABLE

-10 -5 0 5 100

50

100

150

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14n = 1000

Pro

ba

bili

ty

Uses of Frequency Distributions

• Empirical (frequentist):Make predictions about the frequency of a particular

event.Judge whether an observation belongs to a

population.

• Theoretical:Predictions about the distribution of the data based

on some basic assumptions about the nature of the forces acting on a particular biological system.

Describe the randomness in the data.

Some useful distributions

1. Discrete Binomial : Two possible outcomes. Poisson: Counts. Negative binomial: Counts. Multinomial: Multiple categorical outcomes.

2. Continuous Normal. Lognormal. Exponential Gamma Beta

An example: Seed predation

)N(

V)V(obPrx

110

VedVisitProbPr x =no seeds taken

0 to N

Assume each seed has equal probability (p)of being taken. Then:

01

011

1

xif)p(p)!xN(!x

!NV)x(prob

xif)p(V)V()x(prob

)p()takennotseeds)xN((prob

p)takenseedsx(prob

xNx

N

xN

x

Normalization constant

t1 t2 ( )

Zero-inflated binomial

Histogram of rzibinom(n = 1000, prob = 0.6, size = 12, zprob = 0.3)

rzibinom(n = 1000, prob = 0.6, size = 12, zprob = 0.3)

Fre

quen

cy

0 2 4 6 8 10

010

020

030

0

Binomial distribution: Discrete events that can take one of two values

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Event (x)

Pro

bab

ilit

y)!xn(!x

!n

x

n

)p(px

n)xX(P xnx

1

E[x] = npVariance =np(1-p)n = number of sitesp = prob. of survival

Example: Probability of survival derived from pop data

n =20p = 0.5

Binomial distribution

Poisson Distribution: Counts (or getting hit in the head by a horse)

Variance

]X[E!k

)(e)kX(P

k

k = number of seedlings

λ= arrival rate

500 0.5

0 1 2 3 4 5 6 7POISSON

0

100

200

300

400

Cou

nt

0.0

0.1

0.2

0.3

0.4 Proportion per B

ar

Number of Seedlings/quadrat**Alt param= λ=rt

Poisson distribution

Example: Number of seedlings in census quad.

0 10 20 30 40 50 60 70 80 90 100

Number of seedlings/trap

0

10

20

30

40

50

60C

ou

nt

0.0

0.1

0.2

0.3

0.4P

ropo

rtion pe

r Bar

Alchornea latifolia

(Data from LFDP, Puerto Rico)

Clustering in space or time

Poisson processE[X]=Variance[X]

Poisson processE[X]<Variance[X]Overdispersed Clumped or patchy

Negative binomial?

Negative binomial:Table 4.2 & 4.3 in H&M Bycatch Data

E[X]=0.279Variance[X]=1.56

Suggests temporal or spatial aggregationin the data!!

Negative Binomial: Counts

2

1

11

1

p

)p(rVariance

p

r]X[E

)p(pr

n)nX(P rnr

0 10 20 30 40 50NEGBIN

0

10

20

30

40

50

60

70

80

90

100

Cou

nt

0.0

0.1

0.2

Proportion per B

ar

Number of Seeds

Negative Binomial: Counts

large k

Poisson; k

:variance to related kk

mmVariance

m]X[E

km

m

k

m

!n)k(

)nk()nXPr(

nk

0

1

2

0 10 20 30 40 50NEGBIN

0

10

20

30

40

50

60

70

80

90

100

Cou

nt

0.0

0.1

0.2

Proportion per B

ar

Number of Seeds

Negative binomial

Negative Binomial: Count data

0 10 20 30 40 50 60 70 80 90 100

No seedlings/quad.

0

10

20

30

Cou

nt

0.0

0.1

0.2P

roportion per Bar

Prestoea acuminata

(Data from LFDP, Puerto Rico)

Normal PDF with mean = 0

X

0

0.2

0.4

0.6

0.8

1

-5 -4 -3 -2 -1 0 1 2 3 4 5

Pro

b(x

)

Var = 0.25

Var = 0.5

Var = 1

Var = 2

Var = 5

Var = 10

Normal Distribution

2

2

2

2 22

1

Variance

mMean

))mx(

exp()x(f E[x] = mVariance = δ2

Normal Distribution with increasing variance

dcxVariance

mMean

))mx(

exp()x(f

2

2

2 22

1

Lognormal: One tail and no negative values

)e(meiancevarme]x[Eemmedian

),(Y,eX)xln(

expx

)x(f Y

1

2

1

2

1

22

2

2

22

2

2

0.8

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 10 20 30 40 50 60 70

x is always positive

f(x)

x

Lognormal: Radial growth data

0 1 2 3 4HEMLOCK

0

50

100

150

Cou

nt

0.0

0.1

0.2 Prop

ortion per Bar

0 1 2 3REDCEDAR

0

10

20

30

40

Cou

nt0.0

0.1

0.2

Prop

ortion per Bar

Growth (cm/yr) Growth (cm/yr)

Red cedarHemlock

(Data from Date Creek, British Columbia)

Exponential

2

1

1

Variance

]x[E

e)x(f x

Variable

Co

unt

0 1 2 3 4 5 60

10

20

30

40

50

60

70

80

0.0

0.1

0.2

0.3

0.4

Pro

portion

per B

ar

Exponential: Growth data (negatives assumed 0)

0 1 2 3 4 5 6 7 8Growth (mm/yr)

0

200

400

600

800

1000

1200C

ount

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7P

roportion per Bar

Beilschemedia pendula

(Data from BCI, Panama)

Gamma: One tail and flexibility

parameter scales

parameter shapea

as]X[Var

as]x[E

ex)n(s

)x(f s/xaa

2

11

Gamma: “raw” growth data

0 1 2 3 4 5 6 7 8 9Growth (mm/yr)

0

200

400

600

800

1000

Cou

nt

Alseis blackiana

(Data from BCI, Panama)

0 10 20 300

50

100

150

200

Cordia bicolor

Growth (mm/yr)

Beta distribution

.otherwise;xfor)x(x)b,a(

)b,a|x(f ib

ia

ii 01011 11

)ba(

)b()a()b,a(

)ba()ba(

ba)x(Var

x of value expected ba

a)x(E

12

Beta: Light interception by crown trees

(Data from Luquillo, PR)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0GLI

0

100

200

300

400

500

600

Co

unt

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Pro

po

rtion p

er B

ar

Mixture models

• What do you do when your data don’t fit any known distribution?– Add covariates– Mixture models

• Discrete

• Continuous

Discrete mixtures

Discrete mixture:Zero-inflated binomial

0

001

xif)x(prob*V)x(prob

xif)(prob)V()x(prob

Continuous (compounded) mixtures

The Method of Moments

• You can match up the sample values of the moments of the distributions and match them up with the theoretical moments.

• Recall that:

• The MOM is a good way to get a first (but biased) estimate of the parameters of a distribution. ML estimators are more reliable.

0)x(p:x

)x(xp]X[E

22 ])X[E(]X[E]X[Var

MOM: Negative binomial

mu)x(xp]X[E)x(p:x

0

k

mumu])X[E(]X[E]X[Var

222

Recommended