Upload
megan-higgins
View
224
Download
4
Tags:
Embed Size (px)
Citation preview
Probability Distributions and Dataset Properties
Lecture 2
Likelihood Methods in Forest Ecology
October 9th – 20th , 2006
Statistical Inference
Data
Scientific Model (Scientific hypothesis)
Probability Model(Statistical hypothesis)
Inference
Parametric perspective on inference
Scientific Model (Hypothesis test)Often with linear models
Probability Model(Normal typically)
Inference
Likelihood perspective on inference
Data
Scientific Model (hypothesis)
Probability Model
Inference
An example...
The Data:xi = measurements of DBH on 50 treesyi = measurements of crown radius on those trees
The Scientific Model:yi = xi + (linear relationship, with 2 parameters ( and an error term () (the residuals))
The Probability Model: is normally distributed, with E[] and variance estimated from the observed variance of the residuals...
Data
Scientific Model (hypothesis)
Probability Model
Inference
Data
Scientific Model (hypothesis)
Probability Model
Inference
The triangle of statistical inference: Model
• Models clarify our understanding of nature.• Help us understand the importance (or
unimportance) of individuals processes and mechanisms.
• Since they are not hypotheses, they can never be “correct”.
• We don’t “reject” models; we assess their validity.• Establish what’s “true” by establishing which
model the data support.
The triangle of statistical inference:Probability distributions
• Data are never “clean”.• Most models are deterministic, they
describe the average behavior of a system but not the noise or variability. To compare models with data, we need a statistical model which describes the variability.
• We must understand the the processes giving rise to variability to select the correct probability density function (error structure) that gives rise to the variability or noise.
DBH (cm)
0 10 20 30 40 50C
row
n ra
dius
(m
)0
1
2
3
4
5
6
The Data: xi = measurements of DBH on 50 trees yi = measurements of crown radius on those trees
The Scientific Model: yi = DBHi +
The Probability Model: is normally distributed.
Data
ScientificProbability Model
Inference
Data
Scientific Model
Probability Model
Inference
An example: Can we predict crown radius using tree diameter?
0
2
4
6
8
10
12
14
16
1.62 2.10 2.57 3.05 3.52 4.00 4.47 4.95 5.42 5.89
Crown radius
Fre
qu
en
cy
Why do we care about probability?
• Foundation of theory of statistics.• Description of uncertainty (error).
– Measurement error
– Process error
• Needed to understand likelihood theory which is required for:Estimating model parameters.Model selection (What hypothesis do data support?).
Error (noise, variability) is your friend!
• Classical statistics are built around the assumption that the variability is normally distributed.
• But…normality is in fact rare in ecology.
• Non-normality is an opportunity to:Represent variability in a more realistic way.Gain insights into the process of interest.
The likelihood framework
Ask biological question
Collect data
Probability Model Model noise
Ecological Model Model signal
Estimate parameters
Estimate support regions
Answer questions
Model selection
Bolker, Notes
Probability Concepts
• An experiment is an operation with uncertain outcome.
• A sample space is a set of all possible outcomes of an experiment.
• An event is a particular outcome of an experiment, a subset of the sample space.
Random Variables
• A random variable is a function that assigns a numeric value to every outcome of an experiment (event) or sample. For instance
Event Random variable
Tree Growth = f (DBH, light, soil…)
Function: formula expressing a relationship between two variables.
All pdf’s are functions BUT NOT all functions are PDF’s.
Functions and probability density functions
Functions = Scientific Model
pdf’s
Crown radius = DBHWE WILL TALK ABOUT THIS LATER
Used to model noise:Y-(DBH)
Probability Density Functions: properties
• A function that assigns probabilities to ALL the possible values of a random variable (x).
Sx
)x(f
)x(f
1
10
x
Pro
babi
lity
den
sity
f(x)
Probability Density Functions: Expectations
• The expectation of a random variable x is the weighted value of the possible values that x can take, each value weighted by the probability that x assumes it.
• Analogous to “center of gravity”. First moment.
0
1
)x(p:x
N
ii
)x(xpN
x
]X[E
-1 0 1 2
p(-1)=0.10 p(0)=0.25 p(1)=0.3 p(2)=0.35
Probability Density Functions: Variance
• The variance of a random variable reflects the spread of X values around the expected value.
• Second moment of a distribution.
22
2
])X[E(]X[E
]))x(EX[(E]X[Var
Probability Distributions
• A function that assigns probabilities to the possible values of a random variable (X).
• They come in two flavors:
DISCRETE: outcomes are a set of discrete possibilities such as integers (e.g, counting).
CONTINUOUS: A probability distribution over a continuous range (real numbers or the non-negative real numbers).
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Event (x)
Pro
ba
bil
ity
Probability Mass Functions
For a discrete random variable, X, the probability that x takes ona value x is a discrete density function, f(x) also known as probability mass or distribution function.
Sx
)x(f
)x(f
}xX{f)x(f
1
10
Pro
babi
lity
den
sity
f(x)
Probability Density Functions: Continuous variables
A probability density function (f(x)) gives the probability that a random variable X takes on values within a range.
1
0
dx)x(f
)x(f
}bXa{Pdx)x(fb
a
a b
Some rules of probability
)A(obPr)A|B(obPr)BA(obPr
)A(obPr
)BA(obPr)A|B(obPr
)B(obPr
)BA(obPr)B|A(obPr
)B(obPr)*A(obPr)BA(obPr
)BA(obPr)B(obPr)A(obPr)BA(obPr
assuming independence
A B
Real data: Histograms
-5 -4 -3 -2 -1 0 1 2 3TEN
0
1
2
3
4
Cou
nt
0.0
0.1
0.2
0.3
0.4
Proportion per B
ar
-10 -5 0 5FIFTY
0
5
10
15
Cou
nt
0.0
0.1
0.2
0.3
Proportion per B
ar
-10 -5 0 5 10HUNDRED
0
10
20
30
40
Cou
nt
0.0
0.1
0.2
0.3
0.4
Proportion per B
ar
-10 -5 0 5 10FIVEHUND
0
20
40
60
80
100
120
Cou
nt
0.0
0.1
0.2
Proportion per B
ar
-10 -5 0 5 10THOUS
0
50
100
150
Cou
nt
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
Proportion per B
ar
n = 10 n = 50 n = 100
n = 500 n = 1000
VARIABLE VARIABLE
VARIABLE
VARIABLE
VARIABLE
Histograms and PDF’s
Probability density functions approximate the distribution of finite data sets.
VARIABLE
-10 -5 0 5 100
50
100
150
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14n = 1000
Pro
ba
bili
ty
Uses of Frequency Distributions
• Empirical (frequentist):Make predictions about the frequency of a particular
event.Judge whether an observation belongs to a
population.
• Theoretical:Predictions about the distribution of the data based
on some basic assumptions about the nature of the forces acting on a particular biological system.
Describe the randomness in the data.
Some useful distributions
1. Discrete Binomial : Two possible outcomes. Poisson: Counts. Negative binomial: Counts. Multinomial: Multiple categorical outcomes.
2. Continuous Normal. Lognormal. Exponential Gamma Beta
An example: Seed predation
)N(
V)V(obPrx
110
VedVisitProbPr x =no seeds taken
0 to N
Assume each seed has equal probability (p)of being taken. Then:
01
011
1
xif)p(p)!xN(!x
!NV)x(prob
xif)p(V)V()x(prob
)p()takennotseeds)xN((prob
p)takenseedsx(prob
xNx
N
xN
x
Normalization constant
t1 t2 ( )
Zero-inflated binomial
Histogram of rzibinom(n = 1000, prob = 0.6, size = 12, zprob = 0.3)
rzibinom(n = 1000, prob = 0.6, size = 12, zprob = 0.3)
Fre
quen
cy
0 2 4 6 8 10
010
020
030
0
Binomial distribution: Discrete events that can take one of two values
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Event (x)
Pro
bab
ilit
y)!xn(!x
!n
x
n
)p(px
n)xX(P xnx
1
E[x] = npVariance =np(1-p)n = number of sitesp = prob. of survival
Example: Probability of survival derived from pop data
n =20p = 0.5
Binomial distribution
Poisson Distribution: Counts (or getting hit in the head by a horse)
Variance
]X[E!k
)(e)kX(P
k
k = number of seedlings
λ= arrival rate
500 0.5
0 1 2 3 4 5 6 7POISSON
0
100
200
300
400
Cou
nt
0.0
0.1
0.2
0.3
0.4 Proportion per B
ar
Number of Seedlings/quadrat**Alt param= λ=rt
Poisson distribution
Example: Number of seedlings in census quad.
0 10 20 30 40 50 60 70 80 90 100
Number of seedlings/trap
0
10
20
30
40
50
60C
ou
nt
0.0
0.1
0.2
0.3
0.4P
ropo
rtion pe
r Bar
Alchornea latifolia
(Data from LFDP, Puerto Rico)
Clustering in space or time
Poisson processE[X]=Variance[X]
Poisson processE[X]<Variance[X]Overdispersed Clumped or patchy
Negative binomial?
Negative binomial:Table 4.2 & 4.3 in H&M Bycatch Data
E[X]=0.279Variance[X]=1.56
Suggests temporal or spatial aggregationin the data!!
Negative Binomial: Counts
2
1
11
1
p
)p(rVariance
p
r]X[E
)p(pr
n)nX(P rnr
0 10 20 30 40 50NEGBIN
0
10
20
30
40
50
60
70
80
90
100
Cou
nt
0.0
0.1
0.2
Proportion per B
ar
Number of Seeds
Negative Binomial: Counts
large k
Poisson; k
:variance to related kk
mmVariance
m]X[E
km
m
k
m
!n)k(
)nk()nXPr(
nk
0
1
2
0 10 20 30 40 50NEGBIN
0
10
20
30
40
50
60
70
80
90
100
Cou
nt
0.0
0.1
0.2
Proportion per B
ar
Number of Seeds
Negative binomial
Negative Binomial: Count data
0 10 20 30 40 50 60 70 80 90 100
No seedlings/quad.
0
10
20
30
Cou
nt
0.0
0.1
0.2P
roportion per Bar
Prestoea acuminata
(Data from LFDP, Puerto Rico)
Normal PDF with mean = 0
X
0
0.2
0.4
0.6
0.8
1
-5 -4 -3 -2 -1 0 1 2 3 4 5
Pro
b(x
)
Var = 0.25
Var = 0.5
Var = 1
Var = 2
Var = 5
Var = 10
Normal Distribution
2
2
2
2 22
1
Variance
mMean
))mx(
exp()x(f E[x] = mVariance = δ2
Normal Distribution with increasing variance
dcxVariance
mMean
))mx(
exp()x(f
2
2
2 22
1
Lognormal: One tail and no negative values
)e(meiancevarme]x[Eemmedian
),(Y,eX)xln(
expx
)x(f Y
1
2
1
2
1
22
2
2
22
2
2
0.8
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 10 20 30 40 50 60 70
x is always positive
f(x)
x
Lognormal: Radial growth data
0 1 2 3 4HEMLOCK
0
50
100
150
Cou
nt
0.0
0.1
0.2 Prop
ortion per Bar
0 1 2 3REDCEDAR
0
10
20
30
40
Cou
nt0.0
0.1
0.2
Prop
ortion per Bar
Growth (cm/yr) Growth (cm/yr)
Red cedarHemlock
(Data from Date Creek, British Columbia)
Exponential
2
1
1
Variance
]x[E
e)x(f x
Variable
Co
unt
0 1 2 3 4 5 60
10
20
30
40
50
60
70
80
0.0
0.1
0.2
0.3
0.4
Pro
portion
per B
ar
Exponential: Growth data (negatives assumed 0)
0 1 2 3 4 5 6 7 8Growth (mm/yr)
0
200
400
600
800
1000
1200C
ount
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7P
roportion per Bar
Beilschemedia pendula
(Data from BCI, Panama)
Gamma: One tail and flexibility
parameter scales
parameter shapea
as]X[Var
as]x[E
ex)n(s
)x(f s/xaa
2
11
Gamma: “raw” growth data
0 1 2 3 4 5 6 7 8 9Growth (mm/yr)
0
200
400
600
800
1000
Cou
nt
Alseis blackiana
(Data from BCI, Panama)
0 10 20 300
50
100
150
200
Cordia bicolor
Growth (mm/yr)
Beta distribution
.otherwise;xfor)x(x)b,a(
)b,a|x(f ib
ia
ii 01011 11
)ba(
)b()a()b,a(
)ba()ba(
ba)x(Var
x of value expected ba
a)x(E
12
Beta: Light interception by crown trees
(Data from Luquillo, PR)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0GLI
0
100
200
300
400
500
600
Co
unt
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Pro
po
rtion p
er B
ar
Mixture models
• What do you do when your data don’t fit any known distribution?– Add covariates– Mixture models
• Discrete
• Continuous
Discrete mixtures
Discrete mixture:Zero-inflated binomial
0
001
xif)x(prob*V)x(prob
xif)(prob)V()x(prob
Continuous (compounded) mixtures
The Method of Moments
• You can match up the sample values of the moments of the distributions and match them up with the theoretical moments.
• Recall that:
• The MOM is a good way to get a first (but biased) estimate of the parameters of a distribution. ML estimators are more reliable.
0)x(p:x
)x(xp]X[E
22 ])X[E(]X[E]X[Var
MOM: Negative binomial
mu)x(xp]X[E)x(p:x
0
k
mumu])X[E(]X[E]X[Var
222