Body size distribution of European Collembola
Lecture 9Moments of distributions
Body size distribution of European Collembola
SpeciesBody
weight [mg]
ln weight
ln body weight [mg] class means
Number of
speciesTetrodontophora bielanensis (Waga 1842) 13.471729 2.6006 -4.71511 7Orchesella chiantica Frati & Szeptycki 1990 13.471729 2.6006 -4.018377 53Disparrhopalites tergestinus Fanciulli, Colla, Dallai 2005 12.924837 2.5592 -3.321643 133Orchesella dallaii Frati & Szeptycki 1990 9.4503028 2.246 -2.624909 224Seira pini Jordana & Arbea 1989 9.4503028 2.246 -1.928176 353Isotomurus pentodon (Kos,1937) 7.1044808 1.9607 -1.231442 395Heteromurus (V.) longicornis (Absolon 1900) 7.1044808 1.9607 -0.534708 325Pogonognathellus flavescens (Tullberg 1871) 6.9512714 1.9389 0.162025 126Orchesella hoffmanni Stomp 1968 6.9512714 1.9389 0.858759 45Heteromurus (H) constantinellus Lučić, Ćurčić & Mitić 2007 6.3862223 1.8541 1.555493 24Pogonognathellus longicornis (Müller 1776) 6.2133935 1.8267 2.252226 9Orchesella devergens Handschin 1924 6.2133935 1.8267Orchesella flavescens (Bourlet 1839) 6.2133935 1.8267Orchesella quinquefasciata (Bourlet 1841) 6.2133935 1.8267
0
100
200
300
400
500
-4.72 -4.02 -3.32 -2.62 -1.93 -1.23 -0.53 0.16 0.86 1.56 2.25
Num
ber o
f spe
cies
ln body weight class
CollembolaThe histogram of raw data
Modus
Weighed mean
)(1111
ifxnnxnx
nx
k
ii
ik
iii
k
ii
Class 1 Class 2 Class 3N 25 31 43
Mean 1.8169079 1.032923 0.5310592.6005933 1.313477 0.6518082.5591508 1.313477 0.6518082.2460468 1.313477 0.6518082.2460468 1.313477 0.6518081.9607257 1.313477 0.6518081.9607257 1.301948 0.6518081.9389246 1.225568 0.6518081.9389246 1.165038 0.6518081.8541429 1.165038 0.6518081.8267072 1.165038 0.6518081.8267072 1.165038 0.6518081.8267072 1.006355 0.6518081.8267072 1.006355 0.6518081.8267072 1.006355 0.6518081.584378 1.006355 0.6518081.584378 1.006355 0.6518081.584378 1.006355 0.6518081.584378 1.006355 0.6131521.584378 1.006355 0.5738351.584378 1.006355 0.5738351.5326904 1.006355 0.5338341.5326904 0.939683 0.4931251.5064044 0.871022 0.4931251.4529137 0.871022 0.4931251.4529137 0.835906 0.493125
0.835906 0.4931250.800247 0.4890140.800247 0.4516820.764026 0.4516820.756712 0.4516820.727225 0.451682
0.409479
Three Collembolan weight classes
What is the average body weight?
013.1531.09943033.1
9931812.1
9925
x
n
xn
ii
1 n
xx
n
ii
1
Population mean Sample mean
ln body weight [mg] class means
Number of
speciesFrequency Arithmetic
mean Variance
-4.72 7 =B2/B14 =A2*C2 =(A2-D14)^2*C2-4.02 53 0.031286895 -0.125723 0.202268085-3.32 133 0.078512397 -0.26079 0.267516588-2.62 224 0.132231405 -0.347095 0.174619987-1.93 353 0.208382527 -0.401798 0.042653444-1.23 395 0.233175915 -0.287143 0.013917567-0.53 325 0.191853601 -0.102586 0.1698983170.16 126 0.074380165 0.0120514 0.1995107270.86 45 0.026564345 0.0228124 0.1447740291.56 24 0.014167651 0.0220377 0.1301786272.25 9 0.005312869 0.0119658 0.073837264
Sum 1694 -1.475751 1.462535979StDev 1.209353538
0
0.05
0.1
0.15
0.2
0.25
-4.72 -4.02 -3.32 -2.62 -1.93 -1.23 -0.53 0.16 0.86 1.56 2.25
Num
ber o
f spe
cies
ln body weight class
Collembola
nnxf i)( 1
Weighed mean
k
iii
k
i
iin
i
i xfxnxn
nxx
111
)(
Discrete distributions
Continuous distributions
max
min
)( dxxxf
The average European springtail has a body weight of e-1.476 = 023 mg.
Most often encounted is a weight around e-1.23 = 029 mg.
Why did we use log transformed values?
SpeciesAverage
body length [mm]
Body weight
[mg]
Tetrodontophora bielanensis (Waga 1842) 7 13.472Orchesella chiantica Frati & Szeptycki 1990 7 13.472Disparrhopalites tergestinus Fanciulli, Colla, Dallai 2005 6.875 12.925Orchesella dallaii Frati & Szeptycki 1990 6 9.4503Seira pini Jordana & Arbea 1989 6 9.4503Isotomurus pentodon (Kos,1937) 5.3 7.1045Heteromurus (V.) longicornis (Absolon 1900) 5.3 7.1045Pogonognathellus flavescens (Tullberg 1871) 5.25 6.9513Orchesella hoffmanni Stomp 1968 5.25 6.9513Heteromurus (H) constantinellus Lučić, Ćurčić & Mitić 2007 5.06 6.3862Pogonognathellus longicornis (Müller 1776) 5 6.2134Orchesella devergens Handschin 1924 5 6.2134Orchesella flavescens (Bourlet 1839) 5 6.2134Orchesella quinquefasciata (Bourlet 1841) 5 6.2134
5 =JEŻELI(B86=0;0;EXP(-1.875+LN(B86)*2.3))
3.2875.1 ][]/[][ mmLLWemgW
0
100
200
300
400
500
-6.00 -4.00 -2.00 0.00 2.00 4.00
Num
ber o
f spe
cies
ln body weight class
Collembola
0
100
200
300
400
500
0 2 4 6 8 10
Num
ber o
f spe
cies
Body weight class
CollembolaLog transformed data Linear data
The distribution is skewed
Body weight [mg] class
means
Number of
speciesFrequency Arithmetic
meanGeometric
mean
0.01 7 0.004132231 3.702E-05 -0.0194839260.02 53 0.031286895 0.0005626 -0.1257225390.04 133 0.078512397 0.0028338 -0.2607901530.07 224 0.132231405 0.0095797 -0.3470954050.15 353 0.208382527 0.0303016 -0.4017981870.29 395 0.233175915 0.0680574 -0.2871426150.59 325 0.191853601 0.1123956 -0.1025856551.18 126 0.074380165 0.0874629 0.0120514462.36 45 0.026564345 0.062698 0.022812374.74 24 0.014167651 0.0671181 0.0220376819.51 9 0.005312869 0.0505194 0.011965782
Sum 1694 0.491566 -1.4757512Exp() 0.228606933
0
100
200
300
400
500
0 2 4 6 8 10
Num
ber o
f spe
cies
Body weight class
Collembola
LzWWLWW
mmLLWemgWz
lnlnln
][]/[][
0
0
3.2875.1
In the case of exponentially distributed data we have to use the geometric mean.To make things easier we first log-transform our data.
nxn
n
ii
n
ii
ex
1
ln
1
Geometric mean
The average European springtail has a body weight of
e-1.476 = 023 mg.
lb scaled weight classes
ln body weight [mg] class means
Number of
speciesFrequency Arithmetic
mean Variance
-4.72 7 =B2/B14 =A2*C2 =(A2-D14)^2*C2-4.02 53 0.031286895 -0.125723 0.202268085-3.32 133 0.078512397 -0.26079 0.267516588-2.62 224 0.132231405 -0.347095 0.174619987-1.93 353 0.208382527 -0.401798 0.042653444-1.23 395 0.233175915 -0.287143 0.013917567-0.53 325 0.191853601 -0.102586 0.1698983170.16 126 0.074380165 0.0120514 0.1995107270.86 45 0.026564345 0.0228124 0.1447740291.56 24 0.014167651 0.0220377 0.1301786272.25 9 0.005312869 0.0119658 0.073837264
Sum 1694 -1.475751 1.462535979StDev 1.209353538
0
0.05
0.1
0.15
0.2
0.25
-4.72 -4.02 -3.32 -2.62 -1.93 -1.23 -0.53 0.16 0.86 1.56 2.25
Num
ber o
f spe
cies
ln body weight class
Collembola
nnxf i)( 1
1
)(1
2
2
n
xxs
n
ii
n
xn
ii
1
2
2)(
Variance
)()(1
22i
n
ii xfxxs
Continuous distributions
dxxfxxs max
min
22 )()(
2ss Standard deviation
Mean
1 SD
The standard deviation is a measure of the width of the statistical distribution that has the sam
dimension as the mean.
Degrees of freedom
The standard deviation as a measure of errorsEnvironmental pollution
Station NOx [ppm]1 8.492 1.123 9.114 7.755 0.756 8.237 0.978 6.069 8.48
10 5.8811 8.5112 9.6213 3.3514 7.7415 2.0316 5.0617 7.6118 0.9919 2.5520 8.91
Mean 5.66Variance 10.45
Standard deviation
3.23
DistanceAverage NOx
concentrationStandard deviation
1 9.53 1.702 7.37 1.183 5.24 0.864 3.15 0.265 2.17 0.186 1.05 0.097 0.84 0.148 0.63 0.109 0.32 0.03
10 0.21 0.02
The precision of derived metrics should always match the precision of the raw data
02468
101214
1 2 3 4 5 6 7 8 9 10
Conc
entr
ation
Distance [km]
± 1 standard deviation is the most often used estimator of error.The probablity that the true mean is within ± 1 standard deviation is approximately 68%.The probablity that the true mean is within ± 2 standard deviations is approximately 95%.
± 1 standard deviation
MeanStandard deviation
5.44 4.15
4.49 5.29
5.55 3.39
5.56 3.13
Standard deviation and standard errorEnvironmental
pollution
StationNOx
[ppm]1 8.492 1.123 9.114 7.755 0.756 8.237 0.978 6.069 8.48
10 5.8811 8.5112 9.6213 3.3514 7.7415 2.0316 5.0617 7.6118 0.9919 2.5520 8.91
The standard deviation is constant irrespective of sample size.
The precision of the estimate of the mean should increase with sample size n.
The standard error is a measure of precision.
nSDSE
DistanceAverage NOx
concentrationStandard deviation
Standard error n=20
1 9.53 3.32 0.742 7.37 2.45 0.553 5.24 1.24 0.284 3.15 0.67 0.155 2.17 0.87 0.196 1.05 0.34 0.087 0.84 0.14 0.038 0.63 0.10 0.029 0.32 0.03 0.01
10 0.21 0.02 0.01
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9 10
Conc
entr
ation
Distance [km]
)()()(2)()()()(1
2
11
2
1
22i
n
ii
n
iii
n
iii
n
ii xfxxfxxxfxxfxxs
2
1
22
1
22 )()(1)(2)()( xxfxxxxxfxs i
n
iii
n
ii
E(x2) [E(x)]2
222 )()( xExE
The variance is the difference between the mean of the squared values and the squared mean
1
( ) ( )n
k ki i
i
E X x f x
( ) ( )k kE X x f x dx
( )E X k-th central moment
2 2 2
1
( ) ( ) (( ) )n
i ii
X f X E X
Mathematical expectation
Central moments
First central momentFirst moment of central tendency
2
11
2
2
11
n
x
n
xs
n
ii
n
ii
00.20.40.60.8
1
0 2 4 6 8
f(x)
x
3
3
(( ) )E X
Skewness
3 3 2 2 3 3 2 3(( ) ) ( ) 3 ( ) 3 ( ) ( ) 3 ( ) 2E X E X E X E X E X E X Third central moment
4
4
( )( ) 3XE
Kurtosis
00.20.40.60.8
1
0 2 4 6 8
f(x)
x
00.20.40.60.8
1
0 500 1000 1500 2000f(x
)
x
00.20.40.60.8
1
1 1.5 2
f(x)
x
=0 >0 <0
Symmetric distribution Right skewed distribution Left skewed distribution
=0
00.20.40.60.8
1
0 2 4 6 8
f(x)
x
>0
What is the probability that of 10 newborn babies at least 7 are boys?
( ) k n knp k p q
k
172.05.05.01010
5.05.0910
5.05.0810
5.05.0710
)6( 010192837
kp
0
0.05
0.1
0.15
0.2
0.25
0.3
0 2 4 6 8 10
p(X)
X
p(girl) = p(boy) = 0.5
Lecture 10Important statistical distributions
n
iip
0
1
Bernoulli distribution
( ) k n knp k p q
k
0
( ) ( )k
x n x
x
nF k p x k p q
x
The Bernoulli or binomial distribution comes from the Taylor expansion of the binomial
n
i
nin
i
nin qpin
qpin
qp0
1
0
1 )1()(
npq
np
2
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 1 2 3 4 5 6 7 8 9 10p
f(p)
1010( ) 0.2 0.8k kp k
k
1010( ) 0.2 0.8k kp k
k
Bernoulli or binomial distribution
Assume the probability to find a certain disease in a tree population is 0.01. A bio-monitoring program surveys 10 stands of trees and takes in each case a random sample of
100 trees. How large is the probability that in these stands 1, 2, 3, and more than 3 cases of this disease will occur?
146.39.9
9.999.0*01.0*1000
1001.0*10002
0074.099.0*01.03
1000)3(
0022.099.0*01.02
1000)2(
0004.099.0*01.01
1000)1(
9973
9982
999
p
p
pMean, variance, standard deviation
99.099.001.03
100099.001.0
21000
99.001.01
1000
99.001.00
1000199.001.01)3(1)3(
997399829991
100003
0
i
inikpkp
What happens if the number of trials n becomes larger and larger and p the event probability becomes smaller and smaller.
( )! 1 ( )!( )!( 1)! ( ) ( ) ! ( 1)!( )
1
k r k
rk r k
r k r r kp X kk r r r k r r
r
1lim1
( )!lim 1( 1)!( )
r r
r k
e
rr k
r r
( )!
k
p X k ek
Poisson distribution
( ) k n knp k p q
k
rrpq
rp
prpnp 11
The distribution or rare events
Assume the probability to find a certain disease in a tree population is 0.01. A bio-monitoring program surveys 10 stands of trees and takes in each case a random sample of
100 trees. How large is the probability that in these stands 1, 2, 3, and more than 3 cases of this disease will occur?
1001.0*1000
0076.0!3
10)3(
0023.0!2
10)2(
00045.0!110)1(
103
102
10
ep
ep
ep
0074.0)3(
0022.0)2(
0004.0)1(
p
p
pPoisson solution Bernoulli solution
The probability that no infected tree will be detected
000045.0!0
10)0( 10100
eepep )0(
The probability of more than three infected trees
981.0019.01)3(019.00076.00023.000045.0)3()2()1()0(
kppppp
99.0)3( kp
Bernoulli solution
00.05
0.10.15
0.20.25
0.30.35
0.4
0 1 2 3 4 5 6 7 8 9 10 11 12 13k
p(k)
= 1
= 2 = 3
= 4 = 6
2 1
Variance, mean
Skewness
What is the probability in Duży Lotek to have three times cumulation if the first time 14 000 000 people bet, the second time 20 000 000,
and the third time 30 000 000?
The probability to win is
140000001
!49!43!6)6( p
142857.214000000
130000000
428571.114000000
120000000
114000000
114000000
3
2
1
117.0!0
142857.2
239.0!0
428571.1
368.0!01
142857.20
3
428571.10
2
10
1
ep
ep
ep
The events are independent:
01.0117.0*239.0*368.03,2,1 p
The zero term of the Poisson distribution gives the probability of no eventThe probability of at least one event:
ekp 1)1(
A pile model to generate the binomial.If the number of steps is very, very large the binomial becomes smooth.
The normal distribution is the continous equivalent to the discrete
Bernoulli distribution
Abraham de Moivre (1667-1754)
2
21
21)(
x
exf
)( 2
)( xCexf
If we have a series of random variates Xn, a new random variate Yn that is the sum of all Xn will for n→∞ be a variate that is asymptotically normally distributed.
00.010.020.030.040.05
-2 -1.2 -0.4 0.4 1.2 2X
Freq
uenc
y
00.010.020.030.040.05
-2 -1.2 -0.4 0.4 1.2 2X
Freq
uenc
y
0
0.02
0.04
0.06
-2 -1.2 -0.4 0.4 1.2 2X
Freq
uenc
y
0
0.05
0.1
0.15
-2 -1.2 -0.4 0.4 1.2 2X
Freq
uenc
y
00.05
0.10.15
0.20.25
-2 -1.2 -0.4 0.4 1.2 2X
Freq
uenc
y
0
0.05
0.1
0.15
-2 -1.2 -0.4 0.4 1.2 2X
Freq
uenc
y
The central limit theorem
00.020.040.060.08
0.10.120.140.160.18
0.2
0 3 6 9 12 15 18X
f(x)
n=20
0
0.02
0.04
0.06
0.08
0.1
0.12
0 6 12 18 24 30 36 42 48X
f(x)
n=50
0
0.05
0.1
0.15
0.2
0.25
0.3
0 2 4 6 8 10X
f(x)
n=10
0
0.01
0.02
0.03
0.04
0.05
0.06
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5X
f(x)
2
2( )21( )
2
x
f x e
2
2( )21( )
2
x
f x e
0
0.2
0.4
0.6
0.8
1
1.2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5X
f(x)
2
2( )21( )
2
vx
F x e dv
The normal or Gaussian distribution
Mean: Variance: 2
Important features of the normal distribution• The function is defined for every real x.• The frequency at x = m is given by
1 0.4( )2
p x
• The distribution is symmetrical around m. • The points of inflection are given by the second
derivative. Setting this to zero gives
( )x x
00.020.040.060.08
0.10.120.140.160.18
0.2
0 3 6 9 12 15 18X
f(x)
n=20
0
0.02
0.04
0.06
0.08
0.1
0.12
0 6 12 18 24 30 36 42 48X
f(x)
n=50
0
0.05
0.1
0.15
0.2
0.25
0.3
0 2 4 6 8 10X
f(x)
n=10
0
0.01
0.02
0.03
0.04
0.05
0.06
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5X
f(x)
2
2( )21( )
2
x
f x e
+- 0.68
+2-2 0.95
95.021
68.021
121
2
2
2
212
2
21
21
x
x
x
e
e
e
975.021
5.021
2
2
212
21
x
x
e
e
Many statistical tests compare observed values with those of the standard normal distribution and assign
the respective probabilities to H1.
2
2( )21( )
2
vx
F x e dv
The Z-transform
2
21
21)(
x
exf
xZ
221
21)(
Zexf
The variate Z has a mean of 0 and and variance of 1.
A Z-transform normalizes every statistical distribution.Tables of statistical distributions are always given as Z-
transforms.
The standard normal
The 95% confidence limit
P( - < X < + ) = 68%P( - 1.65 < X < + 1.65) =
90%P( - 1.96 < X < + 1.96) =
95%P( - 2.58 < X < + 2.58) =
99% P( - 3.29 < X < + 3.29) =
99.9%
The Fisherian significance levels
00.020.040.060.08
0.10.120.140.160.18
0.2
0 3 6 9 12 15 18X
f(x)
n=20
0
0.02
0.04
0.06
0.08
0.1
0.12
0 6 12 18 24 30 36 42 48X
f(x)
n=50
0
0.05
0.1
0.15
0.2
0.25
0.3
0 2 4 6 8 10X
f(x)
n=10
0
0.01
0.02
0.03
0.04
0.05
0.06
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5X
f(x)
2
2( )21( )
2
x
f x e
+- 0.68
+2-2 0.95
The Z-transformed (standardized) normal distribution
x,s
x,s
x,s
x,s
x,sx,sx,s
x,s
,
The estimation of the population mean from a series of samples
xnnnxn
s
nxZ
n
i
i
n
ii
n
ii
1
1
2
1
xZn
The n samples from an additive random variate.
Z is asymptotically normally distributed.
nx
Confidence limit of the estimate of a mean from a series of
samples.
is the desired probability level.
00.020.040.060.080.1
0.120.140.160.180.2
0 3 6 9 12 15 18X
f(x)
n=20
0
0.02
0.04
0.06
0.08
0.1
0.12
0 6 12 18 24 30 36 42 48X
f(x)
n=50
0
0.05
0.1
0.15
0.2
0.25
0.3
0 2 4 6 8 10X
f(x)
n=10
0
0.01
0.02
0.03
0.04
0.05
0.06
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5X
f(x)
2
2( )21( )
2
x
f x e
+- 0.68
+2-2 0.95
Standard error
How to apply the normal distribution
Intelligence is approximately normally distributed with a mean of 100 (by definition) and a standard deviation of 16 (in North America). For an intelligence study we need 100 persons with an IO above 130. How many persons do we have to test to find this
number if we take random samples (and do not test university students only)?
2 2
2 2( ) ( )1302 2
130
1 1( 130) 12 2
v v
F x e dv e dv
( ) ( )az F x a
0
0.005
0.01
0.015
0.02
0.025
0.03
40 60 80 100 120 140 160
IQ
f(IQ
)
IQ<130 IQ>130
One and two sided tests
We measure blood sugar concentrations and know that our method estimates the concentration with an error of about 3%. What is the probability that our
measurement deviates from the real value by more than 5%?
Albinos are rare in human populations. Assume their frequency is 1 per 100000 persons. What is the probability to find 15
albinos among 1000000 persons?
15 9999851000000( 15) (0.00001) (0.99999)
15p X
=KOMBINACJE(1000000,15)*0.00001^15*(1-0.00001)^999985 = 0.0347
np 2 npq