Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Table of contents Statistical analysis Measures of statistical central tendencies Measures of variability Aleatory uncertainties Epistemic uncertainties
Measures of statistical dispersion or deviation The range Mean difference The variance The standard deviation Coefficient of variation
Measures of uncertainty
Systems of events Entropy
Random (stochastic) variables Discontinuous (discrete) random variables
Moments of discrete random variables Probability distributions of discrete random variables Binomial distribution Poisson distribution
Continuous random variables
Probability Density Function Cumulative Distribution Function Probability distributions of continuous random variables Uniform distribution Simpson’s (triangular) distribution Normal distribution Lognormal distribution Shifted exponential distribution Gamma distribution Shifted Rayleigh distribution Type I Largest value (Gumbel) distribution Type III Smallest values (for 0=ε it is known as the Weibull distribution) Beta distribution Type I Smallest values distribution
Combinations of random variables
Measures of statistical central tendencies Measures of central tendency of a set of data x1, x2, ..., xN locate only the centre of a distribution of measures. Other measures often are needed to describe data. The mean is often used to describe central tendencies. Mean has two related meanings in statistics:
• the arithmetic mean • the expected value of a random variable.
In mathematics and statistics, the arithmetic mean, often referred to as simply the mean or average. The term "arithmetic mean" is preferred in mathematics and statistics. The arithmetic mean is analytically defined on a data set x1, x2, ..., xN as it follows:
1
1( )N
ii
x x xN
μ=
= = ∑
Measures of variability Statistics uses summary measures to describe the amount of variability or spread in a set of data x1, x2, ..., xN. The variability applies to the extent to which data points in a statistical distribution or data set diverge from the average or mean value. Variability also refers to the extent to which these data points differ from each other. There are several commonly used measures of variability: range, mean difference, variance and standard deviation as well as the combined measure of variability defined as coefficient of variation with respect to the mean value.
Uncertainty represents a state of having limited knowledge where it is impossible to exactly describe the existing state, a future outcome, or more than one possible outcome.
The uncertainty (doubt) in statistics and probability theory represents the estimated amount or percentage by which an observed or calculated value may differ from the true value.
Uncertainties can be distinguished as being either aleatory or epistemic.
Aleatory uncertainties
Objective or external or irreducible uncertainty arises because of natural, unpredictable variability of the wave and wind climate or of ship operations. The inherent randomness normally cannot be reduced although the knowledge of the phenomena may help in quantifying the uncertainty.
Epistemic uncertainties
Uncertainty is due to a lack of knowledge about the climate properties. The epistemic (or subjective or internal or modelling) uncertainty can be reduced with sufficient study, better measurement facilities, more observations or improved modelling and, therefore, expert judgments may be useful in its reduction.
The range A measure of statistical dispersion or deviation is a real number that is zero if all the data are identical, and increases as the data becomes more diverse. It cannot be less than zero. Most measures of dispersion have the same scale as the quantity being measured. In other words, if the measurements have units, such as metres or seconds, the measure of dispersion has the same units. Basic measures of dispersion include:
• Range • Mean difference • Variance
Additional measures are: • Standard deviation – the square root of the variance • Coefficient of variation – the standard deviation divided by the mean value •
(See Excel example: GraduationRateofNavalArchitectureinZagreb) The example presents the statistical properties of the input and output rates of numbers of students naval architecture at the Faculty of Mechanical Engineering and Naval Architecture at the University of Zagreb.
05
101520253035404550556065707580859095
100105110115120
Upis
ano/
dipl
omira
lo
Godina
Studij brodogradnje
UpisanoDiplomiralo
The range In descriptive statistics, the range is the length of the smallest interval which contains all the data of a dataset x1, x2, ..., xN. The range is calculated by subtracting the smallest observation (sample minimum Smin) from the greatest (sample maximum Smax) and indicates of statistical dispersion R=Smax-Smin. The range, in the sense of the difference between the highest and lowest scores, is also called the crude range. The midrange point, i.e. the point halfway between the two extremes, is an indicator of the central tendency of the data. It is not appropriate for small samples. The mean difference In probability theory and statistics, the mean difference is used as a measure of how far a set of numbers of a dataset x1, x2, ..., xN are spread out from each other. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean (expected value). For random variable X = x1, x2, ..., xN with mean value μ the mean difference of X is:
1
1( ) ( )N
ii
MD x abs xN
μ=
= −∑ or the relative mean difference is then
( )( ) MD xRMD xμ
=
The variance In probability theory and statistics, the variance is another indicator used as a measure of how far a set of numbers are spread out from each other. For random variable X with expected value (mean) μ = E[X], the variance of X is:
2 2 2 2
1 1
1 1( ) ( )N N
i ii i
Var x x xN N
σ μ μ= =
= = − = −∑ ∑
Proof: 2
2 2 2 2 2
1 1 1
1 1 1( ) ( ) ( ) 21
N N Ni
i i ii i i
Var x x x N xN N N N
μ μσ μ μ μ= = =
= = − = − + = −∑ ∑ ∑
The standard deviation The widely used measure of variability or diversity in statistics and probability theory is the standard deviation. It shows how much variation or "dispersion" there is from the "average". The standard deviation is the square root of its variance:
( ) ( )X Var Xσ = The standard deviation, unlike variance, is expressed in the same units as the data. The coefficient of variation Other measures of dispersion are dimensionless (scale-free). In They have no units even if the variable itself has units. In widest use is the coefficient of variation defined as follows:
( )( )( )XCOV XX
σμ
=
For measurements with percentage as unit, the coefficient of variation and the standard deviation will have percentage points as unit.
Measures of uncertainty Systems of events
Random events are in general considered as abstract concepts and the relations among
events are characterized axiomatically. The algebraic structure of the set of events turns out to be Boolean algebra.
The disjoined random events jE with probabilities )( ii Epp = , 1, 2, ,i N= ⋅⋅ ⋅ configure a system SN in a form of an N-element finite scheme:
( ) ( ) ( ) ( )1 2
1 1 2 2
j N
Nj j N N
E E E E
p p E p p E p p E p p E
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎞⎛= ⎟⎜⎜ ⎟= = ⋅⋅⋅ = ⋅⋅ ⋅ =⎝ ⎠
S
The probability of a system of events SN is then in general 1
( ) 1N
N ii
p p=
= ≤∑S . For a
complete distribution is ( ) ( ) 1N Np or p =P S . A system of N events: E1, E2, ... , EN is called a complete system of events if the
following axioms hold: ( 1, 2, , )kE k N≠ ∅ = ⋅ ⋅ ⋅ (a)
)( kjforEE kj ≠∅= (b)
1 2 NE E E I+ + ⋅ ⋅ ⋅ + = (c) The "∅" in (a) and (b) means an impossible event and "I" in (c) denotes a sure event. The
fact that Ej and Ek are exclusive is expressed in (b). The (c) denotes that at least one of the events Ek, k = 1, 2, ..., N, occurs.
Entropy
Uncertainty of a single stochastic event E with known probability p=p(E)≠0 plays a
fundamental role in information theory. To each probability can be assigned the equivalent number (2) of probabilities or events )(/1)( EpE =ν . The entropy of a single stochastic event E can be interpreted according to Wiener (1948) either as a measure of the information yielded by the event or how unexpected the event was and can be defined as the logarithm of the equivalent number of events )(Eν as follows:
[ ]2 2 2( ) log ( ) log 1/ ( ) log ( )H E E p E p Eν= = = − The unit of unexpectedness 1)2/1( =H expresses how unexpected is for example to get a
tail when flipping a coin. More important than unexpectedness of a single stochastic event are the uncertainties of systems of N events. The uncertainty of a complete system S of N events can be expressed as the weighted sum of unexpectedness of all events by the Shannon’s entropy (Shannon and Weaver, 1949), as it follows:
1 1 1
( ) log log(1/ ) logN N N
N j j j j j jj j j
H p p p p pυ= = =
= = = −∑ ∑ ∑S
The uncertainty of an incomplete system of N events S can be defined as the limiting case of the Renyi’s entropy (1970) of order 1, as it is shown:
1
1
1( ) log( )
NRN j j
j
H p pp =
= − ∑SS
The definition of the unit of uncertainty according to Renyi (1970) is not more and not less arbitrary than the choice of the unit of some physical quantity. E.g., if the logarithm applied is of base two, the unit of entropy is denoted as one "bit". One bit is the uncertainty of a system of two equally probable events. If the natural logarithm is applied, the unit is denoted as one "nit". Outcomes with 0 probabilities do not change the uncertainty. By convention, 0 log 0= 0. Some characteristics of the probabilistic uncertainty measures and properties of the entropy are summarized next. The entropy HN(S) is equal to zero when the state of the system S can be surely predicted, i.e., no uncertainty exists at all. This occurs when one of the probabilities of events pi, i=1,2,...,N is equal to one, let us say pk=1 and all other probabilities are equal to zero, pj=0, j≠k. The entropy is maximal when all events are equally probable and the probability of failure is equal to pi= 1/N, for i=1, 2, ..., N, and it amounts to HN(S)max=log N, that is the Hartley's entropy (1928). Hartley’s entropy (1928) corresponds to the Renyi’s entropy of order 0 (1970). The entropy increases as the number of events increase. The entropy does not depend on the sequence of events: Hn(p1,p2,...,pN)= Hn(pk(1),pk(2),...,pk(N)), where k is an arbitrary permutation on (1,2,...,N). The uniqueness theorem by Khinchin (1957) states that the entropy is the only function that measures the probabilistic uncertainty of systems of events in agreement with human experience of uncertainty.
(see Excel example U1-EntropyDieCoin)
Random (stochastic) variables Deterministic variables are normally described by their properties:
• N - Nominal value or the exact value And possibly with tolerances
• T - Tolerance • t=T/N - Relative tolerance
Description of characteristics of random variables:
• μ=ON+N Mean value • O=μ/N-1 Mean deviation of the nominal value (bias) • Var=σ2 Variance • σ=Var1/2 Standard deviation • COV=σ/μ Coefficient of variation • F Probability distribution:
PDF – probability density function CDF – cumulative distribution function Empirically, it is possible for practical purposes to relate the tolerance of the deterministic variables to the standard deviation of random variables: T=n σ For example, supposing a normal probability distribution, for n=3 is less than 27 samples out of 10000 expected to be outside the tolerable margins. In other words, the confidence interval is 99.73% that a random sample will be within the prescribed tolerance interval of ±3σ.
(See Excel example: MSproperty-plating-statistics) The example presents the statistical analysis of mechanical properties mild shipbuilding steels (MS) for rolled plates and profilers obtained by tensile testing in the Laboratory for experimental mechanics at the Faculty of Mechanical Engineering and Naval Architecture at the University of Zagreb.
(See Excel example: MSproperty-profils-statistics)
Discontinuous (discrete) random variables Definition: The values of discontinuous (discrete) random variables x1, x2, ... are probabilities p(x1), p(x2), ... with the property ( ) 1i
i
p x =∑ .
Moments of discrete random variables ( )r
r i ii
m x p x=∑
( ) ( )rr i i
i
M x p xμ= −∑
Expectation ( )i i
i
x p xμ =∑
Variance 2 2( ) ( ) ( )i i
i
V x x p xσ μ= = −∑
Probability distributions of discrete random variables Binomial distribution
( ) x n xnP x p q
x−⎞⎛
= ⎟⎜⎝ ⎠
Mean np=
Sigma npq=
(see Excel example DD1-DistributionBinomial)
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
0 5 10 15 20 25
P(x)
x
Binomial distribution
CDF
n=25p=0.5
Mean=12.5Sigma=2.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0 5 10 15 20 25 30 35 40 45 50
P(x)
x
Binomial distribution
n=5
n=10
n=50
p=0.75p=0.25 p=0.50
n=20
n=2
Poisson distribution
( )!
xmmP x e
x−=
0m np= >
Mean m=
Variance m=
Sigma m= 1 /COV m=
(see Excel example DD2-DistributionPoisson)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 10 20 30 40 50 60
P(x)
x
Poisson distribution
Poisson disribution
Cumulative density function
Continuous random variables A random variable is called continuous if it can assume all possible values in the possible range of the random variable. In continuous random variable the value of the variable is never an exact point. It is always in the form of an interval, the interval may be very small. Probability Density Function (PDF) The probability function of the continuous random variable is called probability density function of briefly p.d.f. It is denoted by f(x) and represents the probability that the random variable X takes the value between x and x+Δx where Δx is a very small change in X. Cumulative Distribution Function (CDF) In terms of probability density function the cumulative distribution function is defined as:
( )x
CDF f x dx−∞
= ∫
(see Excel example DC3-DistributionNormal)
0.0000
0.1000
0.2000
0.3000
0.4000
0.5000
0.6000
0.7000
0.8000
0.9000
1.0000
‐4.0 ‐3.5 ‐3.0 ‐2.5 ‐2.0 ‐1.5 ‐1.0 ‐0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
CDF
Mean=2Sigma=1
Moments of continuous random variables
( )rrm x f x dx
+∞
−∞
= ∫
( ) ( )rrM x f x dxμ
+∞
−∞
= −∫
Expectation ( )xf x dxμ
+∞
−∞
= ∫
Variance 2 2( ) ( ) ( )V x x f x dxσ μ
+∞
−∞
= = −∫
Probability distributions of continuous random variables Uniform distribution
1 1( )( ) 3( )
f xb a b a
μσ
= = ⋅− +
1 1( ) ( ) ( )3( )
F x x a x ab a b a
μσ
= − ⋅ = − ⋅ ⋅− +
2ba +
=μ 32 ⋅
−=
abσ
(see Excel example DC6-DistributionUniform)
Simpson’s (triangular) distribution
( ) ( ) ( )3
x a x af x f b f bb a b a
μσ
− −= ⋅ = ⋅ ⋅
− +
2( )( ) ( )2 ( )
x aF x f bb a−
= ⋅⋅ −
2( )f bb a
=−
2ba +
=μ 32 ⋅
−=
abσ
(see Excel example DC7-DistributionSimpson)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20
Simpson's (triangular) distribution
CDF
Normal distribution
212
2
1( )2
x
f x eμ
σ
πσ
− ⎞⎛− ⋅⎜ ⎟⎝ ⎠= ⎟
⎠⎞
⎜⎝⎛ −
Φ=σμxxF )(
Standard normal cumulative probability
( )21
212
uu eϕ
π−
=
( ) ∫∞−
−=Φ
x ueu
2
21
21π σ
μ−=
xu
0>σ
(see Excel example DC3-DistributionNormal)
0.0000
0.1000
0.2000
0.3000
0.4000
0.5000
0.6000
0.7000
0.8000
0.9000
1.0000
‐4.0 ‐3.5 ‐3.0 ‐2.5 ‐2.0 ‐1.5 ‐1.0 ‐0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
CDF
Mean=2Sigma=1
Lognormal distribution
2ln1
21( )2
y
y
x
y
f x ex
μσ
σ π
⎞⎛ −− ⋅ ⎟⎜⎜ ⎟
⎝ ⎠=⋅
ln
( ) y
y
xF x
μσ
⎞⎛ −= Φ ⎟⎜⎜ ⎟
⎝ ⎠
ln y
y
xu
μσ−
=
2 2 2ln /y x x xμ μ σ μ⎡ ⎤= +⎣ ⎦ 2 2ln 1 /y x xσ σ μ⎡ ⎤= +⎣ ⎦
2
2y
y
x eσ
μμ
+=
2
22 1
yy y
x e eσ
μ σσ+
= − (see Excel example DC4-DistributionLogNormal)
0.0000
0.1000
0.2000
0.0 5.0 10.0 15.0 20.0
CDF
Mean‐y=1.91Sigma‐y=0.547
Mean‐x=8Sigma‐x=5
(see Excel example DC4-DistributionLogNormal-MildSteel)
0.00000.00100.00200.00300.00400.00500.00600.00700.00800.00900.01000.01100.01200.01300.01400.01500.01600.01700.01800.01900.0200
0.0 50.0 100.0 150.0 200.0 250.0 300.0 350.0 400.0
CDF
Mean‐y=5.58Sigma‐y=0.114
Mean‐x=268Sigma‐x=30.5
Lognormal distribution of yield stressof mild shipbuilding steel
Shifted exponential [ ])()( oxxexf −−= λλ [ ])(1)( oxxexF −−−= λ
0>λ 1 1
oo
xx
μ λλ μ
= + =−
λ
σ 1=
Gamma distribution
)(1
)()()( x
k
ekxxf λλλ −
−
Γ=
)(),()(
kxkxF
ΓΓ
=λ
0>λ
0>k
Gamma function ( ) duuek k
o
u 1−∞
− ⋅=Γ ∫
)!1()( −=Γ⋅ kkk
Incomplete gamma function ( ) duuexk k
x
o
u 1, −− ⋅=Γ ∫
λμ k=
λσ k=
kλμ
= ( )2 kλσ =
1/( )
( )
kx
k
xf x ek
θ
θ
−−=
⋅Γ
1( ) ( , )( )
xF x kkγ
θ=Γ
kμ θ= ⋅ kσ θ= ⋅ 2
k μσ⎞⎛= ⎜ ⎟
⎝ ⎠
2σθμ
=
(see Excel example DC9-DistributionGamma)
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
.0 5.0 10.0
Gamma distribution
CDF
Shifted Rayleigh
2
21
2
)()(⎟⎠⎞
⎜⎝⎛ −⋅−−
= α
α
oxxo exxxf
2
21
1)(⎟⎠⎞
⎜⎝⎛ −⋅−
−= αoxx
exF
2παμ ⋅+= ov
22 πασ −⋅=
Type I Largest value (Gumbel)
( )( )( )x un o
n ox u enf x e e
ααα− −− −= ⋅ ⋅
( )
( ) 1x un oeF x e
α− −−= − 0>nα
nnu
αμ 5772.0
+= 6n
πσα
=
0.5772n
n
u μα
= − 6n
πασ
=⋅
(see Excel example DC5-DistributionGumbel)
0.00
0.01
0.02
0.03
0.04
0.05
.0 50.0 100.0 150.0 200.0 250.0 300.0 350.0 400.0
CDF
PDFMean= 100Sigma= 20
un= 109an= 0.064
CDF
Type III Smallest values (for 0=ε it is known as the Weibull distribution) 1
( )kxkk xf x e ε
ε ε
− ⎞⎛−⎜ ⎟⎝ ⎠⎞⎛= ⎜ ⎟
⎝ ⎠ ( ) 1
kx
F x e ε⎞⎛−⎜ ⎟
⎝ ⎠= − 0>k
11k
μ ε ⎞⎛= ⋅Γ +⎜ ⎟⎝ ⎠
22 11 1k k
σ ε ε⎞ ⎞⎛ ⎛= ⋅Γ + − ⋅Γ +⎜ ⎟ ⎜ ⎟⎝ ⎝⎠ ⎠
Type III Smallest values (for 0=ε it is known as the Weibull distribution) k
uxk
eux
ukxf
⎟⎟⎠
⎞⎜⎜⎝
⎛−−
−−
⎟⎟⎠
⎞⎜⎜⎝
⎛−−
−= ε
ε
εε
ε1
1
11
)(
k
ux
exF⎟⎟⎠
⎞⎜⎜⎝
⎛−−
−
−= εε
11)( 0>k
⎟⎠⎞
⎜⎝⎛ +Γ⋅−+=
ku 11)( 1 εεμ
⎟⎠⎞
⎜⎝⎛ +Γ−⎟
⎠⎞
⎜⎝⎛ +Γ⋅−=
kku 1121)( 2
1 εσ
Beta distribution
1
11
))(,()()()( −−
−−
−−−
= rq
rq
abrqBxbaxxf
0>q
0>r
Beta function )()()(),(
rqrqrqB
+ΓΓΓ
=
rqabqa
+−
+=)(μ
1)(
+++−
=rq
qrrqabσ
Type I Smallest values distribution )(
11 )(1)( ouxoeuxexf
−−=ααα
)1(11)(uxeexF−−−=
α
01 >α
n
uα
μ 5772.01 −=
16απσ =
Type II Largest value k
ox
uko
o
ex
uukxf
⎟⎠⎞
⎜⎝⎛−−
⎟⎠⎞
⎜⎝⎛=
1
)( k
ox
u
exF⎟⎠⎞
⎜⎝⎛−
=)( 0>k
⎟⎠⎞
⎜⎝⎛ −Γ⋅=
kun
11μ
⎟⎠⎞
⎜⎝⎛ −Γ−⎟
⎠⎞
⎜⎝⎛ −Γ⋅=
kkun
1121 2σ
Combinations of random variables For linear combinations 1 1 2 2 ... k ky a X a X a X= + + + of random variables 1 2 ... kX X X+ + +
With given arithmetic means
1 2, , ..., kμ μ μ
And standard deviations 2 2 21 2, , ..., kσ σ σ
Theorem 1: The mean value of the linear combination of random variables is the sum of the mean values of components:
1 1 2 2 1 1 2 2( ... ) ( ) ( ) ... ( )k k k kE a X a X a X a E X a E X a E X+ + + = + + +
1 1 1 2 ... k ka a aμ μ μ μ= + + +.
Theorem 2: The variance of the linear combination of random variables is the sum of the variances of components:
2 2 2 2 2 2 21 1 2 2 ... k ka a aσ σ σ σ= + + +