STATISTICAL DATA ANALYSIS
Prof. Janusz GajdaDept. of Instrumentation and
Measurement
Plan of the lecture
• Classification of the measuring signals according to their statistical properties.
• Definition of basic parameters and characteristics of the stochastic signals (expected value, variance, mean-square value, probability density function, power spectral density, joined probability density function, cross-correlation function, spectral density function, transfer function).
•Interpretation of the basic statistical characteristics. •Elements of statistic: estimation theory and hypothesis verification theory, parametric and non-parametric estimation, point and interval estimation.Good properties of the estimator:-unbiased,-effective,-zgodny,-robust.
•Application of the point estimation and interval estimation methods in determination of estimates of these parameters and characteristics. •Estimators of basic parameters and characteristics of random signals: mean value and variance, probability density function, autocorrelation and cross correlation function, power spectral density and mutual spectral density, coherency function,
transmittance. •Analysis of the statistical properties of those estimators. •Determination of the confidence intervals of basic statistical parameters for assumed confidence level. •Statistical hypothesis and their verification. •Errors of the first and second order observed during the verification process.
Classification of the measuring signals according to their statistical properties
Determ inistic signals
Periodic signals
Non-periodic signals
Mono-harmonic signals
Poli-harmonic signals
Alm ost periodic signals
Transient signals
Periodic signals
x t =A t + sin 0
Where:
A - signal amplitude,
0 02 f - angular frequency,
- initial phase angle,
Mono-harmonic signals:
Poly-harmonic signals:
n
nn t + nA=tx 0sin
Where:
nA - amplitude of the n-th harmonic component,
0 02 f - basic angular frequency,
n - initial phase angle of the n-th component,
Frequency spectrum of the periodic signals
x t = X f t+n n nn=
sin 21
knn ff k - is the measurable number
Classification of the measuring signals according to their statistical properties
Stochastic signals
Stationary signals
Non-stationary signals
Ergodicsignals
Non-ergodic signals
Different classesof non-stationary signals
czas
-0.40-0.200.000.200.40
real
izac
ja 1
-0.40-0.200.000.200.40
real
izac
ja 4
-0.40-0.200.000.200.40
real
izac
ja 5
-0.40-0.200.000.200.40
rera
lizac
ja 2
-0.40-0.200.000.200.40
real
izac
ja 3
x 1(t2)
x2(t2)
x3(t2)
x 4(t2)
x5(t2)
x1(t1)
x1(t2)
x 3(t1)
x4(t1)
x5(t1)
Set of realizations of therandom quantity
Basic statistical characteristics Mean-square value:
x T
T
=E x tT
x t dt2 2 2
0
1
lim
Root-mean-square value
x =sk x 2
Expected value
x T
T
=E x tT
x t dt lim
1
0
Variance:
T
xTxx dttxT
tx=E0
222 1lim
Probability function:
Pr limx<x t x+ x =TTT
x
Probability density function:
TT
x=
xxx+tx<x=xp x
Txxlim1limPrlim)(
00
Most popular distributions:
Standardised normal distribution:
2
2
21
x
exp
Most popular distributions:Normal distribution:
2
2
2
21
x
exp
-20 -10 0 10 20argum ent x
0
0.1
0.2
0.3
0.4pr
obab
ility
den
sity
p(x
)
- 8 - 4 0 4 8 1 2a r g u m e n t x
0
0 . 1
0 . 2
0 . 3
0 . 4pr
obab
ility
den
sity
p(x
)
- 6 - 4 - 2 0 2 4 6a r g u m e n t x
0
0 . 1
0 . 2
0 . 3
0 . 4pr
obab
ility
den
sity
p(x
)
0.68
- 6 - 4 - 2 0 2 4 6a r g u m e n t x
0
0 . 1
0 . 2
0 . 3
0 . 4pr
obab
ility
den
sity
p(x
)
0.95
Normal distribution – cumulative probability
dpxPx
- 5 - 4 - 3 - 2 - 1 0 1 2 3 4 5a r g u m e n t x
0
0 . 2
0 . 4
0 . 6
0 . 8
1
cum
ulat
ive
prob
abili
ty P
(x)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
cum ulative probability P (xp)
-3
-2
-1
0
1
2
3
x p
Normal distribution - kwantyle
Px
xPdpxx
p
p
x
pp
Pr
Most popular distributions:
Chi-square distribution ( )2
nn
ep
;2
;2
1 221
122
- number of freedom degrees
0 4 8 12 16 20argum ent
0
0.2
0.4
0.6
0.8
1pr
obab
ility
den
sity
p(
)
n=1
n=2
n=3
n=10
- 5 0 5 1 0 1 5 2 00
0 . 0 4
0 . 0 8
0 . 1 2
0 . 1 6
0 . 2
prob
abili
ty d
ensi
ty
1
2
0 4 8 12 16 20argum ent
0
4
8
12
cum
ulat
ive
prob
abili
ty P
()
n=2, 3, 4 , ... 20
Chi-square distribution – cumulative distribution
Most popular distributions:
t- Student distribution – probability density:
1
21
21
21
121
f
ft
ff
ftp
f – number of freedom degrees
- 6 - 4 - 2 0 2 4 6a r g u m e n t t
0
0 . 1
0 . 2
0 . 3
0 . 4pr
obab
ility
den
sity
p(t) f=1
f=2
f=10
-10 -8 -6 -4 -2 0 2 4 6 8 10
argum ent t
0
0.2
0.4
0.6
0.8
1
cum
ulat
ive
prob
abili
ty P
(t)
f = 2, 3, 4 , ... 10
t - Student distribution – cumulative probability:
Auto-correlation function.
T
Tx dttxtxT
=K0
1lim
dttxtxT
=RT
xxTx
0
1lim
0 1 2 3 4
argum ent [s]
-0 .6
-0.4
-0.2
0
0.2
0.4
0.6
auto
-cor
rela
tion
Rx(
)
0 0.2 0.4 0.6 0.8 1
argum ent [s]
-0 .4
0
0.4
0.8
1.2
auto
-cor
rela
tion
Rx(
)
0 0.4 0.8 1.2 1.6 2
argum ent [s]
-0 .6
-0.3
0
0.3
0.6
0.9
1.2
1.5
1.8
auto
-cor
rela
tion
Rx(
)
Power spectral density:
G ff T
x t f f dtx f T
T
lim lim , ,
0
2
0
1 1
d eR=fS-
fjxx
2
0 2 4 6 8 10frequency f [H z]
0
1
2
3
4
5
pow
er s
pect
ral d
ensi
ty G
(f)
0.5 A12
0.5 A22
222
11162sin22sin
HzfAHzfAtx
Joined density of probability function:
0 0.2 0.4 0.6 0.8 1tim e [s ]
-6
-4
-2
0
2
4
6
ampl
itude
x(t)
- 2
0
2
4
6
ampl
itude
y(t)
0 0.2 0.4 0.6 0.8 1tim e [s ]
x
x+dx
y
y+dy
Txy
T
p x y =x y
TTx
yT
xy, lim lim
0
0
1
P x y = x t x, y t y = p d d-
y
-
x
, Pr ,
Joined cumulative probability:
-50
5
-5
0
50
0.05
0.1
0.15
-5
0
5
-5
0
5
Cross – correlation:
T
Txy dtytxT
=K0
1lim
T
yxTxy dt-tytxT
=R0
1lim
0 1 2 3 4tim e [s]
- 6
- 4
- 2
0
2
4
6
sign
al a
mpl
itude
1;03.112sin.1 randntHzty
- 2 - 1 0 1 2tim e [s]
-0.6
-0.4
-0.2
0
0.2
0.4
0.6cr
oss-
corr
elat
ion
Spectral density function:
-
f- jxyxy deR=jfS 2
0 4 8 12 16 20frequency [Hz]
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
spec
tral d
ensi
ty Sxy(jf)
Sx(f)
^
^
Transfer function: fSjfHjfS xxyxy
- 4 - 3 - 2 - 1 0 1 2 3 4 5 6R eal[H xy(jf)]
-10
-8
-6
-4
-2
0
Imag
[Hxy
(jf)]
H xy( jf)
f=0 H zf=50 H z
22
2
222 nn
nxy fjfj
kjfH
Statistical Base of Data Analysis
Mathematical statistics deals with gaining information from data. In practice, data often contain some randomness or uncertainty. Statistics handles such data using methods of probability theory.
Mathematical statistics tests the distribution of therandom quantities
In applying statistics to a scientific, industrial, or societal problem, one begins with a process or population to be studied.
This might be a population of people in a country, of crystal grains in a rock, or of goods manufactured by a particular factoryduring a given period.
It may instead be a process observed at various times; data collected about this kind of "population" constitute what is called a time series. For practical reasons, one usually studies a chosen subset of the population, called a sample. Data are collected about the sample in an observational or measurement experiment.
The data are then subjected to statistical analysis, which serves two related purposes: description and inference.
•Descriptive statistics can be used to summarize the data, either numerically or graphically, to describe the sample. Basic examples of numerical descriptors include the mean and standard deviation.Graphical summarizations include various kinds of charts and graphs. •Inferential statistics is used to model patterns in the data, accounting for randomness and drawing inferences about the larger population. These inferences may take the form of answers to yes/no questions (hypothesis testing), estimates of numerical characteristics (estimation), descriptions of association (correlation) or modelling of relationships (regression).
Mathematical statistic
estimation theory hypothesis tests
parametric estimation non-param etric estimation
point estimation interval estimation
A statistical hypothesis test, or more briefly, hypothesis test, is an algorithm to state the alternative (for or against the hypothesis)which minimizes certain risks.
The only conclusion, which may be draw-out from the test is that•There is not enough evidence to reject the hypothesis. •Hypothesis is false.
hypothesis tests
estimation theory
Estimation theory is a branch of statistics and that deals with estimating the values of parameters based on measured/empirical data. The parameters describe the physical object that answers a question posed by the estimator.
non-parametric estim ation
Nonparametric estimation is a statistical method that allows determination of the chosen characteristic,understood as a set of points in predefined coordinatessystem (without any functional description).
param etric estim ation
Parametric estimation is a statistical method that allows determination of the chosen parameters, describing the analysed signal or object.
point estimation
In statistics, point estimation involves the use of sample data to calculate a single value (known as a estimate) which is to serve as a "best guess" for an unknown (fixed or random) population parameter.
interval estim ation
In statistics, interval estimation is the use of sample datato calculate an interval of possible (or probable) values of an unknown population parameter.
Statistical Base of Data Analysis
A random variable is a function, which assigns unique numerical values to all possible outcomes of a random experiment under fixed conditions. A random variable is not a variable but rather a function that maps events to numbers.
A random event may appear or not as a result of experiment.
Random quantities:
A stochastic process, or sometimes random process, is the opposite of a deterministic process. Instead of dealing only with one possible 'reality' of how the process might evolve under time, in a stochastic or random process there is some indeterminacy in its future evolution described by probability distributions.
This means that even if the initial condition (or starting point) is known, there are many possibilities the process might go to, but some paths are more probable and others less
All elements belonging to the defined set are called the general population. For instance: all citizens of the defined country.
A sample population chosen sub-set of general population.
general population
sample population
Estimator properties – ideal estimator.
bias error
varia
nce
Unbiased estimators:
This means that the average of the estimates from an increasing number of experiments should converge to the true parameter values, assuming that the noise characteristics are constant during the experiments.
A more precise mathematical description would be:
An estimator is called „unbiased” if its expected value is equal to the true value.
E
0 400 800 1200 1600num ber of sam ples
0.0
20.0
40.0
60.0
80.0
100.0
estim ates
Asymptotically unbiased estimator:
Same estimators are biased, but in general expected value of an estimator should converge to the true value if the number of measurements increases to infinity.
Again this can be formulated more carefully:
An estimator is called „asymptotically unbiased” if
with N number of measurements.
NEN
ˆlim
0 4 8 12 16 20 24 28 32 36 40 44
num ber o f sam ples
0.00E+000
5.00E-005
1.00E-004
1.50E-004
2.00E-004
2.50E-004
3.00E-004
3.50E-004es
timat
ortrue va lue
Efficient estimators.
E Ek i( - ) ( - )2 2
Estimator with smaller root-mean error is called more efficient.
222
222
ˆˆ
ˆˆ
+bEE
+E==E
Consistent estimator.
An estimator is called consistent if
lim Pr N
=
0
for each > 0
Robust estimator
An estimator is called a robust estimator if its properties are still valid when the assumptions made in its construction are no longer applicable.