Download ppt - STATISTICAL DATA ANALYSIS

STATISTICAL DATA ANALYSIS

Prof. Janusz GajdaDept. of Instrumentation and

Measurement

Plan of the lecture

• Classification of the measuring signals according to their statistical properties.

• Definition of basic parameters and characteristics of the stochastic signals (expected value, variance, mean-square value, probability density function, power spectral density, joined probability density function, cross-correlation function, spectral density function, transfer function).

•Interpretation of the basic statistical characteristics. •Elements of statistic: estimation theory and hypothesis verification theory, parametric and non-parametric estimation, point and interval estimation.Good properties of the estimator:-unbiased,-effective,-zgodny,-robust.

•Application of the point estimation and interval estimation methods in determination of estimates of these parameters and characteristics. •Estimators of basic parameters and characteristics of random signals: mean value and variance, probability density function, autocorrelation and cross correlation function, power spectral density and mutual spectral density, coherency function,

transmittance. •Analysis of the statistical properties of those estimators. •Determination of the confidence intervals of basic statistical parameters for assumed confidence level. •Statistical hypothesis and their verification. •Errors of the first and second order observed during the verification process.

Classification of the measuring signals according to their statistical properties

Determ inistic signals

Periodic signals

Non-periodic signals

Mono-harmonic signals

Poli-harmonic signals

Alm ost periodic signals

Transient signals

Periodic signals

x t =A t + sin 0

Where:

A - signal amplitude,

0 02 f - angular frequency,

- initial phase angle,

Mono-harmonic signals:

Poly-harmonic signals:

n

nn t + nA=tx 0sin

Where:

nA - amplitude of the n-th harmonic component,

0 02 f - basic angular frequency,

n - initial phase angle of the n-th component,

Frequency spectrum of the periodic signals

x t = X f t+n n nn=

sin 21

knn ff k - is the measurable number

Classification of the measuring signals according to their statistical properties

Stochastic signals

Stationary signals

Non-stationary signals

Ergodicsignals

Non-ergodic signals

Different classesof non-stationary signals

czas

-0.40-0.200.000.200.40

real

izac

ja 1

-0.40-0.200.000.200.40

real

izac

ja 4

-0.40-0.200.000.200.40

real

izac

ja 5

-0.40-0.200.000.200.40

rera

lizac

ja 2

-0.40-0.200.000.200.40

real

izac

ja 3

x 1(t2)

x2(t2)

x3(t2)

x 4(t2)

x5(t2)

x1(t1)

x1(t2)

x 3(t1)

x4(t1)

x5(t1)

Set of realizations of therandom quantity

Basic statistical characteristics Mean-square value:

x T

T

=E x tT

x t dt2 2 2

0

1

lim

Root-mean-square value

x =sk x 2

Expected value

x T

T

=E x tT

x t dt lim

1

0

Variance:

T

xTxx dttxT

tx=E0

222 1lim

Probability function:

Pr limx<x t x+ x =TTT

x

Probability density function:

TT

x=

xxx+tx<x=xp x

Txxlim1limPrlim)(

00

Most popular distributions:

Standardised normal distribution:

2

2

21

x

exp

Most popular distributions:Normal distribution:

2

2

2

21

x

exp

-20 -10 0 10 20argum ent x

0

0.1

0.2

0.3

0.4pr

obab

ility

den

sity

p(x

)

- 8 - 4 0 4 8 1 2a r g u m e n t x

0

0 . 1

0 . 2

0 . 3

0 . 4pr

obab

ility

den

sity

p(x

)

- 6 - 4 - 2 0 2 4 6a r g u m e n t x

0

0 . 1

0 . 2

0 . 3

0 . 4pr

obab

ility

den

sity

p(x

)

0.68

- 6 - 4 - 2 0 2 4 6a r g u m e n t x

0

0 . 1

0 . 2

0 . 3

0 . 4pr

obab

ility

den

sity

p(x

)

0.95

Normal distribution – cumulative probability

dpxPx

- 5 - 4 - 3 - 2 - 1 0 1 2 3 4 5a r g u m e n t x

0

0 . 2

0 . 4

0 . 6

0 . 8

1

cum

ulat

ive

prob

abili

ty P

(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

cum ulative probability P (xp)

-3

-2

-1

0

1

2

3

x p

Normal distribution - kwantyle

Px

xPdpxx

p

p

x

pp

Pr


Chi-square distribution ( )2

nn

ep

;2

;2

1 221

122

- number of freedom degrees

0 4 8 12 16 20argum ent

0

0.2

0.4

0.6

0.8

1pr

obab

ility

den

sity

p(

)

n=1

n=2

n=3

n=10

- 5 0 5 1 0 1 5 2 00

0 . 0 4

0 . 0 8

0 . 1 2

0 . 1 6

0 . 2

prob

abili

ty d

ensi

ty

1

2

0 4 8 12 16 20argum ent

0

4

8

12

cum

ulat

ive

prob

abili

ty P

()

n=2, 3, 4 , ... 20

Chi-square distribution – cumulative distribution


t- Student distribution – probability density:

1

21

21

21

121

f

ft

ff

ftp

f – number of freedom degrees

- 6 - 4 - 2 0 2 4 6a r g u m e n t t

0

0 . 1

0 . 2

0 . 3

0 . 4pr

obab

ility

den

sity

p(t) f=1

f=2

f=10

-10 -8 -6 -4 -2 0 2 4 6 8 10

argum ent t

0

0.2

0.4

0.6

0.8

1

cum

ulat

ive

prob

abili

ty P

(t)

f = 2, 3, 4 , ... 10

t - Student distribution – cumulative probability:

Auto-correlation function.

T

Tx dttxtxT

=K0

1lim

dttxtxT

=RT

xxTx

0

1lim

0 1 2 3 4

argum ent [s]

-0 .6

-0.4

-0.2

0

0.2

0.4

0.6

auto

-cor

rela

tion

Rx(

)

0 0.2 0.4 0.6 0.8 1

argum ent [s]

-0 .4

0

0.4

0.8

1.2

auto

-cor

rela

tion

Rx(

)

0 0.4 0.8 1.2 1.6 2

argum ent [s]

-0 .6

-0.3

0

0.3

0.6

0.9

1.2

1.5

1.8

auto

-cor

rela

tion

Rx(

)

Power spectral density:

G ff T

x t f f dtx f T

T

lim lim , ,

0

2

0

1 1

d eR=fS-

fjxx

2

0 2 4 6 8 10frequency f [H z]

0

1

2

3

4

5

pow

er s

pect

ral d

ensi

ty G

(f)

0.5 A12

0.5 A22

222

11162sin22sin

HzfAHzfAtx

Joined density of probability function:

0 0.2 0.4 0.6 0.8 1tim e [s ]

-6

-4

-2

0

2

4

6

ampl

itude

x(t)

- 2

0

2

4

6

ampl

itude

y(t)

0 0.2 0.4 0.6 0.8 1tim e [s ]

x

x+dx

y

y+dy

Txy

T

p x y =x y

TTx

yT

xy, lim lim

0

0

1

P x y = x t x, y t y = p d d-

y

-

x

, Pr ,

Joined cumulative probability:

-50

5

-5

0

50

0.05

0.1

0.15

-5

0

5

-5

0

5

Cross – correlation:

T

Txy dtytxT

=K0

1lim

T

yxTxy dt-tytxT

=R0

1lim

0 1 2 3 4tim e [s]

- 6

- 4

- 2

0

2

4

6

sign

al a

mpl

itude

1;03.112sin.1 randntHzty

- 2 - 1 0 1 2tim e [s]

-0.6

-0.4

-0.2

0

0.2

0.4

0.6cr

oss-

corr

elat

ion

Spectral density function:

-

f- jxyxy deR=jfS 2

0 4 8 12 16 20frequency [Hz]

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

spec

tral d

ensi

ty Sxy(jf)

Sx(f)

^

^

Transfer function: fSjfHjfS xxyxy

- 4 - 3 - 2 - 1 0 1 2 3 4 5 6R eal[H xy(jf)]

-10

-8

-6

-4

-2

0

Imag

[Hxy

(jf)]

H xy( jf)

f=0 H zf=50 H z

22

2

222 nn

nxy fjfj

kjfH

Statistical Base of Data Analysis

Mathematical statistics deals with gaining information from data. In practice, data often contain some randomness or uncertainty. Statistics handles such data using methods of probability theory.

Mathematical statistics tests the distribution of therandom quantities

In applying statistics to a scientific, industrial, or societal problem, one begins with a process or population to be studied.

This might be a population of people in a country, of crystal grains in a rock, or of goods manufactured by a particular factoryduring a given period.

It may instead be a process observed at various times; data collected about this kind of "population" constitute what is called a time series. For practical reasons, one usually studies a chosen subset of the population, called a sample. Data are collected about the sample in an observational or measurement experiment.

The data are then subjected to statistical analysis, which serves two related purposes: description and inference.

•Descriptive statistics can be used to summarize the data, either numerically or graphically, to describe the sample. Basic examples of numerical descriptors include the mean and standard deviation.Graphical summarizations include various kinds of charts and graphs. •Inferential statistics is used to model patterns in the data, accounting for randomness and drawing inferences about the larger population. These inferences may take the form of answers to yes/no questions (hypothesis testing), estimates of numerical characteristics (estimation), descriptions of association (correlation) or modelling of relationships (regression).

Mathematical statistic

estimation theory hypothesis tests

parametric estimation non-param etric estimation

point estimation interval estimation

A statistical hypothesis test, or more briefly, hypothesis test, is an algorithm to state the alternative (for or against the hypothesis)which minimizes certain risks.

The only conclusion, which may be draw-out from the test is that•There is not enough evidence to reject the hypothesis. •Hypothesis is false.

hypothesis tests

estimation theory

Estimation theory is a branch of statistics and that deals with estimating the values of parameters based on measured/empirical data. The parameters describe the physical object that answers a question posed by the estimator.

non-parametric estim ation

Nonparametric estimation is a statistical method that allows determination of the chosen characteristic,understood as a set of points in predefined coordinatessystem (without any functional description).

param etric estim ation

Parametric estimation is a statistical method that allows determination of the chosen parameters, describing the analysed signal or object.

point estimation

In statistics, point estimation involves the use of sample data to calculate a single value (known as a estimate) which is to serve as a "best guess" for an unknown (fixed or random) population parameter.

interval estim ation

In statistics, interval estimation is the use of sample datato calculate an interval of possible (or probable) values of an unknown population parameter.

Statistical Base of Data Analysis

A random variable is a function, which assigns unique numerical values to all possible outcomes of a random experiment under fixed conditions. A random variable is not a variable but rather a function that maps events to numbers.

A random event may appear or not as a result of experiment.

Random quantities:

A stochastic process, or sometimes random process, is the opposite of a deterministic process. Instead of dealing only with one possible 'reality' of how the process might evolve under time, in a stochastic or random process there is some indeterminacy in its future evolution described by probability distributions.

This means that even if the initial condition (or starting point) is known, there are many possibilities the process might go to, but some paths are more probable and others less

All elements belonging to the defined set are called the general population. For instance: all citizens of the defined country.

A sample population chosen sub-set of general population.

general population

sample population

Estimator properties – ideal estimator.

bias error

varia

nce

Unbiased estimators:

This means that the average of the estimates from an increasing number of experiments should converge to the true parameter values, assuming that the noise characteristics are constant during the experiments.

A more precise mathematical description would be:

An estimator is called „unbiased” if its expected value is equal to the true value.

E

0 400 800 1200 1600num ber of sam ples

0.0

20.0

40.0

60.0

80.0

100.0

estim ates

Asymptotically unbiased estimator:

Same estimators are biased, but in general expected value of an estimator should converge to the true value if the number of measurements increases to infinity.

Again this can be formulated more carefully:

An estimator is called „asymptotically unbiased” if

with N number of measurements.

NEN

ˆlim

0 4 8 12 16 20 24 28 32 36 40 44

num ber o f sam ples

0.00E+000

5.00E-005

1.00E-004

1.50E-004

2.00E-004

2.50E-004

3.00E-004

3.50E-004es

timat

ortrue va lue

Efficient estimators.

E Ek i( - ) ( - )2 2

Estimator with smaller root-mean error is called more efficient.

222

222

ˆˆ

ˆˆ

+bEE

+E==E

Consistent estimator.

An estimator is called consistent if

lim Pr N

=

0

for each > 0

Robust estimator

An estimator is called a robust estimator if its properties are still valid when the assumptions made in its construction are no longer applicable.