65
STATISTICAL DATA ANALYSIS Prof. Janusz Gajda Dept. of Instrumentation and Measurement

STATISTICAL DATA ANALYSIS

  • Upload
    keon

  • View
    50

  • Download
    0

Embed Size (px)

DESCRIPTION

STATISTICAL DATA ANALYSIS. Prof. Janusz Gajda Dept. of Instrumentation and Measurement. Plan of the lecture. Classification of the measuring signals according to their statistical properties. - PowerPoint PPT Presentation

Citation preview

Page 1: STATISTICAL DATA ANALYSIS

STATISTICAL DATA ANALYSIS

Prof. Janusz GajdaDept. of Instrumentation and

Measurement

Page 2: STATISTICAL DATA ANALYSIS

Plan of the lecture

• Classification of the measuring signals according to their statistical properties.

• Definition of basic parameters and characteristics of the stochastic signals (expected value, variance, mean-square value, probability density function, power spectral density, joined probability density function, cross-correlation function, spectral density function, transfer function).

Page 3: STATISTICAL DATA ANALYSIS

•Interpretation of the basic statistical characteristics. •Elements of statistic: estimation theory and hypothesis verification theory, parametric and non-parametric estimation, point and interval estimation.Good properties of the estimator:-unbiased,-effective,-zgodny,-robust.

Page 4: STATISTICAL DATA ANALYSIS

•Application of the point estimation and interval estimation methods in determination of estimates of these parameters and characteristics. •Estimators of basic parameters and characteristics of random signals: mean value and variance, probability density function, autocorrelation and cross correlation function, power spectral density and mutual spectral density, coherency function,

transmittance. •Analysis of the statistical properties of those estimators. •Determination of the confidence intervals of basic statistical parameters for assumed confidence level. •Statistical hypothesis and their verification. •Errors of the first and second order observed during the verification process.

Page 5: STATISTICAL DATA ANALYSIS

Classification of the measuring signals according to their statistical properties

Determ inistic signals

Periodic signals

Non-periodic signals

Mono-harmonic signals

Poli-harmonic signals

Alm ost periodic signals

Transient signals

Page 6: STATISTICAL DATA ANALYSIS

Periodic signals

x t =A t + sin 0

Where:

A - signal amplitude,

0 02 f - angular frequency,

- initial phase angle,

Mono-harmonic signals:

Page 7: STATISTICAL DATA ANALYSIS

Poly-harmonic signals:

n

nn t + nA=tx 0sin

Where:

nA - amplitude of the n-th harmonic component,

0 02 f - basic angular frequency,

n - initial phase angle of the n-th component,

Page 8: STATISTICAL DATA ANALYSIS

Frequency spectrum of the periodic signals

x t = X f t+n n nn=

sin 21

knn ff k - is the measurable number

Page 9: STATISTICAL DATA ANALYSIS

Classification of the measuring signals according to their statistical properties

Stochastic signals

Stationary signals

Non-stationary signals

Ergodicsignals

Non-ergodic signals

Different classesof non-stationary signals

Page 10: STATISTICAL DATA ANALYSIS

czas

-0.40-0.200.000.200.40

real

izac

ja 1

-0.40-0.200.000.200.40

real

izac

ja 4

-0.40-0.200.000.200.40

real

izac

ja 5

-0.40-0.200.000.200.40

rera

lizac

ja 2

-0.40-0.200.000.200.40

real

izac

ja 3

x 1(t2)

x2(t2)

x3(t2)

x 4(t2)

x5(t2)

x1(t1)

x1(t2)

x 3(t1)

x4(t1)

x5(t1)

Set of realizations of therandom quantity

Page 11: STATISTICAL DATA ANALYSIS

Basic statistical characteristics Mean-square value:

x T

T

=E x tT

x t dt2 2 2

0

1

lim

Root-mean-square value

x =sk x 2

Page 12: STATISTICAL DATA ANALYSIS

Expected value

x T

T

=E x tT

x t dt lim

1

0

Variance:

T

xTxx dttxT

tx=E0

222 1lim

Page 13: STATISTICAL DATA ANALYSIS

Probability function:

Pr limx<x t x+ x =TTT

x

Probability density function:

TT

x=

xxx+tx<x=xp x

Txxlim1limPrlim)(

00

Page 14: STATISTICAL DATA ANALYSIS

Most popular distributions:

Standardised normal distribution:

2

2

21

x

exp

Page 15: STATISTICAL DATA ANALYSIS

Most popular distributions:Normal distribution:

2

2

2

21

x

exp

Page 16: STATISTICAL DATA ANALYSIS

-20 -10 0 10 20argum ent x

0

0.1

0.2

0.3

0.4pr

obab

ility

den

sity

p(x

)

Page 17: STATISTICAL DATA ANALYSIS

- 8 - 4 0 4 8 1 2a r g u m e n t x

0

0 . 1

0 . 2

0 . 3

0 . 4pr

obab

ility

den

sity

p(x

)

Page 18: STATISTICAL DATA ANALYSIS

- 6 - 4 - 2 0 2 4 6a r g u m e n t x

0

0 . 1

0 . 2

0 . 3

0 . 4pr

obab

ility

den

sity

p(x

)

0.68

Page 19: STATISTICAL DATA ANALYSIS

- 6 - 4 - 2 0 2 4 6a r g u m e n t x

0

0 . 1

0 . 2

0 . 3

0 . 4pr

obab

ility

den

sity

p(x

)

0.95

Page 20: STATISTICAL DATA ANALYSIS

Normal distribution – cumulative probability

dpxPx

- 5 - 4 - 3 - 2 - 1 0 1 2 3 4 5a r g u m e n t x

0

0 . 2

0 . 4

0 . 6

0 . 8

1

cum

ulat

ive

prob

abili

ty P

(x)

Page 21: STATISTICAL DATA ANALYSIS

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

cum ulative probability P (xp)

-3

-2

-1

0

1

2

3

x p

Normal distribution - kwantyle

Px

xPdpxx

p

p

x

pp

Pr

Page 22: STATISTICAL DATA ANALYSIS

Most popular distributions:

Chi-square distribution ( )2

nn

ep

;2

;2

1 221

122

- number of freedom degrees

Page 23: STATISTICAL DATA ANALYSIS

0 4 8 12 16 20argum ent

0

0.2

0.4

0.6

0.8

1pr

obab

ility

den

sity

p(

)

n=1

n=2

n=3

n=10

Page 24: STATISTICAL DATA ANALYSIS

- 5 0 5 1 0 1 5 2 00

0 . 0 4

0 . 0 8

0 . 1 2

0 . 1 6

0 . 2

prob

abili

ty d

ensi

ty

1

2

Page 25: STATISTICAL DATA ANALYSIS

0 4 8 12 16 20argum ent

0

4

8

12

cum

ulat

ive

prob

abili

ty P

()

n=2, 3, 4 , ... 20

Chi-square distribution – cumulative distribution

Page 26: STATISTICAL DATA ANALYSIS

Most popular distributions:

t- Student distribution – probability density:

1

21

21

21

121

f

ft

ff

ftp

f – number of freedom degrees

Page 27: STATISTICAL DATA ANALYSIS

- 6 - 4 - 2 0 2 4 6a r g u m e n t t

0

0 . 1

0 . 2

0 . 3

0 . 4pr

obab

ility

den

sity

p(t) f=1

f=2

f=10

Page 28: STATISTICAL DATA ANALYSIS

-10 -8 -6 -4 -2 0 2 4 6 8 10

argum ent t

0

0.2

0.4

0.6

0.8

1

cum

ulat

ive

prob

abili

ty P

(t)

f = 2, 3, 4 , ... 10

t - Student distribution – cumulative probability:

Page 29: STATISTICAL DATA ANALYSIS

Auto-correlation function.

T

Tx dttxtxT

=K0

1lim

dttxtxT

=RT

xxTx

0

1lim

Page 30: STATISTICAL DATA ANALYSIS

0 1 2 3 4

argum ent [s]

-0 .6

-0.4

-0.2

0

0.2

0.4

0.6

auto

-cor

rela

tion

Rx(

)

Page 31: STATISTICAL DATA ANALYSIS

0 0.2 0.4 0.6 0.8 1

argum ent [s]

-0 .4

0

0.4

0.8

1.2

auto

-cor

rela

tion

Rx(

)

Page 32: STATISTICAL DATA ANALYSIS

0 0.4 0.8 1.2 1.6 2

argum ent [s]

-0 .6

-0.3

0

0.3

0.6

0.9

1.2

1.5

1.8

auto

-cor

rela

tion

Rx(

)

Page 33: STATISTICAL DATA ANALYSIS

Power spectral density:

G ff T

x t f f dtx f T

T

lim lim , ,

0

2

0

1 1

d eR=fS-

fjxx

2

Page 34: STATISTICAL DATA ANALYSIS

0 2 4 6 8 10frequency f [H z]

0

1

2

3

4

5

pow

er s

pect

ral d

ensi

ty G

(f)

0.5 A12

0.5 A22

222

11162sin22sin

HzfAHzfAtx

Page 35: STATISTICAL DATA ANALYSIS

Joined density of probability function:

0 0.2 0.4 0.6 0.8 1tim e [s ]

-6

-4

-2

0

2

4

6

ampl

itude

x(t)

- 2

0

2

4

6

ampl

itude

y(t)

0 0.2 0.4 0.6 0.8 1tim e [s ]

x

x+dx

y

y+dy

Txy

T

Page 36: STATISTICAL DATA ANALYSIS

p x y =x y

TTx

yT

xy, lim lim

0

0

1

P x y = x t x, y t y = p d d-

y

-

x

, Pr ,

Joined cumulative probability:

Page 37: STATISTICAL DATA ANALYSIS

-50

5

-5

0

50

0.05

0.1

0.15

Page 38: STATISTICAL DATA ANALYSIS

-5

0

5

-5

0

5

Page 39: STATISTICAL DATA ANALYSIS

Cross – correlation:

T

Txy dtytxT

=K0

1lim

T

yxTxy dt-tytxT

=R0

1lim

Page 40: STATISTICAL DATA ANALYSIS

0 1 2 3 4tim e [s]

- 6

- 4

- 2

0

2

4

6

sign

al a

mpl

itude

1;03.112sin.1 randntHzty

Page 41: STATISTICAL DATA ANALYSIS

- 2 - 1 0 1 2tim e [s]

-0.6

-0.4

-0.2

0

0.2

0.4

0.6cr

oss-

corr

elat

ion

Page 42: STATISTICAL DATA ANALYSIS

Spectral density function:

-

f- jxyxy deR=jfS 2

0 4 8 12 16 20frequency [Hz]

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

spec

tral d

ensi

ty Sxy(jf)

Sx(f)

^

^

Page 43: STATISTICAL DATA ANALYSIS

Transfer function: fSjfHjfS xxyxy

- 4 - 3 - 2 - 1 0 1 2 3 4 5 6R eal[H xy(jf)]

-10

-8

-6

-4

-2

0

Imag

[Hxy

(jf)]

H xy( jf)

f=0 H zf=50 H z

22

2

222 nn

nxy fjfj

kjfH

Page 44: STATISTICAL DATA ANALYSIS

Statistical Base of Data Analysis

Mathematical statistics deals with gaining information from data. In practice, data often contain some randomness or uncertainty. Statistics handles such data using methods of probability theory.

Mathematical statistics tests the distribution of therandom quantities

Page 45: STATISTICAL DATA ANALYSIS

In applying statistics to a scientific, industrial, or societal problem, one begins with a process or population to be studied.

This might be a population of people in a country, of crystal grains in a rock, or of goods manufactured by a particular factoryduring a given period.

It may instead be a process observed at various times; data collected about this kind of "population" constitute what is called a time series. For practical reasons, one usually studies a chosen subset of the population, called a sample. Data are collected about the sample in an observational or measurement experiment.

The data are then subjected to statistical analysis, which serves two related purposes: description and inference.

Page 46: STATISTICAL DATA ANALYSIS

•Descriptive statistics can be used to summarize the data, either numerically or graphically, to describe the sample. Basic examples of numerical descriptors include the mean and standard deviation.Graphical summarizations include various kinds of charts and graphs. •Inferential statistics is used to model patterns in the data, accounting for randomness and drawing inferences about the larger population. These inferences may take the form of answers to yes/no questions (hypothesis testing), estimates of numerical characteristics (estimation), descriptions of association (correlation) or modelling of relationships (regression).

Page 47: STATISTICAL DATA ANALYSIS

Mathematical statistic

estimation theory hypothesis tests

parametric estimation non-param etric estimation

point estimation interval estimation

Page 48: STATISTICAL DATA ANALYSIS

A statistical hypothesis test, or more briefly, hypothesis test, is an algorithm to state the alternative (for or against the hypothesis)which minimizes certain risks.

The only conclusion, which may be draw-out from the test is that•There is not enough evidence to reject the hypothesis. •Hypothesis is false.

hypothesis tests

Page 49: STATISTICAL DATA ANALYSIS

estimation theory

Estimation theory is a branch of statistics and that deals with estimating the values of parameters based on measured/empirical data. The parameters describe the physical object that answers a question posed by the estimator.

Page 50: STATISTICAL DATA ANALYSIS

non-parametric estim ation

Nonparametric estimation is a statistical method that allows determination of the chosen characteristic,understood as a set of points in predefined coordinatessystem (without any functional description).

Page 51: STATISTICAL DATA ANALYSIS

param etric estim ation

Parametric estimation is a statistical method that allows determination of the chosen parameters, describing the analysed signal or object.

Page 52: STATISTICAL DATA ANALYSIS

point estimation

In statistics, point estimation involves the use of sample data to calculate a single value (known as a estimate) which is to serve as a "best guess" for an unknown (fixed or random) population parameter.

Page 53: STATISTICAL DATA ANALYSIS

interval estim ation

In statistics, interval estimation is the use of sample datato calculate an interval of possible (or probable) values of an unknown population parameter.

Page 54: STATISTICAL DATA ANALYSIS

Statistical Base of Data Analysis

A random variable is a function, which assigns unique numerical values to all possible outcomes of a random experiment under fixed conditions. A random variable is not a variable but rather a function that maps events to numbers.

A random event may appear or not as a result of experiment.

Random quantities:

Page 55: STATISTICAL DATA ANALYSIS

A stochastic process, or sometimes random process, is the opposite of a deterministic process. Instead of dealing only with one possible 'reality' of how the process might evolve under time, in a stochastic or random process there is some indeterminacy in its future evolution described by probability distributions.

This means that even if the initial condition (or starting point) is known, there are many possibilities the process might go to, but some paths are more probable and others less

Page 56: STATISTICAL DATA ANALYSIS

All elements belonging to the defined set are called the general population. For instance: all citizens of the defined country.

A sample population chosen sub-set of general population.

general population

sample population

Page 57: STATISTICAL DATA ANALYSIS

Estimator properties – ideal estimator.

Page 58: STATISTICAL DATA ANALYSIS

bias error

varia

nce

Page 59: STATISTICAL DATA ANALYSIS

Unbiased estimators:

This means that the average of the estimates from an increasing number of experiments should converge to the true parameter values, assuming that the noise characteristics are constant during the experiments.

A more precise mathematical description would be:

An estimator is called „unbiased” if its expected value is equal to the true value.

E

Page 60: STATISTICAL DATA ANALYSIS

0 400 800 1200 1600num ber of sam ples

0.0

20.0

40.0

60.0

80.0

100.0

estim ates

Page 61: STATISTICAL DATA ANALYSIS

Asymptotically unbiased estimator:

Same estimators are biased, but in general expected value of an estimator should converge to the true value if the number of measurements increases to infinity.

Again this can be formulated more carefully:

An estimator is called „asymptotically unbiased” if

with N number of measurements.

NEN

ˆlim

Page 62: STATISTICAL DATA ANALYSIS

0 4 8 12 16 20 24 28 32 36 40 44

num ber o f sam ples

0.00E+000

5.00E-005

1.00E-004

1.50E-004

2.00E-004

2.50E-004

3.00E-004

3.50E-004es

timat

ortrue va lue

Page 63: STATISTICAL DATA ANALYSIS

Efficient estimators.

E Ek i( - ) ( - )2 2

Estimator with smaller root-mean error is called more efficient.

222

222

ˆˆ

ˆˆ

+bEE

+E==E

Page 64: STATISTICAL DATA ANALYSIS

Consistent estimator.

An estimator is called consistent if

lim Pr N

=

0

for each > 0

Page 65: STATISTICAL DATA ANALYSIS

Robust estimator

An estimator is called a robust estimator if its properties are still valid when the assumptions made in its construction are no longer applicable.