STATISTICAL DATA ANALYSIS

Prof. Janusz GajdaDept. of Instrumentation and

Measurement

Plan of the lecture

• Classification of the measuring signals according to their statistical properties.

• Definition of basic parameters and characteristics of the stochastic signals (expected value, variance, mean-square value, probability density function, power spectral density, joined probability density function, cross-correlation function, spectral density function, transfer function).

•Interpretation of the basic statistical characteristics. •Elements of statistic: estimation theory and hypothesis verification theory, parametric and non-parametric estimation, point and interval estimation.Good properties of the estimator:-unbiased,-effective,-zgodny,-robust.

•Application of the point estimation and interval estimation methods in determination of estimates of these parameters and characteristics. •Estimators of basic parameters and characteristics of random signals: mean value and variance, probability density function, autocorrelation and cross correlation function, power spectral density and mutual spectral density, coherency function,

transmittance. •Analysis of the statistical properties of those estimators. •Determination of the confidence intervals of basic statistical parameters for assumed confidence level. •Statistical hypothesis and their verification. •Errors of the first and second order observed during the verification process.

Classification of the measuring signals according to their statistical properties

Determ inistic signals

Periodic signals

Non-periodic signals

Mono-harmonic signals

Poli-harmonic signals

Alm ost periodic signals

Transient signals

Periodic signals

x t =A t + sin 0

Where:

A - signal amplitude,

0 02 f - angular frequency,

- initial phase angle,

Mono-harmonic signals:

Poly-harmonic signals:

nn t + nA=tx 0sin

Where:

nA - amplitude of the n-th harmonic component,

0 02 f - basic angular frequency,

n - initial phase angle of the n-th component,

Frequency spectrum of the periodic signals

x t = X f t+n n nn=

sin 21

knn ff k - is the measurable number

Classification of the measuring signals according to their statistical properties

Stochastic signals

Stationary signals

Non-stationary signals

Ergodicsignals

Non-ergodic signals

Different classesof non-stationary signals

-0.40-0.200.000.200.40

x 1(t2)

x2(t2)

x3(t2)

x 4(t2)

x5(t2)

x1(t1)

x1(t2)

x 3(t1)

x4(t1)

x5(t1)

Set of realizations of therandom quantity

Basic statistical characteristics Mean-square value:

=E x tT

x t dt2 2 2

Root-mean-square value

x =sk x 2

Expected value

=E x tT

x t dt lim

Variance:

xTxx dttxT

222 1lim

Probability function:

Pr limx<x t x+ x =TTT

Probability density function:

xxx+tx<x=xp x

Txxlim1limPrlim)(

Most popular distributions:

Standardised normal distribution:

Most popular distributions:Normal distribution:

-20 -10 0 10 20argum ent x

- 8 - 4 0 4 8 1 2a r g u m e n t x

0 . 4pr

- 6 - 4 - 2 0 2 4 6a r g u m e n t x

0 . 4pr

- 6 - 4 - 2 0 2 4 6a r g u m e n t x

0 . 4pr

Normal distribution – cumulative probability

- 5 - 4 - 3 - 2 - 1 0 1 2 3 4 5a r g u m e n t x

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

cum ulative probability P (xp)

Normal distribution - kwantyle

xPdpxx

Chi-square distribution ( )2

- number of freedom degrees

0 4 8 12 16 20argum ent

- 5 0 5 1 0 1 5 2 00

0 . 0 4

0 . 0 8

0 . 1 2

0 . 1 6

0 4 8 12 16 20argum ent

n=2, 3, 4 , ... 20

Chi-square distribution – cumulative distribution

t- Student distribution – probability density:

f – number of freedom degrees

- 6 - 4 - 2 0 2 4 6a r g u m e n t t

0 . 4pr

p(t) f=1

-10 -8 -6 -4 -2 0 2 4 6 8 10

argum ent t

f = 2, 3, 4 , ... 10

t - Student distribution – cumulative probability:

Auto-correlation function.

Tx dttxtxT

dttxtxT

0 1 2 3 4

argum ent [s]

0 0.2 0.4 0.6 0.8 1

argum ent [s]

0 0.4 0.8 1.2 1.6 2

argum ent [s]

Power spectral density:

G ff T

x t f f dtx f T

lim lim , ,

d eR=fS-

0 2 4 6 8 10frequency f [H z]

0.5 A12

0.5 A22

11162sin22sin

HzfAHzfAtx

Joined density of probability function:

0 0.2 0.4 0.6 0.8 1tim e [s ]

p x y =x y

xy, lim lim

P x y = x t x, y t y = p d d-

, Pr ,

Joined cumulative probability:

Cross – correlation:

Txy dtytxT

yxTxy dt-tytxT

0 1 2 3 4tim e [s]

1;03.112sin.1 randntHzty

- 2 - 1 0 1 2tim e [s]

Spectral density function:

f- jxyxy deR=jfS 2

0 4 8 12 16 20frequency [Hz]

tral d

ty Sxy(jf)

Transfer function: fSjfHjfS xxyxy

- 4 - 3 - 2 - 1 0 1 2 3 4 5 6R eal[H xy(jf)]

H xy( jf)

f=0 H zf=50 H z

222 nn

nxy fjfj

Statistical Base of Data Analysis

Mathematical statistics deals with gaining information from data. In practice, data often contain some randomness or uncertainty. Statistics handles such data using methods of probability theory.

Mathematical statistics tests the distribution of therandom quantities

In applying statistics to a scientific, industrial, or societal problem, one begins with a process or population to be studied.

This might be a population of people in a country, of crystal grains in a rock, or of goods manufactured by a particular factoryduring a given period.

It may instead be a process observed at various times; data collected about this kind of "population" constitute what is called a time series. For practical reasons, one usually studies a chosen subset of the population, called a sample. Data are collected about the sample in an observational or measurement experiment.

The data are then subjected to statistical analysis, which serves two related purposes: description and inference.

•Descriptive statistics can be used to summarize the data, either numerically or graphically, to describe the sample. Basic examples of numerical descriptors include the mean and standard deviation.Graphical summarizations include various kinds of charts and graphs. •Inferential statistics is used to model patterns in the data, accounting for randomness and drawing inferences about the larger population. These inferences may take the form of answers to yes/no questions (hypothesis testing), estimates of numerical characteristics (estimation), descriptions of association (correlation) or modelling of relationships (regression).

Mathematical statistic

estimation theory hypothesis tests

parametric estimation non-param etric estimation

point estimation interval estimation

A statistical hypothesis test, or more briefly, hypothesis test, is an algorithm to state the alternative (for or against the hypothesis)which minimizes certain risks.

The only conclusion, which may be draw-out from the test is that•There is not enough evidence to reject the hypothesis. •Hypothesis is false.

hypothesis tests

estimation theory

Estimation theory is a branch of statistics and that deals with estimating the values of parameters based on measured/empirical data. The parameters describe the physical object that answers a question posed by the estimator.

non-parametric estim ation

Nonparametric estimation is a statistical method that allows determination of the chosen characteristic,understood as a set of points in predefined coordinatessystem (without any functional description).

param etric estim ation

Parametric estimation is a statistical method that allows determination of the chosen parameters, describing the analysed signal or object.

point estimation

In statistics, point estimation involves the use of sample data to calculate a single value (known as a estimate) which is to serve as a "best guess" for an unknown (fixed or random) population parameter.

interval estim ation

In statistics, interval estimation is the use of sample datato calculate an interval of possible (or probable) values of an unknown population parameter.

Statistical Base of Data Analysis

A random variable is a function, which assigns unique numerical values to all possible outcomes of a random experiment under fixed conditions. A random variable is not a variable but rather a function that maps events to numbers.

A random event may appear or not as a result of experiment.

Random quantities:

A stochastic process, or sometimes random process, is the opposite of a deterministic process. Instead of dealing only with one possible 'reality' of how the process might evolve under time, in a stochastic or random process there is some indeterminacy in its future evolution described by probability distributions.

This means that even if the initial condition (or starting point) is known, there are many possibilities the process might go to, but some paths are more probable and others less

All elements belonging to the defined set are called the general population. For instance: all citizens of the defined country.

A sample population chosen sub-set of general population.

general population

sample population

Estimator properties – ideal estimator.

bias error

Unbiased estimators:

This means that the average of the estimates from an increasing number of experiments should converge to the true parameter values, assuming that the noise characteristics are constant during the experiments.

A more precise mathematical description would be:

An estimator is called „unbiased” if its expected value is equal to the true value.

0 400 800 1200 1600num ber of sam ples

estim ates

Asymptotically unbiased estimator:

Same estimators are biased, but in general expected value of an estimator should converge to the true value if the number of measurements increases to infinity.

Again this can be formulated more carefully:

An estimator is called „asymptotically unbiased” if

with N number of measurements.

0 4 8 12 16 20 24 28 32 36 40 44

num ber o f sam ples

0.00E+000

5.00E-005

1.00E-004

1.50E-004

2.00E-004

2.50E-004

3.00E-004

3.50E-004es

ortrue va lue

Efficient estimators.

E Ek i( - ) ( - )2 2

Estimator with smaller root-mean error is called more efficient.

Consistent estimator.

An estimator is called consistent if

lim Pr N

for each > 0

Robust estimator

An estimator is called a robust estimator if its properties are still valid when the assumptions made in its construction are no longer applicable.

STATISTICAL DATA ANALYSIS

Documents

Statistical Data Analysis 2017/18 - PP/Publiccowan/stat/stat_1.pdf · G. Cowan Statistical Data Analysis / Stat 1 1 Statistical Data Analysis 2017/18 London Postgraduate Lectures

Statistical Data Analysis: Lecture 3

Statistical Data Analysis: Lecture 10

Statistical Topological Data Analysis - A Kernel Perspectivepapers.nips.cc/paper/5887-statistical-topological-data-analysis-a... · Statistical Topological Data Analysis – A Kernel

Statistical Analysis & Survey Data

Internet Usage Statistical Data Analysis

MNGT6232 Data Analysis & Statistical Modelling for ... · Data Analysis & Statistical Modelling for Business ... Data Analysis & Statistical Modelling for Business2 ... Monitoring

Statistical analysis data. alcohol - accidents

Statistical Data Analysis Posters

Statistical Analysis Episode #1: Prior Data Analysis Logic of Statistical Inference

SPH 247 Statistical Analysis of Laboratory Data 1April 2, 2013SPH 247 Statistical Analysis of Laboratory Data

Statistical Analysis Of Data Final

Statistical Analysis of Microarray Data

Statistical analysis of expression data:

Statistical Data Analysis: Lecture 13

Statistical Analysis of Decay Data

Data Management and Statistical Analysis - Loading data

Metabolomic Data Processing & Statistical Analysis

Statistical Analysis of Windspeed Data

Data Analysis and Introductory Statistical Inference with ...calvinw/MthSc405/pdf/Handbook405up.pdf · Data Analysis and Introductory Statistical Inference with Statistical Formulae