NPTEL - Arun K. Tangirala · 2017. 8. 4. · Arun K. Tangirala Applied Time-Series Analysis 14. Probability & Statistics - Review 1. Probability density functions. When the density

Probability & Statistics - Review 1

CH5350: Applied Time-Series Analysis

Arun K. Tangirala

Department of Chemical Engineering, IIT Madras

Probability & Statistics: Univariate case

Arun K. Tangirala Applied Time-Series Analysis 1


Recap: General process



General process: Simplified



Trend-plus-random processes

(Discrete-time, scalar-valued, trend-plus-random, lumped-cause)Arun K. Tangirala Applied Time-Series Analysis 4


Random process

(Discrete-time, scalar-valued, lumped-cause)

+

(Discrete-time, scalar-valued, endogenously driven)



Bivariate random process

(Discrete-time, bivariate, scalar-valued, externally plus endogenously driven)



Framework

1. Univariate / bivariate

2. Linear random process

3. Stationary and non-stationarities (of certain types)

4. Discrete-time

5. Time- and frequency-domain analysis

The cornerstone of theory of random processes is the concept of a random

variable and the associated probability theory.



Notation- Random variable: UPPERCASE e.g., X; Outcomes: lowercase e.g., x.

- Probability distribution and density functions: F (x) and f(x), respectively.

- Scalars: lowercase x, ✓, etc.

- Vectors: lowercase bold faced e.g., x, v, ✓, etc.

- Matrices: Uppercase bold faced A, X.

- Expectation operator: E(.)

- Discrete-time random signal and process: v[k] (or {v[k]}) (scalar-valued)- White-noise: e[k]

- Backward / forward shift-operator: q�1 and q s.t. q�1v[k] = v[k � 1].

- Angular and cyclic frequencies: ! and f , respectively.

- . . .Arun K. Tangirala Applied Time-Series Analysis 8


Random Variable

DefinitionA random variable (RV) is one whose value set contains at least two elements, i.e., it

draws one value from at least two possibilities. The space of possible values is known as

the outcome space or sample space.

Examples: Toss of a coin, roll of a dice, outcome of a game, atmospheric temperature.



Formal definition

Outcomes of random phenomena can be either qualitative and/or quantitative. In order

to have a unified mathematical treatment, RVs are defined to be quantitative.

Definition (Priestley (1981))

A random variable X is a mapping from the sample space S onto the real line s.t. to

each element s ⇢ S there corresponds a unique real number.

I In the study of RVs, the time (or space) dimension does not come into picture.

Instead they are analysed only in the outcome space.



Two broad classes of RVs

I When the set of possibilities contains a single element, the randomness vanishes to

give rise to a deterministic variable.

I Two classes of random variables exist:

Discrete-valued RV: discrete set of possibilities (e.g., roll of a dice)

Continuous-valued RV: continuous-valued RV (e.g., ambient temperature)

Focus of this course: Continuous-valued random variables (with occasional digression

to discrete-valued RVs).



Do random variables actually exist?

The tag of randomness is given to any variable or a signal which is not accurately pre-

dictable, i.e., the outcome of the associated event is not predictable with zero error.

In reality, there is no reason to believe that the true process behaves in a “random”

manner. It is merely that since we are unable to predict its course, i.e., due to lack of

su�cient understanding or knowledge that any process becomes random.

Randomness is, therefore, not a characteristic of a process, but is rather a

reflection of our (lack of) knowledge and understanding of that process



Probability Distribution

The natural recourse to dealing with uncertainties is to list all possible outcomes and

assign a chance to each of those outcomes

Examples:

I Rainfall in a region: ⌦ = {0, 1}, P = {0.3, 0.7}I Face value from the roll of a die: ⌦ = {1, 2, · · · , 6}, P (!) = {1/6} 8! 2 ⌦

The specification of the outcomes and the associated probabilities through what is known

as probability distribution completely characterizes the random variable.



Probability Distribution FunctionsProbability distribution functionAlso known as the cumulative distribution function,

F (x) = Pr(X x)

I Probability distribution functions can be either continuous or piecewise-continuous

(step-like) depending on whether the RV is continuous- or discrete-valued,

respectively.

I They are known either a priori (through physics or postulates) or by means of

experiments



Probability density functionsWhen the density function exist, i.e., for continuous-valued RVs,

1. The density function is such that the area under the curve gives the probability,

Pr(a < x < b) =

Zb

a

f(x) dx =)Z 1

�1f(x) dx = 1 (1)

2. The density function is the derivative (w.r.t. x) of the distribution function

f(x) =

dF (x)

dx

(2)

I For discrete-valued RVs, a probability mass function (p.m.f.) is usedArun K. Tangirala Applied Time-Series Analysis 15


Examples: c.d.f. and p.d.f.

-4 -2 0 2 4

0.00.2

0.40.6

0.81.0

Gaussian dist.

x

F(x)

-4 -2 0 2 4

0.00.1

0.20.3

0.4

Gaussian density

x

f(x)

0 10 20 30 40

0.00.2

0.40.6

0.81.0

Chisquare (df=10) dist.

x

F(x)

0 10 20 30 40

0.00

0.02

0.04

0.06

0.08

0.10

Chi-square (df=10) density

x

f(x)

0 2 4 6 8 10

0.00.2

0.40.6

0.81.0

Binomial dist. (n=10,p=0.5)

x

F(x)

0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

0.25

Binomial mass (n=10, p=0.5)

x

p(X=x)

The type of distribution for a

random phenomenon

depends on its nature.



Density Functions

1. Gaussian density function:f(x) =

1

�

p2⇡

exp

✓�1

2

(x� µ)

2

�

2

◆

2. Uniform density function:f(x) =

1

b� a

, a x b

3. Chi-square density: f

n

(x) =

1

2

n/2�(n/2)

x

n/2�1e

�x/2



Commands in REvery distribution that R handles has four functions for probability, quantile, density and

random variable (value), and has the same root name, but prefixed by p, q, d and r

respectively

Few relevant functions:

Commands Distribution

rnorm, pnorm, qnorm, dnorm Gaussian

rt, pt, qt, dt Student’s-t

rchisq, pchisq, qchisq, dchisq Chi-square

runif, punif, qunif, dunif Uniform distribution

rbinom, pbinom, qbinom, dbinom Binomial



Sample usage

1 x <� rnorm (1000 , mean=20, sd=5)

2 h i s t ( x , p r o b a b i l i t y=TRUE)

3 xseq <� seq (min ( x ) , max( x ) , l eng th=200 , c o l= ’ g r ey ’ )

4 l i n e s ( xseq , dnorm ( xseq , mean=20, sd=5) , c o l=’ b l u e ’ , lwd=2)

Histogram of x

x

Density

5 10 15 20 25 30 35

0.00

0.02

0.04

0.06

0.08



Practical AspectsThe p.d.f. of a RV allows us to compute the probability of X taking on values in an

infinitesimal interval, i.e., Pr(x X x+ dx) ⇡ f(x)dx

Note: Just as the way the density encountered in mechanics cannot be interpreted as mass of

the body at a point, the probability density should never be interpreted as the probability at a

point. In fact, for continuous-valued RVs, Pr(X = x) = 0

In practice, knowing the p.d.f. theoretically is seldom possible. One has to conduct

experiments and then try to fit a known p.d.f. that best explains the behaviour of the

RV.Arun K. Tangirala Applied Time-Series Analysis 20


Practical Aspects: Moments of a p.d.f.

I It may not be necessary to know the p.d.f. in practice!

I What is of interest in practice is (i) the most likely value and/or the expected

outcome (mean) and (ii) how far the outcomes are spread (variance)

The useful statistical properties, namely, mean and variance are, in fact, the first and

second-order (central) moments of the p.d.f. f(x) (similar to the moments of inertia).

The n

th moment of a p.d.f. is defined as

M

n

(X) =

Z 1

�1x

n

f(x) dx (3)



Linear random process and moments

It turns out that for linear processes, predictions of random signals and estimation of

model parameters it is su�cient to have the knowledge of mean, variance and

covariance (to be introduced shortly), i.e., it is su�cient to know the first and

second-order moments of p.d.f.



First Moment of a p.d.f.: Mean

The mean is defined as the first moment of the p.d.f. (analogous to the center of mass).

It is also the expected value (outcome) of the RV.

MeanThe mean of a RV, also the expectation of the RV, is defined as

E(X) = µ

X

=

Z 1

�1xf(x) dx (4)



RemarksI The integration in (4) is across the outcome space and NOT across any time

space.

I Applying the expectation operator E to a random variable produces its

“average” or expected value.

I Prediction perspective:

The mean is the best prediction of the random variable in the min-

imum mean square error sense, i.e.,

µ = min

c

E(X � ˆ

X)

2 s.t. ˆ

X = c

where ˆ

X denotes the prediction of X.



Expectation Operator

I For any constant, E(c) = c.

I The expectation of a function of X is given by

E(g(X)) =

Z 1

�1g(x)f(x) dx (5)

I It is a linear operator:

E

kX

i=1

c

i

g

i

(X)

!=

kX

i=1

c

i

E(g

i

(X)) (6)



Examples: Computing expectations

Example

Problem: Find the expectation of a random variable y[k] = sin(!k + �) where � is

uniformly distributed in [�⇡, ⇡].

Solution: E(y[k]) = E(sin(!k + �)) =

1

2⇡

Z⇡

�⇡

sin(!k + �) d�

=

1

2⇡

(� cos(!k + �)|⇡�⇡

)

=

1

2⇡

(cos(!k � ⇡)� cos(!k + ⇡)) = 0



Variance / Variability

An important statistic useful in decision making, error analysis of parameter estimation,

input design and several other prime stages of data analysis is the variance.

VarianceThe variance of a random variable, denoted by �

2X

is the average spread of outcomes

around its mean,

�

2X

= E((X � µ

X

)

2) =

Z 1

�1(x� µ

X

)

2f(x) dx (7)



Points to note

I As (7) suggests, �2X

is the second central moment of f(x). Further,

�

2X

= E(X

2)� µ

2X

(8)

I The variance definition is in the space of outcomes. It should not be confused

with the widely used variance definition for a series or a signal (sample

variance).

I Large variance indicates far spread of outcomes around its statistical center.

Naturally, in the limit as �2X

! 0, X becomes a deterministic variable.



Mean and Variance of scaled RVs

I Adding a constant to a RV simply shifts its mean by the same amount. The

variance remains unchanged.

E(X + c) = µ

X

+ c, var(X + c) = var(X) = �

2X

(9)

I A�ne transformation:

Y = ↵X + �, ↵ 2 R =)µ

Y

= ↵µ

X

+ � (10)

�

2Y

= ↵

2�

2X

(11)

I Properties of non-linearly transformed RV depend on the non-linearity involved



Properties of Normally distributed variablesThe normal distribution is one of the most widely assumed and studied distribution for

two important reasons:

I It is completely characterized by the mean and variance

I Central LImit Theorem

I If x1, x2, · · · , xn

are uncorrelated normal variables, then

y = a1x1 + a2x2 + · · ·+ a

n

x

n

is also a normally distributed variable with mean and

variance

µ

y

= a1µ1 + a2µ2 + · · ·+ a

n

µ

n

�

2y

= a

21�

21 + a

22�

22 + · · ·+ a

2n

�

2n



Central Limit Theorem

Let X1, X2, · · · , Xm

be a sequence of independent identically distributed random

variables each having finite mean µ and finite variance �

2. Let

Y

N

=

NX

i=1

X

i

, N = 1, 2, · · ·

Then, as N ! 1, the distribution of

Y

N

�Nµ

�

pN

! N (0, 1)

One of the popular applications of the CLT is in deriving the distribution of sample mean.


Documents

NPTEL - Arun K. Tangirala · 2017. 8. 4. · Arun K. Tangirala Applied Time-Series Analysis 14. Probability & Statistics - Review 1. Probability density functions. When the density