Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Probability & Statistics - Review 1
CH5350: Applied Time-Series Analysis
Arun K. Tangirala
Department of Chemical Engineering, IIT Madras
Probability & Statistics: Univariate case
Arun K. Tangirala Applied Time-Series Analysis 1
Probability & Statistics - Review 1
Recap: General process
Arun K. Tangirala Applied Time-Series Analysis 2
Probability & Statistics - Review 1
General process: Simplified
Arun K. Tangirala Applied Time-Series Analysis 3
Probability & Statistics - Review 1
Trend-plus-random processes
(Discrete-time, scalar-valued, trend-plus-random, lumped-cause)Arun K. Tangirala Applied Time-Series Analysis 4
Probability & Statistics - Review 1
Random process
(Discrete-time, scalar-valued, lumped-cause)
+
(Discrete-time, scalar-valued, endogenously driven)
Arun K. Tangirala Applied Time-Series Analysis 5
Probability & Statistics - Review 1
Bivariate random process
(Discrete-time, bivariate, scalar-valued, externally plus endogenously driven)
Arun K. Tangirala Applied Time-Series Analysis 6
Probability & Statistics - Review 1
Framework
1. Univariate / bivariate
2. Linear random process
3. Stationary and non-stationarities (of certain types)
4. Discrete-time
5. Time- and frequency-domain analysis
The cornerstone of theory of random processes is the concept of a random
variable and the associated probability theory.
Arun K. Tangirala Applied Time-Series Analysis 7
Probability & Statistics - Review 1
Notation- Random variable: UPPERCASE e.g., X; Outcomes: lowercase e.g., x.
- Probability distribution and density functions: F (x) and f(x), respectively.
- Scalars: lowercase x, ✓, etc.
- Vectors: lowercase bold faced e.g., x, v, ✓, etc.
- Matrices: Uppercase bold faced A, X.
- Expectation operator: E(.)
- Discrete-time random signal and process: v[k] (or {v[k]}) (scalar-valued)- White-noise: e[k]
- Backward / forward shift-operator: q�1 and q s.t. q�1v[k] = v[k � 1].
- Angular and cyclic frequencies: ! and f , respectively.
- . . .Arun K. Tangirala Applied Time-Series Analysis 8
Probability & Statistics - Review 1
Random Variable
DefinitionA random variable (RV) is one whose value set contains at least two elements, i.e., it
draws one value from at least two possibilities. The space of possible values is known as
the outcome space or sample space.
Examples: Toss of a coin, roll of a dice, outcome of a game, atmospheric temperature.
Arun K. Tangirala Applied Time-Series Analysis 9
Probability & Statistics - Review 1
Formal definition
Outcomes of random phenomena can be either qualitative and/or quantitative. In order
to have a unified mathematical treatment, RVs are defined to be quantitative.
Definition (Priestley (1981))
A random variable X is a mapping from the sample space S onto the real line s.t. to
each element s ⇢ S there corresponds a unique real number.
I In the study of RVs, the time (or space) dimension does not come into picture.
Instead they are analysed only in the outcome space.
Arun K. Tangirala Applied Time-Series Analysis 10
Probability & Statistics - Review 1
Two broad classes of RVs
I When the set of possibilities contains a single element, the randomness vanishes to
give rise to a deterministic variable.
I Two classes of random variables exist:
Discrete-valued RV: discrete set of possibilities (e.g., roll of a dice)
Continuous-valued RV: continuous-valued RV (e.g., ambient temperature)
Focus of this course: Continuous-valued random variables (with occasional digression
to discrete-valued RVs).
Arun K. Tangirala Applied Time-Series Analysis 11
Probability & Statistics - Review 1
Do random variables actually exist?
The tag of randomness is given to any variable or a signal which is not accurately pre-
dictable, i.e., the outcome of the associated event is not predictable with zero error.
In reality, there is no reason to believe that the true process behaves in a “random”
manner. It is merely that since we are unable to predict its course, i.e., due to lack of
su�cient understanding or knowledge that any process becomes random.
Randomness is, therefore, not a characteristic of a process, but is rather a
reflection of our (lack of) knowledge and understanding of that process
Arun K. Tangirala Applied Time-Series Analysis 12
Probability & Statistics - Review 1
Probability Distribution
The natural recourse to dealing with uncertainties is to list all possible outcomes and
assign a chance to each of those outcomes
Examples:
I Rainfall in a region: ⌦ = {0, 1}, P = {0.3, 0.7}I Face value from the roll of a die: ⌦ = {1, 2, · · · , 6}, P (!) = {1/6} 8! 2 ⌦
The specification of the outcomes and the associated probabilities through what is known
as probability distribution completely characterizes the random variable.
Arun K. Tangirala Applied Time-Series Analysis 13
Probability & Statistics - Review 1
Probability Distribution FunctionsProbability distribution functionAlso known as the cumulative distribution function,
F (x) = Pr(X x)
I Probability distribution functions can be either continuous or piecewise-continuous
(step-like) depending on whether the RV is continuous- or discrete-valued,
respectively.
I They are known either a priori (through physics or postulates) or by means of
experiments
Arun K. Tangirala Applied Time-Series Analysis 14
Probability & Statistics - Review 1
Probability density functionsWhen the density function exist, i.e., for continuous-valued RVs,
1. The density function is such that the area under the curve gives the probability,
Pr(a < x < b) =
Zb
a
f(x) dx =)Z 1
�1f(x) dx = 1 (1)
2. The density function is the derivative (w.r.t. x) of the distribution function
f(x) =
dF (x)
dx
(2)
I For discrete-valued RVs, a probability mass function (p.m.f.) is usedArun K. Tangirala Applied Time-Series Analysis 15
Probability & Statistics - Review 1
Examples: c.d.f. and p.d.f.
-4 -2 0 2 4
0.00.2
0.40.6
0.81.0
Gaussian dist.
x
F(x)
-4 -2 0 2 4
0.00.1
0.20.3
0.4
Gaussian density
x
f(x)
0 10 20 30 40
0.00.2
0.40.6
0.81.0
Chisquare (df=10) dist.
x
F(x)
0 10 20 30 40
0.00
0.02
0.04
0.06
0.08
0.10
Chi-square (df=10) density
x
f(x)
0 2 4 6 8 10
0.00.2
0.40.6
0.81.0
Binomial dist. (n=10,p=0.5)
x
F(x)
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
0.25
Binomial mass (n=10, p=0.5)
x
p(X=x)
The type of distribution for a
random phenomenon
depends on its nature.
Arun K. Tangirala Applied Time-Series Analysis 16
Probability & Statistics - Review 1
Density Functions
1. Gaussian density function:f(x) =
1
�
p2⇡
exp
✓�1
2
(x� µ)
2
�
2
◆
2. Uniform density function:f(x) =
1
b� a
, a x b
3. Chi-square density: f
n
(x) =
1
2
n/2�(n/2)
x
n/2�1e
�x/2
Arun K. Tangirala Applied Time-Series Analysis 17
Probability & Statistics - Review 1
Commands in REvery distribution that R handles has four functions for probability, quantile, density and
random variable (value), and has the same root name, but prefixed by p, q, d and r
respectively
Few relevant functions:
Commands Distribution
rnorm, pnorm, qnorm, dnorm Gaussian
rt, pt, qt, dt Student’s-t
rchisq, pchisq, qchisq, dchisq Chi-square
runif, punif, qunif, dunif Uniform distribution
rbinom, pbinom, qbinom, dbinom Binomial
Arun K. Tangirala Applied Time-Series Analysis 18
Probability & Statistics - Review 1
Sample usage
1 x <� rnorm (1000 , mean=20, sd=5)
2 h i s t ( x , p r o b a b i l i t y=TRUE)
3 xseq <� seq (min ( x ) , max( x ) , l eng th=200 , c o l= ’ g r ey ’ )
4 l i n e s ( xseq , dnorm ( xseq , mean=20, sd=5) , c o l=’ b l u e ’ , lwd=2)
Histogram of x
x
Density
5 10 15 20 25 30 35
0.00
0.02
0.04
0.06
0.08
Arun K. Tangirala Applied Time-Series Analysis 19
Probability & Statistics - Review 1
Practical AspectsThe p.d.f. of a RV allows us to compute the probability of X taking on values in an
infinitesimal interval, i.e., Pr(x X x+ dx) ⇡ f(x)dx
Note: Just as the way the density encountered in mechanics cannot be interpreted as mass of
the body at a point, the probability density should never be interpreted as the probability at a
point. In fact, for continuous-valued RVs, Pr(X = x) = 0
In practice, knowing the p.d.f. theoretically is seldom possible. One has to conduct
experiments and then try to fit a known p.d.f. that best explains the behaviour of the
RV.Arun K. Tangirala Applied Time-Series Analysis 20
Probability & Statistics - Review 1
Practical Aspects: Moments of a p.d.f.
I It may not be necessary to know the p.d.f. in practice!
I What is of interest in practice is (i) the most likely value and/or the expected
outcome (mean) and (ii) how far the outcomes are spread (variance)
The useful statistical properties, namely, mean and variance are, in fact, the first and
second-order (central) moments of the p.d.f. f(x) (similar to the moments of inertia).
The n
th moment of a p.d.f. is defined as
M
n
(X) =
Z 1
�1x
n
f(x) dx (3)
Arun K. Tangirala Applied Time-Series Analysis 21
Probability & Statistics - Review 1
Linear random process and moments
It turns out that for linear processes, predictions of random signals and estimation of
model parameters it is su�cient to have the knowledge of mean, variance and
covariance (to be introduced shortly), i.e., it is su�cient to know the first and
second-order moments of p.d.f.
Arun K. Tangirala Applied Time-Series Analysis 22
Probability & Statistics - Review 1
First Moment of a p.d.f.: Mean
The mean is defined as the first moment of the p.d.f. (analogous to the center of mass).
It is also the expected value (outcome) of the RV.
MeanThe mean of a RV, also the expectation of the RV, is defined as
E(X) = µ
X
=
Z 1
�1xf(x) dx (4)
Arun K. Tangirala Applied Time-Series Analysis 23
Probability & Statistics - Review 1
RemarksI The integration in (4) is across the outcome space and NOT across any time
space.
I Applying the expectation operator E to a random variable produces its
“average” or expected value.
I Prediction perspective:
The mean is the best prediction of the random variable in the min-
imum mean square error sense, i.e.,
µ = min
c
E(X � ˆ
X)
2 s.t. ˆ
X = c
where ˆ
X denotes the prediction of X.
Arun K. Tangirala Applied Time-Series Analysis 24
Probability & Statistics - Review 1
Expectation Operator
I For any constant, E(c) = c.
I The expectation of a function of X is given by
E(g(X)) =
Z 1
�1g(x)f(x) dx (5)
I It is a linear operator:
E
kX
i=1
c
i
g
i
(X)
!=
kX
i=1
c
i
E(g
i
(X)) (6)
Arun K. Tangirala Applied Time-Series Analysis 25
Probability & Statistics - Review 1
Examples: Computing expectations
Example
Problem: Find the expectation of a random variable y[k] = sin(!k + �) where � is
uniformly distributed in [�⇡, ⇡].
Solution: E(y[k]) = E(sin(!k + �)) =
1
2⇡
Z⇡
�⇡
sin(!k + �) d�
=
1
2⇡
(� cos(!k + �)|⇡�⇡
)
=
1
2⇡
(cos(!k � ⇡)� cos(!k + ⇡)) = 0
Arun K. Tangirala Applied Time-Series Analysis 26
Probability & Statistics - Review 1
Variance / Variability
An important statistic useful in decision making, error analysis of parameter estimation,
input design and several other prime stages of data analysis is the variance.
VarianceThe variance of a random variable, denoted by �
2X
is the average spread of outcomes
around its mean,
�
2X
= E((X � µ
X
)
2) =
Z 1
�1(x� µ
X
)
2f(x) dx (7)
Arun K. Tangirala Applied Time-Series Analysis 27
Probability & Statistics - Review 1
Points to note
I As (7) suggests, �2X
is the second central moment of f(x). Further,
�
2X
= E(X
2)� µ
2X
(8)
I The variance definition is in the space of outcomes. It should not be confused
with the widely used variance definition for a series or a signal (sample
variance).
I Large variance indicates far spread of outcomes around its statistical center.
Naturally, in the limit as �2X
! 0, X becomes a deterministic variable.
Arun K. Tangirala Applied Time-Series Analysis 28
Probability & Statistics - Review 1
Mean and Variance of scaled RVs
I Adding a constant to a RV simply shifts its mean by the same amount. The
variance remains unchanged.
E(X + c) = µ
X
+ c, var(X + c) = var(X) = �
2X
(9)
I A�ne transformation:
Y = ↵X + �, ↵ 2 R =)µ
Y
= ↵µ
X
+ � (10)
�
2Y
= ↵
2�
2X
(11)
I Properties of non-linearly transformed RV depend on the non-linearity involved
Arun K. Tangirala Applied Time-Series Analysis 29
Probability & Statistics - Review 1
Properties of Normally distributed variablesThe normal distribution is one of the most widely assumed and studied distribution for
two important reasons:
I It is completely characterized by the mean and variance
I Central LImit Theorem
I If x1, x2, · · · , xn
are uncorrelated normal variables, then
y = a1x1 + a2x2 + · · ·+ a
n
x
n
is also a normally distributed variable with mean and
variance
µ
y
= a1µ1 + a2µ2 + · · ·+ a
n
µ
n
�
2y
= a
21�
21 + a
22�
22 + · · ·+ a
2n
�
2n
Arun K. Tangirala Applied Time-Series Analysis 30
Probability & Statistics - Review 1
Central Limit Theorem
Let X1, X2, · · · , Xm
be a sequence of independent identically distributed random
variables each having finite mean µ and finite variance �
2. Let
Y
N
=
NX
i=1
X
i
, N = 1, 2, · · ·
Then, as N ! 1, the distribution of
Y
N
�Nµ
�
pN
! N (0, 1)
One of the popular applications of the CLT is in deriving the distribution of sample mean.
Arun K. Tangirala Applied Time-Series Analysis 31