IIMC Long Duration Executive Education Executive Programme in Business Management Statistics for Managerial Decisions Distributions & Modeling Data Prof

IIMC Long Duration Executive EducationExecutive Programme in Business Management

Statistics for Managerial Decisions

Distributions & Modeling Data

Prof. Saibal Chattopadhyay

IIM Calcutta

A Brief Review

Uncertainty and Randomness: Theory of Probability• Random Experiments, Events, Sample Space, Mutually

Exclusive and Exhaustive Events, Set-theoretic operations with events (Union, Intersection, Difference, Complement), Classical Definition, Total probability Theorem, Bayes’ Theorem, Independence of two or more events.

• Random Variables & Probability Distributions, Mean and Variance

Decision Making Under Uncertainty: Utility Theory• Decisions Based on Expected Utility• Choice of Utility Function U(w): Risk-averse {U(w) = w},

Risk-seeker {U(w) = w2} and Risk-neutral {U(w) = w} • Preference Reversal in Decision Making

Some Probability Distributions

Two types of Random Variables:

(A) Discrete Random Variables

• X is discrete if it takes a few values (mass-points) x1, x2, …, xn, … with corresponding probabilities p1, p2, …, pn, … .

• probability law specifying the probabilities for different values, is called the probability mass function (pmf)

f(x) = P(X = x), for x = x1, x2, …, xn,…

Necessary Conditions for a function to be a pmf• f(x) 0, and f(x) = 1, sum taken over all values of xExample: The random variable X takes 10 values

1, 2, …, 10; the probability for X to take any value is proportional to the square of the value.

Thus f(x) = C.x2 where C is the constant of proportionality.

From condition (i), C 0, and from (ii)1 = f(1) + f(2) + …+ f(10) = C{12 + 22 +…+102}

= C. 10.11.21/6, so C = 1/385. f(x) = x2 / 385, for x = 1, 2, …, 10.

Probability Distribution of a Random Variable

Table giving the different values of the random variable and the corresponding probabilities:

Characteristics of importance: Mean & SDMean = = Sum(value*probability)SD = = SQRT_Sum1 – (Mean)2

Sum1 = Sum(value2*probability)Use: Help in assessing the shape of the distribution and

the coverage probability (Chebyshev’s Inequality)

X=x x1 x2 … xn Total

P(X=x) p1 p2 … pn 1

Some Special Discrete Distributions

1 Binomial Distribution:

Applicable for the following types of experimentations (called Binomial/ Bernoullian trials):

(a) Only two outcomes, called Success (S) and Failure (F) for each trial;

(b) P(S) = p and P(F) = 1 – P(S) = 1 – p = q, same for all trials;

(c) Trials are probabilistically independent.

When is such a set-up applicable in real life?

Condition (a) generally holds: Call Success as any event for some experiment and Failure as the complement of the event

Condition (b) holds in most situations unless the definition of ‘Success’ changes mid-stream, or the initial conditions vary from one trial to another

Condition (c) holds for repetitive trials

Calculation of probabilities for random events under such a set-up is easy !

Binomial Distribution

Consider n Binomial trials with P(Success) = p, and P(Failure) = q = 1 – p.

Define X = Number of successes in ‘n’ trials

X is discrete random variable with values 0,1,…,n.

The probability law (p.m.f) of X is

f(x) = P(X=x) = nCx px qn-x, for x = 0,1,…,n

Mean = n.p

SD = n.p.q

2. Poisson Distribution

X is Poisson if the probability law isf(x) = P(X = x) = e-m.mx/x! ,for x = 0,1, 2,…,• m = mean = (SD)2 • Distribution is positively skewed (longer tails

towards the higher values)• Used to model count data for rare events• Approximates binomial distribution when n (the

number of trials) is ‘large’, p (the probability of success in a trial) is ‘small’ but n.p (the average number of successes) is finite, equal to m

Continuous Probability Distributions

Random Variable X is continuous in (a, b) if it can take any value in (a, b).

Probability Law for X?How many values? Uncountable !!Can’t assign probabilities to individual values of

the variable!How to proceed? Use a continuous function f(x)

over (a, b) to describe the probability lawP(cXd) = Area under the function f(x) between

x=c and x=d

Continuous Distribution

The function f(x) is called the probability density function (pdf) of the continuous random variable

Necessary conditions:1. f(x) 0, for all x in (a, b)2. Total area under f(x) in (a, b) = 1

( Definite Integral ab f(x)dx = 1) Continuous Distribution: Probability is SAME as

area under a curveNaturally, P(X = any particular value) = 0, but P(X taking values in any interval) > 0.

An ExampleConsider random variable X over an interval(1, 10)

such that the pdf f(x) is a constant over the interval, i.e.,

f(x) = C for 1 x 10,= 0, otherwise.

Since total area = 1, C = 1/(10-1) = 1/9Thus f(x) = 1/9 for 1 x 10, and

= 0 elsewhere Rectangular / Uniform DistributionDensity is uniform over the entire range of the

variableNot true in general for any distribution!

Some Continuous Distributions

1. Normal Distribution• Most widely used distribution in Statistical

Literature• Unimodal, bell-shaped probability curve• Ranges over the entire real line (-, )• Distribution is characterized by its mean and

SD (- < < , 0 < < )• Distribution is perfectly symmetrical about its

mean• Mean = Median = Mode =

Normal Distribution Continued

Standard Normal Distribution (Z)

Mean and SD are two standard values: Mean = = 0, and SD = = 1.

Result: If X is Normal with mean and SD , then the standardized variable

Z = (X – Mean) / SD = (X - ) / is Standard Normal.

Probability Table for Standard Normal Distribution is available, and this can be used to calculate normal probabilities

Approximating a discrete probability

distribution by Normal Distribution

• Normal Approximation of Binomial

If X is Binomial with parameters (n, p), then the binomial probability P(a X b), where ‘a’ and ‘b’ are integers, can be well approximated by a normal area P(a – ½ X b + ½ ) where X follows a normal distribution with mean = np and SD= npq.

Approximation works well unless the binomial distribution is too skewed (p very close to 0 or 1)

2. Exponential Distribution

• Another continuous distribution which varies over the positive part of real line (0, )

• Not symmetric, in fact the density curve is positively skewed (longer tail is towards the higher values of the variable)

• Used to model the life of complex electronic equipments

• Has a “loss of memory” property: future life is independent of the current age of the product

• Widely used is Reliability Analysis

Reproductive Property of Distributions

Many distributions retain the same form when two or more identical but independent distributions are combined

1. Binomial: X1 is Binomial (n1, p) and X2 is Binomial (n2, p) X1 + X2 is also Binomial (n1 + n2, p)

Note: Result not true if success probability p is different

Reproductive Property2. Poisson: X1 is Poisson with mean m1, X2 is

Poisson with mean m2 X1 + X2 is Poisson with mean m1 + m2

3. Normal: X1 is Normal (1, 1) and X2 is Normal (2, 2) X1 + X2 is also Normal (Mean = 1 + 2, SD = {12 + 22} )

Notes: a) For discrete distributions, the property does not hold for the difference X1 – X2

b) For normal distribution, the property holds for the difference as well: X1 – X2 is also normal (Mean = 1 - 2, SD = {12 + 22} )

Joint Distribution of Two Random Variables

Two random variables X and Y are studied together for examining their possible interdependence

Consider “both discrete” case:

X has k values x1, x2, …, xk

Y has l values y1, y2, …, yl

Joint probability law: P(X = xi, Y = yj) = Pij,

i = 1, 2, …, k; j = 1, 2, …, l.

An Example of a Joint Distribution

X and Y are (random) percentage returns from two stocks in BSE;

X could take one of the values 5%, 10% or 20%Y takes one of the values 10% or 20%. From past data, the joint probabilities are

estimated asP(X=5, Y=10) = 0.10; P(X=5, Y=20) = 0.25;P(X=10, Y=10) = 0.08; P(X=10, Y=20) = 0.22;P(X=20, Y=10) = 0.30; P(X=20, Y=20) = 0.05;

Joint Distribution Table

The joint distribution of X and Y can be shown in the following table:

X \ Y 10 20 Row Total (X)

5 0.10 0.25 0.35

10 0.08 0.22 0.30

20 0.30 0.05 0.35

Column Total (Y)

0.48 0.52 1.00

Some Concepts in a Joint Distribution

a) Marginal Probability Distributions – Mean and SD

These are obtained as the Row and Column totals:

of X along the rows: Pio, and of Y along the columns:Poj

distribution of only one variable when variation in the other variable is ignored

Marginal Distributions of X and Y

Of X: P(X=5) = 0.35, P(X=10) = 0.30 and

P(X=20) = 0.35. Mean & SD of X: As usualMean of X =Average % return of stock X = x = 5.

(0.35) + 10. (0.30) + 20.(0.35)= 11.75SD of the % return for stock X = SD of X = x =

{SUM-SQ – (Mean)-Sq} = [{25.(0.35) + 100.(0.30) + 400.(0.35)} – (11.75)2 ] = 40.6875 = 6.38

Marginal Distribution of Y

P(Y=10) = 0.48; P(Y=20) = 0.52

Mean of Y = Average % return of stock Y = y = 10.(0.48) + 20.(0.52) = 15.2,

SD of the % return for stock Y = SD of Y = y = (256 – 231.04) = 5.00

Is that all?

What about their possible interdependence?

Independence of the Random Variables

Recall: For two events A and B, they are independent if P(AB) = P(A).P(B)

X and Y will be independent random variables if similar things hold:

(X=xi) and (Y=yj) must be independent events for all choices of xi and yj, that is

P{(X=xi)(Y=yj)} = P(X=xi).P(Y=yj)

Every Cell prob. = Row total . Column total

Independence of two random variables

For this example, P(X=5, Y=10) = 0.10 while

P(X=5) = 0.35, P(Y=10) = 0.48 so that

P(X=5).P(Y=10)=(0.35).(0.48) =0.168 0.10 X and Y are not independent

How to examine the extent of dependence?

• Correlation Approach: Examine if X and Y are related, either exactly or at least approximately, in a linear form

Correlation Coefficient (Pearson)

The Covariance between X and Y is

xy = Sum(xiyjPij) – (Mean of X)(Mean of Y)The SD’s of X and Y are, as before,

x = SQRT{Sum(xi2Pio) – (Mean of X)2 }

y = SQRT{Sum(yj2Poj) – (Mean of Y)2 }

Correlation Coefficient = = xy / x y

For our example, xy = 162 – 11.75*15.2 =

-16.6; x = 6.38; y = 5.00, so that = -0.52

How does it help?

Result: For any joint distribution, -1 1. Interpretation: Sign of tells us how one variable

behaves with variation in the other:• if both behave in the same direction (both

increases or both decreases) that is a case of positive correlation; will be positive here; 0 < < 1.

• if they behave in opposite directions (one increases as the other decreases or conversely) that is a case of negative correlation; will be negative here; -1 < < 0.

Interpretation of Correlation Coefficient

• Case when = 0: Here the two variables are called uncorrelated. This means that there is no linear relationship between the two variables.

• Case when = 1: This is the case of perfect positive correlation in the sense that for all pairs of values of (X, Y), Y = a + bX with certainty, i.e., P(Y = a + bX) =1, with b>0.

• Case when = -1: This is the case of perfect negative correlation in the sense that for all pairs of values of (X, Y), Y = a - bX with certainty, i.e., P(Y=a - bX) =1, with b>0.

Example Revisited

In our example, = - 0.52.

So there is a high negative correlation between the two variables X and Y.

Since X and Y indicate the % return from the two stocks, this means that if one stock performs well (giving high return), the other stock is likely to under-perform, giving returns below its expected return.

Limitation of

Examines only Linear Relationship between X and Y; if true relationship is non-linear or if there is no relationship, fails to capture

Thus = 0 does not mean that X and Y are independent random variables !!

A serious drawback of How to capture other relationships, if any?

Regression Approach

Regression Equation

Emphasis is to examine how one variable explains the variation in the other variable

Y = Study variable

X = auxiliary variable

To develop an equation that explains Y when X is known

Regression Equation of Y on X

Regression Equation

Different types:

• Y = + X (linear regression)

• Y = + X + X2 (quadratic regression)

• log Y = + X (logarithmic regression)

, , etc. are equation parameters, usually unknown

Need to estimate them from data

How to estimate the parameters?

Least Squares Principle

Data: n pairs of values on (X, Y): (x1,y1), (x2, y2), … , (xn, yn)

Consider Linear Regression: Y = + X

For X =Xi,

• Observed value of Y = Yi, and

• Predicted value of Y = Value of Y obtained from the model = + Xi , i=1,2,…, n.

Least Squares Principle

Error is ei = Yi – ( + Xi ), i = 1,2,…, nMinimize sum of squares of the errors: = ei2 = (Yi – ( + Xi )2 w.r.t. and Equations for solving and :

Yi = n + Xi XiYi = Xi + Xi2

Two equations in two unknowns and Solve for and , say, ^ and ^

Estimated Regression Equation

Y^ = ^ + ^ X

(^ and ^ are the estimates of and )

For a given value of X = x*, the predicted value of Y is

Y* = ^ + ^ x*

Regression of X on Y: Similar

Equations are not interchangeable !

An Example of Fitting a Linear Regression

Data on two variables (X, Y):

Calculations: n = 7; X = 12.6; Y = 203.9; X2 = 25.9; XY = 441.31

^ = 23.07; ^ = -12.40

Fitted Regression: Y^ = -12.4 + 23.07 X

When X=5, predicted value of Y = Y^ = 102.95.

X 1.0 1.1 1.3 1.8 2.0 2.4 3.0

Y 10.0 12.3 17.0 30.1 36.2 43.0 55.3

Statistical Model Vs. Mathematical Model

Where is the difference?In our approach: • Mathematical Model: Deterministic, no

concept of an error component• Statistical Model: Probabilistic, with a

provision for allowing random error to operate (to account for uncertainties associated with several market forces acting together) – a greater scope for application in real-life situations

How good is a Statistical Model?

• Given data on (X, Y) there may be several competitive models (Linear, Quadratic, Logarithmic etc.)

• Which one will give us the best fit?

• Need to examine the significance of any fitted model - how much of the total variation the model can explain

Statistical Inference/Hypothesis Testing

Reference:

Text Book for the Course

• Statistical Methods in Business and Social Sciences: Shenoy, G.V. & Pant, M. (Macmillan India Limited)

Suggested Reading

• Complete Business Statistics: Aczel, A.D. & Sounderpandian, J. – Fifth Edition (Tata McGraw-Hill)

Documents

IIMC Long Duration Executive Education Executive Programme in Business Management Statistics for Managerial Decisions Distributions & Modeling Data Prof