Upload
amberlynn-charles
View
217
Download
0
Embed Size (px)
Citation preview
IIMC Long Duration Executive EducationExecutive Programme in Business Management
Statistics for Managerial Decisions
Distributions & Modeling Data
Prof. Saibal Chattopadhyay
IIM Calcutta
A Brief Review
Uncertainty and Randomness: Theory of Probability• Random Experiments, Events, Sample Space, Mutually
Exclusive and Exhaustive Events, Set-theoretic operations with events (Union, Intersection, Difference, Complement), Classical Definition, Total probability Theorem, Bayes’ Theorem, Independence of two or more events.
• Random Variables & Probability Distributions, Mean and Variance
Decision Making Under Uncertainty: Utility Theory• Decisions Based on Expected Utility• Choice of Utility Function U(w): Risk-averse {U(w) = w},
Risk-seeker {U(w) = w2} and Risk-neutral {U(w) = w} • Preference Reversal in Decision Making
Some Probability Distributions
Two types of Random Variables:
(A) Discrete Random Variables
• X is discrete if it takes a few values (mass-points) x1, x2, …, xn, … with corresponding probabilities p1, p2, …, pn, … .
• probability law specifying the probabilities for different values, is called the probability mass function (pmf)
f(x) = P(X = x), for x = x1, x2, …, xn,…
Necessary Conditions for a function to be a pmf• f(x) 0, and f(x) = 1, sum taken over all values of xExample: The random variable X takes 10 values
1, 2, …, 10; the probability for X to take any value is proportional to the square of the value.
Thus f(x) = C.x2 where C is the constant of proportionality.
From condition (i), C 0, and from (ii)1 = f(1) + f(2) + …+ f(10) = C{12 + 22 +…+102}
= C. 10.11.21/6, so C = 1/385. f(x) = x2 / 385, for x = 1, 2, …, 10.
Probability Distribution of a Random Variable
Table giving the different values of the random variable and the corresponding probabilities:
Characteristics of importance: Mean & SDMean = = Sum(value*probability)SD = = SQRT_Sum1 – (Mean)2
Sum1 = Sum(value2*probability)Use: Help in assessing the shape of the distribution and
the coverage probability (Chebyshev’s Inequality)
X=x x1 x2 … xn Total
P(X=x) p1 p2 … pn 1
Some Special Discrete Distributions
1 Binomial Distribution:
Applicable for the following types of experimentations (called Binomial/ Bernoullian trials):
(a) Only two outcomes, called Success (S) and Failure (F) for each trial;
(b) P(S) = p and P(F) = 1 – P(S) = 1 – p = q, same for all trials;
(c) Trials are probabilistically independent.
When is such a set-up applicable in real life?
Condition (a) generally holds: Call Success as any event for some experiment and Failure as the complement of the event
Condition (b) holds in most situations unless the definition of ‘Success’ changes mid-stream, or the initial conditions vary from one trial to another
Condition (c) holds for repetitive trials
Calculation of probabilities for random events under such a set-up is easy !
Binomial Distribution
Consider n Binomial trials with P(Success) = p, and P(Failure) = q = 1 – p.
Define X = Number of successes in ‘n’ trials
X is discrete random variable with values 0,1,…,n.
The probability law (p.m.f) of X is
f(x) = P(X=x) = nCx px qn-x, for x = 0,1,…,n
Mean = n.p
SD = n.p.q
2. Poisson Distribution
X is Poisson if the probability law isf(x) = P(X = x) = e-m.mx/x! ,for x = 0,1, 2,…,• m = mean = (SD)2 • Distribution is positively skewed (longer tails
towards the higher values)• Used to model count data for rare events• Approximates binomial distribution when n (the
number of trials) is ‘large’, p (the probability of success in a trial) is ‘small’ but n.p (the average number of successes) is finite, equal to m
Continuous Probability Distributions
Random Variable X is continuous in (a, b) if it can take any value in (a, b).
Probability Law for X?How many values? Uncountable !!Can’t assign probabilities to individual values of
the variable!How to proceed? Use a continuous function f(x)
over (a, b) to describe the probability lawP(cXd) = Area under the function f(x) between
x=c and x=d
Continuous Distribution
The function f(x) is called the probability density function (pdf) of the continuous random variable
Necessary conditions:1. f(x) 0, for all x in (a, b)2. Total area under f(x) in (a, b) = 1
( Definite Integral ab f(x)dx = 1) Continuous Distribution: Probability is SAME as
area under a curveNaturally, P(X = any particular value) = 0, but P(X taking values in any interval) > 0.
An ExampleConsider random variable X over an interval(1, 10)
such that the pdf f(x) is a constant over the interval, i.e.,
f(x) = C for 1 x 10,= 0, otherwise.
Since total area = 1, C = 1/(10-1) = 1/9Thus f(x) = 1/9 for 1 x 10, and
= 0 elsewhere Rectangular / Uniform DistributionDensity is uniform over the entire range of the
variableNot true in general for any distribution!
Some Continuous Distributions
1. Normal Distribution• Most widely used distribution in Statistical
Literature• Unimodal, bell-shaped probability curve• Ranges over the entire real line (-, )• Distribution is characterized by its mean and
SD (- < < , 0 < < )• Distribution is perfectly symmetrical about its
mean• Mean = Median = Mode =
Normal Distribution Continued
Standard Normal Distribution (Z)
Mean and SD are two standard values: Mean = = 0, and SD = = 1.
Result: If X is Normal with mean and SD , then the standardized variable
Z = (X – Mean) / SD = (X - ) / is Standard Normal.
Probability Table for Standard Normal Distribution is available, and this can be used to calculate normal probabilities
Approximating a discrete probability
distribution by Normal Distribution
• Normal Approximation of Binomial
If X is Binomial with parameters (n, p), then the binomial probability P(a X b), where ‘a’ and ‘b’ are integers, can be well approximated by a normal area P(a – ½ X b + ½ ) where X follows a normal distribution with mean = np and SD= npq.
Approximation works well unless the binomial distribution is too skewed (p very close to 0 or 1)
2. Exponential Distribution
• Another continuous distribution which varies over the positive part of real line (0, )
• Not symmetric, in fact the density curve is positively skewed (longer tail is towards the higher values of the variable)
• Used to model the life of complex electronic equipments
• Has a “loss of memory” property: future life is independent of the current age of the product
• Widely used is Reliability Analysis
Reproductive Property of Distributions
Many distributions retain the same form when two or more identical but independent distributions are combined
1. Binomial: X1 is Binomial (n1, p) and X2 is Binomial (n2, p) X1 + X2 is also Binomial (n1 + n2, p)
Note: Result not true if success probability p is different
Reproductive Property2. Poisson: X1 is Poisson with mean m1, X2 is
Poisson with mean m2 X1 + X2 is Poisson with mean m1 + m2
3. Normal: X1 is Normal (1, 1) and X2 is Normal (2, 2) X1 + X2 is also Normal (Mean = 1 + 2, SD = {12 + 22} )
Notes: a) For discrete distributions, the property does not hold for the difference X1 – X2
b) For normal distribution, the property holds for the difference as well: X1 – X2 is also normal (Mean = 1 - 2, SD = {12 + 22} )
Joint Distribution of Two Random Variables
Two random variables X and Y are studied together for examining their possible interdependence
Consider “both discrete” case:
X has k values x1, x2, …, xk
Y has l values y1, y2, …, yl
Joint probability law: P(X = xi, Y = yj) = Pij,
i = 1, 2, …, k; j = 1, 2, …, l.
An Example of a Joint Distribution
X and Y are (random) percentage returns from two stocks in BSE;
X could take one of the values 5%, 10% or 20%Y takes one of the values 10% or 20%. From past data, the joint probabilities are
estimated asP(X=5, Y=10) = 0.10; P(X=5, Y=20) = 0.25;P(X=10, Y=10) = 0.08; P(X=10, Y=20) = 0.22;P(X=20, Y=10) = 0.30; P(X=20, Y=20) = 0.05;
Joint Distribution Table
The joint distribution of X and Y can be shown in the following table:
X \ Y 10 20 Row Total (X)
5 0.10 0.25 0.35
10 0.08 0.22 0.30
20 0.30 0.05 0.35
Column Total (Y)
0.48 0.52 1.00
Some Concepts in a Joint Distribution
a) Marginal Probability Distributions – Mean and SD
These are obtained as the Row and Column totals:
of X along the rows: Pio, and of Y along the columns:Poj
distribution of only one variable when variation in the other variable is ignored
Marginal Distributions of X and Y
Of X: P(X=5) = 0.35, P(X=10) = 0.30 and
P(X=20) = 0.35. Mean & SD of X: As usualMean of X =Average % return of stock X = x = 5.
(0.35) + 10. (0.30) + 20.(0.35)= 11.75SD of the % return for stock X = SD of X = x =
{SUM-SQ – (Mean)-Sq} = [{25.(0.35) + 100.(0.30) + 400.(0.35)} – (11.75)2 ] = 40.6875 = 6.38
Marginal Distribution of Y
P(Y=10) = 0.48; P(Y=20) = 0.52
Mean of Y = Average % return of stock Y = y = 10.(0.48) + 20.(0.52) = 15.2,
SD of the % return for stock Y = SD of Y = y = (256 – 231.04) = 5.00
Is that all?
What about their possible interdependence?
Independence of the Random Variables
Recall: For two events A and B, they are independent if P(AB) = P(A).P(B)
X and Y will be independent random variables if similar things hold:
(X=xi) and (Y=yj) must be independent events for all choices of xi and yj, that is
P{(X=xi)(Y=yj)} = P(X=xi).P(Y=yj)
Every Cell prob. = Row total . Column total
Independence of two random variables
For this example, P(X=5, Y=10) = 0.10 while
P(X=5) = 0.35, P(Y=10) = 0.48 so that
P(X=5).P(Y=10)=(0.35).(0.48) =0.168 0.10 X and Y are not independent
How to examine the extent of dependence?
• Correlation Approach: Examine if X and Y are related, either exactly or at least approximately, in a linear form
Correlation Coefficient (Pearson)
The Covariance between X and Y is
xy = Sum(xiyjPij) – (Mean of X)(Mean of Y)The SD’s of X and Y are, as before,
x = SQRT{Sum(xi2Pio) – (Mean of X)2 }
y = SQRT{Sum(yj2Poj) – (Mean of Y)2 }
Correlation Coefficient = = xy / x y
For our example, xy = 162 – 11.75*15.2 =
-16.6; x = 6.38; y = 5.00, so that = -0.52
How does it help?
Result: For any joint distribution, -1 1. Interpretation: Sign of tells us how one variable
behaves with variation in the other:• if both behave in the same direction (both
increases or both decreases) that is a case of positive correlation; will be positive here; 0 < < 1.
• if they behave in opposite directions (one increases as the other decreases or conversely) that is a case of negative correlation; will be negative here; -1 < < 0.
Interpretation of Correlation Coefficient
• Case when = 0: Here the two variables are called uncorrelated. This means that there is no linear relationship between the two variables.
• Case when = 1: This is the case of perfect positive correlation in the sense that for all pairs of values of (X, Y), Y = a + bX with certainty, i.e., P(Y = a + bX) =1, with b>0.
• Case when = -1: This is the case of perfect negative correlation in the sense that for all pairs of values of (X, Y), Y = a - bX with certainty, i.e., P(Y=a - bX) =1, with b>0.
Example Revisited
In our example, = - 0.52.
So there is a high negative correlation between the two variables X and Y.
Since X and Y indicate the % return from the two stocks, this means that if one stock performs well (giving high return), the other stock is likely to under-perform, giving returns below its expected return.
Limitation of
Examines only Linear Relationship between X and Y; if true relationship is non-linear or if there is no relationship, fails to capture
Thus = 0 does not mean that X and Y are independent random variables !!
A serious drawback of How to capture other relationships, if any?
Regression Approach
Regression Equation
Emphasis is to examine how one variable explains the variation in the other variable
Y = Study variable
X = auxiliary variable
To develop an equation that explains Y when X is known
Regression Equation of Y on X
Regression Equation
Different types:
• Y = + X (linear regression)
• Y = + X + X2 (quadratic regression)
• log Y = + X (logarithmic regression)
, , etc. are equation parameters, usually unknown
Need to estimate them from data
How to estimate the parameters?
Least Squares Principle
Data: n pairs of values on (X, Y): (x1,y1), (x2, y2), … , (xn, yn)
Consider Linear Regression: Y = + X
For X =Xi,
• Observed value of Y = Yi, and
• Predicted value of Y = Value of Y obtained from the model = + Xi , i=1,2,…, n.
Least Squares Principle
Error is ei = Yi – ( + Xi ), i = 1,2,…, nMinimize sum of squares of the errors: = ei2 = (Yi – ( + Xi )2 w.r.t. and Equations for solving and :
Yi = n + Xi XiYi = Xi + Xi2
Two equations in two unknowns and Solve for and , say, ^ and ^
Estimated Regression Equation
Y^ = ^ + ^ X
(^ and ^ are the estimates of and )
For a given value of X = x*, the predicted value of Y is
Y* = ^ + ^ x*
Regression of X on Y: Similar
Equations are not interchangeable !
An Example of Fitting a Linear Regression
Data on two variables (X, Y):
Calculations: n = 7; X = 12.6; Y = 203.9; X2 = 25.9; XY = 441.31
^ = 23.07; ^ = -12.40
Fitted Regression: Y^ = -12.4 + 23.07 X
When X=5, predicted value of Y = Y^ = 102.95.
X 1.0 1.1 1.3 1.8 2.0 2.4 3.0
Y 10.0 12.3 17.0 30.1 36.2 43.0 55.3
Statistical Model Vs. Mathematical Model
Where is the difference?In our approach: • Mathematical Model: Deterministic, no
concept of an error component• Statistical Model: Probabilistic, with a
provision for allowing random error to operate (to account for uncertainties associated with several market forces acting together) – a greater scope for application in real-life situations
How good is a Statistical Model?
• Given data on (X, Y) there may be several competitive models (Linear, Quadratic, Logarithmic etc.)
• Which one will give us the best fit?
• Need to examine the significance of any fitted model - how much of the total variation the model can explain
Statistical Inference/Hypothesis Testing
Reference:
Text Book for the Course
• Statistical Methods in Business and Social Sciences: Shenoy, G.V. & Pant, M. (Macmillan India Limited)
Suggested Reading
• Complete Business Statistics: Aczel, A.D. & Sounderpandian, J. – Fifth Edition (Tata McGraw-Hill)