Upload
swati-sehgal
View
221
Download
2
Embed Size (px)
Citation preview
7/31/2019 Regression & Correlation Class 2011-2013 L17
1/37
Regression Models
7/31/2019 Regression & Correlation Class 2011-2013 L17
2/37
Introduction
Regression analysisis a very valuabletool for a manager
Regression can be used to
Understand the relationship betweenvariables
Predict the value of one variable based onanother variable
Simple linear regression models haveonly two variables
Multiple regression models have morevariables
7/31/2019 Regression & Correlation Class 2011-2013 L17
3/37
Coefficient of Correlation
Measures the relative strength of the linear relationshipbetween two variables
1. Price and Quantity demand (Simple Regression)
2. Height and Weight (Simple Regression)
3. Advertisement expenditure and Sales(Simple Regression)
4. Family income and expenditure on luxury items
(Simple Regression)
5. Sales revenue of the product is influenced by the Adv. exp., Quality
of the product, and price. (Multiple Regression)6. Employer-employee relationship in any organization may be
examined with reference to training and development facilities,medical, housing, salary structure etc (Multiple Regression)
7/31/2019 Regression & Correlation Class 2011-2013 L17
4/37
Introduction
The variable to be predicted is calledthe dependent variable Sometimes called the response variable
The value of this variable depends onthe value of the independent variable Sometimes called the explanatoryor
predictor variable
Independentvariable
Dependentvariable
Independentvariable
= +
7/31/2019 Regression & Correlation Class 2011-2013 L17
5/37
Scatter Diagram
Graphing is a helpful way to investigatethe relationship between variables
A scatter diagramor scatter plotis
often used The independent variable is normally
plotted on theXaxis
The dependent variable is normallyplotted on the Yaxis
7/31/2019 Regression & Correlation Class 2011-2013 L17
6/37
Triple A Construction
Triple A Construction renovates old homes
They have found that the dollar volume ofrenovation work is dependent on the areapayroll
TRIPLE AS SALES(Rs100,000s)
LOCAL PAYROLL(Rs100,000,000s)
6 3
8 4
9 65 4
4.5 2
9.5 5
7/31/2019 Regression & Correlation Class 2011-2013 L17
7/37
Triple A Construction
12
10
8
6
4
2
0
Sales(Rs100,0
00)
Payroll (Rs100 million)
| | | | | | | |
0 1 2 3 4 5 6 7 8
Fig. 1
7/31/2019 Regression & Correlation Class 2011-2013 L17
8/37
Simple Linear Regression
where
Y = dependent variable (response)
X = independent variable (predictor or explanatory)
0 = constant (value of YwhenX= 0)
1 = slope of the regression line
= random error
Regression models are used to test if there is arelationship between variables
There is some random error that cannot bepredicted
XY10
7/31/2019 Regression & Correlation Class 2011-2013 L17
9/37
Simple Linear Regression
True values for the slope and constant are notknown so they are estimated using sample data
XbbY 10
where
Y = dependent variable (response)
X
= independent variable (predictor or explanatory)b0 = constant (value of YwhenX= 0)
b1 = slope of the regression line
^
7/31/2019 Regression & Correlation Class 2011-2013 L17
10/37
Triple A Construction
Triple A Construction is trying to predict salesbased on area payroll
Y= Sales
X= Area payroll
The line chosen in Figure 1 is the one thatminimizes the errors
Error = (Actual value) (Predicted value)
YYe
7/31/2019 Regression & Correlation Class 2011-2013 L17
11/37
Triple A Construction
For the simple linear regression model, the values of theconstant and slope can be calculated using the formulaebelow
XbbY 10
0 1
1 1
N N
n n
Y nb b X
2
0 1
1 1 1
N N N
n n n
YX b X b X
Step 1.
Step 2.
7/31/2019 Regression & Correlation Class 2011-2013 L17
12/37
Triple A Construction
Regression calculations (Step 3)
Y X XY X2
6 3 18 9
8 4 32 16
9 6 54 36
5 4 20 16
4.5 2 9 4
9.5 5 47.5 25
Y= 42Y= 42/6 = 7
X= 24X= 24/6 = 4
5.180XY 1062
X
7/31/2019 Regression & Correlation Class 2011-2013 L17
13/37
Slop, constant and equation
25.1;2 10 bb
XY 2512 .Therefore
Sales = 2 + 1.25(Payroll)
If the payroll next year is Rs600 million
000,9505.9)6(25.12 orY
Put values from step 3 to Step 1; 2 equations
Results are:
7/31/2019 Regression & Correlation Class 2011-2013 L17
14/37
Measuring the Fitof the Regression Model
Regression models can be developedfor any variablesXand Y
How do we know the model is actually
helpful in predicting Ybased onX? We could just take the average error, but
the positive and negative errors wouldcancel each other out
Three measures of variability are SST Total variability about the mean
SSE Variability about the regression line
SSR Total variability that is explained bythe model
7/31/2019 Regression & Correlation Class 2011-2013 L17
15/37
Measuring the Fitof the Regression Model
Sum of the squares total2)( YYSST
Sum of the squared error22)( YYeSSE
Sum of squares due to regression
2
)
( YYSSR
An important relationship
SSESSRSST
7/31/2019 Regression & Correlation Class 2011-2013 L17
16/37
Measuring the Fitof the Regression Model
Y X (YY)2 Y (YY)2 (YY)2
6 3 (6 7)2 = 1 2 + 1.25(3) = 5.75 0.0625 1.563
8 4 (8 7)2 = 1 2 + 1.25(4) = 7.00 1 0
9 6 (9 7)2 = 4 2 + 1.25(6) = 9.50 0.25 6.25
5 4 (5 7)2 = 4 2 + 1.25(4) = 7.00 4 0
4.5 2 (4.5 7)2 = 6.25 2 + 1.25(2) = 4.50 0 6.25
9.5 5 (9.5 7)2 = 6.25 2 + 1.25(5) = 8.25 1.5625 1.563
(YY)2 = 22.5 (YY)2 = 6.875 (YY)2 = 15.625
Y= 7 SST= 22.5 SSE = 6.875 SSR = 15.625
^
^^
^^
7/31/2019 Regression & Correlation Class 2011-2013 L17
17/37
Sum of the squares total2)( YYSST
Sum of the squared error22)( YYeSSE
Sum of squares due to regression
2
)
( YYSSR
An important relationship
SSESSRSST
Measuring the Fitof the Regression Model
For Triple A Construction
SST= 22.5
SSE = 6.875
SSR = 15.625
7/31/2019 Regression & Correlation Class 2011-2013 L17
18/37
Measuring the Fitof the Regression Model
12
10
8
6
4
2
0
Sales(Rs100,0
00)
Payroll (Rs100 million)
| | | | | | | |
0 1 2 3 4 5 6 7 8
Y= 2 + 1.25X^
YYYY
^
YYY^
7/31/2019 Regression & Correlation Class 2011-2013 L17
19/37
Coefficient of Determination
The proportion of the variability in Yexplained byregression equation is called the coefficient ofdetermination
The coefficient of determination isr2
SST
SSE
SST
SSRr 12
For Triple A Construction
69440522
625152 ..
.r
About 69% of the variability in Yis explained bythe equation based on payroll (X)
r2= 0 to 1
7/31/2019 Regression & Correlation Class 2011-2013 L17
20/37
Correlation Coefficient
Thecorrelation coefficientis an expression of thestrength of the linear relationship
It will always be between +1 and1
The correlation coefficient isr
2rr
For Triple A Construction
8333069440 ..r
r will be ve if slope isve and +ve if slope is +ve
7/31/2019 Regression & Correlation Class 2011-2013 L17
21/37
Correlation Coefficient
*
**
*(a) Perfect Positive
Correlation:r = +1
X
Y
*
**
*
(c) No Correlation:r = 0
X
Y
* *
**
* *
* **
*
(d) Perfect NegativeCorrelation:
r =1
X
Y
* **
*
* ***
*(b) Positive
Correlation:0
7/31/2019 Regression & Correlation Class 2011-2013 L17
22/37
Using Excel for Regression
7/31/2019 Regression & Correlation Class 2011-2013 L17
23/37
Using Excel for Regression
7/31/2019 Regression & Correlation Class 2011-2013 L17
24/37
Using Excel for Regression
7/31/2019 Regression & Correlation Class 2011-2013 L17
25/37
Using Excel for Regression
7/31/2019 Regression & Correlation Class 2011-2013 L17
26/37
Using Excel for RegressionCorrelation coefficient is
called Multiple R in Excel
7/31/2019 Regression & Correlation Class 2011-2013 L17
27/37
Multiple Regression Analysis
Multiple regression modelsareextensions to the simple linear modeland allow the creation of models withseveral independent variables
Y= 0 + 1X1 + 2X2+ + kXk +
where
Y= dependent variable (response variable)
Xi = ith independent variable (predictor or explanatoryvariable)
0 = constant (value of Ywhen allXi= 0)I = coefficient of the ith independent variable
k = number of independent variables
= random error
7/31/2019 Regression & Correlation Class 2011-2013 L17
28/37
Multiple Regression Analysis
To estimate these values, a sample is takenthe following equation developed
kkXbXbXbbY ...
22110
where
= predicted value of Y
b0 = sample constant (and is an estimate of 0)
bi= sample coefficient of the ith variable (and isan estimate of i)
Y
7/31/2019 Regression & Correlation Class 2011-2013 L17
29/37
Jenny Wilson Realty
Jenny Wilson wants to develop a model todetermine the suggested listing price for housesbased on the size and age of the house
kkXbXbXbbY ...
22110
where
= predicted value of dependent variable (sellingprice)
b0 = Yconstant
X1
andX2
= value of the two independent variables (squarefootage and age) respectively
b1 andb2 = slopes forX1 andX2 respectively
Y
She selects a sample of houses that have soldrecently and records the data shown in followingTable
7/31/2019 Regression & Correlation Class 2011-2013 L17
30/37
Jenny Wilson Realty
SELLINGPRICE (Rs)
SQUAREFOOTAGE
AGE OFHOUSE
CONDITION
95,000 1,926 30 Good
119,000 2,069 40 Excellent
124,800 1,720 30 Excellent
135,000 1,396 15 Good142,000 1,706 32 Mint
145,000 1,847 38 Mint
159,000 1,950 27 Mint
165,000 2,323 30 Excellent
182,000 2,285 26 Mint
183,000 3,752 35 Good
200,000 2,300 18 Good
211,000 2,525 17 Good
215,000 3,800 40 Excellent
219,000 1,740 12 Mint
7/31/2019 Regression & Correlation Class 2011-2013 L17
31/37
7/31/2019 Regression & Correlation Class 2011-2013 L17
32/37
Press OK
7/31/2019 Regression & Correlation Class 2011-2013 L17
33/37
One Dep. variable Selling price; Two indep. Variables
Square Footage & Age. So Input X Range will containBoth dataset (Square footage & Age together i.e. B2:C15).
And remaining process is same.
7/31/2019 Regression & Correlation Class 2011-2013 L17
34/37
Constant i.e. b0
Proportion of Ist Indep. Var.i.e. Square footage
Proportion of IInd Indep. Var.i.e. Age
7/31/2019 Regression & Correlation Class 2011-2013 L17
35/37
Jenny Wilson Realty
21289944146631 XXY
7/31/2019 Regression & Correlation Class 2011-2013 L17
36/37
Assumptions of the Regression Model
1. Errors are independent
2. Errors are normally distributed
3. Errors have a mean of zero
4. Errors have a constant variance
If we make certain assumptions about the errorsin a regression model, we can perform statisticaltests to determine if the model is useful
A plot of the residuals (errors) will often highlightany glaring violations of the assumption
7/31/2019 Regression & Correlation Class 2011-2013 L17
37/37
Example
Bus and Subway ridership in Delhi during winter months is believed to be
heavily tied to the number of tourists visiting the city. During the past 12
years, the following data have been obtained:
Develop a regression model.
What is expected ridership if 10 thousand tourists visit the city?
If there are no tourists at all, explain the predicted ridership.
Year 1 2 3 4 5 6 7 8 9 10 11 12No. of Tourists
(10,000s)
7 2 6 4 14 15 16 12 14 20 15 7
Ridership
(1000s)
15 10 13 15 25 27 24 20 27 44 34 17