Upload
malcolm-stephens
View
212
Download
0
Embed Size (px)
Citation preview
Simple Linear Regression: Ŷ b0 + b1 X1
You want to examine the linear dependency of the annual sales of produce stores on their size in square footage. Sample data for seven stores were obtained. Find the equation of the straight line that fits the data best.
Annual Store Square Sales
Feet ($1000)
1 1,726 3,681 2 1,542 3,395 3 2,816 6,653 4 5,555 9,543 5 1,292 3,318 6 2,208 5,563 7 1,313 3,760
Which is the dependent Y variable?
A. The Store Number
B. The Square Footage of the Store
C. The Annual Sales of the Store
Which is the independent X variable?
A. The Store Number
B. The Square Footage of the Store
C. The Annual Sales of the Store
Scatter Diagram: Example
0
2000
4000
6000
8000
10000
12000
0 1000 2000 3000 4000 5000 6000
Square Feet
An
nu
al
Sa
les
($00
0)
Excel Output
Equation for the Sample Regression Line: Example
0 1ˆ
1636.415 1.487i i
i
Y b b X
X
From Excel Printout:
CoefficientsIntercept 1636.414726X Variable 1 1.486633657
Graph of the Sample Regression Line: Example
0
2000
4000
6000
8000
10000
12000
0 1000 2000 3000 4000 5000 6000
Square Feet
An
nu
al
Sa
les
($00
0)
Y i = 1636.415 +1.487X i
Interpretation of Results:
The model estimates that for each increase of one square foot in the size of the store, the expected annual sales are predicted to increase by $1487.
ˆ 1636.415 1.487i iY X
If a new 2,000 square foot produce store is built, the model predicts that the expected annual sales at the store will be:
1636 + 1.487*2000 = $4,610 (in 1000s)
The Coefficient of Determination
• r2 = 94.2%• Measures the proportion of variation in Y
(e.g. Annual Sales) that is explained by the linear regression model with the independent variable X (e.g. square feet)
• Describes the explanatory power of the simple linear regression model; it does not imply that X causes the changes in Y.
Significant Coefficients• If a linear relationship between the
dependent and independent variables does not exist, the true value of the slope should be 0. To test to see if this is true, look at the 95% confidence interval for an independent variable’s coefficient:
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 1636.414726 451.4953308 3.624433 0.015149 475.80903 2797.02042
X Variable 1 1.486633657 0.164999212 9.009944 0.000281 1.06248968 1.91077763
Multiple Linear Regression Model with n independent variables
• Equation:
• Adjusted R-square:– Describes the explanatory power of the
multiple regression, after compensating for sample size and the number of independent variables in the model
Ŷi b0 + b1 X1 + b2 X2 +…..+ bn Xn
Restaurant Sales Exercise (ChiliDogRegress.xlsx)
• The ChiliDog Hut fast food chain wants to identify a good location for a new restaurant
• They have identified three possible independent variables that could have a relationship with the annual sales (in $1,000s) of a restaurant– # of other fast food stores in 1 mile radius– # of schools and businesses in 1 mile radius– $ spent on advertising per year
• Help ChiliDog identify the regression model that forecasts annual sales the best
Simple Linear Regression on # of Other Restaurants in 1 mile radius
Regression StatisticsMultiple R 0.8356977R Square 0.6983906Adjusted R Square 0.6606894Standard Error 24.352403Observations 10
CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%Upper 95.0%Intercept 38.664063 22.5752821 1.712672393 0.125128067 -13.3946 90.7 -13.3946 90.72276# of competitors in 1 mile radius 8.4570313 1.9649261 4.303994564 0.002601674 3.925904 13 3.925904 12.98816
What Conclusions Can You Make From the Simple Linear Regression?
A. More competing restaurants in the 1 mile radius hurt ChiliDog Hut’s sales
B. The # of competing restaurants in the 1 mile radius have little impact on ChiliDog Hut’s sales
C. There is a positive correlation between the # of competing restaurants in the 1 mile radius and ChiliDog Hut’s sales
D. Increasing the # of competing restaurants in the 1 mile radius will increase ChiliDog Hut’s sales
Multiple Regression on $ Spent on Advertising & # of Other Restaurants in 1
mile radiusSUMMARY OUTPUT
Regression StatisticsMultiple R 0.8926645R Square 0.7968499Adjusted R Square 0.738807Standard Error 21.366032Observations 10
ANOVA
df SS MS FSignifica
nce FRegression 2 12534.4487 6267.224338 13.72863895 0.003779Residual 7 3195.55132 456.5073321Total 9 15730
Coefficient
sStandard
Error t Stat P-valueLower 95%
Upper
95%Lower 95.0%
Upper 95.0%
Intercept 78.879957 29.4792254 2.675781191 0.031732659 9.172666 149 9.172666 148.5872$ spent on advertising (in
$1,000s) -5.7649964 3.12989773 -1.84191208 0.108041529 -13.166 1.64 -13.166 1.636036
# of competitors in 1 mile radius 8.2173432 1.72886864 4.753017667 0.002076223 4.129218 12.3 4.129218 12.30547
Coefficient of Determination: Adjusted R-square
• Proportion of variation in Y around its mean that is accounted for by the regression model– 0 <= Adj. R2 <= 1
• Describes the explanatory power of the multiple linear regression model, after compensating for sample size and the number of independent variables.