Upload
lecea
View
44
Download
1
Embed Size (px)
DESCRIPTION
Business Forecasting. Chapter 7 Forecasting with Simple Regression. Chapter Topics. Types of Regression Models Determining the Simple Linear Regression Equation Measures of Variation Assumptions of Regression and Correlation Residual Analysis Measuring Autocorrelation - PowerPoint PPT Presentation
Citation preview
Business Forecasting Chapter 7Forecasting with Simple Regression
Chapter TopicsTypes of Regression ModelsDetermining the Simple Linear Regression EquationMeasures of VariationAssumptions of Regression and CorrelationResidual Analysis Measuring AutocorrelationInferences about the Slope
Chapter TopicsCorrelationMeasuring the Strength of the AssociationEstimation of Mean Values and Prediction of Individual ValuesPitfalls in Regression and Ethical Issues(continued)
Purpose of Regression AnalysisRegression Analysis is Used Primarily to Model Causality and Provide PredictionPredict the values of a dependent (response) variable based on values of at least one independent (explanatory) variable.Explain the effect of the independent variables on the dependent variable.
Types of Regression ModelsPositive Linear RelationshipNegative Linear RelationshipRelationship NOT LinearNo Relationship
Sample regression line provides an estimate of the population regression line as well as a predicted value of Y.Linear Regression EquationSample Y InterceptSample Slope CoefficientResidual Simple Regression Equation (Fitted Regression Line, Predicted Value)
Simple Linear Regression: ExampleYou wish to examine the linear dependency of the annual sales of produce stores on their sizes in square footage. Sample data for seven stores were obtained. Find the equation of the straight line that best fits the data.Annual Store Square Sales Feet($1000) 1 1,726 3,681 2 1,542 3,395 3 2,816 6,653 4 5,555 9,543 5 1,292 3,318 6 2,208 5,563 7 1,313 3,760
Scatter Diagram: ExampleExcel Output
Chart6
3681
3395
6653
9543
3318
5563
3760
Annual Sales ($000)
Square Feet
Annual Sales
Sheet1
StoreSquare FeetAnnual Sales ($000)
11,7263,681
21,5423395
32,8166653
45,5559543
51,2923318
62,2085563
71,3133760
Sheet1
Annual Sales ($000)
Square Feet
Annual Sales
Sheet2
Sheet3
Simple Linear Regression Equation: ExampleFrom Excel Printout:
Sheet4
SUMMARY OUTPUT
Regression Statistics
Multiple R0.9705572038
R Square0.9419812858
Adjusted R Square0.930377543
Standard Error611.7515173314
Observations7
ANOVA
dfSSMSFSignificance F
Regression130380456.119499630380456.119499681.1790901520.0002812009
Residual51871199.59478616374239.918957231
Total632251655.7142857
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept1636.4147260759451.49533084953.62443333130.0151488191475.80903014342797.0204220085475.80903014342797.0204220085
Square Footage1.48663365650.16499921239.00994395940.00028120091.06248967861.91077763451.06248967861.9107776345
RESIDUAL OUTPUTPROBABILITY OUTPUT
ObservationPredicted YResidualsPercentileY
14202.3444172716-521.34441727167.14285714293318
23928.8038244674-533.803824467421.42857142863395
35822.775102905830.22489709535.71428571433681
49894.6646881801-351.6646881801503760
53557.1454103313-239.145410331364.28571428575563
64918.901839726644.09816027478.57142857146653
73588.3647171187171.635282881392.85714285719543
Sheet4
1726
1542
2816
5555
1292
2208
1313
X Variable 1
Residuals
X Variable 1 Residual Plot
Sheet1
Sample Percentile
Y
Normal Probability Plot
Sheet2
Sq. Ft spaceSales
11,7263,681
21,5423,395
32,8166,653
45,5559,543
51,2923,318
62,2085,563
71,3133,760
Sheet3
Graph of the Simple Linear Regression Equation: ExampleYi = 1636.415 +1.487Xi
Chart2
3681
3395
6653
9543
3318
5563
3760
Annual Sales ($000)
Square Feet
Annual Sales ($)
Sheet5
SUMMARY OUTPUT
Regression Statistics
Multiple R0.9705572038
R Square0.9419812858
Adjusted R Square0.930377543
Standard Error611.7515173314
Observations7
ANOVA
dfSSMSFSignificance F
Regression130380456.119499630380456.119499681.1790901520.0002812009
Residual51871199.59478615374239.91895723
Total632251655.7142857
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536
X Variable 11.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416
RESIDUAL OUTPUT
ObservationPredicted YResiduals
14202.3444172716-521.3444172716
23928.8038244674-533.8038244674
35822.775102905830.224897095
49894.6646881801-351.6646881801
53557.1454103313-239.1454103313
64918.901839726644.098160274
73588.3647171187171.6352828813
Sheet1
StoreSquare FeetAnnual Sales ($000)
117263681
215423395
328166653
455559543
512923318
622085563
713133760
Sheet1
Annual Sales ($000)
Square Feet
Annual Sales ($000)
Sheet2
Sheet3
Interpretation of Results: ExampleThe slope of 1.487 means that, for each increase of one unit in X, we predict the average of Y to increase by an estimated 1.487 units.The equation estimates that, for each increase of 1 square foot in the size of the store, the expected annual sales are predicted to increase by $1,487.
Simple Linear Regression in ExcelIn Excel, use | Regression EXCEL Spreadsheet of Regression Sales on Square Footage
Scatter
3681
3395
6653
9543
3318
5563
3760
Sales
X
Y
Scatter Diagram
DataCopy
FootageSales(X-XBar)^2
17263681389732.653061224
15423395653325.795918367
28166653216889.795918367
5555954310270193.6530612
129233181119968.65306122
2208556320245.2244897959
131337601075961.65306122
SLR
Regression Analysis
Regression Statistics
Multiple R0.9705572038
R Square0.9419812858
Adjusted R Square0.930377543
Standard Error611.7515173314
Observations7
ANOVA
dfSSMSFSignificance F
Regression130380456.119499630380456.119499681.1790901520.0002812009
Residual51871199.59478615374239.91895723
Total632251655.7142857
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%
Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536
Footage1.48663365650.16499921239.00994395940.00028120091.06249037151.9107769416
RESIDUAL OUTPUT
ObservationPredicted SalesResiduals
14202.3444172716-521.3444172716
23928.8038244674-533.8038244674
35822.775102905830.224897095
49894.6646881801-351.6646881801
53557.1454103313-239.1454103313
64918.901839726644.098160274
73588.3647171187171.6352828813
SLR
1726
1542
2816
5555
1292
2208
1313
Footage
Residuals
Footage Residual Plot
Estimate
Confidence Interval Estimate
X Value2000
Confidence Level95%
Sample Size7
Degrees of Freedom5
t Value2.5705776352
Sample Mean2350.2857142857
Sum of Squared Difference13746317.43
Standard Error of the Estimate611.7515173314
h Statistic0.1517831758
Average Predicted Y (YHat)4609.6820391647
For Average Predicted Y (YHat)
Interval Half Width612.6572787957
Confidence Interval Lower Limit3997.024760369
Confidence Interval Upper Limit5222.3393179604
For Individual Response Y
Interval Half Width1687.6840468462
Prediction Interval Lower Limit2921.9979923185
Prediction Interval Upper Limit6297.3660860109
This sheet is generated by:(1)| Regression | Simple Linear Regression (2) checking the "ANOVA and Coefficients Table", and the "Residual Plot" boxes
&A
Page &P
This sheet is generated by:(1) PHStat | Regression | Simple Linear Regression (2) checking the "Confidence and Prediction Intervals for X=" box
data
StoreFootageSales
11,7263,681
21,5423,395
32,8166,653
45,5559,543
51,2923,318
62,2085,563
71,3133,760
Measures of Variation: The Sum of SquaresSST = SSR + SSETotal Sample Variability=Explained Variability+Unexplained Variability
The ANOVA Table in Excel
ANOVAdfSSMSFSignificance FRegressionkSSRMSR=SSR/kMSR/MSEP-value of the F TestResidualsnk1SSEMSE=SSE/(nk1)Totaln1SST
Measures of VariationThe Sum of Squares: ExampleExcel Output for Produce StoresSSRSSERegression (explained) dfDegrees of freedomError (residual) dfTotal dfSST
Sheet5
SUMMARY OUTPUT
Regression Statistics
Multiple R0.9705572038
R Square0.9419812858
Adjusted R Square0.930377543
Standard Error611.7515173314
Observations7
ANOVA
dfSSMSFSignificance F
Regression130,380,456.1230,380,456.1281.1790901520.0002812009
Residual51,871,199.59374,239.92
Total632,251,655.71
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536
X Variable 11.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416
RESIDUAL OUTPUT
ObservationPredicted YResiduals
14202.3444172716-521.3444172716
23928.8038244674-533.8038244674
35822.775102905830.224897095
49894.6646881801-351.6646881801
53557.1454103313-239.1454103313
64918.901839726644.098160274
73588.3647171187171.6352828813
Sheet1
StoreSquare FeetAnnual Sales ($000)
117263681
215423395
328166653
455559543
512923318
622085563
713133760
Sheet1
Annual Sales ($000)
Square Feet
Annual Sales ($000)
Sheet2
Sheet3
The Coefficient of Determination
Measures the proportion of variation in Y that is explained by the independent variable X in the regression model.
Measures of Variation: Produce Store ExampleExcel Output for Produce Stores r2 = 0.94 94% of the variation in annual sales can be explained by the variability in the size of the store as measured by square footage.Syx
Sheet5
SUMMARY OUTPUT
Regression Statistics
Multiple R0.9705572038
R Square0.9419812858
Adjusted R Square0.930377543
Standard Error611.7515173314
Observations7
ANOVA
dfSSMSFSignificance F
Regression130380456.119499630380456.119499681.1790901520.0002812009
Residual51871199.59478615374239.91895723
Total632251655.7142857
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536
X Variable 11.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416
RESIDUAL OUTPUT
ObservationPredicted YResiduals
14202.3444172716-521.3444172716
23928.8038244674-533.8038244674
35822.775102905830.224897095
49894.6646881801-351.6646881801
53557.1454103313-239.1454103313
64918.901839726644.098160274
73588.3647171187171.6352828813
Sheet1
StoreSquare FeetAnnual Sales ($000)
117263681
215423395
328166653
455559543
512923318
622085563
713133760
Sheet1
Annual Sales ($000)
Square Feet
Annual Sales ($000)
Sheet2
Sheet3
Linear Regression AssumptionsNormalityY values are normally distributed for each X.Probability distribution of error is normal.2.Homoscedasticity (Constant Variance)3.Independence of Errors
Residual AnalysisPurposesExamine linearity Evaluate violations of assumptionsGraphical Analysis of ResidualsPlot residuals vs. X and time
Residual Analysis for LinearityNot LinearLinearX e eXYXYX
Residual Analysis for Homoscedasticity HeteroscedasticityHomoscedasticitySRXSRXYXXY
Residual Analysis: Excel Output for Produce Stores ExampleExcel Output
Chart3
-521.3444172716
-533.8038244674
830.224897095
-351.6646881801
-239.1454103313
644.098160274
171.6352828813
Square Feet
Residual Plot
Sheet5
SUMMARY OUTPUT
Regression Statistics
Multiple R0.9705572038
R Square0.9419812858
Adjusted R Square0.930377543
Standard Error611.7515173314
Observations7
ANOVA
dfSSMSFSignificance F
Regression130380456.119499630380456.119499681.1790901520.0002812009
Residual51871199.59478615374239.91895723
Total632251655.7142857
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536
X Variable 11.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416
RESIDUAL OUTPUT
ObservationPredicted YResiduals
14202.3444172716-521.3444172716
23928.8038244674-533.8038244674
35822.775102905830.224897095
49894.6646881801-351.6646881801
53557.1454103313-239.1454103313
64918.901839726644.098160274
73588.3647171187171.6352828813
Sheet6
SUMMARY OUTPUT
Regression Statistics
Multiple R0.9705572038
R Square0.9419812858
Adjusted R Square0.930377543
Standard Error611.7515173314
Observations7
ANOVA
dfSSMSFSignificance F
Regression130380456.119499630380456.119499681.1790901520.0002812009
Residual51871199.59478615374239.91895723
Total632251655.7142857
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536
X Variable 11.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416
RESIDUAL OUTPUT
ObservationPredicted YResidualsStandard Residuals
14202.3444172716-521.3444172716-0.9335558294
23928.8038244674-533.8038244674-0.9558665166
35822.775102905830.2248970951.486658851
49894.6646881801-351.6646881801-0.6297154218
53557.1454103313-239.1454103313-0.4282305219
64918.901839726644.0981602741.1533672795
73588.3647171187171.63528288130.3073421591
Sheet6
1726
1542
2816
5555
1292
2208
1313
Square Feet
Residuals
Residual Plot
Sheet1
36811726
33951542
66532816
95435555
33181292
55632208
37601313
Y
Predicted Y
X Variable 1
Y
X Variable 1 Line Fit Plot
Sheet2
StoreSquare FeetAnnual Sales ($000)
117263681
215423395
328166653
455559543
512923318
622085563
713133760
Sheet2
Annual Sales ($000)
Square Feet
Annual Sales ($000)
Sheet3
Sheet5
SUMMARY OUTPUT
Regression Statistics
Multiple R0.9705572038
R Square0.9419812858
Adjusted R Square0.930377543
Standard Error611.7515173314
Observations7
ANOVA
dfSSMSFSignificance F
Regression130380456.119499630380456.119499681.1790901520.0002812009
Residual51871199.59478615374239.91895723
Total632251655.7142857
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536
X Variable 11.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416
RESIDUAL OUTPUT
ObservationPredicted YResiduals
14,202.34442-$521.34442
23,928.80382-$533.80382
35,822.77510$830.22490
49,894.66469-$351.66469
53,557.14541-$239.14541
64,918.90184$644.09816
73,588.36472$171.63528
Sheet1
StoreSquare FeetAnnual Sales ($000)
117263681
215423395
328166653
455559543
512923318
622085563
713133760
Sheet1
Annual Sales ($000)
Square Feet
Annual Sales ($000)
Sheet2
Sheet3
Residual Analysis for IndependenceThe DurbinWatson StatisticUsed when data are collected over time to detect autocorrelation (residuals in one time period are related to residuals in another period).Measures the violation of the independence assumptionShould be close to 2. If not, examine the model for autocorrelation.
DurbinWatson Statistic in MinitabMinitab | Regression | Simple Linear Regression
Check the box for DurbinWatson Statistic
Obtaining the Critical Values of DurbinWatson StatisticFinding the Critical Values of DurbinWatson Statistic
k=1
k=2
n
dL
dU
dL
dU
15
1.08
1.36
0.95
1.54
16
1.10
1.37
0.98
1.54
_1013327115.unknown
_1044645607.unknown
_1013326860.unknown
Accept H0 (no autocorrelation)Using the DurbinWatson Statistic : No autocorrelation (error terms are independent). : There is autocorrelation (error terms are not independent).042dL4-dLdU4-dUReject H0 (positive autocorrelation)InconclusiveReject H0 (negative autocorrelation)
Residual Analysis for IndependenceNot IndependentIndependenteeTimeTimeResidual is plotted against time to detect any autocorrelationNo Particular PatternCyclical PatternGraphical Approach
Inference about the Slope: t Testt Test for a Population SlopeIs there a linear dependency of Y on X ?Null and Alternative HypothesesH0: = 0(No Linear Dependency)H1: 0(Linear Dependency)Test Statistic
df = n - 2
Example: Produce StoreData for Seven Stores:Estimated Regression Equation:The slope of this model is 1.487. Does Square Footage Affect Annual Sales?Annual Store Square Sales Feet($000) 1 1,726 3,681 2 1,542 3,395 3 2,816 6,653 4 5,555 9,543 5 1,292 3,318 6 2,208 5,563 7 1,313 3,760
Inferences about the Slope: t Test ExampleH0: = 0H1: 0 0.05df 7 - 2 = 5Critical Value(s):Test Statistic: Decision:
Conclusion:
There is evidence that square footage affects annual sales.t02.57062.57060.025RejectReject0.025From Excel Printout Reject H0
SLR
Regression Analysis
Regression Statistics
Multiple R0.9705572038
R Square0.9419812858
Adjusted R Square0.930377543
Standard Error611.7515173314
Observations7
ANOVA
dfSSMSFSignificance F
Regression130380456.119499630380456.119499681.1790901520.0002812009
Residual51871199.59478615374239.91895723
Total632251655.7142857
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%
Intercept1636.4147451.49533.62440.01515475.81092619832797.0185259536
Footage1.48660.16509.00990.000281.06249037151.9107769416
Sheet1
FootageAnnual Sales
11,7263,681
21,5423,395
32,8166,653
45,5559,543
51,2923,318
62,2085,563
71,3133,760
Sheet2
Sheet3
Inferences about the Slope: F TestF Test for a Population SlopeIs there a linear dependency of Y on X ?Null and Alternative HypothesesH0: = 0(No Linear Dependency)H1: 0(Linear Dependency)Test Statistic
Numerator df=1, denominator df=n-2
Relationship between a t Test and an F TestNull and Alternative HypothesesH0: = 0(No Linear Dependency)H1: 0(Linear Dependency)
Inferences about the Slope: F Test ExampleTest Statistic: Decision:Conclusion:
H0: = 0H1: 0 0.05numerator df = 1denominator df 7 2 = 5
There is evidence that square footage affects annual sales.From Excel Printout Reject H006.61Reject = 0.05
Scatter
3681
3395
6653
9543
3318
5563
3760
Sales
X
Y
Scatter Diagram
DataCopy
FootageSales(X-XBar)^2
17263681389732.653061224
15423395653325.795918367
28166653216889.795918367
5555954310270193.6530612
129233181119968.65306122
2208556320245.2244897959
131337601075961.65306122
SLR
Regression Analysis
Regression Statistics
Multiple R0.9705572038
R Square0.9419812858
Adjusted R Square0.930377543
Standard Error611.7515173314
Observations7
ANOVA
dfSSMSFSignificance F
Regression130,380,456.1230,380,456.1281.1790901520.000281
Residual51,871,199.59374,239.92
Total632,251,655.71
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%
Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536
Footage1.48663365650.16499921239.00994395940.00028120091.06249037151.9107769416
RESIDUAL OUTPUT
ObservationPredicted SalesResiduals
14202.3444172716-521.3444172716
23928.8038244674-533.8038244674
35822.775102905830.224897095
49894.6646881801-351.6646881801
53557.1454103313-239.1454103313
64918.901839726644.098160274
73588.3647171187171.6352828813
SLR
1726
1542
2816
5555
1292
2208
1313
Footage
Residuals
Footage Residual Plot
Estimate
Confidence Interval Estimate
X Value2000
Confidence Level95%
Sample Size7
Degrees of Freedom5
t Value2.5705818347
Sample Mean2350.2857142857
Sum of Squared Difference13746317.43
Standard Error of the Estimate611.7515173314
h Statistic0.1517831758
Average Predicted Y (YHat)4609.6820391647
For Average Predicted Y (YHat)
Interval Half Width612.6582796815
Confidence Interval Lower Limit3997.0237594833
Confidence Interval Upper Limit5222.3403188462
For Individual Response Y
Interval Half Width1687.6868039814
Prediction Interval Lower Limit2921.9952351833
Prediction Interval Upper Limit6297.3688431462
&A
Page &P
data
StoreFootageSales
11,7263,681
21,5423,395
32,8166,653
45,5559,543
51,2923,318
62,2085,563
71,3133,760
Purpose of Correlation AnalysisPopulation Correlation Coefficient (rho) is used to measure the strength between the variables.Sample Correlation Coefficient r is an estimate of and is used to measure the strength of the linear relationship in the sample observations.(continued)
r = 0.5r = 1Sample of Observations from Various r ValuesYXYXYXYXYXr = 1r = 0.5r = 0
Features of r and rUnit FreeRange between 1 and 1The Closer to 1, the Stronger the Negative Linear Relationship.The Closer to 1, the Stronger the Positive Linear Relationship.The Closer to 0, the Weaker the Linear Relationship.
Pitfalls of Regression AnalysisLacking an Awareness of the Assumptions Underlining Least-squares Regression.Not Knowing How to Evaluate the Assumptions.Not Knowing What the Alternatives to Least-squares Regression are if a Particular Assumption is Violated.Using a Regression Model Without Knowledge of the Subject Matter.
Strategy for Avoiding the Pitfalls of RegressionStart with a scatter plot of X on Y to observe possible relationship.Perform residual analysis to check the assumptions.Use a normal probability plot of the residuals to uncover possible non-normality.
Strategy for Avoiding the Pitfalls of RegressionIf there is violation of any assumption, use alternative methods (e.g., least absolute deviation regression or least median of squares regression) to least-squares regression or alternative least-squares models (e.g., curvilinear or multiple regression).If there is no evidence of assumption violation, then test for the significance of the regression coefficients and construct confidence intervals and prediction intervals.(continued)
Chapter SummaryIntroduced Types of Regression Models.Discussed Determining the Simple Linear Regression Equation.Described Measures of Variation.Addressed Assumptions of Regression and Correlation.Discussed Residual Analysis.Addressed Measuring Autocorrelation.
Chapter SummaryDescribed Inference about the Slope.Discussed CorrelationMeasuring the Strength of the Association.Discussed Pitfalls in Regression and Ethical Issues.
(continued)