View
221
Download
1
Category
Tags:
Preview:
Citation preview
© Copyright 2001, Alan Marshall 1
Regression AnalysisRegression AnalysisTime Series AnalysisTime Series Analysis
© Copyright 2001, Alan Marshall 2
Regression AnalysisRegression Analysis
A statistical technique for determining the best fit line through a series of data
© Copyright 2001, Alan Marshall 3
ErrorError
No line can hit all, or even most of the points - The amount we miss by is called ERROR
Error does not mean mistake! It simply means the inevitable “missing” that will happen when we generalize, or try to describe things with models
When we looked at the mean and variance, we called the errors deviations
© Copyright 2001, Alan Marshall 4
What Regression DoesWhat Regression Does
Regression finds the line that minimizes the amount of error, or deviation from the line
The mean is the statistic that has the minimum total of squared deviations
Likewise, the regression line is the unique line that minimizes the total of the squared errors.
The Statistical term is “Sum of Squared Errors” or SSE
© Copyright 2001, Alan Marshall 5
ExampleExample
Suppose we are examining the sale prices of compact cars sold by rental agencies and that we have the following summary statistics:
© Copyright 2001, Alan Marshall 6
Summary StatisticsSummary Statistics
Price
Mean 5411.41Median 5362Mode 5286Standard Deviation 254.9488004Range 1124Minimum 4787Maximum 5911Sum 541141Count 100
Our best estimate of the average price would be $5,411
Our 95% Confidence Interval would be $5,411 ± (2)(255) or $5,411 ± (510) or $4,901 to $5,921
© Copyright 2001, Alan Marshall 7
Something Missing?Something Missing?
Clearly, looking at this data in such a simplistic way ignores a key factor: the mileage on the vehicle
© Copyright 2001, Alan Marshall 8
Price vs. MileagePrice vs. Mileage
0
1000
2000
3000
4000
5000
6000
7000
0 10000 20000 30000 40000 50000 60000
Odometer Reading
Pri
ce
© Copyright 2001, Alan Marshall 9
Importance of the FactorImportance of the Factor
After looking at the scatter graph, you would be inclined to revise you estimate depending on the mileage 25,000 km about $5,700 - $5,900 45,000 km about $5,100 - $5,300
Similar to getting new test information in decision theory.
© Copyright 2001, Alan Marshall 10
Switch to ExcelSwitch to Excel
File CarPrice.xls
Tab Odometer
© Copyright 2001, Alan Marshall 11
The Regression ToolThe Regression Tool
Tools Data Analysis
Choose “Regression” from the dialogue box menu.
© Copyright 2001, Alan Marshall 12
More Than You NeedMore Than You Need
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.806307604R Square 0.650131952Adjusted R Square 0.64656187Standard Error 151.5687515Observations 100
ANOVAdf SS MS F Significance F
Regression 1 4183527.721 4183527.721 182.1056015 4.44346E-24Residual 98 2251362.469 22973.08642Total 99 6434890.19
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept 6533.383035 84.51232199 77.30686935 1.22253E-89 6365.671086 6701.094984 6365.671086 6701.094984Odometer -0.031157739 0.002308896 -13.49465085 4.44346E-24 -0.035739667 -0.026575811 -0.035739667 -0.026575811
© Copyright 2001, Alan Marshall 13
IgnoreIgnore
The ANOVA table The Upper 95% and Lower 95% stuff.
© Copyright 2001, Alan Marshall 14
© Copyright 2001, Alan Marshall 15
Stripped Down OutputStripped Down Output
Regression StatisticsMultiple R 0.806307604R Square 0.650131952Adjusted R Square 0.64656187Standard Error 151.5687515Observations 100
Coefficients Standard Error t Stat P-valueIntercept 6533.383035 84.51232199 77.30686935 1.22253E-89Odometer -0.031157739 0.002308896 -13.49465085 4.44346E-24
© Copyright 2001, Alan Marshall 16
InterpretationInterpretation
Our estimated relationship is Price = $6,533 - 0.031(km)
Every 1000 km reduces the price by an average of $31
What does the $6,533 mean?Careful! It is outside the data range!
© Copyright 2001, Alan Marshall 17
QualityQuality
The model makes sense: Price is lowered as mileage increases, and by a plausible amount.
The slope: 13.5 from 0! Occurs randomly, or by chance, with a
probability that has 23 zeros! The R-squared: 0.65: 65% of the variation
in price is explained by mileage
© Copyright 2001, Alan Marshall 18
Multiple RegressionMultiple Regression
Using More than One Explanatory Variable
© Copyright 2001, Alan Marshall 19
Using ExcelUsing Excel
No significant changes
© Copyright 2001, Alan Marshall 20
To Watch ForTo Watch For
Variables significantly related to each other Correlation Function (Tools Data Analysis) Look for values above 0.5 or below -0.5
Nonsensical Results Wrong Signs
Weak Variables Magnitude of the T-ratio less than 2 p-value greater than 0.05
© Copyright 2001, Alan Marshall 21
Dummy VariablesDummy Variables
Qualitative variables that allow the relationship to shift is a certain factor is present.
Illustrated in the two upcoming examples
© Copyright 2001, Alan Marshall 22
ExamplesExamples
House Prices
Theme Park Attendance
© Copyright 2001, Alan Marshall 23
Time Series AnalysisTime Series Analysis
© Copyright 2001, Alan Marshall 24
Time Series AnalysisTime Series Analysis
Various techniques that allow us to Understand the variation in a time series Understand the seasonalities and cycles in a
time series Use this understanding to make predictions
© Copyright 2001, Alan Marshall 25
Two TechniquesTwo Techniques
Deseasonalizing based on a moving average
Using Dummy Variables to Isolate the seasonal effects.
© Copyright 2001, Alan Marshall 26
Moving AverageMoving Average
Calculate a moving average Calculate the ratio of the observation to the
moving average Collect all ratios organized by the point in
the seasonal cycle months, if monthly; quarters, if quarterly
Average, and adjust if necessary, to get seasonal adjustment factors
© Copyright 2001, Alan Marshall 27
ExampleExample
Course Kit Example
Page 143
© Copyright 2001, Alan Marshall 28
RegressionRegression
Add dummy variables for all but one seasonal period (i.e., 3 for quarterly, 11 for monthly)
© Copyright 2001, Alan Marshall 29
ExampleExample
Revisit the Course Kit Example
Page 143
© Copyright 2001, Alan Marshall 30
Edgar Feidler’s Six Rules of Edgar Feidler’s Six Rules of ForecastingForecasting
With thanks to Peter Walker for bringing this to my attention
© Copyright 2001, Alan Marshall 31
Forecasting is very difficult, Forecasting is very difficult, especially if it is about the especially if it is about the
futurefuture
© Copyright 2001, Alan Marshall 32
The minute you make a The minute you make a forecast, you know you’re forecast, you know you’re
going to be wrong, you just going to be wrong, you just don’t know when or in what don’t know when or in what
direction.direction.
© Copyright 2001, Alan Marshall 33
The herd instinct among The herd instinct among forecasters make sheep look forecasters make sheep look
like independent thinkerslike independent thinkers
© Copyright 2001, Alan Marshall 34
When asked to explain a When asked to explain a forecast, never underestimate forecast, never underestimate
the power of a platitude the power of a platitude
© Copyright 2001, Alan Marshall 35
When you know absolutely When you know absolutely nothing about a subject, you nothing about a subject, you
can still do a forecast by can still do a forecast by asking 300 people who don’t asking 300 people who don’t
know anything either.know anything either.That’s called a surveyThat’s called a survey
© Copyright 2001, Alan Marshall 36
Forecasters learn more and Forecasters learn more and more about less and less until more about less and less until
they know nothing about they know nothing about anything anything
Recommended