113
1 Powerpoint 2 REGRESSION, REGRESSION, REGRESSION!

Powerpoint2.reg

Embed Size (px)

Citation preview

Page 1: Powerpoint2.reg

1

Powerpoint 2

REGRESSION,REGRESSION, REGRESSION!

Page 2: Powerpoint2.reg

2

REGRESSION / CORRELATION

Object: To measure the degree of association between variables and/or to predict the value of one variable from the knowledge of the values of (an)other variable(s).

Relationships:(1) Functional(2) Statistical

Page 3: Powerpoint2.reg

3

Functional Relationship:

Y=f(X), an exact relationship-- no “error”.

e.g., Y = -25 +.10X

$ spent at B&N during the year$ savings

(joining Barnes & Noble book club)

Page 4: Powerpoint2.reg

4

Statistical Relationship:(true only “on the average”)

YPRODUCTION

XLABOR HOURS

Linear

YPHYSICAL ABILITY

XAGE

Non-linearUpside-down U-shape

Page 5: Powerpoint2.reg

5

Consider the following data, which represent the sales of a product (adjusted for trend) over the last 8 sales periods:

Y = sales (millions)

116 109 117 112122 113 108 115

Y = 114

What would (should) one predict for the next sales period? Probably, one would be hard pressed, in this case, to justify choosing other than Y=114. How good will this prediction be?

Last 8 sales of periods

Average of the 8 sales amounts

Page 6: Powerpoint2.reg

6

WE DON’T

KNOW!!!!!

Page 7: Powerpoint2.reg

7

But-- we can get an idea by looking at how well we would have done, had we been using this 114 all along:

TSS =Total Sum of Squares

So TSS = Y -Y)2 = 144j

n

j=1

Y Y (Y-Y) (Y-Y)2

116109117112122113108115

114114114114114114114114

2-53

-28

-1-61

425

94

641

361

Y=114 0 144

Prediction error/residual

Page 8: Powerpoint2.reg

8

Two ways to look at the “TSS”:

1) A measure of the “mis-prediction” (prediction error) using Y as predictor.

2) A measure of the “Total Variability in the System” (the amount by which the 8 data values aren’t all the same).

When the TSS is larger/when the data varies more, you have more reason to investigate

Page 9: Powerpoint2.reg

9

Consider using X, advertising, to “help” predict Y:

105

110

115

120

125

0 1 2 3 4X

Y

Scatter Diagram

Y X

116 109 117 112 122 113 108 115

2 1 3 1 4 2 1 2

Y=114

X=2

Page 10: Powerpoint2.reg

10

Consider a Linear or Straight Line Statistical relationship between the two variables, and then consider finding the “best fitting line” to the data. Call this line:

Yc = a+bX

Yc = “Computed Y” or “Predicted Y”

Y is called the Dependent VariableX is called the Independent Variable

Page 11: Powerpoint2.reg

11

What do we mean by “best fitting”?

Answer:The “Least Squares” line, i.e., the line which minimizes the sum of the squares of the distances from the “dots”, Y, and the “line”, Yc. Hence, the MATH problem is to minimize

Y -Yc)2

n

j=1

Y1 = 7

Yc1 = 5

X1

Y

X

Page 12: Powerpoint2.reg

12

To find this Least Squares line, we theoretically need calculus.

However, as a practical matter, every text gives the answer, and, more importantly, we will get the result using Excel, or SPSS, or other software - NOT “BY HAND.”

(There is an arithmetic formula for “b” and “a” in terms of the sum of the X’s, the sum of the Y’s, the sum of the X•Y’s, etc., but with software available, we never use it.)

Page 13: Powerpoint2.reg

13

105

110

115

120

125

0 1 2 3 4

Least squares line Yc = 106 + 4X

Page 14: Powerpoint2.reg

14

Page 15: Powerpoint2.reg

15

Page 16: Powerpoint2.reg

16

Page 17: Powerpoint2.reg

17

Page 18: Powerpoint2.reg

18

Page 19: Powerpoint2.reg

19

Page 20: Powerpoint2.reg

20

Page 21: Powerpoint2.reg

21

Page 22: Powerpoint2.reg

22

Intercept and slope

Page 23: Powerpoint2.reg

23

So, using X in the best way, we have a prediction line of Yc=106+4X. How good are the predictions we’ll get using this line? Suppose we had been using it:

TSS SSE

106+4(2)(Y-Y)2 (Y-Y) Y X Yc Y-Yc (Y-Yc)2

4 25 9 464 136 1

2 -5

3 -2

8 -1 -6

1

116109117112122113108115

21314212

114110118110122114110114

2-1-120-1-21

41140141

144 0 0 16

Page 24: Powerpoint2.reg

24

So, SSE = (Y-Yc)2 = 16.

SSE = Sum of Squares “due to error”

That is, we use X in the best way possible, and still do not get perfect prediction. The amount of “mis-prediction” still remaining, measured by sum of squares, is 16. This must be due to factors other than advertising (X). (Perhaps: size of sales force, number of retail outlets, strategy of competition, interest rates, etc.)

Page 25: Powerpoint2.reg

25

We call all these other factors “ERROR”. That is, “error” is the collective name of all variables (factors) not used in making the prediction.

SSE is also called “SUM OF SQUARED RESIDUALS” or “RESIDUAL SUM OF SQUARES”.

Page 26: Powerpoint2.reg

26

We have TSS = 144 and SSE = 16.

TSS - SSE = 128

What happened to the other 128? We call this “SSA”: (“SSR” in text)

SSA = TSS - SSE = 128

SSA = Sum of squares “due to X” or “Attributed to X”.

Page 27: Powerpoint2.reg

27

So, TSS = SSA + SSE

TotalVariability

Variabilitydue to ERROR

VariabilityAttributed

to X

= +

Page 28: Powerpoint2.reg

28

We have

r2 is called the “Coefficient of Determination”, and is interpreted as the “proportion of variability in Y explained by X” or “... explained by the relationship between Y and X expressed in the regression line”.

r2 = = = .89SSATSS

128144

Page 29: Powerpoint2.reg

29

0 ≤ r2 ≤ 1 r2 = SSA

TSS

Of course, 1 - r2 = = .11 SSETSS

is interpreted as the proportion of variability in Y unexplained by X (and still present).

Define r= SQRT(r2). r= correlation (coefficient).Here r=SQRT(.89) = .943

Page 30: Powerpoint2.reg

30

But, r can be + or - !!

SQRT(.89) = +.943 or -.943.

It takes on the sign of b in Yc = a+bX.

A value of r near 1 or -1 is suggestive of a strong linear relationship between Y and X. A value of r near 0 is suggestive of no linear relationship between Y and X.

-1 ≤ r ≤ 1

Page 31: Powerpoint2.reg

31

Note that the sign of r indicates the direction of the relationship (if any). A “+” indicates that Y and X move in the same direction; a “-” indicates that they move in opposite directions. Some people refer to a positive r as a “positive relationship” and a negative r as an “inverse relationship”.

Page 32: Powerpoint2.reg

32

X

Y

r = +1

X

Y

r = -1

X

Y

r = +.8

X

Y

r = -.65

X

Y r = 0

X

Y r = 0

Page 33: Powerpoint2.reg

33

Note that a high r2 does not necessarily mean CAUSE/EFFECT.

Frequently we have “spurious correlations” – two variables which are highly related in terms of r2, but only because they are both “driven” by a third variable.

“Classic” example:

Number ofTEACHERS

Number quarts ofLIQUOR SOLD

Page 34: Powerpoint2.reg

34

Page 35: Powerpoint2.reg

35

R and R2SSA

SSETSS

Page 36: Powerpoint2.reg

36

THE MODEL

In order to get a measure of prediction error (e.g., confidence intervals, hypothesis testing), we must make some assumptions about the distribution of points scattered about the regression line. These assumptions are usually couched in what is called a “statistical model.”

Page 37: Powerpoint2.reg

37

We specifyY•X = A+BX

Where Y•X is the mean or average value of Y for a given X. We have a (true) slope of B and (true) intercept of A; A and B are parameters, the exact values of which we’ll never know.

Page 38: Powerpoint2.reg

38

This says that if we set X = 1 (for example) and sample an infinite number of Y’s (hence finding Y.1) and then set X = 2 and find Y.2, X = 3 and find Y.3, etc., all the Y•X fall exactly on a straight line

X

(TRUE)

AverageY,Y•X

Page 39: Powerpoint2.reg

39

But, we never find Y•X. For a given X, we observe a value of Y which differs from Y•X in the same way that when we observe any random variable value, it does not equal “” but is some point governed by some probability law.

Y.XY

f(Y)

Page 40: Powerpoint2.reg

40

The way we write this in a formal way is:

Y = Y•X + A+BX +

Where is the difference between an individual Y and the mean Y, all given a specific X.

is basically the impact of having a non-zero

Page 41: Powerpoint2.reg

41

Example:Suppose that Y = weight X = height,

and Y•X=70” = 160 lbs.

Then a person 70” tall with weight of 168 pounds has a “personal ” of 8 lbs. If his/her weight were 158 lbs., his/her personal would be -2 pounds.

Of course, since = Y - Y•X, and we don’t know Y•X, we don’t really know anybody’s personal .

Page 42: Powerpoint2.reg

42

We find the LS line,

Yc = a + bx

a estimates A

b estimates B

Yc estimates Y•X , and Y itself.

Page 43: Powerpoint2.reg

43

We usually make the following assumptions,

which are called

“the standard assumptions.”

1) NORMALITY2) HOMOSCEDASTICITY3) INDEPENDENCE

Page 44: Powerpoint2.reg

44

Assumption 1:

Given a value of X, the probability of Y is normal.

(e.g., with Y = weight and X = height, for any given height (say 70”) the Y’s are normal around Y•X =70 (say, 160 lbs.)

160

Y

Page 45: Powerpoint2.reg

45

Assumption 2:

The standard deviation of , (which we don’t know) which is usually called Y•X, is constant for all values of X. The characteristic of having Y•X constant is referred to as “Homoscedasticity.”

Page 46: Powerpoint2.reg
Page 47: Powerpoint2.reg

47

Page 31 old

Page 48: Powerpoint2.reg

48

Combining assumptions 1 & 2, we have the Y’s being normally distributed with y.x as mean (and correspondingly, average error of 0) and constant standard dev. y•x.

Of course, as you know, neither Y•X nor y•x is known.

Y•X is estimated by Yc = a+bx

y•x is estimated by “Sy•x”

Sy•x is called the “Standard Error of Estimate,”

Sy•x = SSEn2

Page 49: Powerpoint2.reg

49

The SSE makes intuitive sense, in that SSE is a variability due to error. The [n-2] (instead of [n-1], the denominator of S in most previous applications) is really a degrees of freedom number. The df = n minus a degree of freedom for each parameter estimated from the data. Here, there are 2 such parameters, A and B (estimated by a and b, respectively).

Page 50: Powerpoint2.reg

50

Later, when we have a model of

Y = A + B1 X1 + B2 X2 + ,

the df will be [n-3].

We usually get Sy•x from the Computer output.

Here, Sy•x = 1.63 (See output on next page).

Page 51: Powerpoint2.reg

51

Page 52: Powerpoint2.reg

52

sy.x

Page 53: Powerpoint2.reg

53

Assumption 3:

The Y values are independent of one another. (This is often a problem when the data form a time series).

In the real world these assumptions may never be exactly true, but are often close enough to true to make our statistical analysis (which follows) valid.

Investigation has shown that moderate departure from assumptions 1 and 2 do not appreciably affect results (i.e., assumptions 1 and 2 are “Robust”). In terms of large departures –– there are ways to recognize them and do the appropriate (but more complex) analysis.

Page 54: Powerpoint2.reg

54

CONFIDENCE INTERVALS

95% confidenceIntervals for A and B

Page 55: Powerpoint2.reg

55

Page 56: Powerpoint2.reg

56

This, you had before

Now added to output

Page 57: Powerpoint2.reg

57

This formula is a excellent approximation when n is “large,” (virtually always in MK) and the value of X

at which we are predicting isn’t dramatically far from the center [X-bar] of our data.

Yc ± t1- • Sy•x (n-2) df

For 95% confidence, and X = 3, we have:

118 ± 2.447(1.63) or 118 ± 3.99

Of greater interest (usually) is a confidence interval for the prediction of the next “period.” This is done by:

TINV(.05, 6)

Recall: Yc=106+4X

Page 58: Powerpoint2.reg

58

(EXCEL COMMAND)

TINV(.05, 6) = 2.447

In general: TINV(, df)

Page 59: Powerpoint2.reg

59

Hypothesis Testing

To test:To test: HH00: B=0 : B=0 Note: B=0 Note: B=0

HH11: B≠0: B≠0same as X & same as X &

YY NOTNOT RELATEDRELATED

Y= A + BX + Y= A + BX + We computeWe compute

ttcalccalc= b-B= b-BH0H0

ssbb

0

and accept Hand accept H0 0 ifif |t|tcalccalc| < t| < t1- 1-

(n-2)df(n-2)dfreject Hreject H00 if |t if |tcalccalc|| > t > t 1- 1-

(n-2)df(n-2)df

Page 60: Powerpoint2.reg

60

If =.05, we have t.95= 2.4476 df

and we reject H0

We’ll refer to this as the “t-test.”

tcalcIn our problem- = 6.93 (see output on next page)

(All we really need to do is to examine the p-value)

Page 61: Powerpoint2.reg

61

Page 62: Powerpoint2.reg

62

P-value (called “significance” by SPSS)

Page 63: Powerpoint2.reg

63

Here, where Here, where y•xy•x = A + BX, = A + BX,

there’s only one B, and thus the H’s there’s only one B, and thus the H’s above are the same as the previous above are the same as the previous

HH00: B=0 H: B=0 H11: B≠0: B≠0

To test H0: all B’s=0H1: not all B’s=0,

we have a different procedure.

Page 64: Powerpoint2.reg

64

HoweverHowever, for the future, where, for the future, where

y•xy•x = A + B = A + B11XX11 + B + B22XX2, 2, and “all B’s=0” and “all B’s=0” means means

BB11=B=B22=0, and there is a difference =0, and there is a difference between between

“B=0” and “all B’s=0,” we introduce:“B=0” and “all B’s=0,” we introduce:

HH00: all B’s =0: all B’s =0

HH11: not all B’s=0: not all B’s=0

Page 65: Powerpoint2.reg

65

To test the above, we determine

Fcalc

We get Fcalc from the output!!! Yeah!!!!

Page 66: Powerpoint2.reg

66

And we accept H0 if Fcalc < F1-(1, n-2) df

reject H0 if Fcalc > F1- (1, n-2) df

where F 1- is the appropriate value from the F table.

More easily: examine p-value of F- test (next page)

= 0.05

5.99

F

Page 67: Powerpoint2.reg

67

Page 68: Powerpoint2.reg

68

Fcalc and p-value

Page 69: Powerpoint2.reg

69

MULTIPLE REGRESSION

When there is more than one independent variable (X), we call our regression analysis by the term “Multiple Regression.” With a single independent variable, we call it “Simple Regression.”

Page 70: Powerpoint2.reg

70

y•y•xx = A + B = A + B11XX1 1 + B+ B22XX22 + + • • • + B • • • + Bk-1k-1XXk-1k-1

Y = Y = y•xy•x + +

Least Squares hyperplane (“line”):Least Squares hyperplane (“line”):

YYcc = a + b = a + b11xx11 + b + b22xx22 + • • • + b + • • • + bk-1k-1xxk-1k-1

NOTE: NOTE: k-1 = Number of X’sk-1 = Number of X’s k = Number of parameters k = Number of parameters

Page 71: Powerpoint2.reg

71

Example:Example:Y = Job PerformanceY = Job PerformanceXX11 = Score on (entrance) Test 1 = Score on (entrance) Test 1

XX22 = Score on Test 2 = Score on Test 2

XX33 = Score on Test 3 = Score on Test 3

oror

Y = SalesY = SalesXX11 = Advertising = Advertising

XX22 = Number of sales people = Number of sales people

XX33 = Number of competitors = Number of competitors

We assume that Computer software gives us all (or nearly all) the numerical results.

Page 72: Powerpoint2.reg

72

Typically, we wish to perform two types of Hypothesis Tests:

First: F – test (Y = A+B1X1 + ••• + Bk-1 Xk-1+

H0 : B1 = B2 = B3 = . . . = Bk-1 = 0H1 : not all B’s = 0

Page 73: Powerpoint2.reg

73

In “English”: H0: The X’s collectively do not help us predict Y.

H1: At least one of the darn X’s help us predict Y!

We call this, reasonably so, a “TEST OF THE OVERALL MODEL”

H0 : B1 = B2 = B3 = . . . = Bk-1 = 0H1 : not all B’s = 0

Page 74: Powerpoint2.reg

74

If we accept H0 that the X’s collectively do not help us predict Y, we probably discontinue formal statistical analysis.

However, if we reject H0 (i.e., the “F is significant”), then we are likely to want a series of t-tests:

H0 : B1 = 0H1 : B1 ≠ 0 ,

H0 : B2 = 0H1 : B2 ≠ 0 ,

H0 : Bk-1 = 0H1 : Bk-1 ≠ 0

•••

Page 75: Powerpoint2.reg

75

These are called “Tests for individual X’s.” The test is answering: (using B1 as an example)

H0 : Variable X1 is NOT helping us predict Y, above and beyond the other variables in the model.

H1 : X1 IS INDEED helping us predict Y, above and beyond the other variables in the model.

Page 76: Powerpoint2.reg

76

X1 height

X2 pant length

Y weight

F-Test : SIGNIFICANTt1 : NOT SIGNIFICANTt2 : NOT SIGNIFICANT

So, note:

We’re answering whether a variable gives us INCREMENTAL value.

Sometimes a result looks “strange” -

Page 77: Powerpoint2.reg

77

If I know a person’s X1, height, do I get additional predictive valueabout Y, weight, from knowing pant length?No - hence, weaccept H0: B2= 0 (t2 not sign.)

Y = WeightY = Weight

XX11 = Height = Height

XX22 = Pant Length = Pant Length

Page 78: Powerpoint2.reg

78

If I know XIf I know X22, pant length, do I get, pant length, do I get

additionaladditional predictive value about Y predictive value about Y from knowing height?from knowing height?

(Also) No - hence we (Also) No - hence we acceptaccept HHoo: B: B11= 0= 0

(t(t11 notnot sign.) sign.)

Page 79: Powerpoint2.reg

79

When the X’s themselves are highly

interrelated (the fact that leads to

the strange looking - but not really

strange result), we call this

MULTI-COLINEARITY.

Page 80: Powerpoint2.reg

80

XX11 R R22 = .5 = .5

XX22 R R22 = .4 = .4

XX11, X, X22 RR22 = ? = ?

YYYYYY

Ans: between .5 and .9Ans: between .5 and .9

(In some unusual, “strange” cases, (In some unusual, “strange” cases, RR22 may exceed .9 ) may exceed .9 )

If XIf X11 and X and X22 not overlapping in the not overlapping in the

information provided, Rinformation provided, R22 = .9; if X = .9; if X22 tells tells

us a total subset of what Xus a total subset of what X11 tells us, tells us,

RR22 = .5. = .5.

Another “look” at this issue:

Page 81: Powerpoint2.reg

81

1) The F test is significant because the X’s together tell us (an estimate of) 73% of what’s going on with Y.

2) t1 (likely) not sign., because the gain of .01 (.73 - .72 [with only X2]) is judged by the t-test as too easily due to the “luck of the draw”. (Actually, it depends on the sample size)

3) t2 , similarly.

YYYYYY

XX11 RR22 = .70 = .70

XX22 R R22 = .72 = .72

XX11, X, X22 R R22 = .73, = .73,

If you haveIf you have

Page 82: Powerpoint2.reg

82

XX11 X X22 X X33 Y Y

100100 95 95 87 87 88 88 99 99 98 8099 99 98 80 101 103 101 96101 103 101 96 93 95 91 7693 95 91 76 95 102 88 8095 102 88 80 95 94 84 7395 94 84 73

. . . .. . . .n = 25n = 25

Example: Y = Job performanceExample: Y = Job performance XX1 1 = Test 1 score= Test 1 score XX22 = Test 2 score = Test 2 score

XX3 3 = Test 3 score= Test 3 score

Page 83: Powerpoint2.reg

83

X1 X2 X3 YX1 X2 X3 Y

Page 84: Powerpoint2.reg

84

Page 85: Powerpoint2.reg

85

Page 86: Powerpoint2.reg

86

Page 87: Powerpoint2.reg

LEAST SQUARES LINE

So, Yc =

-106.34 + 1.02•X1 + .137•X2 + .87•X3

Page 88: Powerpoint2.reg

88

To test: HTo test: H00: B: B11 = B = B22 = B = B33 = 0 = 0 = .05 = .05

HH11: not all B’s = 0: not all B’s = 0

FF.05.05

47.59847.598

Since Since p-value = .000000001528p-value = .000000001528 < .05< .05,,we reject H0.

from output

fromoutput

Page 89: Powerpoint2.reg

89

To Test

Ho: B1 = 0 Ho: B2 = 0 Ho: B3 = 0

H1: B1 ≠ 0 H1 : B2 ≠ 0 H1 : B3 ≠ 0

We have

tcalc1 = 3.65tcalc2 = .80 tcalc3 = 3.57

(p = .0015) (p = .4314) (p = .0018)

t1-= 2.08 = .05

21 df = 25 - 40- 2.08 2.08

Page 90: Powerpoint2.reg

90

For and we reject Ho; for we accept Ho.

Conclusion in Practical Terms?

x1 (Test 1) and x3 (Test 3) each gives us incremental predictive value about PERFORMANCE, Y.

X2 (Test 2) is either irrelevant or redundant.

1 3 2

Page 91: Powerpoint2.reg

91

An added benefit of the analysis was to indicate how the tests should be weighted: The best fit occurs if the tests are weighted

1.02, .137, .87

(assuming we retain Test 2).

This is equivalent to weights of

1.02 , .137 , .87

2.027 2.027 2.027

or

(.50, .07, .43)

The present weights were (1/3, 1/3, 1/3).

1.02 .137 .87

2.027

Page 92: Powerpoint2.reg

92

“PROBLEM IN NOTES”

Consider the following model: Y = A+B1•X1+B2•X2+B3•X3+ Y = Sales Revenue (in units of $100,000)X1= Expenditure on TV advertising (in units of $10,000)X2= Expenditure on Web advertising (in units of $10,000)X3= Expenditure on Newspaper advertising (in units of $10,000)

Refer to computer output following the questions -1. What is the least squares line (hyperplane)?2. What revenue do I expect (in dollars) with no advertising

in any of the three media?3. If $10,000 more were allocated to advertising, which

medium should receive it to generate the most additionalrevenue?

Page 93: Powerpoint2.reg

93

4) What percent of the variability in revenue is due to factors other than the expenditures in the three advertising media?5) If management decided to spend the same amount of money on each of the three types of media, how much total money would have to be spent to generate an expected revenue of $40,000,000?6) Test H0: B1 = B2 = B3 = 0 vs. H1: not all B’s = 0, at = .05. What is your conclusion in practical terms?7) For each variable, test H0: B = 0 vs. H1: B ≠ 0, at = .05. What are your conclusions in practical terms?

Page 94: Powerpoint2.reg

94

.

Page 95: Powerpoint2.reg

95

Dummy Variables(Indicator)

(Categorical)

Ex: Y = A + B1X1 + B2X2 +

Disposable Income / yr.

Sex

Male X2 = 1

Female X2 = 0

$spent on DVDs/mo.

Page 96: Powerpoint2.reg

96

We Get Yc = a + b1X1+ b2X2

For any given X1, income, we predict Y as follows:

Male: Yc = a + b1X1 +b2(1) = a + b1X1 + b2

Female: Yc = a + b1X1 +b2(0) = a + b1X1 + 0

How is b2 to be interpreted?

We can test, of course, Ho: B2 = 0 vs. H1: B2 ≠ 0.

Page 97: Powerpoint2.reg

97

Ans: The (estimated) amount spent by a Male, above that which would be spent by a Female, given the same X1 value (income). (Of course, if b2 is negative, it says that we estimate that Females spend more than Males, at equal incomes.)

If we had defined X2 = 1 for F’s

X2 = 0 for M’s ,

then b2 would reverse sign, and have the opposite meaning.

Page 98: Powerpoint2.reg

98

Remember that a variable is a “dummy” variable because of definition and interpretation. The computer treats a variable whose values are 0 and 1, just like any other variable.

Our data are, perhaps,

Y X1 X2

20 50 1

18 40 1

33 65 0

24 49 0

21 62 1

• • •

• • •

Page 99: Powerpoint2.reg

99

Note that we have 2 categories, (M,F), but only one dummy variable.

This is necessary, as is the general situation of C categories, (C-1) dummy variables.

This is because of computation issues involved in matrix inversion;

Page 100: Powerpoint2.reg

100

Example

YC = a + b1X1 + b2X2 + b3X3 + b4X4 + b5X5

Water Usage

Temp. Amount Produced # People

Employed

Plant 1

Plant 2

Plant 3

X4 X5

1 0

0 1

0 0

Page 101: Powerpoint2.reg

101

Let a + b1X1 + b2X2 + b3X3 = G

Then we predict: (for a given X1, X2 , X3)

FOR PLANT 1: G + b4(1) + b5(0) = G + b4

FOR PLANT 2: G + b4(0) + b5(1) = G + b5

FOR PLANT 3: G + b4(0) + b5(0) = G

How do we interpret b4? b5?

Page 102: Powerpoint2.reg

102

STEPWISE REGRESSION

A “variation” of multiple regression to pick the “best” model.

Page 103: Powerpoint2.reg

103

Y/X1, X2, X3, X4Step 1:

Internal: Run separate simple regressions with each X; pick the best (best= highest R2)

Y/X1 .45Y/X2 .50Y/X3 .48Y/X4 .28

R2

External: Y/X2, R2=.50

Page 104: Powerpoint2.reg

104

Step 2:

Internal: Y/X2, X1 .59

Y/X2, X3 .68 Y/X2, X4 .70

R2

External: Y/X2, X4, R2= .70

Page 105: Powerpoint2.reg

105

Step 3: Internal: Y/X2, X4, X1 .77 Y/X2, X4, X3 .73 External: Y/X2, X4, X1, R

2= .77

R2

NOTE: If at any stage, the best variable to enter is not significant by the t-test, the ALGORITHM STOPS (and does not bring that variable in!!!). You select a p-value (pin), and if p-value of entering variable > pin (i.e., variable is not significant), the variable does not enter and the algorithm stops.

Page 106: Powerpoint2.reg

106

Also- There’s a step 3b (and 4b, 5b, etc.)

Step 3b) Now that we’ve entered our third variable, software goes back and re-examines previously

entered variables to see if any should be DELETED (specify a “p to go out”, pout, so that if p-

value of a variable in our model > pout, the variable is deleted.

Algorithm continues until it stops!!!!

Page 107: Powerpoint2.reg

107

Page 108: Powerpoint2.reg

108

Output for the example with three tests and job performance

Page 109: Powerpoint2.reg

109

Page 110: Powerpoint2.reg

110

KEY!!!

Page 111: Powerpoint2.reg

111

Variable 1: Y= GRADUATE GPAVariable 2: X1= UNDERGRAD GPAVariable 3: X2= QUANTITATIVE GMATVariable 4: X3= VERBAL GMATVariable 5: X4= COLLEGE SELECTIVITY

0= Less Selective 1= More Selective

Y1 X1 X2 X3 X4

3.50 3.60 600.00 580.00 0.03.90 3.60 680.00 670.00 1.00 . . . . . . . . . . . . . . .3.20 2.90 440.00 430.00 1.00

Page 112: Powerpoint2.reg

112

Detailed Summary of Stepwise Analysis

Ent. Var. LS Line R2

Step 1

Step 2

Step 3

UNDERGRADGPA

QUANTGMAT

COLLEGESEL.

X1

X2

X4

Yc= .85 + .73X1

Yc= .585 .53X1

.00165X2

Yc= 1.197 .309X1

.00163X2

.284X4

+++

++

.609

.833

.915

STOP! If we bring in Verbal GMAT, R2=.919

Page 113: Powerpoint2.reg

113

PRACTICE PROBLEM

Y = COMPANY ABC’s SALES ($millions)X1 = OVERALL INDUSTRY SALES ($billions)X2 = COMPANY ABC’s ADVERTISING ($millions)X3 = SPECIAL PROMOTION BY CHIEF COMPETITOR: 0 = YES, 1 = NO

A STEPWISE REGRESSION WAS RUN WITH THESE RESULTS:

STEP 1: VARIABLE ENTERING: X1, Yc = 205+16•X1, R2 = .48

STEP 2: VARIABLE ENTERING: X2, Yc = 183+11•X1+10•X2, R2 = .64

STEP 3: VARIABLE ENTERING: X3, Yc = 180+10•X1+8•X2+65•X3, R2 = .68

A) If ABC’s advertising is to be same next year as this year (i.e., X2 held constant), and we do not know (in advance) the value of X3, what would we predict to be the increase in ABC’s sales if overall industry sales (X1) increase by $1 billion?

a) 10 b) 11 c) 16

B) Based on the given information, we can conclude that the R2 between Y and X2 (the exact value of which we cannot determine from the given information) is between:

a) .16 and .48 b) .16 and .64 c) .48 and .64 d) none of these

C) Answer part B) if the regression results above were NOT part of a stepwise procedure, but simply a set of multiple regression results.