Week 7 Notes

PBAF 528 Week 7

1. How do we choose our model? (continued)

We know that the s are our estimate of the slope of the relationship between a predictor (explanatory factor), X, and an outcome, Y. The slope is the amount of change in the outcome that occurs when the predictor changes by “1 unit” (in units measured by the units of that variable).

There are several other types of measures of the relationship between a predictor and an outcome.

A. Standardized regression coefficients

Show the expected change in outcome in standard deviations for a 1 standard deviation change in the explanatory factor.

Estimated coefficients in which all variables have been standardized Larger beta (in absolute value), the more important it is thought to be in

explaining the dependent variable Best for continuous variables, not dummies or categoricals. Can be used to compare the impact of variables just as s can. Allow you to compare the impacts of factors on the dependent variable (they

are on the same scale).

SPSS calculates this for us. When you run a linear regression in SPSS they are in the table called “coefficients,” in the column called “Beta,” to the right of the standard error.

Excel will not calculate standardized regression coefficients. If you are working in Excel and want to interpret standardized coefficients, you’ll have to create a formula to adjust each estimated coefficient.

Ex #1: The listed sales price of diamonds (price in thousands of dollars) can be explained in terms of the 4 C’s: CUT, COLOR, CARAT (weight), and CLARITY.

How much (in standard deviations) would the sales price change with a 1 standard deviation change in weight (carat)?

We could calculate this using the descriptive statistics on price and carat.

Table 1

Descriptive Statistics

34 10.71 .71 11.42 4.3578 3.43769

34 .86 .18 1.04 .5644 .30869

34 1.00 .00 1.00 .3824 .49327

34 1.00 .00 1.00 .4412 .50399

34

P_1000

CARAT

COLOR

CLARITY

Valid N (listwise)

N Range Minimum Maximum Mean Std. Deviation

Coefficientsa

-2.586 .299 .000

11.329 .372 .000

.590 .223 .013

.734 .226 .003

(Constant)

CARAT

COLOR

CLARITY

Model1

B Std. Error

UnstandardizedCoefficients

Sig.

Dependent Variable: P_1000a.

2

B. Elasticities

the expected percentage change in the dependent variable for a 1% change in the independent variable

.

in a linear model the slope is constant but the elasticity of Y with respect to X is not constant

must evaluated at a specific point, (X,Y), since elasticity is not constant over the length of the regression line or plane. The specific point could be the mean of X and Y or any other (X, Y) point.

can’t read these directly off SPSS or Excel output for a linear regression since elasticity is not a constant.

Ex #2: What percentage would we expect price to rise for a 1% increase in carat size?

We’ll assess this at the mean of X and Y:

So, a_______ change in price is associated with a 1% change in weight (carat)

What does this percentage change in price mean?

What is a 1% change in weight?

B. Does a straight line explain all relationships?

3

Not always! Sometimes theory or experience suggests that a linear functional form is inappropriate. So, you might consider a nonlinear form.

1. Polynomial Models

Includes terms which are raised to some power, usually quadratic--that is, X2.

If we expect a U shape or inverted U shape, then we include a squared predictor in the model: Y=0 + 1X1 + 2X1

2 + 3X2 +

The effect of a 1 unit change in X1 is 1+22X1 =

The effect of a 1 unit change in X2 is 3 =

Hypothesis testing:

4

If you want to know if any quadratic or higher order term has an impact of your model, do an F-test #2 and omit the plain term and the higher order term.

If you want to know if the model should really be in linear form, do a t-test on the quadratic term.

2. Inverse Models

If the impact of a particular independent variable is expected to approach 0 as it increases, then you should use the inverse form in the regression model:

Effect of X on Y decreases quickly as X increases. Can’t use for explanatory variables with 0 values.

A 1-unit change in X1 is associated with a -1(1/Xi2) change in Y.

An example is the relationship between rate of unemployment and the percentage change in wages. The theory is that the percentage change in wages is negatively related to the rate of unemployment, but past some level of unemployment, further increases in unemployment do not reduce the level of wage increases any further.

5

3. Logarithmic Models

Basically, logs are exponents. We typically use natural logs, which are logs in base e (e=2.718).

ln(x)=b means that eb=x

so, since e2=7.389, ln(7.389)=2

Two other properties of logs1) ln(X · Y) = lnX + lnY

2) ln(X²) = 2 · lnX

Why do we use logs? Logging depresses the number.

We use logs to reduce the absolute size of numbers and get at the meaning behind them

We use logs to make it easy to figure out impacts in percentage terms.

We can log both explanatory and outcome variables but ONLY IF THEY ARE POSITIVE AND NON-ZERO.

In SPSS or Excel, if you take the log of 0 or a negative number, you will get error messages.

Often, when taking the log of income, economists will add 1 to the variable before taking the log.

If it is necessary to take the log of a dummy variable, the variable needs to be transformed. Redefine the variable so that is takes on the values of one and e. The interpretation of β remains the same. Such a transformation changes the coefficient value but not the usefulness or theoretical validity of the dummy variable.

When we log the dependent variable, the effect of a 1-unit change in X depends on the levels of all variables.

6

(A) Double log models (log-log form)

Both the dependent and independent variables are logged (see Figure 7.2, page 220, in Studenmund):

lnY = 0 + 1lnX1 + 2lnX2

Y = e0X1

1X2

2

The effect of a 1% change in X1 is associated with a 1% change in Y because in the double log model, the coefficients are the elasticities (assuming the other X’s are held constant).

Used to model a curve increasing at an increasing or decreasing rate or decreasing at an increasing rate.

Cannot use with non-positive explanatory or dependent variable (cannot be used with negative or zero data values).

Used if elasticities are constant and slopes are not constant

7

(B) Semi-log models

Has at least one logged factor (but not all), either the outcome or an explanatory factor

lnY = 0 + 1X1 + 2X2 (left-side semi-log)

Effect of X increases in magnitude as outcome increases

Slope and Elasticity are both non-constant

A change in 1-unit of X1 is associated with a 1001% change in Y, holding other factors constant.

Can’t use with non-positive dependent variable

Useful for any model in which YAdjusts in % terms to a unit changeIn X. (i.e. salaries)

Y = 0 + 1X1 + 2lnX2

(right-side semi-log)

Effect of X2 on Y decreases as X2 gets larger.

X1 is linearly related to Y X2 is non-linearly related to Y A change of 1% in X2 is associated with

2 /100 units change in Y. Can’t use with non-positive variables. Elasticity of Y with respect to X2

An example would be a model of consumption with respect to income.

CAUTION: YOU CANNOT COMPARE THE ADJUSTED R2 FOR MODELS WITH AND WITHOUT Y TRANSFORMEDHow do we know if we have the right model?

8

Theory about shape, predictive power, tests of residuals (next time).

Examples

Ex #3 (Polynomial Example)In assignment 3, we look at predictors age, earnings, marital status, king county residency, and commute time on rent.

We will add (PEARN97^2), income-squared, into the set of predictors. Here’s the model:

Q3P5=0+ 1(PEARN97) + 2(AGE) + 3(Q8P4) + 4(KING) + 5(MARRIED) + 6(PEARN97^2) +

Here’s the fitted model:=447.9+ 2.733E-03(PEARN97) – 0.742(AGE) + 0.416(Q8P4) + 144.85(KING) +

62.664(MARRIED) –5.377E-09(PEARN97^2)

a) Why do we expect the sign on PEARN97 to be positive and on PEARN97^2 to be negative?

b) What’s the slope of PEARN97 and how do we interpret it?

c) At what level of earnings is rent expected to decrease? That is, where does the curve flatten or turn downward?

HINT: Solve for PEARN97 where Y/X =RENT/PEARN97=0

9

Ex #4: (Double Log Example)

From a 1957 study by Murti and Sastri, they fit the following function for the cotton and sugar industries in India.

lnQI = 0 + 1lnLI + 2lnKI +

where Q = output of cotton, sugarL = laborK = capital

For the Cotton Industry:lnQI = 0.97 + 0.92lnLI + 0.12lnKI SE (0.30) (0.04)sign? (+) (+)

t-value 30.7 3.0 (sig at a 5% level?)R2=.98

For the Sugar Industry:lnQI = 2.70 + 0.59lnLI + 0.33lnKI SE (0.14) (0.17)sign? (+) (+)

t-value 4.2 1.94 (sig at a 5% level?)R2=.80

a) Hypothesize and test appropriate null hypotheses at the 5% level of significance.

b) What are the elasticities of output with respect to labor and capital for each industry?What do they mean?

10

Ex #5: (Inverse Example)

What makes a car accelerate well? Here is an equation that tests different car attributes:

Si= the number of seconds it takes the ith car to accelerate from 0 to 62 mph.Ti= a dummy equal to 1 if the car has manual transmission, 0 if not.Ei= the coefficient of drag on the ith car (eg: drag is low for a jet and high for a parachute—high drag slows the car down, low drag doesn’t)Pi= the curb weight (in pounds) of the ith carHi= the bhp horsepower of the ith car.

Si= -2.16 – 1.59Ti + 7.4Ei + 0.0013Pi + 886(1/Hi)SE (0.50) (3.2) (0.0005) (102)

t -3.15 2.28 2.64 8.66

Adjusted R2=.748n=38

a) What relationships do you expect the explanatory factors to have with the dependent variable? (positive or negative—think about the transformation)

b) What is the effect of horsepower on time to accelerate? Evaluate at 195 bhp (your typical GM car) and 326 bhp (a BMW 7 series). Interpret your results.

Ex #6: (Semi-log example)

lnwageI=0 + 0.032expI +eI, where years of experience predicts annual wages.

a) What is the effect of experience on wages? (interpret the coefficient)

11

Documents

Week 7 Notes