Lecture 10: Nonlinear Regression Functions · A General Strategy For Modeling Nonlinear Regression Functions TestScoresanddistrictincome Testscorescanbe determinedbyaverage districtincome

Lecture 10: Nonlinear Regression Functions

Zheng Tian

Zheng Tian Lecture 10: Nonlinear Regression Functions 1 / 46

Outline

1 Introduction

2 A General Strategy For Modeling Nonlinear Regression Functions

3 Nonlinear functions of a single independent variable

4 Interactions between independent variables

5 Warm-up exercises

6 Regression Functions That Are Nonlinear in the Parameters


Introduction

Overview

Linear population regression function

E (Yi | Xi ) = β0 + β1Xi1 + · · ·+ βkXik , where Xi = (Xi1, . . . ,Xik)′.

Nonlinear population regression function

E (Yi | Xi ) = f (Xi1,Xi2, . . . ,Xik ;β1, β2, . . . , βm), where f (·) is a nonlinearfunction.

Study questions

Why do we need to use nonlinear regression models?What types of nonlinear regression models can we estimate by OLS?How can we interpret the coefficients in nonlinear regression models?


A General Strategy For Modeling Nonlinear Regression Functions

Test Scores and district income

Test scores can bedetermined by averagedistrict incomeWe estimate a simple linearregression model

TestScore = β0+β1Income+u

What’s the problem withthe simple linear regressionmodel?

Figure: Scatterplot of test score vs district incomeand a linear regression line



Why does a simple linear regression model not fit the datawell?

Data points are below the OLS line when income is very low (under $10,000)or very high (over $40,000), and are above the line when income is between$15,000 and $30,000.

The scatterplot may imply a curvature in the relationship between test scoresand income.That is, a unit increase in income may have larger effect on test scores whenincome is very low than when income is very high.

The linear regression line cannot capture the curvature because the effect ofdistrict income on test scores is constant over all the range of income since

∆TestScore/∆Income = β1

where β1 is constant.



Estimate a quadratic regression model

TestScore = β0 + β1Income + β2Income2 + u (1)

This model is nonlinear, specifically quadratic, with respect to Income sincewe include the squared income.The population regression function is

E (TestScore|Income) = β0 + β1Income + β2Income2

It is linear with respect to β. So we can still use the OLS estimation andcarry out hypothesis testing as we do with a linear regression model.



Estimate a quadratic regression model (cont’d)

Figure: Scatterplot of test score vs district income and a quadratic regression line



A general formula for a nonlinear population regressionfunction

A general nonlinear regression model is

Yi = f (Xi1,Xi2, . . . ,Xik ;β1, β2, . . . , βm) + ui (2)

The population nonlinear regression function:

E (Yi |Xi1, . . . ,Xik) = f (Xi1,X2i , . . . ,Xik ;β1, β2, . . . , βm)

The number of regressors and the number of parameters are not necessarilyequal in the nonlinear regression model.In vector notation

Yi = f (Xi ;β) + ui (3)

We focus on the nonlinear regression models such that f (·) is nonlinear withXi but linear with β.



The effect on Y of a change in a regressor

For any general nonlinear regression function

The effect on Y of a change in one regressor, say X1, holding other thingsconstant, can be computed as

∆Y = f (X1 + ∆X1,X2, . . . ,Xk ;β)− f (X1,X2, . . . ,Xk ;β) (4)

For continuous and differentiable nonlinear functions

When X1 and Y are continuous variables and f (·) is differentiable, the marginaleffect of X1 is the partial derivative of f with respect to X1, that is, holding otherthings constant

dY =∂f (X1, . . . ,Xk ;β)

∂XidXi

because dXj = 0 for j 6= i



Application to test scores and income

Estimation

TestScore = 607.3(2.9)

+ 3.85(0.27)

Income − 0.0423(0.0048)

Income2, R2 = 0.554 (5)

Hypothesis test

Test H0 : β2 = 0 vs. H1 : β2 6= 0.

t =−0.04230.0048

= −8.81 > −1.96

We reject the null at the 1%, 5% and 10% significance levels, and therefore,confirm the quadratic relationship between test scores and income.



The effect of change in income on test scores

A change in income from $10 thousand to $20 thousand

∆Y = β0 + β1 × 11 + β2 × 112 − (β0 + β1 × 10 + β2 × 102)

= β1(11− 10) + β2(112 − 102)

= 3.85− 0.0423× 21 = 2.96

A change in income from $40 thousand to $41 thousand

∆Y = β0 + β1 × 41 + β2 × 412 − (β0 + β1 × 40 + β2 × 402)

= β1(41− 40) + β2(412 − 402)

= 3.85− 0.0423× 81 = 0.42



A general approach to modeling nonlinearities using multipleregression

1 Identify a possible nonlinear relationship.Economic theoryScatterplotsYour judgment and experts’ opinions

2 Specify a nonlinear function and estimate its parameters by OLS.The OLS estimation and inference techniques can be used as usual when theregression function is linear with respect to β.

3 Determine whether the nonlinear model can improve a linear modelUse t- and/or F-statistics to test the null hypothesis that the populationregression function is linear against the alternative that it is nonlinear.

4 Plot the estimated nonlinear regression function.5 Compute the effect on Y of a change in X and interpret the results.


Nonlinear functions of a single independent variable

Polynomials

A polynomial regression model of degree r

Yi = β0 + β1Xi + β2X2i + · · ·+ βrX

ri + ui (6)

r = 2: a quadratic regression modelr = 3: a cubic regression modelUse the OLS method to estimate β1, β2, . . . , βr .

Testing the null hypothesis that the population regression function is linear

H0 : β2 = 0, β3 = 0, ..., βr = 0 vs. H1 : at least one βj 6= 0, j = 2, . . . , r

Use F statistic to test this joint hypothesis. The number of restriction is q = r − 1.



What is ∆Y /∆X in a polynomial regression model?

Consider a cubic model and continuous X and Y

Y = β0 + β1X + β2X2 + β3X

3 + u

Then, we can calculate

dYdX

= β1 + 2β2X + 3β3X2

The effect of a unit change in X on Y depends on the value of X atevaluation.



Which degree of polynomial should I use?

Balance a trade-off between flexibility and statistical precision.Flexibility. Relate Y to X in more complicated way than simple linearregression.Statistical precision. X ,X 2,X 3, . . . are correlated so that there is the problemof imperfect multicollinearity.

Follow a sequential hypothesis testing procedure1 Pick a maximum value of r and estimate the polynomial regression for that r .2 Follow a "deletion" rule based on t-statistic or F-statistic.



Application to district income and test scores

We estimate a cubic regression model relating test scores to district income asfollows

TestScore = 600.1(5.1)

+ 5.02(0.71)

Income− 0.096(0.029)

Income2+ 0.00069(0.00035)

Income3, R2 = 0.555

Test whether it is a cubic modelThe t-statistic for H0 : β3 = 0 is 1.97 ⇒ Fail to reject

Test whether it is a nonlinear modelThe F-statistic for H0 : β2 = β3 = 0 is 37.7, p-value < 0.01

Interpretation of coefficients

Use the general formula of interpreting the effect of ∆X on Y .



A natural logarithmic function y = ln(x)

Properties of ln(x)

ln(1/x) = − ln(x), ln(ax) = ln(a) + ln(x)

ln(x/a) = ln(x)− ln(a), and ln(xa) = a ln(x)

The derivative of ln(x) is

d ln(x)

dx= lim

∆x→0

ln(x + ∆x)− ln(x)

∆x=

1x.

It follows that d ln(x) = dx/x , representing the percentage change in x .



The percentage-change form using ln(x)

The change in ln(X ) represents the percentage change in X

ln(x + ∆x)− ln(x) ≈ ∆x

xwhen ∆x is small.

The Taylor expansion of ln(x + ∆x) at x , which is

ln(x + ∆x) = ln(x) +d ln(x)

dx(x + ∆x − x) +

12!

d2 ln(x)

dx2 (x + ∆x − x)2 + · · ·

= ln(x) +∆x

x− ∆x2

2x2 + · · ·

When ∆x is very small, we can omit the terms with ∆x2,∆x3, etc. Thus, wehave ln(x + ∆x)− ln(x) ≈ ∆x

x when ∆x is small.



The three logarithmic regression models

There are three types of logarithmic regression models:

Linear-log modelLog-linear modelLog-log model

Differences in logarithmic transformation of X and/or Y lead to differences ininterpretation of the coefficient.



Case I: linear-log model

Model form. X is in logarithms, Y is not.

Yi = β0 + β1 ln(Xi ) + ui , i = 1, . . . , n (7)

Interpretation. a 1% change in X is associated with a change in Y of 0.01β1

∆Y = β1 ln(X + ∆X )− β1 ln(X ) ≈ β1∆X

X

Example. The estimated model is

TestScore = 557.8 + 36.42 ln(Income)

1% increase in average district income results in an increase in test scores by0.01× 36.42 = 0.36 point.



Case II: log-linear model

Model form. Y is in logarithms, X is not.

ln(Yi ) = β0 + β1Xi + ui (8)

Interpretation. A one-unit change in X is associated with a 100× β1%change in Y because

∆Y

Y≈ ln(Y + ∆Y )− ln(Y ) = β1∆X

Example.ln(Earnings) = 2.805 + 0.0087Age

Earnings are predicted to increase by 0.87% for each additional year of age.



Case III: log-log model

Model form. Both X and Y are in logarithms.

ln(Yi ) = β0 + β1 ln(Xi ) + ui (9)

Interpretation: elasticity. 1% change in X is associated with a β1% change inY because

∆Y

Y≈ ln(Y + ∆Y )− ln(Y ) = β1(ln(X + ∆X )− ln(X )) ≈ β1

∆X

X

β1 is the elasticity of Y with respect to X , that is

β1 =100× (∆Y /Y )

100× (∆X/X )=

percentage change in Y

percentage change in X

With the derivative, β1 = d ln(Y )/d ln(X ) = (dY /Y )/(dX/X ).

Example. The log-log model of the test score application is estimated as

ln(TestScore) = 6.336 + 0.0544 ln(Income)

This implies that a 1% increase in income corresponds to a 0.0544% increasein test scores.



The log-linear and log-log regression functions

Figure: The log-linear and log-log regression functions



Summary

Regression specification Interpretation of β1

Y = β0 + β1 ln(X ) + u A 1% change in X is associated with achange in Y of 0.01β1

ln(Y ) = β0 + β1X + u A change in X by one unit is associatedwith a 100β1% change in Y

ln(Y ) = β0 + β1 ln(X ) + u A 1% change in X is associated with aβ1% change in Y, so β1 is the elasticityof Y with respect to X


Interactions between independent variables


Interaction between two binary variables

Interaction between a continuous and a binary variable

Interaction between two continuous variables



The regression model with interaction between two binaryvariables

Two binary variables

D1i = 1 if the ith person has a college degree, and 0 otherwise.D2i = 1 if the ith person is female, and 0 otherwise.

A regression with an interaction term of two binary variables

Consider a regresion model concerning the effects of education and gender onearnings. The population regression function is

Yi = β0 + β1D1i + β2D2i + β3(D1i × D2i ) + ui (10)

The dependent variable: Yi , where Yi = Earningsi

D1i × D2i is the interaction term.



The method of interpreting coefficients in regressions withinteracted binary variables

We can follow a general rule for interpreting coefficients in Equation (10):

First compute the expected values of Y for each possible case described bythe set of binary variables.

Next compare these expected values. Each coefficient can then be expressedeither as an expected value or as the difference between two or moreexpected values.



Compute the expected values of Y for each possiblecombinations of D1 and D2

Case 1 E (Yi |D1i = 0,D2i = 0) = β0: the average income of malenon-college graduates.

Case 2 E (Yi |D1i = 1,D2i = 0) = β0 + β1: the average income malecollege graduates.

Case 3 E (Yi |D1i = 0,D2i = 1) = β0 + β2: the average income of femalenon-college graduates.

Case 4 E (Yi |D1i = 1,D2i = 1) = β0 + β1 + β2 + β3: the average incomeof female college graduates.



Compute the difference between a pair of cases

Case 1 vs. Case 2 E (Yi |D1i = 1,D2i = 0)− E (Yi |D1i = 0,D2i = 0) = β1: theaverage income difference between college graduates andnon-college graduates among male workers.

Case 1 vs. Case 3 E (Yi |D1i = 0,D2i = 1)− E (Yi |D1i = 0,D2i = 0) = β2: theaverage income difference between female and male workers whoare not college graduates.

Case 1 vs. Case 4E (Yi |D1i = 1,D2i = 1)− E (Yi |D1i = 0,D2i = 0) = β1 + β2 + β3:the average income difference between female college graduatesand male non-college graduates.



Compute the difference between a pair of cases (cont’d)

Case 2 vs. Case 3 E (Yi |D1i = 0,D2i = 1)− E (Yi |D1i = 1,D2i = 0) = β2 − β1.Thus, the average income difference between female non-collegegraduates and male college graduates is β2 − β1.

Case 2 vs. Case 4 E (Yi |D1i = 1,D2i = 1)− E (Yi |D1i = 1,D2i = 0) = β2 + β3.Thus, the average income difference between female collegegraduates and male college graduates is β2 + β3.

Case 3 vs. Case 4 E (Yi |D1i = 1,D2i = 1)− E (Yi |D1i = 0,D2i = 1) = β1 + β3.Thus, the average income difference between female collegegraduates and female non-college graduates is β1 + β3.



Hypothesis testing

We can use t-statistic or F-statistic to test whether the differences betweendifferent cases are statistically significant.

The null hypothesis: H0 : β2 = 0 vs. H1 : β2 6= 0.

What is this test for?What test statistic can we use?

The hypothesis is H0 : β1 + β3 = 0 vs. H1 : β1 + β3 6= 0.

What is this test for?What test statistic can we use?



Interactions between a continuous and a binary variable

Consider the population regression of earnings (Yi ) against

one continuous variable, individual’s years of work experience (Xi ), andone binary variable, whether the worker has a college degree (Di , whereDi = 1 if the ith person is a college graduate).

As shown in the next figure, the population regression line relating Y and X candepend on D in three different ways.



Interactions between a continuous and a binary variable:graphic representation

Figure: Regression Functions Using Binary and Continuous VariablesZheng Tian Lecture 10: Nonlinear Regression Functions 33 / 46


Different intercept, same slope: (a) in Figure 4

Yi = β0 + β1Xi + β2Di + ui (11)

From Equation (11), we have the population regression functions asE(Yi |Di = 1) = (β0 + β2) + β1Xi

E(Yi |Di = 0) = β0 + β1Xi .

Thus, E (Yi |Di = 1)− E (Yi |Di = 0) = β2.The average initial salary of college graduates is higher than non-collegegraduates by β2, and this gap persists at the same magnitude regardless ofhow many years a worker has been working.



Different intercepts and different slopes: (b) in Figure 4

Equation (11):Yi = β0 + β1Xi + β2Di + β3(Xi × Di ) + ui (12)

The population regression functions for the two cases areE(Yi |Di = 1) = (β0 + β2) + (β1 + β3)Xi

E(Yi |Di = 0) = β0 + β1Xi .

Thus, β2 is the difference in intercepts and β3 is the difference in slopes.The average initial salary of college graduates is higher than non-collegegraduates by β2, and this gap will widen (or narrow) depending on the effectof the years of work experience on earnings.



Different intercepts and same intercept: (c) in Figure 4

Yi = β0 + β1Xi + β2(Xi × Di ) + ui (13)

The population regression functions for the two cases areE(Yi |Di = 1) = β0 + (β1 + β2)Xi

E(Yi |Di = 0) = β0 + β1Xi .

Thus, there is only a difference in the slope but not in the intercept.Although college graduates have the same starting salary as those withoucolledge degree, the raise in salary and promotion of the former will be fasterthan the latter.



Interactions between two continuous variables

Now we consider the regression of earnings against two continuous variables, onefor the years of work experience (X1) and another for the years of schooling (X2).

The interaction model is

Yi = β0 + β1X1i + β2X2i + β3(X1i × X2i ) + ui (14)

The effect of a change in X1, holding X2 constant, is

∆Y

∆X1= β1 + β3X2

Similarly, the effect of a change in X2, holding X1 constant, is

∆Y

∆X2= β1 + β3X1


Warm-up exercises

Question 1

The interpretation of the slope coefficient in the modelln(Yi ) = β0 + β1 ln(Xi ) + ui is as follows:

A) a 1% change in X is associated with a β1% change in Y.B) a change in X by one unit is associated with a β1 change in Y.C) a change in X by one unit is associated with a 100 β1 % change in

Y.D) a 1% change in X is associated with a change in Y of 0.01β1.

Answer: A


Warm-up exercises

Question 1

The interpretation of the slope coefficient in the modelln(Yi ) = β0 + β1 ln(Xi ) + ui is as follows:

A) a 1% change in X is associated with a β1% change in Y.B) a change in X by one unit is associated with a β1 change in Y.C) a change in X by one unit is associated with a 100 β1 % change in

Y.D) a 1% change in X is associated with a change in Y of 0.01β1.

Answer: A


Warm-up exercises

Question 2

In the regression model Yi = β0 + β1Xi + β2Di + β3(Xi × Di ) + ui , where X is acontinuous variable and D is a binary variable, to test that the two regressions areidentical, you must use the

A) t-statistic separately for β2 = 0, β3 = 0.B) F-statistic for the joint hypothesis that β0 = 0, β1 = 0C) t-statistic separately for β3 = 0D) F-statistic for the joint hypothesis that β2 = 0, β3 = 0.

Answer: D


Warm-up exercises

Question 2



Answer: D


Warm-up exercises

Question 2



Answer: D


Warm-up exercises

Question 3

(Requires Calculus) In the equationTestScore = 607.3 + 3.85Income − 0.0423Income2, the following income level

results in the maximum test score

A) 607.3.B) 91.02.C) 45.50.D) cannot be determined without a plot of the data.

Answer: C


Warm-up exercises

Question 3

(Requires Calculus) In the equationTestScore = 607.3 + 3.85Income − 0.0423Income2, the following income level

results in the maximum test score

A) 607.3.B) 91.02.C) 45.50.D) cannot be determined without a plot of the data.

Answer: C


Regression Functions That Are Nonlinear in the Parameters

Nonlinear regression models and nonlinear least squaresestimator

All the regression models that we have discussed in this lecture are nonlinear inthe regressors but linear in parameters so that we can still treat them as linearregression models and estimate using the OLS.

However, there exist regression models that are nonlinear in parameters. For thesemodels, we can either transform them to the "linear" type of models or estimateusing the nonlinear least squares (NLS) estimators.



Transform a nonlinear model to a linear one

Suppose we have a nonlinear regression model as follows

Yi = αXβ11i X

β22i · · ·X

βk

ki eui (15)

Taking the natural logarithmic function on both sides of the equation

ln(Yi ) = ln(α) + β1 ln(X1i ) + β2 ln(X2i ) + · · ·+ βk ln(Xki ) + ui (16)

Equation (15) becomes a log-log regression model, which is linear in allparameters and can be estimated using the OLS. Let β0 = ln(α) and α = eβ0 .βi for i = 1, 2, . . . , k are the elasticities of Y with respect to Xi .



A nonlinear model: logistic function

A dependent variable can only take values between 0 and 1.

The logistic regression model with k regressors is

Yi =1

1 + exp(β0 + β1X1i + · · ·βkXki )+ ui (17)

For small values of X , the value of the function is nearly 0 and the shape isflat.For large values of X , the function approaches 1 and the slope is flat again.



A nonlinear model: negative exponential growth function

The effect of X on Y must be positive and the effect is bounded by a upperbound.Use the negative-exponential growth function to set up a regression model asfollows

Yi = β0[1− exp(−β1(Xi − β2))] + ui (18)

The slope is positive for all values of X .The slope is greatest at low values of X and decreases as X increases.There is an upper bound, that is, a limit of Y as X goes to infinity, β0.



Logistic and negative exponential growth curves

Figure: The logistic and negative exponential growth functions



The nonlinear least squares estimators

For a nonlinear regression function

Yi = f (X1, . . . ,Xk ;β1, . . . , βm) + ui

which is nonlinear in both X and β, we can obtain the estimated parameters bynonlinear least squares (NLS) estimation.

The essential idea of NLS is the same as OLS, which is to minimize the sum ofsquared prediction mistakes. That is

minb1,...,bm

S(b1, . . . , bm) =n∑

i=1

[Yi − f (X1, . . . ,Xk ; b1, . . . , bm)]2

The solution to this minimization problem is the nonlinear least squares estimators.


Documents

Lecture 10: Nonlinear Regression Functions · A General Strategy For Modeling Nonlinear Regression Functions TestScoresanddistrictincome Testscorescanbe determinedbyaverage districtincome