405 ECONOMETRICS Chapter # 2: TWO-VARIABLE REGRESSION ANALYSIS: SOME BASIC IDEAS Dom odar N. Gujarati Prof. M. El-Sakka Dept of Economics. Kuwait University

405 ECONOMETRICSChapter # 2: TWO-VARIABLE REGRESSION

ANALYSIS: SOME BASIC IDEAS

Domodar N. Gujarati

Prof. M. El-SakkaProf. M. El-SakkaDept of Economics. Kuwait UniversityDept of Economics. Kuwait University

A HYPOTHETICAL EXAMPLE

• Regression analysis is largely concerned with estimating and/or predicting Regression analysis is largely concerned with estimating and/or predicting the (population) the (population) meanmean value of the dependent variable on the basis of the value of the dependent variable on the basis of the known or known or fixed values of the explanatory variable(s). fixed values of the explanatory variable(s).

• Look at table 2.1 which refers to a total population of 60 families and their Look at table 2.1 which refers to a total population of 60 families and their weekly income (weekly income (XX) and weekly consumption expenditure () and weekly consumption expenditure (YY). The 60 ). The 60 families are divided into families are divided into 1010 income groups. income groups.

• There is There is considerable variation considerable variation in weekly consumption expenditure in each in weekly consumption expenditure in each income group. But the general picture that one gets is that, despite the income group. But the general picture that one gets is that, despite the variability of weekly consumption expenditure within each income bracket, variability of weekly consumption expenditure within each income bracket, on the average, weekly consumption on the average, weekly consumption expenditure expenditure increasesincreases as income as income increases.increases.

user1

Conditional Mean of Y given XCov X,Y/EYCov X,Y = E(XY)-(EX*EY)

• The dark circled points in Figure 2.1 show the conditional mean values of The dark circled points in Figure 2.1 show the conditional mean values of Y Y against the various X valuesagainst the various X values. . If we join these conditional mean valuesIf we join these conditional mean values, we , we obtain what is known as obtain what is known as the population regression line (PRL),the population regression line (PRL), or more or more generally, the population regression curve. More simply, it is the regression generally, the population regression curve. More simply, it is the regression of of Y on X. Y on X. The adjective The adjective “population” “population” comes from the fact that we are comes from the fact that we are dealing in this example with the entire population of 60 families. Of course, dealing in this example with the entire population of 60 families. Of course, in reality a population may have many families.in reality a population may have many families.

THE CONCEPT OF POPULATION REGRESSIONFUNCTION (PRF)

• From the preceding discussion and Figures. 2.1 and 2.2, it is clear that each From the preceding discussion and Figures. 2.1 and 2.2, it is clear that each conditional mean conditional mean E(Y | XE(Y | Xii) ) is a function of is a function of XXii. . Symbolically,Symbolically,

• E(Y | XE(Y | Xii) = f (X) = f (Xii) ) (2.2.1)(2.2.1)

• Equation (2.2.1) is known as the Equation (2.2.1) is known as the conditional expectation function conditional expectation function (CEF) or (CEF) or population regression function population regression function (PRF) or population regression (PR) for (PRF) or population regression (PR) for short. short.

• The functional form of the The functional form of the PRF is an empirical questionPRF is an empirical question. For example, we . For example, we may assume that the PRF may assume that the PRF E(Y | XE(Y | Xii) ) is a linear function of is a linear function of XXii, , say, of the typesay, of the type

• E(Y | XE(Y | Xii) = β) = β11 + β + β22XXii (2.2.2)(2.2.2)

THE MEANING OF THE TERM LINEAR

• Linearity in the VariablesLinearity in the Variables

• The first meaning of linearity is that the The first meaning of linearity is that the conditional expectation of conditional expectation of Y is a Y is a linear function of Xlinear function of Xii, , the regression curve in this case is a straight line. But the regression curve in this case is a straight line. But

• E(Y | XE(Y | Xii) = β) = β11 + β + β22XX22ii is not a linear functionis not a linear function

• Linearity in the ParametersLinearity in the Parameters

• The second interpretation of linearity is that the conditional expectation of The second interpretation of linearity is that the conditional expectation of Y, E(Y | XY, E(Y | Xii), is a linear function of the parameters, the β’s), is a linear function of the parameters, the β’s; it may or may not ; it may or may not

be linear in the variable X. be linear in the variable X.

• E(Y | XE(Y | Xii) = β) = β11 + β + β22XX22i i

• is a linear is a linear (in the parameter) regression model. (in the parameter) regression model. All the models shown in All the models shown in Figure 2.3 are thus linear regressionFigure 2.3 are thus linear regression models, that is, models linear in the models, that is, models linear in the parameters.parameters.

• Now consider the model: Now consider the model:

• E(Y | XE(Y | Xii) = β) = β11 + β + β2222 XXii . .

• The The preceding model is an example of a nonlinear (in the parameter) preceding model is an example of a nonlinear (in the parameter) regression model. regression model.

• From now on the term “linear” regression will always mean a regression that From now on the term “linear” regression will always mean a regression that is linear in the parametersis linear in the parameters; ; the β’s the β’s (that is, the parameters are raised to the (that is, the parameters are raised to the first power only). first power only).

STOCHASTIC SPECIFICATION OF PRF

• We can express the We can express the deviation of an individual Ydeviation of an individual Yii around its expected value around its expected value

as follows:as follows:

• uuii = Y = Yii − E(Y | X − E(Y | Xii))

• oror

• YYii = E(Y | X = E(Y | Xii) + u) + uii (2.4.1)(2.4.1)

• Technically, Technically, uuii is known as is known as the stochastic disturbance or stochastic error termthe stochastic disturbance or stochastic error term..

• How do we interpret How do we interpret (2.4.1)? (2.4.1)? The expenditure of an individual family, given The expenditure of an individual family, given its income level, can be expressed as the sum of two components: its income level, can be expressed as the sum of two components: – (1) (1) E(Y | XE(Y | Xii), ), the mean consumption the mean consumption of all families with the same level of income. of all families with the same level of income.

This component is known as the This component is known as the systematic, or deterministic, systematic, or deterministic, componentcomponent,,

– (2) (2) uuii, , which which is the is the random, or nonsystematic, random, or nonsystematic, componentcomponent. .

• For the moment assume that the stochastic disturbance term is a For the moment assume that the stochastic disturbance term is a proxy for proxy for all the omitted or neglected variables all the omitted or neglected variables that may affect that may affect YY but are not included but are not included in the regression model.in the regression model.

• If If E(Y | XE(Y | Xii) ) is assumed to be linear in is assumed to be linear in XXii, as in (2.2.2), Eq. (2.4.1) may be , as in (2.2.2), Eq. (2.4.1) may be

written as: written as:

• YYii = E(Y | X = E(Y | Xii) + u) + uii

• = = ββ11 + β + β22XXii + u + uii (2.4.2)(2.4.2)

• Equation (2.4.2) posits that the consumption expenditure of a family is Equation (2.4.2) posits that the consumption expenditure of a family is linearly related to its income plus the disturbance term. Thus, the linearly related to its income plus the disturbance term. Thus, the individual consumption expenditures, given individual consumption expenditures, given X = $80 X = $80 can be expressed can be expressed as:as:

• Y1 = 55 = βY1 = 55 = β11 + β + β22(80) + u(80) + u11

• Y2 = 60 = βY2 = 60 = β11 + β + β22(80) + u(80) + u22

• Y3 = 65 = βY3 = 65 = β11 + β + β22(80) + u(80) + u33 (2.4.3)(2.4.3)• Y4 = 70 = βY4 = 70 = β11 + β + β22(80) + u(80) + u44

• Y5 = 75 = βY5 = 75 = β11 + β + β22(80) + u(80) + u55

• Now if Now if we take the expected valuewe take the expected value of (2.4.1) on both sides, we obtain of (2.4.1) on both sides, we obtain

• E(YE(Yii | X | Xii) = E[E(Y | X) = E[E(Y | Xii)] + E(u)] + E(uii | X | Xii))

• = = E(Y | XE(Y | Xii) + E(u) + E(uii | X | Xii) ) (2.4.4)(2.4.4)

• Where expected value of a constant is that constant itself. Where expected value of a constant is that constant itself.

• Since Since E(YE(Yii | X | Xii) ) is the same thing as is the same thing as E(Y | XE(Y | Xii), ), Eq. (2.4.4) implies thatEq. (2.4.4) implies that

• E(uE(uii | X | Xii) = 0 ) = 0 (2.4.5)(2.4.5)

• Thus, the assumption that the regression line passes through the conditional Thus, the assumption that the regression line passes through the conditional means of means of Y implies that the Y implies that the conditional mean values conditional mean values of of uuii (conditional upon the (conditional upon the

givengiven X’s X’s) ) are zeroare zero.. • It is clear that It is clear that

• E(Y | XE(Y | Xii) = β) = β11 + β + β22XXii (2.2.2) (2.2.2)

• and and

• YYii = = ββ11 + β + β22XXii + u + uii (2.4.2) (2.4.2) BetterBetter

• are equivalent forms if are equivalent forms if E(uE(uii | X | Xii) = 0.) = 0.

• But the stochastic specification But the stochastic specification (2.4.2) has the (2.4.2) has the advantage that it clearly advantage that it clearly shows that there are other variables besides income that affect consumption shows that there are other variables besides income that affect consumption expenditure and that an individual family’s consumption expenditure expenditure and that an individual family’s consumption expenditure cannot be fully explained only by the variable(s) included in the regression cannot be fully explained only by the variable(s) included in the regression model.model.

THE SIGNIFICANCE OF THE STOCHASTICDISTURBANCE TERM

• The disturbance term The disturbance term uiui is a is a surrogate for all those variables that are omittedsurrogate for all those variables that are omitted from the model but that collectively affect from the model but that collectively affect Y. Y. Why Why don’t we introduce themdon’t we introduce them into the model explicitly? The reasons are many:into the model explicitly? The reasons are many:

• 1. 1. Vagueness of theoryVagueness of theory: The theory, if any, determining the behavior of Y : The theory, if any, determining the behavior of Y may may be, and often is, incomplete. be, and often is, incomplete. We might be We might be ignorant or unsure about the other ignorant or unsure about the other variables affecting variables affecting Y. Y.

• 2. 2. Unavailability of dataUnavailability of data: : Lack of quantitative information about these Lack of quantitative information about these variables, e.g., information on family wealth generally is not available. variables, e.g., information on family wealth generally is not available.

• 3. 3. Core variables versus peripheral variablesCore variables versus peripheral variables: Assume : Assume that besides income that besides income XX11, ,

the number of children per family Xthe number of children per family X22, sex X, sex X33, religion X, religion X44, education X, education X55, and , and

geographical region Xgeographical region X66 also affect also affect consumption expenditure. But the joint consumption expenditure. But the joint

influence of all or some of these variables may be so small and it does not influence of all or some of these variables may be so small and it does not pay to introduce them into the model explicitly. One hopes that their pay to introduce them into the model explicitly. One hopes that their combined effect can be treated as a random variable combined effect can be treated as a random variable uiui..

• 4. 4. Intrinsic randomness in human behavior: Intrinsic randomness in human behavior: Even if we succeed in Even if we succeed in introducing all the relevant variables into the model, there is bound to be introducing all the relevant variables into the model, there is bound to be some “intrinsic” randomness in individual some “intrinsic” randomness in individual Y’sY’s that cannot be explained no that cannot be explained no matter how hard we try. The disturbances, the matter how hard we try. The disturbances, the u’s, u’s, may very well reflect may very well reflect this intrinsic randomness.this intrinsic randomness.

• 5. 5. Poor proxy variables: Poor proxy variables: for example, Friedman regards for example, Friedman regards permanent permanent consumption (Yconsumption (Ypp) as a function ) as a function of of permanent income (Xpermanent income (Xpp). But since data on ). But since data on

these variables are not directly these variables are not directly observable, in practice we use proxy observable, in practice we use proxy variables, such as current consumption (variables, such as current consumption (Y) and current income (X), there is Y) and current income (X), there is the problem of errors of measurement, the problem of errors of measurement, uu may in this case then also represent may in this case then also represent the errors the errors of measurement. of measurement.

• 6. 6. Principle of parsimony: Principle of parsimony: we would like to we would like to keep our regression model as keep our regression model as simple as possible. If we can explain the behavior of simple as possible. If we can explain the behavior of Y “substantially” with Y “substantially” with two or three explanatory variables and if two or three explanatory variables and if our theory is not strong enough to our theory is not strong enough to suggest what other variables might be included, why introduce more suggest what other variables might be included, why introduce more variables? Let variables? Let uuii represent all other variables. represent all other variables.

• 7. 7. Wrong functional form:Wrong functional form: Often we do not know the form of the functional Often we do not know the form of the functional relationship between the regressand (dependent) and the regressors. Is relationship between the regressand (dependent) and the regressors. Is consumption expenditure a linear (in variable) function of income or a consumption expenditure a linear (in variable) function of income or a nonlinear (invariable) function? If it is the former, nonlinear (invariable) function? If it is the former,

• YYii = β = β11 + B + B22XXii + u + uii is the proper functional relationship is the proper functional relationship between between Y and X, but Y and X, but

if it is the latter, if it is the latter,

• YYii = β = β11 + β + β22XXii + β + β33XX22ii + u + uii may be the correct functional form. may be the correct functional form.

• In two-variable models the functional form of the relationship can often be In two-variable models the functional form of the relationship can often be judged from the scattergram. But in a multiple regression model, it is not judged from the scattergram. But in a multiple regression model, it is not easy to determine the appropriate functional form, for graphically we easy to determine the appropriate functional form, for graphically we cannot visualize scattergrams in multipledimensions.cannot visualize scattergrams in multipledimensions.

THE SAMPLE REGRESSION FUNCTION (SRF)

• The data of Table 2.1 The data of Table 2.1 represent the represent the population, not a samplepopulation, not a sample. In most . In most practical situations what we have is a practical situations what we have is a samplesample of of YY values corresponding to values corresponding to somesome fixed fixed X’sX’s. .

• Pretend that the population of Pretend that the population of Table 2.1 Table 2.1 waswas not known not known to us and the only to us and the only information we had was a randomly selected sample of information we had was a randomly selected sample of YY values for the values for the fixed fixed X’sX’s as given in Table 2.4. each as given in Table 2.4. each YY (given (given XXii) in ) in Table 2.4 is chosen Table 2.4 is chosen

randomly from similar randomly from similar Y’sY’s corresponding to the same corresponding to the same XXii from the population from the population

of Table 2.1.of Table 2.1.

• Can we estimate the PRF from the sample data? Can we estimate the PRF from the sample data? We We may not may not be able to be able to estimate the PRF “estimate the PRF “accuratelyaccurately” because of ” because of sampling fluctuationssampling fluctuations. To see this, . To see this, suppose we draw another random sample from the population of Table 2.1, suppose we draw another random sample from the population of Table 2.1, as presented in Table 2.5. Plotting the data of Tables 2.4 and 2.5, we obtain as presented in Table 2.5. Plotting the data of Tables 2.4 and 2.5, we obtain the scattergram given in Figure 2.4. In the scattergram two sample the scattergram given in Figure 2.4. In the scattergram two sample regression lines are drawn so asregression lines are drawn so as

• Which of the two regression lines represents the “true” population regression Which of the two regression lines represents the “true” population regression line?line? There is no way we can be absolutely sure that either of the regression There is no way we can be absolutely sure that either of the regression lines shown in Figure 2.4 represents the true population regression line (or lines shown in Figure 2.4 represents the true population regression line (or curve). Supposedly they represent the population regression line, but curve). Supposedly they represent the population regression line, but because of sampling fluctuations because of sampling fluctuations they are at best an approximation they are at best an approximation of the of the true PR. In general, we would get true PR. In general, we would get N different SRFs for N different samples, N different SRFs for N different samples, and these SRFs are not likely to be the same.and these SRFs are not likely to be the same.

• We can develop the concept of the We can develop the concept of the sample regression function (SRF) sample regression function (SRF) to to represent the sample regression line. The sample counterpart of (2.2.2) may represent the sample regression line. The sample counterpart of (2.2.2) may be written asbe written as

• YˆYî i = = βˆβˆ11 + βˆ + βˆ22XXii (2.6.1)(2.6.1)

• where where Yˆ is read as “Y-hat’’ or “Y-cap’’Yˆ is read as “Y-hat’’ or “Y-cap’’

• YˆYîi = estimator of E(Y | X = estimator of E(Y | Xii))

• βˆβˆ11 = estimator of β = estimator of β11

• βˆβˆ22 = estimator of β = estimator of β22

• Note that an estimator, also known as Note that an estimator, also known as a (sample) statistica (sample) statistic, is simply a rule or , is simply a rule or formula or method that tells how to estimate the population parameter formula or method that tells how to estimate the population parameter from the information provided by the sample at hand. from the information provided by the sample at hand.

• Now just as we expressed the PRF in two equivalent forms, (2.2.2) and Now just as we expressed the PRF in two equivalent forms, (2.2.2) and (2.4.2), we can express the SRF (2.6.1) (2.4.2), we can express the SRF (2.6.1) in its stochastic form in its stochastic form as follows:as follows:

• YYii = = βˆβˆ11 + βˆ + βˆ22XXii +uˆ +uîi (2.6.2)(2.6.2)

• ˆ̂uuii denotes the (sample) denotes the (sample) residual termresidual term. Conceptually . Conceptually ˆ̂uuii is analogous to is analogous to uuii

and can be regarded as and can be regarded as an an estimateestimate of of uuii. It is introduced in the SRF for the . It is introduced in the SRF for the

same reasons as same reasons as uuii was was introduced in the PRF. introduced in the PRF.

• To sum up, then, we find our primary objective in regression analysis is to To sum up, then, we find our primary objective in regression analysis is to estimate the PRF estimate the PRF

• YYii = = ββ11 + β + β22XXii + u + uii (2.4.2) (2.4.2)

• on the basis of the SRF on the basis of the SRF

• YYii = = βˆβˆ11 + βˆ + βˆ22XXii +uˆ +uîi (2.6.2) (2.6.2)

• because more often than not our analysis is based upon a single sample because more often than not our analysis is based upon a single sample from some population. But because of sampling fluctuations our estimate offrom some population. But because of sampling fluctuations our estimate of

• the PRF based on the the PRF based on the SRF is at best an approximate oneSRF is at best an approximate one. This . This approximation is shown diagrammatically in Figure 2.5. For approximation is shown diagrammatically in Figure 2.5. For X = XX = Xii, we have , we have

one (sample) observation one (sample) observation Y = YY = Yii. In terms of the . In terms of the SRF, the SRF, the observedobserved YYii can be can be

expressed as:expressed as:

• YYii = Yˆ = Yîi +uˆ +uîi (2.6.3)(2.6.3)

• and in terms of the PRF, it can be expressed asand in terms of the PRF, it can be expressed as

• YYii = E(Y | X = E(Y | Xii) + u) + uii (2.6.4)(2.6.4)

• Now obviously in Figure 2.5 Now obviously in Figure 2.5 YˆYîi overestimates the true overestimates the true E(Y | XE(Y | Xii) ) for the for the XXii

shown therein. By the same token, for any shown therein. By the same token, for any XXii to the left of the point A, the to the left of the point A, the

SRF will SRF will underestimate the true PRF. underestimate the true PRF.

• The critical question now is: Granted that the SRF is but an approximation The critical question now is: Granted that the SRF is but an approximation of the PRF, can we devise a rule or a method that will make this of the PRF, can we devise a rule or a method that will make this approximation as “close” as possible? In other words, approximation as “close” as possible? In other words, how should the SRF how should the SRF be constructed so thatbe constructed so that βˆβˆ11 is as “close” as possible to the true β is as “close” as possible to the true β11 and βˆ and βˆ22 is as is as

“close” as possible to the true “close” as possible to the true ββ22 even though we will never know the true βeven though we will never know the true β11

and and ββ22?? The answer to this question will occupy much of our attention in The answer to this question will occupy much of our attention in

Chapter 3. Chapter 3.

Documents

405 ECONOMETRICS Chapter # 2: TWO-VARIABLE REGRESSION ANALYSIS: SOME BASIC IDEAS Dom odar N. Gujarati Prof. M. El-Sakka Dept of Economics. Kuwait University