A Multivariate Model of Total Expenditure Among Households in the Philippines - Pardilla Et Al

A Multivariate Model of Total Expenditure among households in the Philippines

Alyssa Ann Angeles

2011-43922

John Vincent R. Pardilla

2011-03625

Marcus J. Valdez

2011-40663

In partial fulfillment of the requirements for

Econ 131 Econometrics

1st Semester, AY 2013-2014

School of Economics

University of the Philippines

Diliman, Quezon City

October 14, 2013

I. Introduction

Do total income and family size have a positive impact on a households total

expenditure? This question is what this paper will be trying to answer.

Population has drastically increased in a matter of years. From a population of 88.55M in

August 2007, it has increased to 92.34M in May 2010. [1] According to the National Statistics

Office, Philippine population has an average annual growth rate of 1.90% during the period

2000-2010. This means that, each year, two persons are added for every 100 persons in the

population. [2] With this said, it could easily be deduced that an increase in the population would

mean an increase in family size in that additions to the population must belong to a certain

household or family. On the other hand, the average total income of families in both bottom 30%

and top 70% of the population has increased by 13 thousand pesos and 42 thousand pesos

respectively from 2006 to 2009. From an average of 49 thousand pesos, average annual income

of poor families, those who belong to the bottom 30% of the population, has increased to 62

thousands pesos, while the average annual income of non-poor families, top 70%, has increased

from 226 thousand pesos to 268 thousand pesos.[3] Lastly, total expenditure has also been

increasing from 2000 to 2009 as shown in the 2009 FIES.[4]

[1] National Statistics Office. Philippines in Figures. Retrieved from http://www.census.gov.ph/ [2] National Statistics Office. (2012). Population grew by 1.90 percent annually. In The 2010 Census of Population

and Housing Reveals the Philippine Population at 92.34 Million. Retrieved from

http://www.census.gov.ph/content/2010-census-population-and-housing-reveals-philippine-population-9234-

million [3] National Statistics Office. (2011, Feb 4). Families in the Bottom 30% Income Group Earned 62 Thousand Pesos in

2009 (Final Results from the 2009 Family Income and Expenditure Survey). Retrieved from

http://www.census.gov.ph/content/families-bottom-30-percent-income-group-earned-62-thousand-pesos-2009-

final-results-2009 [4] National Statistics Office. (2009). Percent Distribution of Annual Family Expenditures by Expenditure Group,

Philippines: 2000, 2003, 2006, 2009 [Data file]. Retrieved from

http://www.bles.dole.gov.ph/PUBLICATIONS/Yearbook%20of%20Labor%20Statistics/STATISTICAL%20TABLES/PDF

/CHAPTER%2012/Table%2012-3.pdf

At certain points between 2003 and 2009, it can be observed that all three variables

family size, total income and total expenditure have been increasing. However, to say that there

is a relationship, particularly a positive relationship, among the three variables based on intuition

and just by using the evidence provided above to support it is not enough. By using data from the

2009 Family Income and Expenditure Survey, this study aims to determine if there exists any

relationship among a households total expenditure, total income and family size and which of

the two independent variables, total income and family size, have a greater impact on total

expenditure. Moreover, this paper hypothesizes that there is a positive relationship among total

expenditure, total income and family size; that is, an increase in either or both total income and

family size would lead to an increase in total expenditure. However, specific components that

make up total expenditure will not be explored or thoroughly discussed in this paper.

II. Methodology

This study uses data from the 2009 Family Income and Expenditure Survey (FIES)

conducted by the National Statistics Office (NSO) every three years. There is a total of 38 400

observations in this survey which could be considered large enough to constitute the whole

population. Only cross-sectional data are to be used in this study. This paper will also be limited

in establishing relationships within the used data set. No time series data are used in this study.

Data

Variable Observations Mean Std. Dev. Min Max

Family Size 38400 47.45872 21.56525 10 200

Total Income 38400 195811.5 .4976421 0 3.04e+07

Total

Expenditure

38400 165984.9 164981.7 9250 4108871

Empirical Specification

Analytical Model

In this study, total expenditure was modeled as a function of total income and number of

members in the family

Total Expenditure= totex(toinc, fsize)

Total income was selected to be part of the model because theoretically, total income affects total

expenditure. The amount of income that a family is earning more or less determines their

expenditure. The family size is selected because the number of family members increases

expenditure, at least in the short run. In the long run, new family members become assets since

they themselves earn income. In our paper, the analysis is mostly static, involving only the year

2009 which eliminates the long run situation which can be problematic.

III. Regressions and Estimations

Preliminary Regression

OLS regression is first used in the model. Total expenditure is regressed against total

income and family size. The result of the preliminary regression using Stata is as follows:

Remarks Expected Effect Variable

It is hypothesized that as family size increases,

there are more expenses (such as food

expenses, education, etc) in the family thus

increasing the total expenditure

+ Family Size

As Keynes stated, men [women] are disposed,

as a rule and on average, to increase their

consumption as their income increases, but not

as much as the increase in their income. Keynes

postulated that the Marginal Propensity to

Consume is greater that zero but less than 1.

+

s.t. 0

. reg totex toinc fsize

Source | SS df MS Number of obs = 38400 -------------+------------------------------ F( 2, 38397) =23585.31 Model | 5.7617e+14 2 2.8809e+14 Prob > F = 0.0000 Residual | 4.6901e+14 38397 1.2215e+10 R-squared = 0.5513 -------------+------------------------------ Adj R-squared = 0.5512 Total | 1.0452e+15 38399 2.7219e+10 Root MSE = 1.1e+05

------------------------------------------------------------------------------ totex | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- toinc | .4145162 .0019521 212.34 0.000 .41069 .4183423 fsize | 660.4736 26.27325 25.14 0.000 608.9773 711.9698 _cons | 53472.66 1388.513 38.51 0.000 50751.14 56194.18

In this regression, toinc, totex and fsize are the variables for total income, total

expenditure and family size respectively. Looking at the result of the first run, a value of 0.5513

was obtained for the R2

after the initial regression. This means that about 55% of the variation in

total expenditure can be explained by the two regressors, total income and family size. Although

this value of the R squared is relatively low, such case is typically observed in cross-section data

with large number of observation; therefore, it cannot be concluded the model is not a good fit.

Proceeding to the examination of the coefficients, all coefficients are individually highly

significant since p-values are low. The F value, on the other hand, is very high which suggests

that collectively, all the variables are statistically significant as well. The first coefficient, the

coefficient of the variable total income, is positive which indicates that as total income increases

total expenditure also increases. Keeping all other factors equal, if total income of the family

increases by a peso, then, total expenditure also increases by 0.4 pesos or 40 centavos. This can

be further interpreted as the marginal propensity to consume. If the total income of the family

increases by 100, for example, the family would most likely spend about 41% of the increase in

total income. As emphasized earlier, the components of a familys expenditure will not be

explored or discussed in this paper. The second coefficient, which describes the family size, also

has a positive coefficient of 600 which means that holding the influence of total income constant,

total expenditure will increase by 600 pesos as family size increases by 1. This 660 peso increase

in total expenditure for a unit increase in family size cannot be compared to the 0.4 peso increase

in total expenditure for every peso increase in total income. We cannot state with certainty that

expenditure increases more with an increase in income rather than with an increase with family

size. However, standardizing these variables in a auxiliary regression (which will be done in the

latter part of this paper) will allow for the comparison of the influence of each independent

variable on total expenditure. On the other hand, the constant term, for the most part, has no

economic sense. What it should mean is that if income is 0 and family size is 0, then expenditure

would be 53,473. This cannot be interpreted that way since if family size is zero, then, there

should be no expenditure at all.

Estimation Issues

Despite having observations equal to 38400, which some may consider large enough to

represent the whole population, the model used in this paper is not rid of estimation issues. It is

therefore customary and appropriate to test for violations namely heteroscedasticity,

autocorrelation, and multicollinearity and remedy these in order to obtain estimations that are

free of such violations.

1. Heteroscedasticity

From the preliminary regression, a test was done to check for the presence of

heteroscedasticity in the data used. The result is as follows:

. hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of totex

chi2(1) = 1.80e+07 Prob > chi2 = 0.0000

The test results seem to point out the presence of heteroscedasticity in the data. The p value,

which is the value described in the second column, is very small at 0. Thus, we reject the null

hypothesis that the data exhibits constant variance across all values. Another regression using

Whites Heteroscedasticity-Consistent Variances and Standard Errors, also known as robust

standard errors, must be run again in order not to suffer from the consequences of interpreting

the results with the presence of heteroscedasticity. Using the robust regression instead of the

normal least squares eliminates the influence of existing outliers in the data which may have

caused the presence of heteroscedasticity. Below is the result of the robust regression:

Linear regression Number of obs = 38400 F( 2, 38397) = 1601.60 Prob > F = 0.0000 R-squared = 0.5513 Root MSE = 1.1e+05 ------------------------------------------------------------------------------ | Robust totex | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- toinc | .4145162 .1060899 3.91 0.000 .2065773 .622455 fsize | 660.4736 148.4128 4.45 0.000 369.5807 951.3665 _cons | 53472.66 13523.12 3.95 0.000 26966.99 79978.33 ------------------------------------------------------------------------------

There is not much change in the coefficients; the coefficients are still individually significant and

no noticeable change in the value of R2

after the regression. But as expected, there are some

changes in the estimated standard errors. Whites heteroscedasticity-corrected standard errors are

considerably larger than the OLS standard errors. Therefore, the estimated t values are much

smaller than those obtained by OLS.

2. Autocorrelation

As stated in the first part of this paper, no time-series data shall be used where

autocorrelation is more likely to occur. Therefore, it is assumed that the presence of

autocorrelation in this papers model is unlikely.

3. Multicollinearity

To check for multicollinearity, a VIF test was done. In this test, it is assumed that a VIF

of more than 10 will prompt for further investigation. The tolerance, which is 1/VIF, is another

measure of multicollinearity. In the data used in this study, if tolerance is less than 0.1, then there

is multicollinearity. The result is as follows:

Variable | VIF 1/VIF -------------+---------------------- fsize | 1.01 0.990891 toinc | 1.01 0.990891 -------------+---------------------- Mean VIF | 1.01

The result shown above shows no sign of multicollinearity in the data used for this study. The

VIF of the family size and toinc are less than 10, while the tolerance, as described by the third

column, is not less than 0.1. Therefore, the data is free of multicollinearity and the regression

estimates are not troublesome

Standardizing Variables

As aforementioned in this paper, the variables cannot be compared in their current values

since family size and total income are represented in different scales. To allow a comparison of

the influence of total income and family size on total expenditure, variables expressed as

deviations from the mean are generated. This is shown by the table below:

Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- ztotex | 38400 5.59e-06 1.000005 -.9500124 23.89904 ztoinc | 38400 1.72e-06 1.000002 -.6470511 103.9544 zfsize | 38400 1.83e-07 .9999999 -1.736995 7.073476

As shown by the table above, the variables are indeed standardized since their standard

deviations are very close to 1. Furthermore, the means are almost zero, denoted by their mean

being represented as a function of the natural logarithm e. Regressing the standardized variable

gives the following result:

Source | SS df MS Number of obs = 38400 -------------+------------------------------ F( 2, 38397) =23585.31 Model | 21168.298 2 10584.149 Prob > F = 0.0000 Residual | 17231.0478 38397 .448760263 R-squared = 0.5513 -------------+------------------------------ Adj R-squared = 0.5512 Total | 38399.3458 38399 1.000009 Root MSE = .6699 ------------------------------------------------------------------------------ ztotex | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ztoinc | .7292455 .0034343 212.34 0.000 .7225143 .7359768 zfsize | .0863328 .0034343 25.14 0.000 .0796016 .0930641 _cons | 4.32e-06 .0034185 0.00 0.999 -.0066961 .0067048 ------------------------------------------------------------------------------

After the regression, the variables have remained significant which would allow for

measurement and comparison of the influence of total income and family size on total

expenditure. The variables ztotex, ztoinc and zfsize are standardized versions of the variables

total expenditure, total income, and family size respectively, as denoted by the letter z preceding

each variable. Therefore, an increase of 1 peso in toinc (total income) will increase total

expenditure by 0.7 pesos or 70 centavos, that is, assuming all other variables are kept constant. In

the same manner, total expenditure increases by 0.09 pesos or 9 centavos when family size

increases by 1. With this interpretation using the coefficients of the standardized variables, it can

be said an increase in total income results to a greater increase in total expenditure than with an

increase in family size. Comparing this to the result of the preliminary regression, that is, before

standardizing the variables, to say that family size has a greater impact on total expenditure with

a coefficient of 660 against the coefficient of 0.41 of total income is very erroneous.

Residuals

Examining the normality of the error terms obtained from the robust regression is deemed

necessary. The results of the histogram and normal probability plot is given below; the histogram

of the error terms is similar to a bell-shape distribution while the in the normal probability plot,

the residuals seems to follow the straight line. Thus, this rejects the hypothesis that the error term

is not normally distributed. This is crucial, since from the robust regressions; if the error terms

are normally distributed then usually we cannot use the usual t and F tests. However, it is not the

case in this model. As noted, the OLS estimators are asymptotically normally distributed with

that the error term has finite variance, is homoscedastic, and the mean value of the error term is

zero. As a result, t and F tests may be valid, as long as the sample is reasonably large which in

this study, 38 400 are the respondents.

IV. Findings

In the initial regressions, a marginal increase in family size increases expenditure by 660

pesos, while in the standardized version, expenditure increases by 0.09 pesos. One reason for this

is that when an additional family member is born, a new set of expenditures are added. A new

family member would have to have food to eat and so, expenditure in food would have to

increase. Also, a new member would need to have the proper education. The expenditure of the

family would then have an increase equal to the amount. The same is true for the other

components of a households total expenditure besides food and education.

The variables chosen showed significance in the model used in this study. Their p-values

were very low which implied that the alternative hypothesis that the variables total income and

family size have no effect on total expenditure can be rejected. With this said the null hypothesis

that these variables, indeed, influence total expenditure can be accepted. Furthermore, it is found

out that a total income has a greater influence on the behavior of total expenditure than that of

the family size. A households total expenditure, on average, increases more with an increase in a

households total income than with an increase in the households family size.

V. Recommendation

While this study has established a clear relationship among a households total

expenditure, total income and family size, it has failed to show any kind of relationship between

a familys total income and this familys size. This paper, in general, showed and discussed the

significance of each variable and its isolated effect on total expenditure and could be improved

by looking for any relationship between the independent variables and by studying the

interaction of the variables. In doing so, the experimenter can further determine whether the

variables have an additive effect as an interaction. It would be interesting to know if there are

any additional effects in the interaction of both the family and income variables.

Any researcher who is interested in exploring this topic further may enhance the results

of this study by taking into consideration the effect of various policies particularly the

Reproductive Health Bill which may have an impact on family size, and may therefore influence

the behavior of total expenditure.

It would also be helpful to add more variables in the model used in this study and aim to

establish relationships between these variables and total expenditure.

Documents

A Multivariate Model of Total Expenditure Among Households in the Philippines - Pardilla Et Al