Upload
john-vincent-pardilla
View
222
Download
3
Embed Size (px)
DESCRIPTION
Econometrics
Citation preview
A Multivariate Model of Total Expenditure among households in the Philippines
Alyssa Ann Angeles
2011-43922
John Vincent R. Pardilla
2011-03625
Marcus J. Valdez
2011-40663
In partial fulfillment of the requirements for
Econ 131 Econometrics
1st Semester, AY 2013-2014
School of Economics
University of the Philippines
Diliman, Quezon City
October 14, 2013
I. Introduction
Do total income and family size have a positive impact on a households total
expenditure? This question is what this paper will be trying to answer.
Population has drastically increased in a matter of years. From a population of 88.55M in
August 2007, it has increased to 92.34M in May 2010. [1] According to the National Statistics
Office, Philippine population has an average annual growth rate of 1.90% during the period
2000-2010. This means that, each year, two persons are added for every 100 persons in the
population. [2] With this said, it could easily be deduced that an increase in the population would
mean an increase in family size in that additions to the population must belong to a certain
household or family. On the other hand, the average total income of families in both bottom 30%
and top 70% of the population has increased by 13 thousand pesos and 42 thousand pesos
respectively from 2006 to 2009. From an average of 49 thousand pesos, average annual income
of poor families, those who belong to the bottom 30% of the population, has increased to 62
thousands pesos, while the average annual income of non-poor families, top 70%, has increased
from 226 thousand pesos to 268 thousand pesos.[3] Lastly, total expenditure has also been
increasing from 2000 to 2009 as shown in the 2009 FIES.[4]
[1] National Statistics Office. Philippines in Figures. Retrieved from http://www.census.gov.ph/ [2] National Statistics Office. (2012). Population grew by 1.90 percent annually. In The 2010 Census of Population
and Housing Reveals the Philippine Population at 92.34 Million. Retrieved from
http://www.census.gov.ph/content/2010-census-population-and-housing-reveals-philippine-population-9234-
million [3] National Statistics Office. (2011, Feb 4). Families in the Bottom 30% Income Group Earned 62 Thousand Pesos in
2009 (Final Results from the 2009 Family Income and Expenditure Survey). Retrieved from
http://www.census.gov.ph/content/families-bottom-30-percent-income-group-earned-62-thousand-pesos-2009-
final-results-2009 [4] National Statistics Office. (2009). Percent Distribution of Annual Family Expenditures by Expenditure Group,
Philippines: 2000, 2003, 2006, 2009 [Data file]. Retrieved from
http://www.bles.dole.gov.ph/PUBLICATIONS/Yearbook%20of%20Labor%20Statistics/STATISTICAL%20TABLES/PDF
/CHAPTER%2012/Table%2012-3.pdf
At certain points between 2003 and 2009, it can be observed that all three variables
family size, total income and total expenditure have been increasing. However, to say that there
is a relationship, particularly a positive relationship, among the three variables based on intuition
and just by using the evidence provided above to support it is not enough. By using data from the
2009 Family Income and Expenditure Survey, this study aims to determine if there exists any
relationship among a households total expenditure, total income and family size and which of
the two independent variables, total income and family size, have a greater impact on total
expenditure. Moreover, this paper hypothesizes that there is a positive relationship among total
expenditure, total income and family size; that is, an increase in either or both total income and
family size would lead to an increase in total expenditure. However, specific components that
make up total expenditure will not be explored or thoroughly discussed in this paper.
II. Methodology
This study uses data from the 2009 Family Income and Expenditure Survey (FIES)
conducted by the National Statistics Office (NSO) every three years. There is a total of 38 400
observations in this survey which could be considered large enough to constitute the whole
population. Only cross-sectional data are to be used in this study. This paper will also be limited
in establishing relationships within the used data set. No time series data are used in this study.
Data
Variable Observations Mean Std. Dev. Min Max
Family Size 38400 47.45872 21.56525 10 200
Total Income 38400 195811.5 .4976421 0 3.04e+07
Total
Expenditure
38400 165984.9 164981.7 9250 4108871
Empirical Specification
Analytical Model
In this study, total expenditure was modeled as a function of total income and number of
members in the family
Total Expenditure= totex(toinc, fsize)
Total income was selected to be part of the model because theoretically, total income affects total
expenditure. The amount of income that a family is earning more or less determines their
expenditure. The family size is selected because the number of family members increases
expenditure, at least in the short run. In the long run, new family members become assets since
they themselves earn income. In our paper, the analysis is mostly static, involving only the year
2009 which eliminates the long run situation which can be problematic.
III. Regressions and Estimations
Preliminary Regression
OLS regression is first used in the model. Total expenditure is regressed against total
income and family size. The result of the preliminary regression using Stata is as follows:
Remarks Expected Effect Variable
It is hypothesized that as family size increases,
there are more expenses (such as food
expenses, education, etc) in the family thus
increasing the total expenditure
+ Family Size
As Keynes stated, men [women] are disposed,
as a rule and on average, to increase their
consumption as their income increases, but not
as much as the increase in their income. Keynes
postulated that the Marginal Propensity to
Consume is greater that zero but less than 1.
+
s.t. 0
. reg totex toinc fsize
Source | SS df MS Number of obs = 38400 -------------+------------------------------ F( 2, 38397) =23585.31 Model | 5.7617e+14 2 2.8809e+14 Prob > F = 0.0000 Residual | 4.6901e+14 38397 1.2215e+10 R-squared = 0.5513 -------------+------------------------------ Adj R-squared = 0.5512 Total | 1.0452e+15 38399 2.7219e+10 Root MSE = 1.1e+05
------------------------------------------------------------------------------ totex | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- toinc | .4145162 .0019521 212.34 0.000 .41069 .4183423 fsize | 660.4736 26.27325 25.14 0.000 608.9773 711.9698 _cons | 53472.66 1388.513 38.51 0.000 50751.14 56194.18
In this regression, toinc, totex and fsize are the variables for total income, total
expenditure and family size respectively. Looking at the result of the first run, a value of 0.5513
was obtained for the R2
after the initial regression. This means that about 55% of the variation in
total expenditure can be explained by the two regressors, total income and family size. Although
this value of the R squared is relatively low, such case is typically observed in cross-section data
with large number of observation; therefore, it cannot be concluded the model is not a good fit.
Proceeding to the examination of the coefficients, all coefficients are individually highly
significant since p-values are low. The F value, on the other hand, is very high which suggests
that collectively, all the variables are statistically significant as well. The first coefficient, the
coefficient of the variable total income, is positive which indicates that as total income increases
total expenditure also increases. Keeping all other factors equal, if total income of the family
increases by a peso, then, total expenditure also increases by 0.4 pesos or 40 centavos. This can
be further interpreted as the marginal propensity to consume. If the total income of the family
increases by 100, for example, the family would most likely spend about 41% of the increase in
total income. As emphasized earlier, the components of a familys expenditure will not be
explored or discussed in this paper. The second coefficient, which describes the family size, also
has a positive coefficient of 600 which means that holding the influence of total income constant,
total expenditure will increase by 600 pesos as family size increases by 1. This 660 peso increase
in total expenditure for a unit increase in family size cannot be compared to the 0.4 peso increase
in total expenditure for every peso increase in total income. We cannot state with certainty that
expenditure increases more with an increase in income rather than with an increase with family
size. However, standardizing these variables in a auxiliary regression (which will be done in the
latter part of this paper) will allow for the comparison of the influence of each independent
variable on total expenditure. On the other hand, the constant term, for the most part, has no
economic sense. What it should mean is that if income is 0 and family size is 0, then expenditure
would be 53,473. This cannot be interpreted that way since if family size is zero, then, there
should be no expenditure at all.
Estimation Issues
Despite having observations equal to 38400, which some may consider large enough to
represent the whole population, the model used in this paper is not rid of estimation issues. It is
therefore customary and appropriate to test for violations namely heteroscedasticity,
autocorrelation, and multicollinearity and remedy these in order to obtain estimations that are
free of such violations.
1. Heteroscedasticity
From the preliminary regression, a test was done to check for the presence of
heteroscedasticity in the data used. The result is as follows:
. hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of totex
chi2(1) = 1.80e+07 Prob > chi2 = 0.0000
The test results seem to point out the presence of heteroscedasticity in the data. The p value,
which is the value described in the second column, is very small at 0. Thus, we reject the null
hypothesis that the data exhibits constant variance across all values. Another regression using
Whites Heteroscedasticity-Consistent Variances and Standard Errors, also known as robust
standard errors, must be run again in order not to suffer from the consequences of interpreting
the results with the presence of heteroscedasticity. Using the robust regression instead of the
normal least squares eliminates the influence of existing outliers in the data which may have
caused the presence of heteroscedasticity. Below is the result of the robust regression:
Linear regression Number of obs = 38400 F( 2, 38397) = 1601.60 Prob > F = 0.0000 R-squared = 0.5513 Root MSE = 1.1e+05 ------------------------------------------------------------------------------ | Robust totex | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- toinc | .4145162 .1060899 3.91 0.000 .2065773 .622455 fsize | 660.4736 148.4128 4.45 0.000 369.5807 951.3665 _cons | 53472.66 13523.12 3.95 0.000 26966.99 79978.33 ------------------------------------------------------------------------------
There is not much change in the coefficients; the coefficients are still individually significant and
no noticeable change in the value of R2
after the regression. But as expected, there are some
changes in the estimated standard errors. Whites heteroscedasticity-corrected standard errors are
considerably larger than the OLS standard errors. Therefore, the estimated t values are much
smaller than those obtained by OLS.
2. Autocorrelation
As stated in the first part of this paper, no time-series data shall be used where
autocorrelation is more likely to occur. Therefore, it is assumed that the presence of
autocorrelation in this papers model is unlikely.
3. Multicollinearity
To check for multicollinearity, a VIF test was done. In this test, it is assumed that a VIF
of more than 10 will prompt for further investigation. The tolerance, which is 1/VIF, is another
measure of multicollinearity. In the data used in this study, if tolerance is less than 0.1, then there
is multicollinearity. The result is as follows:
Variable | VIF 1/VIF -------------+---------------------- fsize | 1.01 0.990891 toinc | 1.01 0.990891 -------------+---------------------- Mean VIF | 1.01
The result shown above shows no sign of multicollinearity in the data used for this study. The
VIF of the family size and toinc are less than 10, while the tolerance, as described by the third
column, is not less than 0.1. Therefore, the data is free of multicollinearity and the regression
estimates are not troublesome
Standardizing Variables
As aforementioned in this paper, the variables cannot be compared in their current values
since family size and total income are represented in different scales. To allow a comparison of
the influence of total income and family size on total expenditure, variables expressed as
deviations from the mean are generated. This is shown by the table below:
Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- ztotex | 38400 5.59e-06 1.000005 -.9500124 23.89904 ztoinc | 38400 1.72e-06 1.000002 -.6470511 103.9544 zfsize | 38400 1.83e-07 .9999999 -1.736995 7.073476
As shown by the table above, the variables are indeed standardized since their standard
deviations are very close to 1. Furthermore, the means are almost zero, denoted by their mean
being represented as a function of the natural logarithm e. Regressing the standardized variable
gives the following result:
Source | SS df MS Number of obs = 38400 -------------+------------------------------ F( 2, 38397) =23585.31 Model | 21168.298 2 10584.149 Prob > F = 0.0000 Residual | 17231.0478 38397 .448760263 R-squared = 0.5513 -------------+------------------------------ Adj R-squared = 0.5512 Total | 38399.3458 38399 1.000009 Root MSE = .6699 ------------------------------------------------------------------------------ ztotex | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ztoinc | .7292455 .0034343 212.34 0.000 .7225143 .7359768 zfsize | .0863328 .0034343 25.14 0.000 .0796016 .0930641 _cons | 4.32e-06 .0034185 0.00 0.999 -.0066961 .0067048 ------------------------------------------------------------------------------
After the regression, the variables have remained significant which would allow for
measurement and comparison of the influence of total income and family size on total
expenditure. The variables ztotex, ztoinc and zfsize are standardized versions of the variables
total expenditure, total income, and family size respectively, as denoted by the letter z preceding
each variable. Therefore, an increase of 1 peso in toinc (total income) will increase total
expenditure by 0.7 pesos or 70 centavos, that is, assuming all other variables are kept constant. In
the same manner, total expenditure increases by 0.09 pesos or 9 centavos when family size
increases by 1. With this interpretation using the coefficients of the standardized variables, it can
be said an increase in total income results to a greater increase in total expenditure than with an
increase in family size. Comparing this to the result of the preliminary regression, that is, before
standardizing the variables, to say that family size has a greater impact on total expenditure with
a coefficient of 660 against the coefficient of 0.41 of total income is very erroneous.
Residuals
Examining the normality of the error terms obtained from the robust regression is deemed
necessary. The results of the histogram and normal probability plot is given below; the histogram
of the error terms is similar to a bell-shape distribution while the in the normal probability plot,
the residuals seems to follow the straight line. Thus, this rejects the hypothesis that the error term
is not normally distributed. This is crucial, since from the robust regressions; if the error terms
are normally distributed then usually we cannot use the usual t and F tests. However, it is not the
case in this model. As noted, the OLS estimators are asymptotically normally distributed with
that the error term has finite variance, is homoscedastic, and the mean value of the error term is
zero. As a result, t and F tests may be valid, as long as the sample is reasonably large which in
this study, 38 400 are the respondents.
IV. Findings
In the initial regressions, a marginal increase in family size increases expenditure by 660
pesos, while in the standardized version, expenditure increases by 0.09 pesos. One reason for this
is that when an additional family member is born, a new set of expenditures are added. A new
family member would have to have food to eat and so, expenditure in food would have to
increase. Also, a new member would need to have the proper education. The expenditure of the
family would then have an increase equal to the amount. The same is true for the other
components of a households total expenditure besides food and education.
The variables chosen showed significance in the model used in this study. Their p-values
were very low which implied that the alternative hypothesis that the variables total income and
family size have no effect on total expenditure can be rejected. With this said the null hypothesis
that these variables, indeed, influence total expenditure can be accepted. Furthermore, it is found
out that a total income has a greater influence on the behavior of total expenditure than that of
the family size. A households total expenditure, on average, increases more with an increase in a
households total income than with an increase in the households family size.
V. Recommendation
While this study has established a clear relationship among a households total
expenditure, total income and family size, it has failed to show any kind of relationship between
a familys total income and this familys size. This paper, in general, showed and discussed the
significance of each variable and its isolated effect on total expenditure and could be improved
by looking for any relationship between the independent variables and by studying the
interaction of the variables. In doing so, the experimenter can further determine whether the
variables have an additive effect as an interaction. It would be interesting to know if there are
any additional effects in the interaction of both the family and income variables.
Any researcher who is interested in exploring this topic further may enhance the results
of this study by taking into consideration the effect of various policies particularly the
Reproductive Health Bill which may have an impact on family size, and may therefore influence
the behavior of total expenditure.
It would also be helpful to add more variables in the model used in this study and aim to
establish relationships between these variables and total expenditure.