Upload
dokiet
View
220
Download
3
Embed Size (px)
Citation preview
A STATISTICAL ANALYSIS OF LIFE
EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION
Sys 302 Project Professor Tony Smith
December 18, 2000
Miranda Chen Michael Ching
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
1
TABLE OF CONTENTS
I. INTRODUCTION A. Explanation of Chosen Economic and Social Variables B. Assumptions on the Regression Model C. Summary of Findings
II. ANALYSIS A. Single Regression Models of Life Expectancy Against Economic
and Social Variables B. Initial Multiple Regression C. Test for Multicollinearity D. Choosing Significant Variables Using Mixed Stepwise Regression E. Test of the Gauss-Markov Assumptions F. Predictions Using the Regression Models G. Does geography play a significant role in Life Expectancy?
III. CONCLUSION A. Multiple Regression Discussion
IV. SUPPLEMENTS A. Appendix A – Singles Regression Models B. Appendix B – Country Listing
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
2
I. INTRODUCTION
In the early sixteenth century, a Spanish explorer by the name of Ponce de Leon made
it his life’s mission to find the mystical “fountain of youth,” a famous spring, the waters of
which had the marvelous virtue of restoring youth and vigor to those who drank it. De Leon
embarked on his quest for the legendary springs, but instead landed in Florida on March 27th
1513, Easter Sunday, but to his dismay, found no Fons Juventutisn.
The quest to prolong our youth continues today, though not through lengthy field
explorations, but through improvements in health, nutrition, and medicine. A healthy diet,
regular exercise, and vaccinations can greatly improve an individual’s life expectancy, while an
outbreak of disease, malnutrition, and social unrest can drastically lower an individual’s life
expectancy.
But how are life expectancies affected on a national level? While these factors are
central to living longer, they alone cannot be the only facets. The social and economic
conditions of each country will undoubtedly affect its citizens, their lifestyles and decisions.
Citizens of wealthier countries have access to modern medicine and medical facilities, the
leisure to exercise, and meticulous regulation of sanitation and drinking water. Their life
expectancies, therefore, naturally should be higher than those of less developed countries.
However, this is not always the case. According to the World Health Organization (WHO),
the United State of America ranked 24th overall in terms of life expectancy among all
countries in the year 2000. Japan, Australia, France, Sweden, Spain, Italy, Greece, and
Switzerland, all ranked above the more developed United States.
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
3
It behooves us then to ask, what social and economic factors contribute most
significantly in determining life expectancy at birth? Do factors that establish a higher
standard of living necessarily lead to a longer lifespan? Can a model consisting of these
significant factors be constructed to accurately forecast life expectancy? And lastly, does the
decision of which geographic region we live in have a significant influence upon our life
expectancy?
A. Definitions and Explanation of Chosen Economic and Social Variables
Life expectancy is a qualitative reflection on the quality of life in a country, since
individuals can hope to live longer, fuller lives. It is an estimate of an individual's life span
derived from averaging the age all individuals who die in a particular year. Life expectancy
goes beyond per capita GNP, or literacy and education attainment in measuring the physical
well being of a person.
There are two basic life expectancy tables, one which projects average years of life
remaining for an individual of a given age and the other the average number of years of life for
all persons born in a given year. For our study, we have chosen the average life expectancy at
birth, since individuals who have survived childhood are more likely to have an extended life
span than the average member of their birth cohort, thus presenting a selection bias. We
selected economic and social variables that extended over across many different social and
economic conditions from over 140 countries, in hopes that these variables would cover as
many facets as possible and thus build an accurate model of life expectancy.
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
4
To allow comparisons across countries and over time, all the statistical tables are based
on internationally standardized data, collected and processed by agencies of the United
Nations such as the UN Development Project, the OECD, and UNICEF. These
organizations, whether collecting data from national sources or through their own surveys,
harmonized definitions and collection methods to make the data internationally comparable,
and the countries unbiased “sampling units.”
The list below was chosen to measure as many distinct components of the quality of
life as possible. But in no way is this list exhaustive. While we would have liked to include
such factors as diet and ethnicity, inadequate data barred us from doing so. A major problem
we faced with collecting global data was that developing or underdeveloped regions lacked
comprehensive reporting on many factors. Because missing data points would reduce the
amount of points in our regression, we were forced to make compromises---to either eliminate
categories lacking data for many countries or remove a country with insufficient records
altogether. As a result, several countries we would have liked to include have been excluded,
as have several factors that would seem to influence life expectancy. Through tedious cross-
referencing between sources, we were able to build an extensive compilation with data for 146
countries.
We regressed Life Expectancy at Birth with the following economic and social variables:
• = Economy: The wealthier a country is, the more money its citizens will have to spend
on healthcare, and correspondingly, the more likely they are to have time for leisurely
activities and exercise. We chose several variants of GDP, including per capita, purchasing
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
5
power equivalent dollars, and GDP growth, to examine the significance of the absolute
amount, comparable purchasing ability, and rate of increase in personal wealth. Inflation was
tested because high rates of inflation signify economic instability and may have distressing
effects on one's health. A dummy variable, "Country Development" was included in addition
to these variables, after plotting the distribution of life expectancy and observing what
appeared to be two distinct distributions. We hypothesized that in addition to GNP per
capita, the state of development of a country would likely be an important indicator of life
expectancy. One's were given to all "developed" countries, as classified by the UN
Development Project, and zero’s were assigned to all less developed countries.
30
40
50
60
70
80
Normal Distribution of Life Expectancy
1. GNP per capita ($US) 1995 2. GNP per capita annual growth rate (%) 1980-1995 3. Real GDP per Capita ($ Purchasing Power Parity) 1995 4. Average Annual Rate of Inflation (%) 1995 5. Country Development (1=developed, 0=underdeveloped)
• = Population Characteristics: Demographic conditions were tested to measure the urban
composition, and growth of the urban population as well as the overall population. Urban
composition estimates the percentage of individuals living in cities--centers of medicine and
modern advances, but also quarters of pollution and overcrowding. Population growth was
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
6
considered relevant because increasing overall population can lead to shortage of resources and
decreasing prosperity, as a nation's wealth must be spread among more individuals.
6. Urban population (% of total) 1995 7. Urban population annual growth rate(%) 1970-1995 8. Annual population growth rate (%) 1970-1995
• = Health: Health factors looked at availability and spending on health related facilities
by looking at health expenditure as a percentage of GDP and the number of physicians per
100,000 people. Contraceptive prevalence signaled how aware and willing individuals were
when engaging in sexual activity.
9. Public Expenditure on Health (% of GDP) 1990 10. Physicians (per 100,000) 1993 11. Contraceptive Prevalence (%) 1990-1995 12. Fertility Rate (births per woman) 1995
• = Disease: The epidemic effects of diseases such as AIDS on life expectancy are evident.
Thus the number of tuberculosis and AIDS cases in 1995 and 1996 was chosen as possible
indicators of life expectancy.
13. AIDS (per 100,000) 1996 14. Tuberculosis (per 100,000) 1995
• = Access to Information/Technology: Access to information allows people to be aware of
their surroundings--from weather updates and medical discoveries, to outbreaks of disease and
violence. Several modes of communication were considered; they included: radios, TV's,
newspapers and phones. Per capita electric consumption and commercial energy use assessed
the prevalence of conveniences such as lighting.
15. Radios (per 1000) 1995 16. Televisions (per 1000) 1995 17. Newspapers (per 1000) 1995 18. Telephone Lines (per 1000) 1995
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
7
19. Electricity Consumption per Capita (kwh) 1995 20. Commercial Energy Use per Capita (kg) 1994
• = Education: The more knowledge an individual has, the more he or she can make
informed life decisions, and improve his or her quality of life. Adult literacy rate and school
enrollment were recorded as good indicators of educational attainment.
21. Adult Literacy Rate (%) 1996 22. School Enrollment Rate (%) 1995 - Combined first-second and third-level
• = Environment: Environmental soundness measured in the forms of clean drinking
water, and proper sanitation is a reflection on the salutary conditions of the country. The
amount of forest and woodlands and rate of deforestation consider the amount of greenery
sand a country's dedication to preserve this, while CO2 emissions reflected air quality control.
23. Access to Safe Water (% of population) 1990-1996 24. Access to Sanitation (% of population) 1990-1996 25. Forest & Woodland (% of land area) 1995 26. Annual Rate of Deforestation (%) 1990-1995 27. CO2 Emissions per Capita (Metric tons) 1995
B. Assumptions on the Regression Model
1. It is assumed that our chosen economic and social variables exert an observable and
significant influence on life expectancy at birth of all nations. The relationship
between life expectancy and these variables is assumed to be linear and subject to
random error.
2. As much as we would have loved to collect data across all nations of the world, this
data was not readily available. Although finding most country data was simple, at
times it downright tedious. Several countries do not release the statistics of factors that
affect their economic and social conditions. When performing a multiple regression
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
8
analysis, any category that contains missing data will be left out of the data set. Thus,
data from all countries of the world could not be gathered and therefore were excluded
from the regression. Although this would have ensured the most complete regression
model, we assume our sample of 146 countries is a good reflection of this overall world
population, and that variables significant in our model will also apply to all nations of
the world.
3. Once again due to the inconsistencies with the data, the values we collected for our
variables were inconsistent. Although most of the data was recorded in the year 1995,
this was not always the case. It is far too expensive for data to be collected in every
nation during every year. Therefore, several of our variables were collected during
different year, and several collected over a span of several years. However, never was
data collected more than two years before or after 1995. It is assumed that extreme
fluctuations in the social and economic conditions did not occur during these years,
and thus this data set is appropriate.
4. Analysis of the Gauss-Markov assumptions will be performed in order to examine
whether the Gauss-Markov model is appropriate. To begin the regression, the Gauss-
Markov model was first assumed to be applicable.
C. Summary of Findings
We began our experiment with 27 variables reflecting the diverse components of life in
a country. Our preliminary regression with the 27 variables had an adjusted R2=0.8732.
However, due to a high-degree of multicollinearity, the significance of the variables were
undermined. The model was refined through a two-stages of deduction. We remedied the
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
9
problem by eliminating all variables that had a correlation higher than ± 0.8 from our
regression. This process removed variables such as: GDP in PPP, telephones lines, urban
growth, literacy, contraceptive prevalence rate, commercial energy use, radios, and country
development (dummy variable). By removing variables with high multicollinearity, we
increased the significance of factors such as GDP per capita, fertility, enrollment, and
population growth whose consequence was muted due to the multicollinearity problem.
With the remaining variables we then ran a mixed step-wise regression that determined which
variables were the most significant. From this we derived a final model, which explained
87.69% of the variance in life expectancy at birth:
Life Expectancy at Birth = (0.0001095 * GNP per Capita) + (1.4555274 * Annual
Population Growth) + (-3.623246 * Fertility Rate) + (-0.066892 * AIDS) + (-
0.016498 * Tuberculosis) + (0.1662502 * School Enrollment Rate) + (0.0524011 * Access
to Safe Water) + (-0.035922 * Forest and Woodlands) + (-0.557085 *Annual Rate of
Deforestation)
It is interesting to note that GNP per capita, Forest and Woodland Percentage, Deforestation,
and Access to Clean Water did not show significance in the initial multiple regression, nor in
the single regressions. Only after the removal of multicollinear variables, was their
significance realized.
We also tackled whether life expectancy was appreciably influenced by geography as
we compared the life expectancies across five continents. From our hypothesis tests, we can
conclude at the 95% level that geography plays a significant role in determining life
expectancy in most regions. The only situations when this was not the case was between
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
10
North America and South America, North America and Europe, and South American and
Europe.
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
11
II. ANALYSIS
A. Single Regression Models of Life Expectancy Against Economic and Social Variables
To begin our analysis, single regression models of life expectancy at birth were run
against each of our chosen economic and social indicator to obtain a graphical representation
of how well each variable could explain variances in life expectancy. These regression plots
can be found in Appendix A. These regressions, as well as many of the test found throughout
our analysis, were done using the statistical analysis software JMPIN.
The only variables that held a significant linear relationship (R2 > 0.60) with life
expectancy at birth were fertility rate, contraception prevalence, literacy rate, and enrollment
rate. Although significant in single regressions, it will be interesting to observe whether these
four variables will hold considerable weight in a multiple regression
B. Initial Multiple Regression
Where single regressions take into account the effect of one variable at a time, multiple
regressions simultaneously consider the effects of many variables. A standard least square
multiple regression was performed, plotting life expectancy against our chosen social and
economic variables. The results of our initial multiple regression are as follows:
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
12
Whole Model Test Actual by Predicted Plot
30
40
50
60
70
80
Life_Exp Actual
30 40 50 60 70 80 Life_Exp Predicted P<.0001 RSq=0.90 RMSE=3.8383
Summary of Fit Rsquare 0.895906RSquare Adj 0.873055Root Mean Square Error 3.838272Mean of Response 65.21192Observations (or Sum Wgts) 151
Parameter Estimates Term Estimate Std Error t Ratio Prob>|t|Intercept 56.717121 3.676142 15.43 <.0001Cntry _Dev -1.937297 1.59645 -1.21 0.2273GNPperCap -0.000206 0.000153 -1.35 0.1803GNP_Grow -0.008635 0.14209 -0.06 0.9516GNP_PPP 0.0005469 0.00024 2.28 0.0244Inflation -0.002367 0.003012 -0.79 0.4335Pop _Growth 1.9108988 0.623566 3.06 0.0027Urban_Pop 0.027884 0.025919 1.08 0.2841Urban_Grow -0.269322 0.308148 -0.87 0.3838Health_Exp -0.015305 0.188232 -0.08 0.9353Fertility -2.695058 0.522805 -5.15 <.0001Physician 0.0028381 0.004525 0.63 0.5316Contracep 0.0245563 0.035684 0.69 0.4926AIDS -0.062174 0.018091 -3.44 0.0008TB -0.01239 0.004392 -2.82 0.0056Radios -0.000963 0.002038 -0.47 0.6376TVs 0.0046905 0.005499 0.85 0.3953News 0.0014127 0.006631 0.21 0.8316Phone 0.0018864 0.006537 0.29 0.7734Elec_ Con 0.0001031 0.00019 0.54 0.5883Comm_ Energy -0.000456 0.000479 -0.95 0.3423Literacy 0.0444016 0.033779 1.31 0.1911Enrollment 0.1041778 0.036738 2.84 0.0053Water 0.016521 0.024449 0.68 0.5005Sanitation 0.0190143 0.020364 0.93 0.3523Forest -0.030681 0.01685 -1.82 0.0711Deforest -0.494224 0.250565 -1.97 0.0508CO2 -0.144861 0.140921 -1.03 0.3060
This model shows a strong linear fit with an R2 value of 0.8959. This means that 89.6%
of the variance has been accounted for in our model. Therefore, we can assume that our data set
is sufficient for creating a regression model for life expectancy at birth for all nations.
Although the goodness of fit is high, not many variables exert great significance with
respect to life expectancy at birth. GNP per capital (PPP), population growth, fertility rate,
AIDS, tuberculosis, and school enrollment show relatively low p-values (<0.0250). However,
this is only 6 of our 27 variables. A few discrepancies may also exist within this data set. The
insignificance of health expenditure is surprising, as its p-value is near one. It seems
counterintuitive that healthcare expenditure does not influence life expectancy, and that there
is actually a negative relationship as indicated by its coefficient. Also surprising is that
country development, our dummy variable, is insignificant because once again, it would make
sense that a country that is more developed would be able to provide a better standard of
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
13
living that an underdeveloped country. To ensure that the goodness of fit of the model is not
due entirely to the number of factors we chose to use, and to correct these discrepancies, this
initial multiple regression model must be refined.
C. Test for Multicollinearity
The assumption of the absence of multicollinearity is essential to the multiple regression
model. In a regression, the X-variables are assumed to be independent, but with
multicollinearity, these variables are actually correlated with one another. For example, if X1 and
X2 are highly correlated, then when we add X1 to our model, we also add a bit of X2. Thus, the
significance of both X1 and X2 are diluted. This phenomenon leads to high standard error.
Therefore, in order to refine the model, a correlation plot between life expectancy and
our indicators was performed. This plot identifies which variables are highly correlated.
Our initial multicorrelation test is shown below:
Life _Exp
Cntry _Dev
GNP perCap
GNP _Grow
GNP _PPP
Inflation
Pop _ Growth
Urban _Pop
Urban _Grow
Health _Exp Fertility Physician Contracep AIDS TB Radios TVs News Phone
Elec _ Con
Comm Energy Literacy Enrollment Water Sanitation Forest Deforest CO2
Life_Exp 1.000 0.516 0.588 0.187 0.701 -0.117 -0.455 0.739 -0.637 0.382 -0.842 0.643 0.851 -0.250 -0.452 0.639 0.771 0.520 0.712 0.581 0.593 0.796 0.776 0.629 0.680 -0.030 0.155 0.553
Cntry _Dev 0.516 1.000 0.760 0.148 0.745 -0.135 -0.543 0.430 -0.517 0.682 -0.523 0.554 0.535 -0.151 -0.297 0.642 0.771 0.806 0.817 0.680 0.549 0.449 0.506 0.392 0.421 0.021 0.301 0.423
GNPperCap 0.588 0.760 1.000 0.153 0.944 -0.150 -0.369 0.597 -0.437 0.583 -0.520 0.458 0.525 -0.099 -0.305 0.704 0.795 0.788 0.897 0.799 0.741 0.459 0.502 0.460 0.489 -0.045 0.229 0.615
GNP_Grow 0.187 0.148 0.153 1.000 0.186 0.271 -0.219 0.016 -0.080 0.103 -0.189 0.038 0.303 -0.090 -0.199 0.113 0.164 0.101 0.199 0.078 0.020 0.173 0.150 0.141 0.120 0.011 0.033 0.007
GNP_PPP 0.701 0.745 0.944 0.186 1.000 -0.174 -0.356 0.693 -0.463 0.570 -0.602 0.483 0.614 -0.086 -0.369 0.756 0.841 0.702 0.911 0.823 0.826 0.546 0.592 0.573 0.599 -0.093 0.238 0.722
Inflation -0.117 -0.135 -0.150 0.271 -0.174 1.000 0.012 -0.084 -0.014 -0.021 0.084 0.122 -0.072 -0.074 0.010 -0.128 -0.075 -0.042 -0.126 -0.083 -0.053 -0.002 -0.027 -0.202 -0.153 -0.016 0.085 -0.033
Pop _Growth -0.455 -0.543 -0.369 -0.219 -0.356 0.012 1.000 -0.283 0.814 -0.438 0.698 -0.539 -0.577 0.122 0.192 -0.532 -0.572 -0.557 -0.542 -0.330 -0.155 -0.537 -0.422 -0.254 -0.331 -0.202 -0.242 -0.085
Urban_Pop 0.739 0.430 0.597 0.016 0.693 -0.084 -0.283 1.000 -0.430 0.383 -0.620 0.604 0.605 -0.137 -0.351 0.648 0.725 0.477 0.673 0.602 0.689 0.638 0.667 0.585 0.596 -0.135 0.183 0.653
Urban_Grow -0.637 -0.517 -0.437 -0.080 -0.463 -0.014 0.814 -0.430 1.000 -0.352 0.775 -0.593 -0.691 0.223 0.333 -0.562 -0.654 -0.536 -0.592 -0.417 -0.326 -0.661 -0.535 -0.339 -0.414 -0.182 -0.195 -0.269
Health_Exp 0.382 0.682 0.583 0.103 0.570 -0.021 -0.438 0.383 -0.352 1.000 -0.417 0.493 0.404 0.082 -0.157 0.545 0.615 0.628 0.634 0.586 0.478 0.382 0.460 0.284 0.298 0.015 0.261 0.369
Fertility -0.842 -0.523 -0.520 -0.189 -0.602 0.084 0.698 -0.620 0.775 -0.417 1.000 -0.686 -0.881 0.151 0.311 -0.604 -0.752 -0.554 -0.683 -0.511 -0.509 -0.817 -0.705 -0.511 -0.600 -0.115 -0.266 -0.467
Physician 0.643 0.554 0.458 0.038 0.483 0.122 -0.539 0.604 -0.593 0.493 -0.686 1.000 0.639 -0.234 -0.318 0.473 0.664 0.571 0.606 0.472 0.461 0.641 0.584 0.393 0.455 -0.050 0.378 0.409
Contracep 0.851 0.535 0.525 0.303 0.614 -0.072 -0.577 0.605 -0.691 0.404 -0.881 0.639 1.000 -0.169 -0.358 0.590 0.727 0.543 0.676 0.510 0.483 0.811 0.765 0.536 0.594 0.027 0.163 0.420
AIDS -0.250 -0.151 -0.099 -0.090 -0.086 -0.074 0.122 -0.137 0.223 0.082 0.151 -0.234 -0.169 1.000 0.252 -0.095 -0.218 -0.175 -0.158 -0.151 -0.073 -0.061 -0.007 -0.062 -0.119 -0.033 -0.081 -0.147
TB -0.452 -0.297 -0.305 -0.199 -0.369 0.010 0.192 -0.351 0.333 -0.157 0.311 -0.318 -0.358 0.252 1.000 -0.379 -0.402 -0.262 -0.365 -0.312 -0.320 -0.293 -0.254 -0.247 -0.233 -0.023 -0.027 -0.282
Radios 0.639 0.642 0.704 0.113 0.756 -0.128 -0.532 0.648 -0.562 0.545 -0.604 0.473 0.590 -0.095 -0.379 1.000 0.815 0.644 0.804 0.666 0.644 0.575 0.628 0.482 0.502 0.039 0.149 0.558
TVs 0.771 0.771 0.795 0.164 0.841 -0.075 -0.572 0.725 -0.654 0.615 -0.752 0.664 0.727 -0.218 -0.402 0.815 1.000 0.763 0.901 0.776 0.753 0.685 0.692 0.519 0.556 -0.004 0.289 0.670
News 0.520 0.806 0.788 0.101 0.702 -0.042 -0.557 0.477 -0.536 0.628 -0.554 0.571 0.543 -0.175 -0.262 0.644 0.763 1.000 0.804 0.748 0.569 0.497 0.497 0.348 0.430 0.103 0.312 0.451
Phone 0.712 0.817 0.897 0.199 0.911 -0.126 -0.542 0.673 -0.592 0.634 -0.683 0.606 0.676 -0.158 -0.365 0.804 0.901 0.804 1.000 0.815 0.743 0.608 0.645 0.517 0.563 -0.009 0.333 0.621
Elec_ Con 0.581 0.680 0.799 0.078 0.823 -0.083 -0.330 0.602 -0.417 0.586 -0.511 0.472 0.510 -0.151 -0.312 0.666 0.776 0.748 0.815 1.000 0.841 0.484 0.539 0.439 0.454 -0.023 0.269 0.726
Comm_ Energy 0.593 0.549 0.741 0.020 0.826 -0.053 -0.155 0.689 -0.326 0.478 -0.509 0.461 0.483 -0.073 -0.320 0.644 0.753 0.569 0.743 0.841 1.000 0.491 0.526 0.475 0.480 -0.122 0.249 0.904
Literacy 0.796 0.449 0.459 0.173 0.546 -0.002 -0.537 0.638 -0.661 0.382 -0.817 0.641 0.811 -0.061 -0.293 0.575 0.685 0.497 0.608 0.484 0.491 1.000 0.826 0.552 0.574 0.177 0.215 0.457
Enrollment 0.776 0.506 0.502 0.150 0.592 -0.027 -0.422 0.667 -0.535 0.460 -0.705 0.584 0.765 -0.007 -0.254 0.628 0.692 0.497 0.645 0.539 0.526 0.826 1.000 0.552 0.611 0.046 0.243 0.486
Water 0.629 0.392 0.460 0.141 0.573 -0.202 -0.254 0.585 -0.339 0.284 -0.511 0.393 0.536 -0.062 -0.247 0.482 0.519 0.348 0.517 0.439 0.475 0.552 0.552 1.000 0.743 -0.102 0.033 0.473
Sanitation 0.680 0.421 0.489 0.120 0.599 -0.153 -0.331 0.596 -0.414 0.298 -0.600 0.455 0.594 -0.119 -0.233 0.502 0.556 0.430 0.563 0.454 0.480 0.574 0.611 0.743 1.000 -0.043 0.176 0.475
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
14
Forest -0.030 0.021 -0.045 0.011 -0.093 -0.016 -0.202 -0.135 -0.182 0.015 -0.115 -0.050 0.027 -0.033 -0.023 0.039 -0.004 0.103 -0.009 -0.023 -0.122 0.177 0.046 -0.102 -0.043 1.000 -0.008 -0.147
Deforest 0.155 0.301 0.229 0.033 0.238 0.085 -0.242 0.183 -0.195 0.261 -0.266 0.378 0.163 -0.081 -0.027 0.149 0.289 0.312 0.333 0.269 0.249 0.215 0.243 0.033 0.176 -0.008 1.000 0.247
CO2 0.553 0.423 0.615 0.007 0.722 -0.033 -0.085 0.653 -0.269 0.369 -0.467 0.409 0.420 -0.147 -0.282 0.558 0.670 0.451 0.621 0.726 0.904 0.457 0.486 0.473 0.475 -0.147 0.247 1.000
As can be seen from the highlighted boxes, the multicollinarity test revealed a great
deal of correlation between certain categorical variables. Especially high were the amount
correlations GNP per Capita (PPP) and phones held with other variables. Per capita GNP
(PPP) was highly correlated with GNP per Capita, televisions, telephone lines, electricity
consumption, and commercial energy use. This is because ownership of televisions,
telephones, and electric appliances are consumption expenditures that increase with
prosperity. Telephone lines were highly correlated with other modes of communication
such as radio, televisions, newspapers, as well as GNP per Capita, GNP growth, average
annual rate of inflation and electricity consumption. Due to their excessive multicollinarities,
these two variables, phones and GNP (PPP), were removed. A second correlation plot was
performed without these two factors.
Life _Exp
Cntry _Dev
GNP perCap
GNP _Grow
Inflation
Pop _ Growth
Urban _Pop
Urban _Grow
Health _Exp Fertility Physician Contracep AIDS TB Radios TVs News
Elec_ Con
Comm_ Energy Literacy Enrollment Water Sanitation Forest Deforest CO2
Life_Exp 1.000 0.516 0.588 0.187 -0.117 -0.455 0.739 -0.637 0.382 -0.842 0.643 0.851 -0.250 -0.452 0.639 0.771 0.520 0.581 0.593 0.796 0.776 0.629 0.680 -0.030 0.155 0.553
Cntry _Dev 0.516 1.000 0.760 0.148 -0.135 -0.543 0.430 -0.517 0.682 -0.523 0.554 0.535 -0.151 -0.297 0.642 0.771 0.806 0.680 0.549 0.449 0.506 0.392 0.421 0.021 0.301 0.423
GNPperCap 0.588 0.760 1.000 0.153 -0.150 -0.369 0.597 -0.437 0.583 -0.520 0.458 0.525 -0.099 -0.305 0.704 0.795 0.788 0.799 0.741 0.459 0.502 0.460 0.489 -0.045 0.229 0.615
GNP_Grow 0.187 0.148 0.153 1.000 0.271 -0.219 0.016 -0.080 0.103 -0.189 0.038 0.303 -0.090 -0.199 0.113 0.164 0.101 0.078 0.020 0.173 0.150 0.141 0.120 0.011 0.033 0.007
Inflation -0.117 -0.135 -0.150 0.271 1.000 0.012 -0.084 -0.014 -0.021 0.084 0.122 -0.072 -0.074 0.010 -0.128 -0.075 -0.042 -0.083 -0.053 -0.002 -0.027 -0.202 -0.153 -0.016 0.085 -0.033
Pop _Growth -0.455 -0.543 -0.369 -0.219 0.012 1.000 -0.283 0.814 -0.438 0.698 -0.539 -0.577 0.122 0.192 -0.532 -0.572 -0.557 -0.330 -0.155 -0.537 -0.422 -0.254 -0.331 -0.202 -0.242 -0.085
Urban_Pop 0.739 0.430 0.597 0.016 -0.084 -0.283 1.000 -0.430 0.383 -0.620 0.604 0.605 -0.137 -0.351 0.648 0.725 0.477 0.602 0.689 0.638 0.667 0.585 0.596 -0.135 0.183 0.653
Urban_Grow -0.637 -0.517 -0.437 -0.080 -0.014 0.814 -0.430 1.000 -0.352 0.775 -0.593 -0.691 0.223 0.333 -0.562 -0.654 -0.536 -0.417 -0.326 -0.661 -0.535 -0.339 -0.414 -0.182 -0.195 -0.269
Health_Exp 0.382 0.682 0.583 0.103 -0.021 -0.438 0.383 -0.352 1.000 -0.417 0.493 0.404 0.082 -0.157 0.545 0.615 0.628 0.586 0.478 0.382 0.460 0.284 0.298 0.015 0.261 0.369
Fertility -0.842 -0.523 -0.520 -0.189 0.084 0.698 -0.620 0.775 -0.417 1.000 -0.686 -0.881 0.151 0.311 -0.604 -0.752 -0.554 -0.511 -0.509 -0.817 -0.705 -0.511 -0.600 -0.115 -0.266 -0.467
Physician 0.643 0.554 0.458 0.038 0.122 -0.539 0.604 -0.593 0.493 -0.686 1.000 0.639 -0.234 -0.318 0.473 0.664 0.571 0.472 0.461 0.641 0.584 0.393 0.455 -0.050 0.378 0.409
Contracep 0.851 0.535 0.525 0.303 -0.072 -0.577 0.605 -0.691 0.404 -0.881 0.639 1.000 -0.169 -0.358 0.590 0.727 0.543 0.510 0.483 0.811 0.765 0.536 0.594 0.027 0.163 0.420
AIDS -0.250 -0.151 -0.099 -0.090 -0.074 0.122 -0.137 0.223 0.082 0.151 -0.234 -0.169 1.000 0.252 -0.095 -0.218 -0.175 -0.151 -0.073 -0.061 -0.007 -0.062 -0.119 -0.033 -0.081 -0.147
TB -0.452 -0.297 -0.305 -0.199 0.010 0.192 -0.351 0.333 -0.157 0.311 -0.318 -0.358 0.252 1.000 -0.379 -0.402 -0.262 -0.312 -0.320 -0.293 -0.254 -0.247 -0.233 -0.023 -0.027 -0.282
Radios 0.639 0.642 0.704 0.113 -0.128 -0.532 0.648 -0.562 0.545 -0.604 0.473 0.590 -0.095 -0.379 1.000 0.815 0.644 0.666 0.644 0.575 0.628 0.482 0.502 0.039 0.149 0.558
TVs 0.771 0.771 0.795 0.164 -0.075 -0.572 0.725 -0.654 0.615 -0.752 0.664 0.727 -0.218 -0.402 0.815 1.000 0.763 0.776 0.753 0.685 0.692 0.519 0.556 -0.004 0.289 0.670
News 0.520 0.806 0.788 0.101 -0.042 -0.557 0.477 -0.536 0.628 -0.554 0.571 0.543 -0.175 -0.262 0.644 0.763 1.000 0.748 0.569 0.497 0.497 0.348 0.430 0.103 0.312 0.451
Elec_ Con 0.581 0.680 0.799 0.078 -0.083 -0.330 0.602 -0.417 0.586 -0.511 0.472 0.510 -0.151 -0.312 0.666 0.776 0.748 1.000 0.841 0.484 0.539 0.439 0.454 -0.023 0.269 0.726
Comm_ Energy 0.593 0.549 0.741 0.020 -0.053 -0.155 0.689 -0.326 0.478 -0.509 0.461 0.483 -0.073 -0.320 0.644 0.753 0.569 0.841 1.000 0.491 0.526 0.475 0.480 -0.122 0.249 0.904
Literacy 0.796 0.449 0.459 0.173 -0.002 -0.537 0.638 -0.661 0.382 -0.817 0.641 0.811 -0.061 -0.293 0.575 0.685 0.497 0.484 0.491 1.000 0.826 0.552 0.574 0.177 0.215 0.457
Enrollment 0.776 0.506 0.502 0.150 -0.027 -0.422 0.667 -0.535 0.460 -0.705 0.584 0.765 -0.007 -0.254 0.628 0.692 0.497 0.539 0.526 0.826 1.000 0.552 0.611 0.046 0.243 0.486
Water 0.629 0.392 0.460 0.141 -0.202 -0.254 0.585 -0.339 0.284 -0.511 0.393 0.536 -0.062 -0.247 0.482 0.519 0.348 0.439 0.475 0.552 0.552 1.000 0.743 -0.102 0.033 0.473
Sanitation 0.680 0.421 0.489 0.120 -0.153 -0.331 0.596 -0.414 0.298 -0.600 0.455 0.594 -0.119 -0.233 0.502 0.556 0.430 0.454 0.480 0.574 0.611 0.743 1.000 -0.043 0.176 0.475
Forest -0.030 0.021 -0.045 0.011 -0.016 -0.202 -0.135 -0.182 0.015 -0.115 -0.050 0.027 -0.033 -0.023 0.039 -0.004 0.103 -0.023 -0.122 0.177 0.046 -0.102 -0.043 1.000 -0.008 -0.147
Deforest 0.155 0.301 0.229 0.033 0.085 -0.242 0.183 -0.195 0.261 -0.266 0.378 0.163 -0.081 -0.027 0.149 0.289 0.312 0.269 0.249 0.215 0.243 0.033 0.176 -0.008 1.000 0.247
CO2 0.553 0.423 0.615 0.007 -0.033 -0.085 0.653 -0.269 0.369 -0.467 0.409 0.420 -0.147 -0.282 0.558 0.670 0.451 0.726 0.904 0.457 0.486 0.473 0.475 -0.147 0.247 1.000
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
15
The table above reveals that many factors still contain multicollinearities, as denoted by
the highlighted boxes. These variables are as follows:
1. Country Development and Newspapers 2. Population Growth and Urban Population Growth 3. Literacy Rate and School Enrollment Rate 4. Electricity Consumption and Commercial Energy Use 5. CO2 Emissions and Commercial Energy Use 6. Fertility Rate and Contraception Prevalence Rate 7. Literacy Rate and Fertility Rate 8. Literacy Rate and Contraception Prevalence Rate 9. Televisions and Radios
Intuitively, these correlations make sense. If a country is developed, then the more
likely it will have the facilities, supplies, and audience needed for a newspaper to be successful.
If a nation's urban population is growing, so too will its population grow. If a nation's
school enrollment rate is high, which means that many children are receiving an education,
the country's literacy rate likewise should also be high. If a country consumes a lot of
electricity commercially, electricity consumption will also be significant. This commercial
energy use will generate a good deal of pollution, including carbon dioxide (CO2). The use of
contraceptives, such as birth pills and condoms, logically, should have a strong negative
relation to fertility rate, as then women who use contraceptives will have fewer births. If the
literacy rate is high, then a nation's population will likely have a better understanding of the
risks of sexual promiscuity, and thus a lower fertility rate and the greater use of
contraceptives. Lastly, since the use of both televisions and radios increase with prosperity, it
seems reasonable that these two mediums were highly correlated with each other.
In order to determine which of these multicollinear variables to remove, we compared
their significance based on p-values. Since not all of these variables are on the same scale, their
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
16
coefficients cannot be directly compared to determine the relative influence of each factor.
Therefore, the absolute t-ratios of each variable were compared instead. These t-ratios were
taken from a multiple regression with the variables GNP per Capita (PPP) and telephone lines
removed. The tables of these parameter estimates can be found below:
Parameter Estimates Term Estimate Std Error t Ratio Prob>|t|Intercept 56.410473 3.742883 15.07 <.0001Cntry _Dev -0.843803 1.760322 -0.48 0.6325GNPperCap 0.0000995 0.000081 1.22 0.2237GNP_Grow 0.0551661 0.141074 0.39 0.6964Inflation -0.003138 0.003047 -1.03 0.3051Pop _Growth 1.924076 0.63214 3.04 0.0028Urban_Pop 0.0350771 0.026386 1.33 0.1861Urban_Grow -0.329277 0.311792 -1.06 0.2930Health_Exp 0.0106446 0.194974 0.05 0.9565Fertility -2.635271 0.535813 -4.92 <.0001Physician 0.0014142 0.004532 0.31 0.7555Contracep 0.0390728 0.035743 1.09 0.2764AIDS -0.055589 0.018109 -3.07 0.0026TB -0.013266 0.00442 -3.00 0.0032Radios 0.00027 0.001934 0.14 0.8892TVs 0.0063612 0.005525 1.15 0.2518News -0.005775 0.005917 -0.98 0.3309Elec_ Con 0.0001876 0.000187 1.00 0.3190Comm_ Energy -0.00029 0.000479 -0.61 0.5456Water 0.0196559 0.024743 0.79 0.4285Sanitation 0.0328903 0.019812 1.66 0.0994Forest -0.03359 0.017044 -1.97 0.0510Deforest -0.400283 0.245403 -1.63 0.1054CO2 -0.117938 0.141341 -0.83 0.4056Literacy 0.0445918 0.034292 1.30 0.1959Enrollment 0.0953554 0.037024 2.58 0.0112
For example, the t-value for school enrollment was 2.58 compared to the value of the t-ratio
for the literacy rate, which was 1.30. As a check, multiple regressions were also performed,
once with enrollment and without literacy, and once without enrollment and with literacy.
The adjusted R2 values of these regressions were then compared. The adjusted R2 of the
regression including enrollment was 0.88964, while the test including literacy yielded an
adjusted R2 of 0.86350. Thus, literacy rate was removed from our regression model. Similar
test were done with the other combinations. The following variables were also removed from
our regression model: country development, urban population growth, contraceptive
prevalence, commercial energy use, and radios. The insignificance of our dummy variable
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
17
country development is somewhat surprising, though there does exist a logical argument for
this. Although developed, industrial countries tend to have higher life expectancies at birth
than underdeveloped countries, this is not always the case. One example, as mentioned
earlier, is the United States of America, which ranks very highly in terms of development, but
whose life expectancy is not extremely high. An observation of our data set also reveals that
there are several less developed countries that have extremely high life expectancies, such as
Jamaica, Kuwait and Singapore, which have life expectancies of 74.1, 75.4, and 77.1 years,
respectively. These discrepancies result in the removal of country development.
Our final correlation test, with country development, urban population growth,
literacy rate, contraceptive prevalence, commercial energy use, radios, telephone lines, and
GNP per Capita (PPP) removed, produced the following table:
Life _Exp
GNP perCap
GNP _Grow Inflation
Pop _ Growth
Urban _Pop
Health _Exp Fertility Physician AIDS TB TVs News
Elec _ Con Enrollment Water Sanitation Forest Deforest CO2
Life_Exp 1.000 0.588 0.187 -0.117 -0.455 0.739 0.382 -0.842 0.643 -0.250 -0.452 0.771 0.520 0.581 0.776 0.629 0.680 -0.030 0.155 0.553
GNPperCap 0.588 1.000 0.153 -0.150 -0.369 0.597 0.583 -0.520 0.458 -0.099 -0.305 0.795 0.788 0.799 0.502 0.460 0.489 -0.045 0.229 0.615
GNP_Grow 0.187 0.153 1.000 0.271 -0.219 0.016 0.103 -0.189 0.038 -0.090 -0.199 0.164 0.101 0.078 0.150 0.141 0.120 0.011 0.033 0.007
Inflation -0.117 -0.150 0.271 1.000 0.012 -0.084 -0.021 0.084 0.122 -0.074 0.010 -0.075 -0.042 -0.083 -0.027 -0.202 -0.153 -0.016 0.085 -0.033
Pop _Growth -0.455 -0.369 -0.219 0.012 1.000 -0.283 -0.438 0.698 -0.539 0.122 0.192 -0.572 -0.557 -0.330 -0.422 -0.254 -0.331 -0.202 -0.242 -0.085
Urban_Pop 0.739 0.597 0.016 -0.084 -0.283 1.000 0.383 -0.620 0.604 -0.137 -0.351 0.725 0.477 0.602 0.667 0.585 0.596 -0.135 0.183 0.653
Health_Exp 0.382 0.583 0.103 -0.021 -0.438 0.383 1.000 -0.417 0.493 0.082 -0.157 0.615 0.628 0.586 0.460 0.284 0.298 0.015 0.261 0.369
Fertility -0.842 -0.520 -0.189 0.084 0.698 -0.620 -0.417 1.000 -0.686 0.151 0.311 -0.752 -0.554 -0.511 -0.705 -0.511 -0.600 -0.115 -0.266 -0.467
Physician 0.643 0.458 0.038 0.122 -0.539 0.604 0.493 -0.686 1.000 -0.234 -0.318 0.664 0.571 0.472 0.584 0.393 0.455 -0.050 0.378 0.409
AIDS -0.250 -0.099 -0.090 -0.074 0.122 -0.137 0.082 0.151 -0.234 1.000 0.252 -0.218 -0.175 -0.151 -0.007 -0.062 -0.119 -0.033 -0.081 -0.147
TB -0.452 -0.305 -0.199 0.010 0.192 -0.351 -0.157 0.311 -0.318 0.252 1.000 -0.402 -0.262 -0.312 -0.254 -0.247 -0.233 -0.023 -0.027 -0.282
TVs 0.771 0.795 0.164 -0.075 -0.572 0.725 0.615 -0.752 0.664 -0.218 -0.402 1.000 0.763 0.776 0.692 0.519 0.556 -0.004 0.289 0.670
News 0.520 0.788 0.101 -0.042 -0.557 0.477 0.628 -0.554 0.571 -0.175 -0.262 0.763 1.000 0.748 0.497 0.348 0.430 0.103 0.312 0.451
Elec_ Con 0.581 0.799 0.078 -0.083 -0.330 0.602 0.586 -0.511 0.472 -0.151 -0.312 0.776 0.748 1.000 0.539 0.439 0.454 -0.023 0.269 0.726
Enrollment 0.776 0.502 0.150 -0.027 -0.422 0.667 0.460 -0.705 0.584 -0.007 -0.254 0.692 0.497 0.539 1.000 0.552 0.611 0.046 0.243 0.486
Water 0.629 0.460 0.141 -0.202 -0.254 0.585 0.284 -0.511 0.393 -0.062 -0.247 0.519 0.348 0.439 0.552 1.000 0.743 -0.102 0.033 0.473
Sanitation 0.680 0.489 0.120 -0.153 -0.331 0.596 0.298 -0.600 0.455 -0.119 -0.233 0.556 0.430 0.454 0.611 0.743 1.000 -0.043 0.176 0.475
Forest -0.030 -0.045 0.011 -0.016 -0.202 -0.135 0.015 -0.115 -0.050 -0.033 -0.023 -0.004 0.103 -0.023 0.046 -0.102 -0.043 1.000 -0.008 -0.147
Deforest 0.155 0.229 0.033 0.085 -0.242 0.183 0.261 -0.266 0.378 -0.081 -0.027 0.289 0.312 0.269 0.243 0.033 0.176 -0.008 1.000 0.247
CO2 0.553 0.615 0.007 -0.033 -0.085 0.653 0.369 -0.467 0.409 -0.147 -0.282 0.670 0.451 0.726 0.486 0.473 0.475 -0.147 0.247 1.000
The above grid reveals that with the removal of these 8 variables, much of the
multicollinearity problem has been remedied. Still, it is important to recognize the unfeasibility
of completely removing multicollinearity since most of the indicators are related in some way.
Although the removal of these variables will decrease R2 and thus the degree of overall
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
18
explanation, the greater goal of this analysis is to obtain the best combination of significant
factors.
D. Choosing Significant Variables Using Mixed Stepwise Regression
In order to determine which variables are the most significant, a stepwise regression is
performed. Stepwise regression allows us to search for the best model from all possible
regressions by successively adding and subtracting variables according to their significance.
The result of our mixed stepwise regression with probability to enter set at 0.150 and
probability to leave set at 0.100 is as follows:
Stepwise Regression Control Prob to Enter 0.150 Prob to Leave 0.100 Direction: Mixed Current Estimates
SSE DFE MSE RSquare RSquare Adj Cp AIC 2143.3161 141 15.20082 0.8769 0.8690 10.25259 420.5773
Lock
Entered Parameter Estimate nDF SS "F Ratio" "Prob>F"
X X Intercept 62.9243535 1 0 0.000 1.0000 X GNPperCap 0.00010952 1 95.8992 6.309 0.0131 GNP_Grow . 1 0.03087 0.002 0.9642 Inflation . 1 13.20918 0.868 0.3531 X Pop _Growth 1.45552738 1 240.4819 15.820 0.0001 Urban_Pop . 1 40.1306 2.671 0.1044 Health_Exp . 1 2.553851 0.167 0.6834 X Fertility -3.6232459 1 1829.988 120.387 0.0000 Physician . 1 1.578741 0.103 0.7485 X AIDS -0.0668917 1 286.5572 18.851 0.0000 X TB -0.0164982 1 251.9342 16.574 0.0001 TVs . 1 10.3615 0.680 0.4110 News . 1 7.73195 0.507 0.4777 Elec_ Con . 1 0.031085 0.002 0.9641 X Enrollment 0.16625024 1 635.5259 41.809 0.0000 X Water 0.05240111 1 112.556 7.405 0.0073 Sanitation . 1 30.03489 1.990 0.1606 X Forest -0.035922 1 85.91684 5.652 0.0188 X Deforest -0.5570848 1 94.91818 6.244 0.0136 CO2 . 1 16.55525 1.090 0.2983
Step History
Step Parameter Action "Sig Prob" Seq SS RSquare Cp p1 Fertility Entered 0.0000 12342.99 0.7090 186.8 22 Urban_Pop Entered 0.0000 1332.235 0.7856 101 33 Enrollment Entered 0.0000 427.1613 0.8101 74.852 44 TB Entered 0.0000 398.1551 0.8330 50.612 55 Pop _Growth Entered 0.0006 224.7891 0.8459 37.798 6
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
19
Step Parameter Action "Sig Prob" Seq SS RSquare Cp p6 AIDS Entered 0.0004 228.0112 0.8590 24.771 77 Water Entered 0.0019 160.9674 0.8682 16.163 88 Deforest Entered 0.0353 70.68607 0.8723 13.504 99 Forest Entered 0.0416 64.74455 0.8760 11.237 10
10 GNPperCap Entered 0.0576 55.07309 0.8792 9.6078 1111 Urban_Pop Removed 0.1044 40.1306 0.8769 10.253 10
From this mixed stepwise regression, the following variables have been found to exert the
greatest significance: GNP per Capita, Population Growth, Fertility Rate, AIDS, Tuberculosis,
School Enrollment Rate, Access to Safe Water, Forest and Woodlands, and Rate of Deforestation.
All have p-values less than 0.020. It is interesting to note that of the four variables that were
significant in the single regression, the two variables, fertility rate and school enrollment rate,
are also found to be significant in the final multiple regression. The other two variables,
literacy rate and contraception prevalence, were removed due to the existence of
multicollinearities.
Using the indicators obtained from the stepwise regression, a final standard least square
regression was performed below using JMPIN:
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
20
Summary of Fit Rsquare 0.876878RSquare Adj 0.869019Root Mean Square Error 3.898823Mean of Response 65.21192Observations (or Sum Wgts) 151
Whole Model Actual by Predicted Plot
30
40
50
60
70
80
Life
_Exp
Act
ual
30 40 50 60 70 80Life_Exp Predicted P<.0001 RSq=0.88RMSE=3.8988
Parameter Estimates Term Estimate Std Error t Ratio Prob>|t|Intercept 62.924353 2.610737 24.10 <.0001GNPperCap 0.0001095 0.000044 2.51 0.0131Pop _Growth 1.4555274 0.365942 3.98 0.0001Fertility -3.623246 0.330223 -10.97 <.0001AIDS -0.066892 0.015406 -4.34 <.0001TB -0.016498 0.004053 -4.07 <.0001Enrollment 0.1662502 0.025712 6.47 <.0001Water 0.0524011 0.019257 2.72 0.0073Forest -0.035922 0.01511 -2.38 0.0188Deforest -0.557085 0.222936 -2.50 0.0136
The final regression model produces a very good fit, with an adjusted R2 value of
0.8690. This R2 differs only slightly from our initial adjusted R2 value of 0.8731. The linear
relationship between life expectancy at birth with our significant variables is:
Life Expectancy at Birth = (0.0001095 * GNP per Capita) + (1.4555274. * Annual Population Growth) + (-3.623246 * Fertility Rate) + (-0.066892 * AIDS) + (-0.016498 * Tuberculosis) + (0.1662502 * Enrollment Rate) + (0.0524011 * Access to Safe Water) + (-0.035922 * Forest and Woodlands) + (-0.557085 * Deforestation Rate) These significant indicators seem reasonable, and the sign of the coefficients are further clues
as to logical interpretations of the variables. For instance, GNP per capita exhibits a positive
influence on life expectancy, as validates our hypothesis that the wealthier a country's citizens
are, the longer they can expect to live because they can afford better medical treatment, and
afford the conveniences to lead more comfortable lives. The positive coefficient before
Population Growth, at first seems misplaced since Fertility Rate has a negative influence on
life expectancy. However, population growth is not necessarily a negative feature. If a nation
can increase its GNP faster than its population growth, then per capita GNP has risen, and
the nation can likely support a larger population.
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
22
E. Gauss-Markov Assumptions:
Our least square estimators (βi, i=1..n) have minimum variance among all possible
linear unbiased estimators if and only if our model abides by the Gauss-Markov Assumptions.
If these assumptions are violated, then our model is unlikely to be the most appropriate.
These assumptions can be stated as:
(i) E(εi) = 0, i = 1,…,n Linearity Assumption
(ii) Var (εi) = σ2, i = 1, …, n Homoscedasticity Assumption
(iii) (εi, . . . , εn) mutually independent Independence Assumption
The linearity assumption can be checked by examining the plot of the residuals, from
which we notice that there is no clear curvilinear patterns which suggest that certain nonlinear
transformations of the y's or x's might provide a better fit. If we have non-linearities they should
be transformed, but none were observed.
The second assumption can be verified by examining the uniformity of the residual
pattern. The residual plot does not show any significant trend of increasing variance, and
actually seems to be less scattered at higher values of x. From the plot of the residuals below can
see that there is residuals are uniformly distributed without an obvious pattern of dispersion to
suggest heteroscedasticity.
Finally, the independence assumption is reasonable since each data point is a unique
country. While there is mobility between countries, for instance in the European Union, cross-
border migrations represent such a small percentage of the total population that independence of
the residuals is a reasonable assumption.
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
23
Residual by Predicted Plot
-15
-10
-5
0
5
10
Life
_Exp
Res
idua
l30 40 50 60 70 80
Life_Exp Predicted
In addition to the Gauss-Markov assumptions, the residuals of the model must be
normally distributed. A normal quantile plot shows that the residuals are normally distributed.
-20
-15
-10
-5
0
5
10 .01 .05.10 .25 .50 .75 .90.95 .99
-2 -1 0 1 2 3
Normal Quantile Plot
Hence our regression residuals appear to be consistent with all the assumptions of the linear model.
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
24
F. Predictions Using the Regression Models
Next, we tested how accurately life expectancy at birth of several random nations
could be forecasted using our model. We constructed prediction bands for these individually
predicted values to see how well the model holds and determine if in fact we can predict the
average life expectancy of an citizen based on the characteristics of the nation.
The data for three random countries, Belgium, Paraguay, and Chad, were excluded
from our regression model for the purpose of testing how well our final regression model
predicts life expectancy. Excluding these countries did not significantly affect our regression
since our sample size was so large. The data of our significant factors for these countries are as
follows:
Country GNP per Cap
Pop. Growth
Fertility
AIDS TB Enroll.
Water Forest Deforest
Belgium 24710 0.19 1.62 1.45 16 86 100 2.35 0 Paraguay
1690 2.92 1.2 0.98 36.72 63 60 29.01 -2.6
Chad 180 2.39 5.7 18.98 50.29 27 37 8.76 -0.8
90% prediction interval for these value were formed using the equation:
Ypredicted ± tαααα/ 2, n – (k+1) * {s2 + (estimated SD of Ypredicted )2}1/2
where Ypredicted is the value the predicted value for life expectancy, s2 is the root mean square
error, “the estimated SD of Ypredicted ” is the standard error of the predicted formula, and tα/ 2, n
– (k+1) is the standard area under a normal t-distribution for a two sided prediction interval of
100(1-�)%. The values for Ypredicted and estimated SD of Ypredicted were specific to each country,
and were found when plugging in the above data into our prediction formula. The value for
s2 was 3.89882. The value for tα/ 2, n – (k+1), for a prediction interval of 90% and an sample size
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
25
of 146 countries with 9 significant variables used, was t0.1/ 2, 146 – (9+1) = 1.6561. Our
prediction intervals for our three randomly chosen countries are thus:
Country Recorded Life Expectancy
Predicted Life Expectancy
Lower 90% PI
Upper 90% PI
Standard Error of Pred Formula
Belgium 76.9 79.1297986 72.5074153 85.7521818 0.88812337 Paragua
y 69.1 64.9150227 58.354333 71.4757124 0.70159494
Chad 47.2 49.996682 43.3709749 56.6223892 0.89711639
As can be observed from the values above, our final regression model predicts life
expectancy for different nations very accurately. The actual recorded values for all three
randomly selected nations fell within the 90% prediction intervals. This further highlights the
ability of our regression model to forecast life expectancy given the nine parameters.
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
26
G. Does geography play a significant role in life expectancy?
Life expectancy reflects the overall health of a nation’s citizens, and so it can be a
significant factor in determining what region of the world to settle. We have determined with
our final multiple regression model which factors are significantly related to life expectancy.
However, it may be difficult to find a region where these factors are all significant. If we
break down our analysis into regions of the world, we may be able to answer a more
appropriate question: Does the decision of which geographic region we choose to live in
significantly influence our life expectancy at birth? We developed a hypothesis-testing
framework in order to analyze this query.
Firstly, our sample of 146 countries is split up based upon geographic region. These
five regions are Asia & the Pacific Islands, Africa & the Middle East, Europe, North &
Central America & the Caribbean, and finally South America. The list of the countries in
each geographical region can be found in Appendix B. The following mean life expectancy,
standard deviation, and number of data points (n) are then observed:
Country Mean Life Expectancy
Standard Deviation N
Asia & Pacific 65.92963 7.303783 27 Africa & Middle East
56.12000 10.13488 59
Europe 73.03243 4.88552 35 North America 71.48235 5.47085 17 South America 68.65000 4.58049 12
Setting �1 and �2 as the life expectancies of two randomly selected regions of the world,
where a difference in life expectancies of 3 years is considered to be significant, we form the
following null and alternative hypothesis.
H0: �1 – �2 = 3
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
27
Ha: �1 – �2 > 3
Therefore, if the null hypothesis is rejected, there is a significant relationship, of more than
three years, between which the region of the world we choose to live in. Our type I error
would be reporting a significant relationship between regions when none exist, which would
be harmful to anyone taking our advice and moving to a different part of the world.
Life expectancies of all regions are assumed to be independent random variables. Our
test statistic for the true mean difference between these two populations is:
Z12 = [(x1^hat – x2^hat) – (�1 – �2)] / sqrt (s12/n1 + s2
2/n2)
where x1^hat and x2^hat are sample means and s1 and s2 are standard deviations of their
respective countries. Because the sample size of n1 and n2 are never both greater than 40
samples, Z12 is assumed to be t-distributed. The next step in deciding whether or not the
difference in life expectancies between two different regions of the world is significant is to
calculate the degrees of freedom associated with each combination. This is done with Welch’s
approximation. It reads:
df = (s12/n1 + s2
2/n2) / [(s12/n1)2/(n1 – 1) + (s2
2/n2)2/(n2 – 1) ]
This value is used in calculating the t-value at the 95% level (�=0.05). This forms the one-
sided upper cut-off region for our rejection region.
If our standardized test statistic Z12 is greater than the t-value, it is in the rejection region.
Thus, the null hypothesis is rejected and the difference between these regions is determined to
significant. If Z12 is less than the t-value, then the null hypothesis is accepted and there is no
R t-value
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
28
significant difference in life expectancies between different regions of the world.
We performed this test for all ten possible combinations formed by testing each of the
five regions against each another. These combinations are Asia vs. Africa, Asia vs. Europe,
Asia vs. North America, Asia vs. South America, Africa vs. Europe, Africa vs. North
America, Africa vs. South America, Europe vs. North America, Europe vs. South America,
and North America vs. South America. Using the values for mean and standard deviation, we
calculate the degrees of freedom (df) and the t-values. Setting �1 – �2 = 3, according to the null
hypothesis, we find the following values for our test statistic Z12:
Countries Df from Welch’s
Df used t-values Z12
Asia vs. Africa 68.25192 68 1.66757 3.53219 Asia vs. Europe 43.11774 43 1.68107 6.19710 Asia vs. North America 40.59761 40 1.68385 4.42466 Asia vs. South America 32.40193 32 1.69388 2.96422 Africa vs. Europe 89.03372 89 1.66215 12.79253 Africa vs. North America 49.84294 49 1.67722 9.81291 Africa vs. South America 36.87789 36 1.68829 8.31379 Europe vs. North America
28.76452 28 1.70113 0.92773
Europe vs. South America
20.25733 20 1.72471 0.88676
North America vs. South America
26.10736 26 1.70561 0.08949
It can be observed from the table that the only instances where there is a significance
of less than three years in life expectancies are when the choice is between North America and
South America, North America and Europe, and Europe and South America. Otherwise,
there is a great significance in life expectancies between regions. This is especially true for
Africa, where the life expectancy has plummeted drastically due to the widespread AIDS
epidemic. The z12 statistic for any country with Africa, for the most part, always fell
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
29
dramatically into the rejection region. There was also great significance between any region
and Asia. Asia is faced with the problem of diminishing resources and an ever-growing
population. According to this test, there is no significant difference in life expectancies
between North America, South America, and Europe. However any country tested against
Africa or Asia revealed an extreme significance in life expectancy.
III. CONCLUSION A. Multiple Regression Discussion
In the final multiple regression model, the nine remaining economic indicators were:
GNP per Capita, Annual Population Growth, Fertility Rate, AIDS, Tuberculosis, Enrollment
Rate, Access to Safe Water, Forest and Woodlands, and Deforestation Rate. All of these
indicators were highly significant, with p < 0.020, and accounted for approximately 87.69 % of
the variance. The model is shown again below:
Summary of Fit Rsquare 0.876878RSquare Adj 0.869019Root Mean Square Error 3.898823Mean of Response 65.21192Observations (or Sum Wgts) 151
Parameter Estimates
Term Estimate Std Error t Ratio Prob>|t| Intercept 62.924353 2.610737 24.10 <.0001 GNPperCap 0.0001095 0.000044 2.51 0.0131 Pop _Growth 1.4555274 0.365942 3.98 0.0001 Fertility -3.623246 0.330223 -10.97 <.0001 AIDS -0.066892 0.015406 -4.34 <.0001 TB -0.016498 0.004053 -4.07 <.0001 Enrollment 0.1662502 0.025712 6.47 <.0001 Water 0.0524011 0.019257 2.72 0.0073 Forest -0.035922 0.01511 -2.38 0.0188 Deforest -0.557085 0.222936 -2.50 0.0136
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
30
Based on the model, life expectancy is positively correlated to GNP per capita,
population growth, fertility, enrollment, and access to safe water and negatively correlated to
AIDS, tuberculosis, forest and woodland percentage, and rate of deforestation. The soundness
of these results was discussed in detail following the stepwise regression. To improve the
model, and increase the its explanatory capability (increase R2), we might consider additional
variables that were excluded such as ethnicity and diet.
Because these variables were not scaled to the same units, their coefficients cannot be
compared to evaluate the relative influence of each factor on life expectancy. However, it was
surprising that variables such as fertility rate were more significant than AIDS and access to
clean water, which would be assumed to have greater consequence and more immediate
impact life expectancy.
Multicollinearity was substantial in our preliminary multiple regression since we chose
to use categorical factors. Refining the data did not decrease the R2 value significantly, and the
adjusted R2 is barely touched, since many of the eliminating variables with multicollinearities
helped improve the significance of the remaining variables.
Of the variables deemed “significant” from single regression analysis, only fertility and
enrollment remained in our final set of explanatory variables. Contraceptive prevalence and
literacy rate was removed because of high correlation with school enrollment.
Life expectancy was predicted very well for countries not included in our data set.
Using the parameters for Belgium, Chad and Paraguay, we tested our model and found that
the true values of life expectancy fell within 90% prediction bands. It is not surprising that
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
31
given all of the measures of economic and social development we employed, a successful
prediction model was achieved.
Finally, we investigated whether life expectancy was appreciably (more than 3 years)
influenced by geography by comparing the life expectancies across five continents. From our
hypothesis tests, we can conclude at the 95% level that geography plays a significant role in
determining discrepancies of more than three years in life expectancy in most countries. This
is not surprising considering the mean life expectancy in Africa is drastically lower than its
closest comparable, Asia. This underscores that life expectancy is meaningfully determined by
economic, sanitation, and illness—and that to improve the life expectancy in developing
regions we must assist them in these critical issues.
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
32
APPENDIX A
Single Regressions of Chosen Social and Economic Variables against Life Expectancy
Bivariate Fit of Life_Exp By Cntry _Dev
30
40
50
60
70
80
Life_Exp
-0.1 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1 1.1Cntry _Dev
Linear Fit
Linear Fit Life_Exp = 62.452066 + 13.891267 Cntry _Dev Summary of Fit RSquare 0.26648RSquare Adj 0.261557Root Mean Square Error 9.257363Mean of Response 65.21192Observations (or Sum Wgts) 151
Bivariate Fit of Life_Exp By GNPperCap
30
40
50
60
70
80
Life
_Exp
0 10000 20000 30000 40000GNPperCap
Linear Fit
Linear Fit Life_Exp = 61.233835 + 0.0006863 GNPperCap Summary of Fit RSquare 0.34523RSquare Adj 0.340835Root Mean Square Error 8.746329Mean of Response 65.21192Observations (or Sum Wgts) 151
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
33
Bivariate Fit of Life_Exp By GNP_Grow
30
40
50
60
70
80Li
fe_E
xp
-10 -5 0 5 10 15GNP_Grow
Linear Fit
Linear Fit Life_Exp = 64.668998 + 0.6498195 GNP_Grow Summary of Fit RSquare 0.034885RSquare Adj 0.028408Root Mean Square Error 10.61869Mean of Response 65.21192Observations (or Sum Wgts) 151
Bivariate Fit of Life_Exp By GDP_PPP
30
40
50
60
70
80
Life
_Exp
0 5000 10000 15000 20000 25000 30000 GDP_PPP
Linear Fit
Linear Fit Life_Exp = 57.912464 + 0.0010321 GDP_PPP Summary of Fit RSquare 0.491018RSquare Adj 0.487602Root Mean Square Error 7.71139Mean of Response 65.21192Observations (or Sum Wgts) 151
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
34
Bivariate Fit of Life_Exp By Inflation
30
40
50
60
70
80Li
fe_E
xp
-100 0 100 300 500 700 900 1100 Inflation
Linear Fit
Linear Fit Life_Exp = 65.599734 - 0.0095838 Inflation Summary of Fit RSquare 0.013599RSquare Adj 0.006979Root Mean Square Error 10.73515Mean of Response 65.21192Observations (or Sum Wgts) 151
Bivariate Fit of Life_Exp By Pop _Growth
30
40
50
60
70
80
Life
_Exp
-1 0 1 2 3 4 5 6 7 8 9 10Pop _Growth
Linear Fit
Linear Fit Life_Exp = 73.030995 - 3.892267 Pop _Growth Summary of Fit RSquare 0.207213RSquare Adj 0.201892Root Mean Square Error 9.624089Mean of Response 65.21192Observations (or Sum Wgts) 151
Bivariate Fit of Life_Exp By Urban_Pop
30
40
50
60
70
80
Life
_Exp
0 10 20 30 40 50 60 70 80 90 100 110Urban_Pop
Linear Fit Linear Fit Life_Exp = 46.983506 + 0.337816 Urban_Pop Summary of Fit RSquare 0.54625RSquare Adj 0.543205Root Mean Square Error 7.280977Mean of Response 65.21192Observations (or Sum Wgts) 151
Bivariate Fit of Life_Exp By Urban_Grow
30
40
50
60
70
80
Life
_Exp
0 1 2 3 4 5 6 7 8 9 10 11 12 13Urban_Grow
Linear Fit Linear Fit Life_Exp = 75.548225 - 2.8657656 Urban_Grow Summary of Fit RSquare 0.405664RSquare Adj 0.401676Root Mean Square Error 8.332919Mean of Response 65.21192Observations (or Sum Wgts) 151
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
35
Bivariate Fit of Life_Exp By Health_Exp
30
40
50
60
70
80
Life
_Exp
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14Health_Exp
Linear Fit
Linear Fit Life_Exp = 58.806771 + 1.6066072 Health_Exp Summary of Fit RSquare 0.14592RSquare Adj 0.140188Root Mean Square Error 9.989195Mean of Response 65.21192Observations (or Sum Wgts) 151
Bivariate Fit of Life_Exp By Fertility
30
40
50
60
70
80
Life
_Exp
1 2 3 4 5 6 7 8Fertility
Linear Fit Linear Fit Life_Exp = 83.136123 - 4.9303312 Fertility Summary of Fit RSquare 0.709041RSquare Adj 0.707089Root Mean Square Error 5.830381Mean of Response 65.21192Observations (or Sum Wgts) 151
Bivariate Fit of Life_Exp By Physician
30
40
50
60
70
80
Life
_Exp
0 100 200 300 400 500 600Physician
Linear Fit
Linear Fit Life_Exp = 58.374773 + 0.0525012 Physician Summary of Fit RSquare 0.413375RSquare Adj 0.409438Root Mean Square Error 8.278692Mean of Response 65.21192Observations (or Sum Wgts) 151
Bivariate Fit of Life_Exp By Contracep
30
40
50
60
70
80
Life
_Exp
0 10 20 30 40 50 60 70 80 90Contracep
Linear Fit
Linear Fit Life_Exp = 48.252039 + 0.3680572 Contracep Summary of Fit RSquare 0.724073RSquare Adj 0.722222Root Mean Square Error 5.677773Mean of Response 65.21192Observations (or Sum Wgts) 151
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
36
Bivariate Fit of Life_Exp By AIDS
30
40
50
60
70
80
Life
_Exp
0 100AIDS
Linear Fit
Linear Fit Life_Exp = 66.430624 - 0.1237245 AIDS Summary of Fit RSquare 0.062401RSquare Adj 0.056108Root Mean Square Error 10.46622Mean of Response 65.21192Observations (or Sum Wgts) 151
Bivariate Fit of Life_Exp By TB
30
40
50
60
70
80
Life
_Exp
0 100 200 300 400 500 600 700TB
Linear Fit
Linear Fit Life_Exp = 69.204454 - 0.056188 TB Summary of Fit RSquare 0.204279RSquare Adj 0.198939Root Mean Square Error 9.641878Mean of Response 65.21192Observations (or Sum Wgts) 151
Bivariate Fit of Life_Exp By Radios
30
40
50
60
70
80
Life
_Exp
0 500 1000 1500 2000Radios
Linear Fit
Linear Fit Life_Exp = 57.273914 + 0.0208234 Radios Summary of Fit RSquare 0.408865RSquare Adj 0.404898Root Mean Square Error 8.31045Mean of Response 65.21192Observations (or Sum Wgts) 151
Bivariate Fit of Life_Exp By TVs
30
40
50
60
70
80
Life
_Exp
0 100 200 300 400 500 600 700 800TVs
Linear Fit
Linear Fit Life_Exp = 55.723317 + 0.0456474 TVs Summary of Fit RSquare 0.594882RSquare Adj 0.592163Root Mean Square Error 6.879744Mean of Response 65.21192Observations (or Sum Wgts) 151
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
37
Bivariate Fit of Life_Exp By News
30
40
50
60
70
80Li
fe_E
xp
0 100 200 300 400 500 600News
Linear Fit
Linear Fit Life_Exp = 62.075193 + 0.0451547 News Summary of Fit RSquare 0.270452RSquare Adj 0.265556Root Mean Square Error 9.232265Mean of Response 65.21192Observations (or Sum Wgts) 151
Bivariate Fit of Life_Exp By Phone
30
40
50
60
70
80
Life
_Exp
0 100 200 300 400 500 600 700Phone
Linear Fit
Linear Fit Life_Exp = 58.911549 + 0.0408865 Phone Summary of Fit RSquare 0.507384RSquare Adj 0.504078Root Mean Square Error 7.586395Mean of Response 65.21192Observations (or Sum Wgts) 151
Bivariate Fit of Life_Exp By Elec_ Con
30
40
50
60
70
80
Life
_Exp
0 5000 10000 15000 20000 25000Elec_ Con
Linear Fit
Linear Fit Life_Exp = 60.93497 + 0.0014917 Elec_ Con Summary of Fit RSquare 0.337822RSquare Adj 0.333378Root Mean Square Error 8.795666Mean of Response 65.21192Observations (or Sum Wgts) 151
Bivariate Fit of Life_Exp By Comm_ Energy
30
40
50
60
70
80
Life
_Exp
-1000 1000 3000 5000 7000 9000 11000Comm_ Energy
Linear Fit
Linear Fit Life_Exp = 60.376935 + 0.0027738 Comm_ Energy Summary of Fit RSquare 0.351126RSquare Adj 0.346772Root Mean Square Error 8.706856Mean of Response 65.21192Observations (or Sum Wgts) 151
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
38
Bivariate Fit of Life_Exp By Literacy
30
40
50
60
70
80
Life
_Exp
10 20 30 40 50 60 70 80 90 100Literacy
Linear Fit
Linear Fit Life_Exp = 35.894063 + 0.3776334 Literacy Summary of Fit RSquare 0.632879RSquare Adj 0.630415Root Mean Square Error 6.549164Mean of Response 65.21192Observations (or Sum Wgts) 151
Bivariate Fit of Life_Exp By Enrollment
30
40
50
60
70
80
Life
_Exp
10 20 30 40 50 60 70 80 90 100 110Enrollment
Linear Fit
Linear Fit Life_Exp = 37.677401 + 0.4377922 Enrollment Summary of Fit RSquare 0.602228RSquare Adj 0.599559Root Mean Square Error 6.817079Mean of Response 65.21192Observations (or Sum Wgts) 151
Bivariate Fit of Life_Exp By Water
30
40
50
60
70
80
Life
_Exp
20 30 40 50 60 70 80 90 100 110Water
Linear Fit
Linear Fit Life_Exp = 41.68171 + 0.3148881 Water Summary of Fit RSquare 0.396135RSquare Adj 0.392083Root Mean Square Error 8.399454Mean of Response 65.21192Observations (or Sum Wgts) 151
Bivariate Fit of Life_Exp By Sanitation
30
40
50
60
70
80
Life
_Exp
0 10 20 30 40 50 60 70 80 90 100 110Sanitation
Linear Fit
Linear Fit Life_Exp = 47.661137 + 0.2674938 Sanitation Summary of Fit RSquare 0.462043RSquare Adj 0.458432Root Mean Square Error 7.927845Mean of Response 65.21192Observations (or Sum Wgts) 151
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
39
Bivariate Fit of Life_Exp By Forest
30
40
50
60
70
80
Life
_Exp
-10 0 10 20 30 40 50 60 70 80 90 100Forest
Linear Fit
Linear Fit Life_Exp = 65.593862 - 0.0146547 Forest Summary of Fit RSquare 0.000894RSquare Adj -0.00581Root Mean Square Error 10.80406Mean of Response 65.21192Observations (or Sum Wgts) 151
Bivariate Fit of Life_Exp By Deforest
30
40
50
60
70
80
Life
_Exp
-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4Deforest
Linear Fit
Linear Fit Life_Exp = 65.793193 + 1.0916929 Deforest Summary of Fit RSquare 0.024145RSquare Adj 0.017595Root Mean Square Error 10.67761Mean of Response 65.21192Observations (or Sum Wgts) 151
Bivariate Fit of Life_Exp By CO2
30
40
50
60
70
80
Life
_Exp
0 5 10 15 20 25 30CO2
Linear Fit
Linear Fit Life_Exp = 60.587316 + 1.0235894 CO2 Summary of Fit RSquare 0.305987RSquare Adj 0.301329Root Mean Square Error 9.004616Mean of Response 65.21192Observations (or Sum Wgts) 151
APPENDIX B ASIA & PACIFIC ISLAND
1. Bangladesh 2. Cambodia 3. China 4. Fiji 5. Hong Kong 6. India 7. Indonesia 8. Japan 9. Kazakhstan 10. Korea, Dem 11. Lao, PDR 12. Malaysia 13. Maldives 14. Mongolia 15. Myanmar 16. Nepal 17. Pakistan 18. Papua New Guinea 19. Philippines 20. Russian Federation 21. Samoa (Western) 22. Singapore 23. Sri Lanka 24. Thailand 25. Turkmenistan 26. Uzbekistan 27. Vietnam
AFRICA & MIDDLE EAST
1. Algeria 2. Angola 3. Bahrain 4. Benin 5. Botswana 6. Burkina Faso 7. Burundi 8. Cameroon 9. Central African Republic 10. Chad 11. Comoros 12. Congo 13. Cote d’Ivoire
14. Dem. Rep. Of Congo 15. Egypt 16. Eritrea 17. Ethiopia 18. Fiji 19. Gabon 20. Gambia 21. Ghana 22. Guinea 23. Guinea-Bissau 24. Iran 25. Iraq 26. Israel 27. Jordan 28. Kenya 29. Kuwait 30. Lebanon 31. Lesotho 32. Libya 33. Madagascar 34. Malawi 35. Mali 36. Mauritania 37. Mauritius 38. Morocco 39. Mozambique 40. Namibia 41. Niger 42. Nigeria 43. Oman 44. Saudi Arabia 45. Senegal 46. Sierra Leone 47. South Africa 48. Sudan 49. Swaziland 50. Syrian 51. Tanzania 52. Togo 53. Tunisia 54. Uganda 55. United Arab Emigrants 56. Yemen
SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING
1
57. Zambia 58. Zimbabwe
EUROPE
1. Albania 2. Armenia 3. Austria 4. Belgium 5. Croatia 6. Czech Republic 7. Denmark 8. Estonia 9. Finland 10. France 11. Georgia 12. Germany 13. Greece 14. Hungary 15. Iceland 16. Ireland 17. Italy 18. Latvia 19. Lithuania 20. Luxembourg 21. Macedonia 22. Malta 23. Netherlands 24. Norway 25. Poland 26. Portugal 27. Romania 28. Slovakia 29. Slovenia 30. Spain 31. Sweden 32. Switzerland 33. Turkey 34. Ukraine
35. United Kingdom
NORTH AMERICA, CENTRAL AMERICA, & THE CARIBEAN
1. Bahamas 2. Barbados 3. Belize 4. Canada 5. Costa Rica 6. Cuba 7. Dominica 8. Dominican Republic 9. El Salvador 10. Guatemala 11. Haiti 12. Honduras 13. Jamaica 14. Mexico 15. Nicaragua 16. Panama 17. Trinidad and Tobago
SOUTH AMERICA
1. Argentina 2. Bolivia 3. Brazil 4. Chile 5. Columbia 6. Ecuador 7. Guyana 8. Paraguay 9. Peru 10. Suriname 11. Uruguay 12. Venezuela