41
A STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor Tony Smith December 18, 2000 Miranda Chen Michael Ching

A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

  • Upload
    dokiet

  • View
    220

  • Download
    3

Embed Size (px)

Citation preview

Page 1: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

A STATISTICAL ANALYSIS OF LIFE

EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION

Sys 302 Project Professor Tony Smith

December 18, 2000

Miranda Chen Michael Ching

Page 2: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

1

TABLE OF CONTENTS

I. INTRODUCTION A. Explanation of Chosen Economic and Social Variables B. Assumptions on the Regression Model C. Summary of Findings

II. ANALYSIS A. Single Regression Models of Life Expectancy Against Economic

and Social Variables B. Initial Multiple Regression C. Test for Multicollinearity D. Choosing Significant Variables Using Mixed Stepwise Regression E. Test of the Gauss-Markov Assumptions F. Predictions Using the Regression Models G. Does geography play a significant role in Life Expectancy?

III. CONCLUSION A. Multiple Regression Discussion

IV. SUPPLEMENTS A. Appendix A – Singles Regression Models B. Appendix B – Country Listing

Page 3: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

2

I. INTRODUCTION

In the early sixteenth century, a Spanish explorer by the name of Ponce de Leon made

it his life’s mission to find the mystical “fountain of youth,” a famous spring, the waters of

which had the marvelous virtue of restoring youth and vigor to those who drank it. De Leon

embarked on his quest for the legendary springs, but instead landed in Florida on March 27th

1513, Easter Sunday, but to his dismay, found no Fons Juventutisn.

The quest to prolong our youth continues today, though not through lengthy field

explorations, but through improvements in health, nutrition, and medicine. A healthy diet,

regular exercise, and vaccinations can greatly improve an individual’s life expectancy, while an

outbreak of disease, malnutrition, and social unrest can drastically lower an individual’s life

expectancy.

But how are life expectancies affected on a national level? While these factors are

central to living longer, they alone cannot be the only facets. The social and economic

conditions of each country will undoubtedly affect its citizens, their lifestyles and decisions.

Citizens of wealthier countries have access to modern medicine and medical facilities, the

leisure to exercise, and meticulous regulation of sanitation and drinking water. Their life

expectancies, therefore, naturally should be higher than those of less developed countries.

However, this is not always the case. According to the World Health Organization (WHO),

the United State of America ranked 24th overall in terms of life expectancy among all

countries in the year 2000. Japan, Australia, France, Sweden, Spain, Italy, Greece, and

Switzerland, all ranked above the more developed United States.

Page 4: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

3

It behooves us then to ask, what social and economic factors contribute most

significantly in determining life expectancy at birth? Do factors that establish a higher

standard of living necessarily lead to a longer lifespan? Can a model consisting of these

significant factors be constructed to accurately forecast life expectancy? And lastly, does the

decision of which geographic region we live in have a significant influence upon our life

expectancy?

A. Definitions and Explanation of Chosen Economic and Social Variables

Life expectancy is a qualitative reflection on the quality of life in a country, since

individuals can hope to live longer, fuller lives. It is an estimate of an individual's life span

derived from averaging the age all individuals who die in a particular year. Life expectancy

goes beyond per capita GNP, or literacy and education attainment in measuring the physical

well being of a person.

There are two basic life expectancy tables, one which projects average years of life

remaining for an individual of a given age and the other the average number of years of life for

all persons born in a given year. For our study, we have chosen the average life expectancy at

birth, since individuals who have survived childhood are more likely to have an extended life

span than the average member of their birth cohort, thus presenting a selection bias. We

selected economic and social variables that extended over across many different social and

economic conditions from over 140 countries, in hopes that these variables would cover as

many facets as possible and thus build an accurate model of life expectancy.

Page 5: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

4

To allow comparisons across countries and over time, all the statistical tables are based

on internationally standardized data, collected and processed by agencies of the United

Nations such as the UN Development Project, the OECD, and UNICEF. These

organizations, whether collecting data from national sources or through their own surveys,

harmonized definitions and collection methods to make the data internationally comparable,

and the countries unbiased “sampling units.”

The list below was chosen to measure as many distinct components of the quality of

life as possible. But in no way is this list exhaustive. While we would have liked to include

such factors as diet and ethnicity, inadequate data barred us from doing so. A major problem

we faced with collecting global data was that developing or underdeveloped regions lacked

comprehensive reporting on many factors. Because missing data points would reduce the

amount of points in our regression, we were forced to make compromises---to either eliminate

categories lacking data for many countries or remove a country with insufficient records

altogether. As a result, several countries we would have liked to include have been excluded,

as have several factors that would seem to influence life expectancy. Through tedious cross-

referencing between sources, we were able to build an extensive compilation with data for 146

countries.

We regressed Life Expectancy at Birth with the following economic and social variables:

• = Economy: The wealthier a country is, the more money its citizens will have to spend

on healthcare, and correspondingly, the more likely they are to have time for leisurely

activities and exercise. We chose several variants of GDP, including per capita, purchasing

Page 6: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

5

power equivalent dollars, and GDP growth, to examine the significance of the absolute

amount, comparable purchasing ability, and rate of increase in personal wealth. Inflation was

tested because high rates of inflation signify economic instability and may have distressing

effects on one's health. A dummy variable, "Country Development" was included in addition

to these variables, after plotting the distribution of life expectancy and observing what

appeared to be two distinct distributions. We hypothesized that in addition to GNP per

capita, the state of development of a country would likely be an important indicator of life

expectancy. One's were given to all "developed" countries, as classified by the UN

Development Project, and zero’s were assigned to all less developed countries.

30

40

50

60

70

80

Normal Distribution of Life Expectancy

1. GNP per capita ($US) 1995 2. GNP per capita annual growth rate (%) 1980-1995 3. Real GDP per Capita ($ Purchasing Power Parity) 1995 4. Average Annual Rate of Inflation (%) 1995 5. Country Development (1=developed, 0=underdeveloped)

• = Population Characteristics: Demographic conditions were tested to measure the urban

composition, and growth of the urban population as well as the overall population. Urban

composition estimates the percentage of individuals living in cities--centers of medicine and

modern advances, but also quarters of pollution and overcrowding. Population growth was

Page 7: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

6

considered relevant because increasing overall population can lead to shortage of resources and

decreasing prosperity, as a nation's wealth must be spread among more individuals.

6. Urban population (% of total) 1995 7. Urban population annual growth rate(%) 1970-1995 8. Annual population growth rate (%) 1970-1995

• = Health: Health factors looked at availability and spending on health related facilities

by looking at health expenditure as a percentage of GDP and the number of physicians per

100,000 people. Contraceptive prevalence signaled how aware and willing individuals were

when engaging in sexual activity.

9. Public Expenditure on Health (% of GDP) 1990 10. Physicians (per 100,000) 1993 11. Contraceptive Prevalence (%) 1990-1995 12. Fertility Rate (births per woman) 1995

• = Disease: The epidemic effects of diseases such as AIDS on life expectancy are evident.

Thus the number of tuberculosis and AIDS cases in 1995 and 1996 was chosen as possible

indicators of life expectancy.

13. AIDS (per 100,000) 1996 14. Tuberculosis (per 100,000) 1995

• = Access to Information/Technology: Access to information allows people to be aware of

their surroundings--from weather updates and medical discoveries, to outbreaks of disease and

violence. Several modes of communication were considered; they included: radios, TV's,

newspapers and phones. Per capita electric consumption and commercial energy use assessed

the prevalence of conveniences such as lighting.

15. Radios (per 1000) 1995 16. Televisions (per 1000) 1995 17. Newspapers (per 1000) 1995 18. Telephone Lines (per 1000) 1995

Page 8: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

7

19. Electricity Consumption per Capita (kwh) 1995 20. Commercial Energy Use per Capita (kg) 1994

• = Education: The more knowledge an individual has, the more he or she can make

informed life decisions, and improve his or her quality of life. Adult literacy rate and school

enrollment were recorded as good indicators of educational attainment.

21. Adult Literacy Rate (%) 1996 22. School Enrollment Rate (%) 1995 - Combined first-second and third-level

• = Environment: Environmental soundness measured in the forms of clean drinking

water, and proper sanitation is a reflection on the salutary conditions of the country. The

amount of forest and woodlands and rate of deforestation consider the amount of greenery

sand a country's dedication to preserve this, while CO2 emissions reflected air quality control.

23. Access to Safe Water (% of population) 1990-1996 24. Access to Sanitation (% of population) 1990-1996 25. Forest & Woodland (% of land area) 1995 26. Annual Rate of Deforestation (%) 1990-1995 27. CO2 Emissions per Capita (Metric tons) 1995

B. Assumptions on the Regression Model

1. It is assumed that our chosen economic and social variables exert an observable and

significant influence on life expectancy at birth of all nations. The relationship

between life expectancy and these variables is assumed to be linear and subject to

random error.

2. As much as we would have loved to collect data across all nations of the world, this

data was not readily available. Although finding most country data was simple, at

times it downright tedious. Several countries do not release the statistics of factors that

affect their economic and social conditions. When performing a multiple regression

Page 9: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

8

analysis, any category that contains missing data will be left out of the data set. Thus,

data from all countries of the world could not be gathered and therefore were excluded

from the regression. Although this would have ensured the most complete regression

model, we assume our sample of 146 countries is a good reflection of this overall world

population, and that variables significant in our model will also apply to all nations of

the world.

3. Once again due to the inconsistencies with the data, the values we collected for our

variables were inconsistent. Although most of the data was recorded in the year 1995,

this was not always the case. It is far too expensive for data to be collected in every

nation during every year. Therefore, several of our variables were collected during

different year, and several collected over a span of several years. However, never was

data collected more than two years before or after 1995. It is assumed that extreme

fluctuations in the social and economic conditions did not occur during these years,

and thus this data set is appropriate.

4. Analysis of the Gauss-Markov assumptions will be performed in order to examine

whether the Gauss-Markov model is appropriate. To begin the regression, the Gauss-

Markov model was first assumed to be applicable.

C. Summary of Findings

We began our experiment with 27 variables reflecting the diverse components of life in

a country. Our preliminary regression with the 27 variables had an adjusted R2=0.8732.

However, due to a high-degree of multicollinearity, the significance of the variables were

undermined. The model was refined through a two-stages of deduction. We remedied the

Page 10: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

9

problem by eliminating all variables that had a correlation higher than ± 0.8 from our

regression. This process removed variables such as: GDP in PPP, telephones lines, urban

growth, literacy, contraceptive prevalence rate, commercial energy use, radios, and country

development (dummy variable). By removing variables with high multicollinearity, we

increased the significance of factors such as GDP per capita, fertility, enrollment, and

population growth whose consequence was muted due to the multicollinearity problem.

With the remaining variables we then ran a mixed step-wise regression that determined which

variables were the most significant. From this we derived a final model, which explained

87.69% of the variance in life expectancy at birth:

Life Expectancy at Birth = (0.0001095 * GNP per Capita) + (1.4555274 * Annual

Population Growth) + (-3.623246 * Fertility Rate) + (-0.066892 * AIDS) + (-

0.016498 * Tuberculosis) + (0.1662502 * School Enrollment Rate) + (0.0524011 * Access

to Safe Water) + (-0.035922 * Forest and Woodlands) + (-0.557085 *Annual Rate of

Deforestation)

It is interesting to note that GNP per capita, Forest and Woodland Percentage, Deforestation,

and Access to Clean Water did not show significance in the initial multiple regression, nor in

the single regressions. Only after the removal of multicollinear variables, was their

significance realized.

We also tackled whether life expectancy was appreciably influenced by geography as

we compared the life expectancies across five continents. From our hypothesis tests, we can

conclude at the 95% level that geography plays a significant role in determining life

expectancy in most regions. The only situations when this was not the case was between

Page 11: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

10

North America and South America, North America and Europe, and South American and

Europe.

Page 12: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

11

II. ANALYSIS

A. Single Regression Models of Life Expectancy Against Economic and Social Variables

To begin our analysis, single regression models of life expectancy at birth were run

against each of our chosen economic and social indicator to obtain a graphical representation

of how well each variable could explain variances in life expectancy. These regression plots

can be found in Appendix A. These regressions, as well as many of the test found throughout

our analysis, were done using the statistical analysis software JMPIN.

The only variables that held a significant linear relationship (R2 > 0.60) with life

expectancy at birth were fertility rate, contraception prevalence, literacy rate, and enrollment

rate. Although significant in single regressions, it will be interesting to observe whether these

four variables will hold considerable weight in a multiple regression

B. Initial Multiple Regression

Where single regressions take into account the effect of one variable at a time, multiple

regressions simultaneously consider the effects of many variables. A standard least square

multiple regression was performed, plotting life expectancy against our chosen social and

economic variables. The results of our initial multiple regression are as follows:

Page 13: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

12

Whole Model Test Actual by Predicted Plot

30

40

50

60

70

80

Life_Exp Actual

30 40 50 60 70 80 Life_Exp Predicted P<.0001 RSq=0.90 RMSE=3.8383

Summary of Fit Rsquare 0.895906RSquare Adj 0.873055Root Mean Square Error 3.838272Mean of Response 65.21192Observations (or Sum Wgts) 151

Parameter Estimates Term Estimate Std Error t Ratio Prob>|t|Intercept 56.717121 3.676142 15.43 <.0001Cntry _Dev -1.937297 1.59645 -1.21 0.2273GNPperCap -0.000206 0.000153 -1.35 0.1803GNP_Grow -0.008635 0.14209 -0.06 0.9516GNP_PPP 0.0005469 0.00024 2.28 0.0244Inflation -0.002367 0.003012 -0.79 0.4335Pop _Growth 1.9108988 0.623566 3.06 0.0027Urban_Pop 0.027884 0.025919 1.08 0.2841Urban_Grow -0.269322 0.308148 -0.87 0.3838Health_Exp -0.015305 0.188232 -0.08 0.9353Fertility -2.695058 0.522805 -5.15 <.0001Physician 0.0028381 0.004525 0.63 0.5316Contracep 0.0245563 0.035684 0.69 0.4926AIDS -0.062174 0.018091 -3.44 0.0008TB -0.01239 0.004392 -2.82 0.0056Radios -0.000963 0.002038 -0.47 0.6376TVs 0.0046905 0.005499 0.85 0.3953News 0.0014127 0.006631 0.21 0.8316Phone 0.0018864 0.006537 0.29 0.7734Elec_ Con 0.0001031 0.00019 0.54 0.5883Comm_ Energy -0.000456 0.000479 -0.95 0.3423Literacy 0.0444016 0.033779 1.31 0.1911Enrollment 0.1041778 0.036738 2.84 0.0053Water 0.016521 0.024449 0.68 0.5005Sanitation 0.0190143 0.020364 0.93 0.3523Forest -0.030681 0.01685 -1.82 0.0711Deforest -0.494224 0.250565 -1.97 0.0508CO2 -0.144861 0.140921 -1.03 0.3060

This model shows a strong linear fit with an R2 value of 0.8959. This means that 89.6%

of the variance has been accounted for in our model. Therefore, we can assume that our data set

is sufficient for creating a regression model for life expectancy at birth for all nations.

Although the goodness of fit is high, not many variables exert great significance with

respect to life expectancy at birth. GNP per capital (PPP), population growth, fertility rate,

AIDS, tuberculosis, and school enrollment show relatively low p-values (<0.0250). However,

this is only 6 of our 27 variables. A few discrepancies may also exist within this data set. The

insignificance of health expenditure is surprising, as its p-value is near one. It seems

counterintuitive that healthcare expenditure does not influence life expectancy, and that there

is actually a negative relationship as indicated by its coefficient. Also surprising is that

country development, our dummy variable, is insignificant because once again, it would make

sense that a country that is more developed would be able to provide a better standard of

Page 14: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

13

living that an underdeveloped country. To ensure that the goodness of fit of the model is not

due entirely to the number of factors we chose to use, and to correct these discrepancies, this

initial multiple regression model must be refined.

C. Test for Multicollinearity

The assumption of the absence of multicollinearity is essential to the multiple regression

model. In a regression, the X-variables are assumed to be independent, but with

multicollinearity, these variables are actually correlated with one another. For example, if X1 and

X2 are highly correlated, then when we add X1 to our model, we also add a bit of X2. Thus, the

significance of both X1 and X2 are diluted. This phenomenon leads to high standard error.

Therefore, in order to refine the model, a correlation plot between life expectancy and

our indicators was performed. This plot identifies which variables are highly correlated.

Our initial multicorrelation test is shown below:

Life _Exp

Cntry _Dev

GNP perCap

GNP _Grow

GNP _PPP

Inflation

Pop _ Growth

Urban _Pop

Urban _Grow

Health _Exp Fertility Physician Contracep AIDS TB Radios TVs News Phone

Elec _ Con

Comm Energy Literacy Enrollment Water Sanitation Forest Deforest CO2

Life_Exp 1.000 0.516 0.588 0.187 0.701 -0.117 -0.455 0.739 -0.637 0.382 -0.842 0.643 0.851 -0.250 -0.452 0.639 0.771 0.520 0.712 0.581 0.593 0.796 0.776 0.629 0.680 -0.030 0.155 0.553

Cntry _Dev 0.516 1.000 0.760 0.148 0.745 -0.135 -0.543 0.430 -0.517 0.682 -0.523 0.554 0.535 -0.151 -0.297 0.642 0.771 0.806 0.817 0.680 0.549 0.449 0.506 0.392 0.421 0.021 0.301 0.423

GNPperCap 0.588 0.760 1.000 0.153 0.944 -0.150 -0.369 0.597 -0.437 0.583 -0.520 0.458 0.525 -0.099 -0.305 0.704 0.795 0.788 0.897 0.799 0.741 0.459 0.502 0.460 0.489 -0.045 0.229 0.615

GNP_Grow 0.187 0.148 0.153 1.000 0.186 0.271 -0.219 0.016 -0.080 0.103 -0.189 0.038 0.303 -0.090 -0.199 0.113 0.164 0.101 0.199 0.078 0.020 0.173 0.150 0.141 0.120 0.011 0.033 0.007

GNP_PPP 0.701 0.745 0.944 0.186 1.000 -0.174 -0.356 0.693 -0.463 0.570 -0.602 0.483 0.614 -0.086 -0.369 0.756 0.841 0.702 0.911 0.823 0.826 0.546 0.592 0.573 0.599 -0.093 0.238 0.722

Inflation -0.117 -0.135 -0.150 0.271 -0.174 1.000 0.012 -0.084 -0.014 -0.021 0.084 0.122 -0.072 -0.074 0.010 -0.128 -0.075 -0.042 -0.126 -0.083 -0.053 -0.002 -0.027 -0.202 -0.153 -0.016 0.085 -0.033

Pop _Growth -0.455 -0.543 -0.369 -0.219 -0.356 0.012 1.000 -0.283 0.814 -0.438 0.698 -0.539 -0.577 0.122 0.192 -0.532 -0.572 -0.557 -0.542 -0.330 -0.155 -0.537 -0.422 -0.254 -0.331 -0.202 -0.242 -0.085

Urban_Pop 0.739 0.430 0.597 0.016 0.693 -0.084 -0.283 1.000 -0.430 0.383 -0.620 0.604 0.605 -0.137 -0.351 0.648 0.725 0.477 0.673 0.602 0.689 0.638 0.667 0.585 0.596 -0.135 0.183 0.653

Urban_Grow -0.637 -0.517 -0.437 -0.080 -0.463 -0.014 0.814 -0.430 1.000 -0.352 0.775 -0.593 -0.691 0.223 0.333 -0.562 -0.654 -0.536 -0.592 -0.417 -0.326 -0.661 -0.535 -0.339 -0.414 -0.182 -0.195 -0.269

Health_Exp 0.382 0.682 0.583 0.103 0.570 -0.021 -0.438 0.383 -0.352 1.000 -0.417 0.493 0.404 0.082 -0.157 0.545 0.615 0.628 0.634 0.586 0.478 0.382 0.460 0.284 0.298 0.015 0.261 0.369

Fertility -0.842 -0.523 -0.520 -0.189 -0.602 0.084 0.698 -0.620 0.775 -0.417 1.000 -0.686 -0.881 0.151 0.311 -0.604 -0.752 -0.554 -0.683 -0.511 -0.509 -0.817 -0.705 -0.511 -0.600 -0.115 -0.266 -0.467

Physician 0.643 0.554 0.458 0.038 0.483 0.122 -0.539 0.604 -0.593 0.493 -0.686 1.000 0.639 -0.234 -0.318 0.473 0.664 0.571 0.606 0.472 0.461 0.641 0.584 0.393 0.455 -0.050 0.378 0.409

Contracep 0.851 0.535 0.525 0.303 0.614 -0.072 -0.577 0.605 -0.691 0.404 -0.881 0.639 1.000 -0.169 -0.358 0.590 0.727 0.543 0.676 0.510 0.483 0.811 0.765 0.536 0.594 0.027 0.163 0.420

AIDS -0.250 -0.151 -0.099 -0.090 -0.086 -0.074 0.122 -0.137 0.223 0.082 0.151 -0.234 -0.169 1.000 0.252 -0.095 -0.218 -0.175 -0.158 -0.151 -0.073 -0.061 -0.007 -0.062 -0.119 -0.033 -0.081 -0.147

TB -0.452 -0.297 -0.305 -0.199 -0.369 0.010 0.192 -0.351 0.333 -0.157 0.311 -0.318 -0.358 0.252 1.000 -0.379 -0.402 -0.262 -0.365 -0.312 -0.320 -0.293 -0.254 -0.247 -0.233 -0.023 -0.027 -0.282

Radios 0.639 0.642 0.704 0.113 0.756 -0.128 -0.532 0.648 -0.562 0.545 -0.604 0.473 0.590 -0.095 -0.379 1.000 0.815 0.644 0.804 0.666 0.644 0.575 0.628 0.482 0.502 0.039 0.149 0.558

TVs 0.771 0.771 0.795 0.164 0.841 -0.075 -0.572 0.725 -0.654 0.615 -0.752 0.664 0.727 -0.218 -0.402 0.815 1.000 0.763 0.901 0.776 0.753 0.685 0.692 0.519 0.556 -0.004 0.289 0.670

News 0.520 0.806 0.788 0.101 0.702 -0.042 -0.557 0.477 -0.536 0.628 -0.554 0.571 0.543 -0.175 -0.262 0.644 0.763 1.000 0.804 0.748 0.569 0.497 0.497 0.348 0.430 0.103 0.312 0.451

Phone 0.712 0.817 0.897 0.199 0.911 -0.126 -0.542 0.673 -0.592 0.634 -0.683 0.606 0.676 -0.158 -0.365 0.804 0.901 0.804 1.000 0.815 0.743 0.608 0.645 0.517 0.563 -0.009 0.333 0.621

Elec_ Con 0.581 0.680 0.799 0.078 0.823 -0.083 -0.330 0.602 -0.417 0.586 -0.511 0.472 0.510 -0.151 -0.312 0.666 0.776 0.748 0.815 1.000 0.841 0.484 0.539 0.439 0.454 -0.023 0.269 0.726

Comm_ Energy 0.593 0.549 0.741 0.020 0.826 -0.053 -0.155 0.689 -0.326 0.478 -0.509 0.461 0.483 -0.073 -0.320 0.644 0.753 0.569 0.743 0.841 1.000 0.491 0.526 0.475 0.480 -0.122 0.249 0.904

Literacy 0.796 0.449 0.459 0.173 0.546 -0.002 -0.537 0.638 -0.661 0.382 -0.817 0.641 0.811 -0.061 -0.293 0.575 0.685 0.497 0.608 0.484 0.491 1.000 0.826 0.552 0.574 0.177 0.215 0.457

Enrollment 0.776 0.506 0.502 0.150 0.592 -0.027 -0.422 0.667 -0.535 0.460 -0.705 0.584 0.765 -0.007 -0.254 0.628 0.692 0.497 0.645 0.539 0.526 0.826 1.000 0.552 0.611 0.046 0.243 0.486

Water 0.629 0.392 0.460 0.141 0.573 -0.202 -0.254 0.585 -0.339 0.284 -0.511 0.393 0.536 -0.062 -0.247 0.482 0.519 0.348 0.517 0.439 0.475 0.552 0.552 1.000 0.743 -0.102 0.033 0.473

Sanitation 0.680 0.421 0.489 0.120 0.599 -0.153 -0.331 0.596 -0.414 0.298 -0.600 0.455 0.594 -0.119 -0.233 0.502 0.556 0.430 0.563 0.454 0.480 0.574 0.611 0.743 1.000 -0.043 0.176 0.475

Page 15: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

14

Forest -0.030 0.021 -0.045 0.011 -0.093 -0.016 -0.202 -0.135 -0.182 0.015 -0.115 -0.050 0.027 -0.033 -0.023 0.039 -0.004 0.103 -0.009 -0.023 -0.122 0.177 0.046 -0.102 -0.043 1.000 -0.008 -0.147

Deforest 0.155 0.301 0.229 0.033 0.238 0.085 -0.242 0.183 -0.195 0.261 -0.266 0.378 0.163 -0.081 -0.027 0.149 0.289 0.312 0.333 0.269 0.249 0.215 0.243 0.033 0.176 -0.008 1.000 0.247

CO2 0.553 0.423 0.615 0.007 0.722 -0.033 -0.085 0.653 -0.269 0.369 -0.467 0.409 0.420 -0.147 -0.282 0.558 0.670 0.451 0.621 0.726 0.904 0.457 0.486 0.473 0.475 -0.147 0.247 1.000

As can be seen from the highlighted boxes, the multicollinarity test revealed a great

deal of correlation between certain categorical variables. Especially high were the amount

correlations GNP per Capita (PPP) and phones held with other variables. Per capita GNP

(PPP) was highly correlated with GNP per Capita, televisions, telephone lines, electricity

consumption, and commercial energy use. This is because ownership of televisions,

telephones, and electric appliances are consumption expenditures that increase with

prosperity. Telephone lines were highly correlated with other modes of communication

such as radio, televisions, newspapers, as well as GNP per Capita, GNP growth, average

annual rate of inflation and electricity consumption. Due to their excessive multicollinarities,

these two variables, phones and GNP (PPP), were removed. A second correlation plot was

performed without these two factors.

Life _Exp

Cntry _Dev

GNP perCap

GNP _Grow

Inflation

Pop _ Growth

Urban _Pop

Urban _Grow

Health _Exp Fertility Physician Contracep AIDS TB Radios TVs News

Elec_ Con

Comm_ Energy Literacy Enrollment Water Sanitation Forest Deforest CO2

Life_Exp 1.000 0.516 0.588 0.187 -0.117 -0.455 0.739 -0.637 0.382 -0.842 0.643 0.851 -0.250 -0.452 0.639 0.771 0.520 0.581 0.593 0.796 0.776 0.629 0.680 -0.030 0.155 0.553

Cntry _Dev 0.516 1.000 0.760 0.148 -0.135 -0.543 0.430 -0.517 0.682 -0.523 0.554 0.535 -0.151 -0.297 0.642 0.771 0.806 0.680 0.549 0.449 0.506 0.392 0.421 0.021 0.301 0.423

GNPperCap 0.588 0.760 1.000 0.153 -0.150 -0.369 0.597 -0.437 0.583 -0.520 0.458 0.525 -0.099 -0.305 0.704 0.795 0.788 0.799 0.741 0.459 0.502 0.460 0.489 -0.045 0.229 0.615

GNP_Grow 0.187 0.148 0.153 1.000 0.271 -0.219 0.016 -0.080 0.103 -0.189 0.038 0.303 -0.090 -0.199 0.113 0.164 0.101 0.078 0.020 0.173 0.150 0.141 0.120 0.011 0.033 0.007

Inflation -0.117 -0.135 -0.150 0.271 1.000 0.012 -0.084 -0.014 -0.021 0.084 0.122 -0.072 -0.074 0.010 -0.128 -0.075 -0.042 -0.083 -0.053 -0.002 -0.027 -0.202 -0.153 -0.016 0.085 -0.033

Pop _Growth -0.455 -0.543 -0.369 -0.219 0.012 1.000 -0.283 0.814 -0.438 0.698 -0.539 -0.577 0.122 0.192 -0.532 -0.572 -0.557 -0.330 -0.155 -0.537 -0.422 -0.254 -0.331 -0.202 -0.242 -0.085

Urban_Pop 0.739 0.430 0.597 0.016 -0.084 -0.283 1.000 -0.430 0.383 -0.620 0.604 0.605 -0.137 -0.351 0.648 0.725 0.477 0.602 0.689 0.638 0.667 0.585 0.596 -0.135 0.183 0.653

Urban_Grow -0.637 -0.517 -0.437 -0.080 -0.014 0.814 -0.430 1.000 -0.352 0.775 -0.593 -0.691 0.223 0.333 -0.562 -0.654 -0.536 -0.417 -0.326 -0.661 -0.535 -0.339 -0.414 -0.182 -0.195 -0.269

Health_Exp 0.382 0.682 0.583 0.103 -0.021 -0.438 0.383 -0.352 1.000 -0.417 0.493 0.404 0.082 -0.157 0.545 0.615 0.628 0.586 0.478 0.382 0.460 0.284 0.298 0.015 0.261 0.369

Fertility -0.842 -0.523 -0.520 -0.189 0.084 0.698 -0.620 0.775 -0.417 1.000 -0.686 -0.881 0.151 0.311 -0.604 -0.752 -0.554 -0.511 -0.509 -0.817 -0.705 -0.511 -0.600 -0.115 -0.266 -0.467

Physician 0.643 0.554 0.458 0.038 0.122 -0.539 0.604 -0.593 0.493 -0.686 1.000 0.639 -0.234 -0.318 0.473 0.664 0.571 0.472 0.461 0.641 0.584 0.393 0.455 -0.050 0.378 0.409

Contracep 0.851 0.535 0.525 0.303 -0.072 -0.577 0.605 -0.691 0.404 -0.881 0.639 1.000 -0.169 -0.358 0.590 0.727 0.543 0.510 0.483 0.811 0.765 0.536 0.594 0.027 0.163 0.420

AIDS -0.250 -0.151 -0.099 -0.090 -0.074 0.122 -0.137 0.223 0.082 0.151 -0.234 -0.169 1.000 0.252 -0.095 -0.218 -0.175 -0.151 -0.073 -0.061 -0.007 -0.062 -0.119 -0.033 -0.081 -0.147

TB -0.452 -0.297 -0.305 -0.199 0.010 0.192 -0.351 0.333 -0.157 0.311 -0.318 -0.358 0.252 1.000 -0.379 -0.402 -0.262 -0.312 -0.320 -0.293 -0.254 -0.247 -0.233 -0.023 -0.027 -0.282

Radios 0.639 0.642 0.704 0.113 -0.128 -0.532 0.648 -0.562 0.545 -0.604 0.473 0.590 -0.095 -0.379 1.000 0.815 0.644 0.666 0.644 0.575 0.628 0.482 0.502 0.039 0.149 0.558

TVs 0.771 0.771 0.795 0.164 -0.075 -0.572 0.725 -0.654 0.615 -0.752 0.664 0.727 -0.218 -0.402 0.815 1.000 0.763 0.776 0.753 0.685 0.692 0.519 0.556 -0.004 0.289 0.670

News 0.520 0.806 0.788 0.101 -0.042 -0.557 0.477 -0.536 0.628 -0.554 0.571 0.543 -0.175 -0.262 0.644 0.763 1.000 0.748 0.569 0.497 0.497 0.348 0.430 0.103 0.312 0.451

Elec_ Con 0.581 0.680 0.799 0.078 -0.083 -0.330 0.602 -0.417 0.586 -0.511 0.472 0.510 -0.151 -0.312 0.666 0.776 0.748 1.000 0.841 0.484 0.539 0.439 0.454 -0.023 0.269 0.726

Comm_ Energy 0.593 0.549 0.741 0.020 -0.053 -0.155 0.689 -0.326 0.478 -0.509 0.461 0.483 -0.073 -0.320 0.644 0.753 0.569 0.841 1.000 0.491 0.526 0.475 0.480 -0.122 0.249 0.904

Literacy 0.796 0.449 0.459 0.173 -0.002 -0.537 0.638 -0.661 0.382 -0.817 0.641 0.811 -0.061 -0.293 0.575 0.685 0.497 0.484 0.491 1.000 0.826 0.552 0.574 0.177 0.215 0.457

Enrollment 0.776 0.506 0.502 0.150 -0.027 -0.422 0.667 -0.535 0.460 -0.705 0.584 0.765 -0.007 -0.254 0.628 0.692 0.497 0.539 0.526 0.826 1.000 0.552 0.611 0.046 0.243 0.486

Water 0.629 0.392 0.460 0.141 -0.202 -0.254 0.585 -0.339 0.284 -0.511 0.393 0.536 -0.062 -0.247 0.482 0.519 0.348 0.439 0.475 0.552 0.552 1.000 0.743 -0.102 0.033 0.473

Sanitation 0.680 0.421 0.489 0.120 -0.153 -0.331 0.596 -0.414 0.298 -0.600 0.455 0.594 -0.119 -0.233 0.502 0.556 0.430 0.454 0.480 0.574 0.611 0.743 1.000 -0.043 0.176 0.475

Forest -0.030 0.021 -0.045 0.011 -0.016 -0.202 -0.135 -0.182 0.015 -0.115 -0.050 0.027 -0.033 -0.023 0.039 -0.004 0.103 -0.023 -0.122 0.177 0.046 -0.102 -0.043 1.000 -0.008 -0.147

Deforest 0.155 0.301 0.229 0.033 0.085 -0.242 0.183 -0.195 0.261 -0.266 0.378 0.163 -0.081 -0.027 0.149 0.289 0.312 0.269 0.249 0.215 0.243 0.033 0.176 -0.008 1.000 0.247

CO2 0.553 0.423 0.615 0.007 -0.033 -0.085 0.653 -0.269 0.369 -0.467 0.409 0.420 -0.147 -0.282 0.558 0.670 0.451 0.726 0.904 0.457 0.486 0.473 0.475 -0.147 0.247 1.000

Page 16: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

15

The table above reveals that many factors still contain multicollinearities, as denoted by

the highlighted boxes. These variables are as follows:

1. Country Development and Newspapers 2. Population Growth and Urban Population Growth 3. Literacy Rate and School Enrollment Rate 4. Electricity Consumption and Commercial Energy Use 5. CO2 Emissions and Commercial Energy Use 6. Fertility Rate and Contraception Prevalence Rate 7. Literacy Rate and Fertility Rate 8. Literacy Rate and Contraception Prevalence Rate 9. Televisions and Radios

Intuitively, these correlations make sense. If a country is developed, then the more

likely it will have the facilities, supplies, and audience needed for a newspaper to be successful.

If a nation's urban population is growing, so too will its population grow. If a nation's

school enrollment rate is high, which means that many children are receiving an education,

the country's literacy rate likewise should also be high. If a country consumes a lot of

electricity commercially, electricity consumption will also be significant. This commercial

energy use will generate a good deal of pollution, including carbon dioxide (CO2). The use of

contraceptives, such as birth pills and condoms, logically, should have a strong negative

relation to fertility rate, as then women who use contraceptives will have fewer births. If the

literacy rate is high, then a nation's population will likely have a better understanding of the

risks of sexual promiscuity, and thus a lower fertility rate and the greater use of

contraceptives. Lastly, since the use of both televisions and radios increase with prosperity, it

seems reasonable that these two mediums were highly correlated with each other.

In order to determine which of these multicollinear variables to remove, we compared

their significance based on p-values. Since not all of these variables are on the same scale, their

Page 17: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

16

coefficients cannot be directly compared to determine the relative influence of each factor.

Therefore, the absolute t-ratios of each variable were compared instead. These t-ratios were

taken from a multiple regression with the variables GNP per Capita (PPP) and telephone lines

removed. The tables of these parameter estimates can be found below:

Parameter Estimates Term Estimate Std Error t Ratio Prob>|t|Intercept 56.410473 3.742883 15.07 <.0001Cntry _Dev -0.843803 1.760322 -0.48 0.6325GNPperCap 0.0000995 0.000081 1.22 0.2237GNP_Grow 0.0551661 0.141074 0.39 0.6964Inflation -0.003138 0.003047 -1.03 0.3051Pop _Growth 1.924076 0.63214 3.04 0.0028Urban_Pop 0.0350771 0.026386 1.33 0.1861Urban_Grow -0.329277 0.311792 -1.06 0.2930Health_Exp 0.0106446 0.194974 0.05 0.9565Fertility -2.635271 0.535813 -4.92 <.0001Physician 0.0014142 0.004532 0.31 0.7555Contracep 0.0390728 0.035743 1.09 0.2764AIDS -0.055589 0.018109 -3.07 0.0026TB -0.013266 0.00442 -3.00 0.0032Radios 0.00027 0.001934 0.14 0.8892TVs 0.0063612 0.005525 1.15 0.2518News -0.005775 0.005917 -0.98 0.3309Elec_ Con 0.0001876 0.000187 1.00 0.3190Comm_ Energy -0.00029 0.000479 -0.61 0.5456Water 0.0196559 0.024743 0.79 0.4285Sanitation 0.0328903 0.019812 1.66 0.0994Forest -0.03359 0.017044 -1.97 0.0510Deforest -0.400283 0.245403 -1.63 0.1054CO2 -0.117938 0.141341 -0.83 0.4056Literacy 0.0445918 0.034292 1.30 0.1959Enrollment 0.0953554 0.037024 2.58 0.0112

For example, the t-value for school enrollment was 2.58 compared to the value of the t-ratio

for the literacy rate, which was 1.30. As a check, multiple regressions were also performed,

once with enrollment and without literacy, and once without enrollment and with literacy.

The adjusted R2 values of these regressions were then compared. The adjusted R2 of the

regression including enrollment was 0.88964, while the test including literacy yielded an

adjusted R2 of 0.86350. Thus, literacy rate was removed from our regression model. Similar

test were done with the other combinations. The following variables were also removed from

our regression model: country development, urban population growth, contraceptive

prevalence, commercial energy use, and radios. The insignificance of our dummy variable

Page 18: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

17

country development is somewhat surprising, though there does exist a logical argument for

this. Although developed, industrial countries tend to have higher life expectancies at birth

than underdeveloped countries, this is not always the case. One example, as mentioned

earlier, is the United States of America, which ranks very highly in terms of development, but

whose life expectancy is not extremely high. An observation of our data set also reveals that

there are several less developed countries that have extremely high life expectancies, such as

Jamaica, Kuwait and Singapore, which have life expectancies of 74.1, 75.4, and 77.1 years,

respectively. These discrepancies result in the removal of country development.

Our final correlation test, with country development, urban population growth,

literacy rate, contraceptive prevalence, commercial energy use, radios, telephone lines, and

GNP per Capita (PPP) removed, produced the following table:

Life _Exp

GNP perCap

GNP _Grow Inflation

Pop _ Growth

Urban _Pop

Health _Exp Fertility Physician AIDS TB TVs News

Elec _ Con Enrollment Water Sanitation Forest Deforest CO2

Life_Exp 1.000 0.588 0.187 -0.117 -0.455 0.739 0.382 -0.842 0.643 -0.250 -0.452 0.771 0.520 0.581 0.776 0.629 0.680 -0.030 0.155 0.553

GNPperCap 0.588 1.000 0.153 -0.150 -0.369 0.597 0.583 -0.520 0.458 -0.099 -0.305 0.795 0.788 0.799 0.502 0.460 0.489 -0.045 0.229 0.615

GNP_Grow 0.187 0.153 1.000 0.271 -0.219 0.016 0.103 -0.189 0.038 -0.090 -0.199 0.164 0.101 0.078 0.150 0.141 0.120 0.011 0.033 0.007

Inflation -0.117 -0.150 0.271 1.000 0.012 -0.084 -0.021 0.084 0.122 -0.074 0.010 -0.075 -0.042 -0.083 -0.027 -0.202 -0.153 -0.016 0.085 -0.033

Pop _Growth -0.455 -0.369 -0.219 0.012 1.000 -0.283 -0.438 0.698 -0.539 0.122 0.192 -0.572 -0.557 -0.330 -0.422 -0.254 -0.331 -0.202 -0.242 -0.085

Urban_Pop 0.739 0.597 0.016 -0.084 -0.283 1.000 0.383 -0.620 0.604 -0.137 -0.351 0.725 0.477 0.602 0.667 0.585 0.596 -0.135 0.183 0.653

Health_Exp 0.382 0.583 0.103 -0.021 -0.438 0.383 1.000 -0.417 0.493 0.082 -0.157 0.615 0.628 0.586 0.460 0.284 0.298 0.015 0.261 0.369

Fertility -0.842 -0.520 -0.189 0.084 0.698 -0.620 -0.417 1.000 -0.686 0.151 0.311 -0.752 -0.554 -0.511 -0.705 -0.511 -0.600 -0.115 -0.266 -0.467

Physician 0.643 0.458 0.038 0.122 -0.539 0.604 0.493 -0.686 1.000 -0.234 -0.318 0.664 0.571 0.472 0.584 0.393 0.455 -0.050 0.378 0.409

AIDS -0.250 -0.099 -0.090 -0.074 0.122 -0.137 0.082 0.151 -0.234 1.000 0.252 -0.218 -0.175 -0.151 -0.007 -0.062 -0.119 -0.033 -0.081 -0.147

TB -0.452 -0.305 -0.199 0.010 0.192 -0.351 -0.157 0.311 -0.318 0.252 1.000 -0.402 -0.262 -0.312 -0.254 -0.247 -0.233 -0.023 -0.027 -0.282

TVs 0.771 0.795 0.164 -0.075 -0.572 0.725 0.615 -0.752 0.664 -0.218 -0.402 1.000 0.763 0.776 0.692 0.519 0.556 -0.004 0.289 0.670

News 0.520 0.788 0.101 -0.042 -0.557 0.477 0.628 -0.554 0.571 -0.175 -0.262 0.763 1.000 0.748 0.497 0.348 0.430 0.103 0.312 0.451

Elec_ Con 0.581 0.799 0.078 -0.083 -0.330 0.602 0.586 -0.511 0.472 -0.151 -0.312 0.776 0.748 1.000 0.539 0.439 0.454 -0.023 0.269 0.726

Enrollment 0.776 0.502 0.150 -0.027 -0.422 0.667 0.460 -0.705 0.584 -0.007 -0.254 0.692 0.497 0.539 1.000 0.552 0.611 0.046 0.243 0.486

Water 0.629 0.460 0.141 -0.202 -0.254 0.585 0.284 -0.511 0.393 -0.062 -0.247 0.519 0.348 0.439 0.552 1.000 0.743 -0.102 0.033 0.473

Sanitation 0.680 0.489 0.120 -0.153 -0.331 0.596 0.298 -0.600 0.455 -0.119 -0.233 0.556 0.430 0.454 0.611 0.743 1.000 -0.043 0.176 0.475

Forest -0.030 -0.045 0.011 -0.016 -0.202 -0.135 0.015 -0.115 -0.050 -0.033 -0.023 -0.004 0.103 -0.023 0.046 -0.102 -0.043 1.000 -0.008 -0.147

Deforest 0.155 0.229 0.033 0.085 -0.242 0.183 0.261 -0.266 0.378 -0.081 -0.027 0.289 0.312 0.269 0.243 0.033 0.176 -0.008 1.000 0.247

CO2 0.553 0.615 0.007 -0.033 -0.085 0.653 0.369 -0.467 0.409 -0.147 -0.282 0.670 0.451 0.726 0.486 0.473 0.475 -0.147 0.247 1.000

The above grid reveals that with the removal of these 8 variables, much of the

multicollinearity problem has been remedied. Still, it is important to recognize the unfeasibility

of completely removing multicollinearity since most of the indicators are related in some way.

Although the removal of these variables will decrease R2 and thus the degree of overall

Page 19: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

18

explanation, the greater goal of this analysis is to obtain the best combination of significant

factors.

D. Choosing Significant Variables Using Mixed Stepwise Regression

In order to determine which variables are the most significant, a stepwise regression is

performed. Stepwise regression allows us to search for the best model from all possible

regressions by successively adding and subtracting variables according to their significance.

The result of our mixed stepwise regression with probability to enter set at 0.150 and

probability to leave set at 0.100 is as follows:

Stepwise Regression Control Prob to Enter 0.150 Prob to Leave 0.100 Direction: Mixed Current Estimates

SSE DFE MSE RSquare RSquare Adj Cp AIC 2143.3161 141 15.20082 0.8769 0.8690 10.25259 420.5773

Lock

Entered Parameter Estimate nDF SS "F Ratio" "Prob>F"

X X Intercept 62.9243535 1 0 0.000 1.0000 X GNPperCap 0.00010952 1 95.8992 6.309 0.0131 GNP_Grow . 1 0.03087 0.002 0.9642 Inflation . 1 13.20918 0.868 0.3531 X Pop _Growth 1.45552738 1 240.4819 15.820 0.0001 Urban_Pop . 1 40.1306 2.671 0.1044 Health_Exp . 1 2.553851 0.167 0.6834 X Fertility -3.6232459 1 1829.988 120.387 0.0000 Physician . 1 1.578741 0.103 0.7485 X AIDS -0.0668917 1 286.5572 18.851 0.0000 X TB -0.0164982 1 251.9342 16.574 0.0001 TVs . 1 10.3615 0.680 0.4110 News . 1 7.73195 0.507 0.4777 Elec_ Con . 1 0.031085 0.002 0.9641 X Enrollment 0.16625024 1 635.5259 41.809 0.0000 X Water 0.05240111 1 112.556 7.405 0.0073 Sanitation . 1 30.03489 1.990 0.1606 X Forest -0.035922 1 85.91684 5.652 0.0188 X Deforest -0.5570848 1 94.91818 6.244 0.0136 CO2 . 1 16.55525 1.090 0.2983

Step History

Step Parameter Action "Sig Prob" Seq SS RSquare Cp p1 Fertility Entered 0.0000 12342.99 0.7090 186.8 22 Urban_Pop Entered 0.0000 1332.235 0.7856 101 33 Enrollment Entered 0.0000 427.1613 0.8101 74.852 44 TB Entered 0.0000 398.1551 0.8330 50.612 55 Pop _Growth Entered 0.0006 224.7891 0.8459 37.798 6

Page 20: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

19

Step Parameter Action "Sig Prob" Seq SS RSquare Cp p6 AIDS Entered 0.0004 228.0112 0.8590 24.771 77 Water Entered 0.0019 160.9674 0.8682 16.163 88 Deforest Entered 0.0353 70.68607 0.8723 13.504 99 Forest Entered 0.0416 64.74455 0.8760 11.237 10

10 GNPperCap Entered 0.0576 55.07309 0.8792 9.6078 1111 Urban_Pop Removed 0.1044 40.1306 0.8769 10.253 10

From this mixed stepwise regression, the following variables have been found to exert the

greatest significance: GNP per Capita, Population Growth, Fertility Rate, AIDS, Tuberculosis,

School Enrollment Rate, Access to Safe Water, Forest and Woodlands, and Rate of Deforestation.

All have p-values less than 0.020. It is interesting to note that of the four variables that were

significant in the single regression, the two variables, fertility rate and school enrollment rate,

are also found to be significant in the final multiple regression. The other two variables,

literacy rate and contraception prevalence, were removed due to the existence of

multicollinearities.

Using the indicators obtained from the stepwise regression, a final standard least square

regression was performed below using JMPIN:

Page 21: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

20

Summary of Fit Rsquare 0.876878RSquare Adj 0.869019Root Mean Square Error 3.898823Mean of Response 65.21192Observations (or Sum Wgts) 151

Whole Model Actual by Predicted Plot

30

40

50

60

70

80

Life

_Exp

Act

ual

30 40 50 60 70 80Life_Exp Predicted P<.0001 RSq=0.88RMSE=3.8988

Parameter Estimates Term Estimate Std Error t Ratio Prob>|t|Intercept 62.924353 2.610737 24.10 <.0001GNPperCap 0.0001095 0.000044 2.51 0.0131Pop _Growth 1.4555274 0.365942 3.98 0.0001Fertility -3.623246 0.330223 -10.97 <.0001AIDS -0.066892 0.015406 -4.34 <.0001TB -0.016498 0.004053 -4.07 <.0001Enrollment 0.1662502 0.025712 6.47 <.0001Water 0.0524011 0.019257 2.72 0.0073Forest -0.035922 0.01511 -2.38 0.0188Deforest -0.557085 0.222936 -2.50 0.0136

The final regression model produces a very good fit, with an adjusted R2 value of

0.8690. This R2 differs only slightly from our initial adjusted R2 value of 0.8731. The linear

relationship between life expectancy at birth with our significant variables is:

Life Expectancy at Birth = (0.0001095 * GNP per Capita) + (1.4555274. * Annual Population Growth) + (-3.623246 * Fertility Rate) + (-0.066892 * AIDS) + (-0.016498 * Tuberculosis) + (0.1662502 * Enrollment Rate) + (0.0524011 * Access to Safe Water) + (-0.035922 * Forest and Woodlands) + (-0.557085 * Deforestation Rate) These significant indicators seem reasonable, and the sign of the coefficients are further clues

as to logical interpretations of the variables. For instance, GNP per capita exhibits a positive

influence on life expectancy, as validates our hypothesis that the wealthier a country's citizens

are, the longer they can expect to live because they can afford better medical treatment, and

afford the conveniences to lead more comfortable lives. The positive coefficient before

Population Growth, at first seems misplaced since Fertility Rate has a negative influence on

life expectancy. However, population growth is not necessarily a negative feature. If a nation

can increase its GNP faster than its population growth, then per capita GNP has risen, and

the nation can likely support a larger population.

Page 22: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

22

E. Gauss-Markov Assumptions:

Our least square estimators (βi, i=1..n) have minimum variance among all possible

linear unbiased estimators if and only if our model abides by the Gauss-Markov Assumptions.

If these assumptions are violated, then our model is unlikely to be the most appropriate.

These assumptions can be stated as:

(i) E(εi) = 0, i = 1,…,n Linearity Assumption

(ii) Var (εi) = σ2, i = 1, …, n Homoscedasticity Assumption

(iii) (εi, . . . , εn) mutually independent Independence Assumption

The linearity assumption can be checked by examining the plot of the residuals, from

which we notice that there is no clear curvilinear patterns which suggest that certain nonlinear

transformations of the y's or x's might provide a better fit. If we have non-linearities they should

be transformed, but none were observed.

The second assumption can be verified by examining the uniformity of the residual

pattern. The residual plot does not show any significant trend of increasing variance, and

actually seems to be less scattered at higher values of x. From the plot of the residuals below can

see that there is residuals are uniformly distributed without an obvious pattern of dispersion to

suggest heteroscedasticity.

Finally, the independence assumption is reasonable since each data point is a unique

country. While there is mobility between countries, for instance in the European Union, cross-

border migrations represent such a small percentage of the total population that independence of

the residuals is a reasonable assumption.

Page 23: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

23

Residual by Predicted Plot

-15

-10

-5

0

5

10

Life

_Exp

Res

idua

l30 40 50 60 70 80

Life_Exp Predicted

In addition to the Gauss-Markov assumptions, the residuals of the model must be

normally distributed. A normal quantile plot shows that the residuals are normally distributed.

-20

-15

-10

-5

0

5

10 .01 .05.10 .25 .50 .75 .90.95 .99

-2 -1 0 1 2 3

Normal Quantile Plot

Hence our regression residuals appear to be consistent with all the assumptions of the linear model.

Page 24: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

24

F. Predictions Using the Regression Models

Next, we tested how accurately life expectancy at birth of several random nations

could be forecasted using our model. We constructed prediction bands for these individually

predicted values to see how well the model holds and determine if in fact we can predict the

average life expectancy of an citizen based on the characteristics of the nation.

The data for three random countries, Belgium, Paraguay, and Chad, were excluded

from our regression model for the purpose of testing how well our final regression model

predicts life expectancy. Excluding these countries did not significantly affect our regression

since our sample size was so large. The data of our significant factors for these countries are as

follows:

Country GNP per Cap

Pop. Growth

Fertility

AIDS TB Enroll.

Water Forest Deforest

Belgium 24710 0.19 1.62 1.45 16 86 100 2.35 0 Paraguay

1690 2.92 1.2 0.98 36.72 63 60 29.01 -2.6

Chad 180 2.39 5.7 18.98 50.29 27 37 8.76 -0.8

90% prediction interval for these value were formed using the equation:

Ypredicted ± tαααα/ 2, n – (k+1) * {s2 + (estimated SD of Ypredicted )2}1/2

where Ypredicted is the value the predicted value for life expectancy, s2 is the root mean square

error, “the estimated SD of Ypredicted ” is the standard error of the predicted formula, and tα/ 2, n

– (k+1) is the standard area under a normal t-distribution for a two sided prediction interval of

100(1-�)%. The values for Ypredicted and estimated SD of Ypredicted were specific to each country,

and were found when plugging in the above data into our prediction formula. The value for

s2 was 3.89882. The value for tα/ 2, n – (k+1), for a prediction interval of 90% and an sample size

Page 25: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

25

of 146 countries with 9 significant variables used, was t0.1/ 2, 146 – (9+1) = 1.6561. Our

prediction intervals for our three randomly chosen countries are thus:

Country Recorded Life Expectancy

Predicted Life Expectancy

Lower 90% PI

Upper 90% PI

Standard Error of Pred Formula

Belgium 76.9 79.1297986 72.5074153 85.7521818 0.88812337 Paragua

y 69.1 64.9150227 58.354333 71.4757124 0.70159494

Chad 47.2 49.996682 43.3709749 56.6223892 0.89711639

As can be observed from the values above, our final regression model predicts life

expectancy for different nations very accurately. The actual recorded values for all three

randomly selected nations fell within the 90% prediction intervals. This further highlights the

ability of our regression model to forecast life expectancy given the nine parameters.

Page 26: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

26

G. Does geography play a significant role in life expectancy?

Life expectancy reflects the overall health of a nation’s citizens, and so it can be a

significant factor in determining what region of the world to settle. We have determined with

our final multiple regression model which factors are significantly related to life expectancy.

However, it may be difficult to find a region where these factors are all significant. If we

break down our analysis into regions of the world, we may be able to answer a more

appropriate question: Does the decision of which geographic region we choose to live in

significantly influence our life expectancy at birth? We developed a hypothesis-testing

framework in order to analyze this query.

Firstly, our sample of 146 countries is split up based upon geographic region. These

five regions are Asia & the Pacific Islands, Africa & the Middle East, Europe, North &

Central America & the Caribbean, and finally South America. The list of the countries in

each geographical region can be found in Appendix B. The following mean life expectancy,

standard deviation, and number of data points (n) are then observed:

Country Mean Life Expectancy

Standard Deviation N

Asia & Pacific 65.92963 7.303783 27 Africa & Middle East

56.12000 10.13488 59

Europe 73.03243 4.88552 35 North America 71.48235 5.47085 17 South America 68.65000 4.58049 12

Setting �1 and �2 as the life expectancies of two randomly selected regions of the world,

where a difference in life expectancies of 3 years is considered to be significant, we form the

following null and alternative hypothesis.

H0: �1 – �2 = 3

Page 27: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

27

Ha: �1 – �2 > 3

Therefore, if the null hypothesis is rejected, there is a significant relationship, of more than

three years, between which the region of the world we choose to live in. Our type I error

would be reporting a significant relationship between regions when none exist, which would

be harmful to anyone taking our advice and moving to a different part of the world.

Life expectancies of all regions are assumed to be independent random variables. Our

test statistic for the true mean difference between these two populations is:

Z12 = [(x1^hat – x2^hat) – (�1 – �2)] / sqrt (s12/n1 + s2

2/n2)

where x1^hat and x2^hat are sample means and s1 and s2 are standard deviations of their

respective countries. Because the sample size of n1 and n2 are never both greater than 40

samples, Z12 is assumed to be t-distributed. The next step in deciding whether or not the

difference in life expectancies between two different regions of the world is significant is to

calculate the degrees of freedom associated with each combination. This is done with Welch’s

approximation. It reads:

df = (s12/n1 + s2

2/n2) / [(s12/n1)2/(n1 – 1) + (s2

2/n2)2/(n2 – 1) ]

This value is used in calculating the t-value at the 95% level (�=0.05). This forms the one-

sided upper cut-off region for our rejection region.

If our standardized test statistic Z12 is greater than the t-value, it is in the rejection region.

Thus, the null hypothesis is rejected and the difference between these regions is determined to

significant. If Z12 is less than the t-value, then the null hypothesis is accepted and there is no

R t-value

Page 28: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

28

significant difference in life expectancies between different regions of the world.

We performed this test for all ten possible combinations formed by testing each of the

five regions against each another. These combinations are Asia vs. Africa, Asia vs. Europe,

Asia vs. North America, Asia vs. South America, Africa vs. Europe, Africa vs. North

America, Africa vs. South America, Europe vs. North America, Europe vs. South America,

and North America vs. South America. Using the values for mean and standard deviation, we

calculate the degrees of freedom (df) and the t-values. Setting �1 – �2 = 3, according to the null

hypothesis, we find the following values for our test statistic Z12:

Countries Df from Welch’s

Df used t-values Z12

Asia vs. Africa 68.25192 68 1.66757 3.53219 Asia vs. Europe 43.11774 43 1.68107 6.19710 Asia vs. North America 40.59761 40 1.68385 4.42466 Asia vs. South America 32.40193 32 1.69388 2.96422 Africa vs. Europe 89.03372 89 1.66215 12.79253 Africa vs. North America 49.84294 49 1.67722 9.81291 Africa vs. South America 36.87789 36 1.68829 8.31379 Europe vs. North America

28.76452 28 1.70113 0.92773

Europe vs. South America

20.25733 20 1.72471 0.88676

North America vs. South America

26.10736 26 1.70561 0.08949

It can be observed from the table that the only instances where there is a significance

of less than three years in life expectancies are when the choice is between North America and

South America, North America and Europe, and Europe and South America. Otherwise,

there is a great significance in life expectancies between regions. This is especially true for

Africa, where the life expectancy has plummeted drastically due to the widespread AIDS

epidemic. The z12 statistic for any country with Africa, for the most part, always fell

Page 29: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

29

dramatically into the rejection region. There was also great significance between any region

and Asia. Asia is faced with the problem of diminishing resources and an ever-growing

population. According to this test, there is no significant difference in life expectancies

between North America, South America, and Europe. However any country tested against

Africa or Asia revealed an extreme significance in life expectancy.

III. CONCLUSION A. Multiple Regression Discussion

In the final multiple regression model, the nine remaining economic indicators were:

GNP per Capita, Annual Population Growth, Fertility Rate, AIDS, Tuberculosis, Enrollment

Rate, Access to Safe Water, Forest and Woodlands, and Deforestation Rate. All of these

indicators were highly significant, with p < 0.020, and accounted for approximately 87.69 % of

the variance. The model is shown again below:

Summary of Fit Rsquare 0.876878RSquare Adj 0.869019Root Mean Square Error 3.898823Mean of Response 65.21192Observations (or Sum Wgts) 151

Parameter Estimates

Term Estimate Std Error t Ratio Prob>|t| Intercept 62.924353 2.610737 24.10 <.0001 GNPperCap 0.0001095 0.000044 2.51 0.0131 Pop _Growth 1.4555274 0.365942 3.98 0.0001 Fertility -3.623246 0.330223 -10.97 <.0001 AIDS -0.066892 0.015406 -4.34 <.0001 TB -0.016498 0.004053 -4.07 <.0001 Enrollment 0.1662502 0.025712 6.47 <.0001 Water 0.0524011 0.019257 2.72 0.0073 Forest -0.035922 0.01511 -2.38 0.0188 Deforest -0.557085 0.222936 -2.50 0.0136

Page 30: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

30

Based on the model, life expectancy is positively correlated to GNP per capita,

population growth, fertility, enrollment, and access to safe water and negatively correlated to

AIDS, tuberculosis, forest and woodland percentage, and rate of deforestation. The soundness

of these results was discussed in detail following the stepwise regression. To improve the

model, and increase the its explanatory capability (increase R2), we might consider additional

variables that were excluded such as ethnicity and diet.

Because these variables were not scaled to the same units, their coefficients cannot be

compared to evaluate the relative influence of each factor on life expectancy. However, it was

surprising that variables such as fertility rate were more significant than AIDS and access to

clean water, which would be assumed to have greater consequence and more immediate

impact life expectancy.

Multicollinearity was substantial in our preliminary multiple regression since we chose

to use categorical factors. Refining the data did not decrease the R2 value significantly, and the

adjusted R2 is barely touched, since many of the eliminating variables with multicollinearities

helped improve the significance of the remaining variables.

Of the variables deemed “significant” from single regression analysis, only fertility and

enrollment remained in our final set of explanatory variables. Contraceptive prevalence and

literacy rate was removed because of high correlation with school enrollment.

Life expectancy was predicted very well for countries not included in our data set.

Using the parameters for Belgium, Chad and Paraguay, we tested our model and found that

the true values of life expectancy fell within 90% prediction bands. It is not surprising that

Page 31: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

31

given all of the measures of economic and social development we employed, a successful

prediction model was achieved.

Finally, we investigated whether life expectancy was appreciably (more than 3 years)

influenced by geography by comparing the life expectancies across five continents. From our

hypothesis tests, we can conclude at the 95% level that geography plays a significant role in

determining discrepancies of more than three years in life expectancy in most countries. This

is not surprising considering the mean life expectancy in Africa is drastically lower than its

closest comparable, Asia. This underscores that life expectancy is meaningfully determined by

economic, sanitation, and illness—and that to improve the life expectancy in developing

regions we must assist them in these critical issues.

Page 32: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

32

APPENDIX A

Single Regressions of Chosen Social and Economic Variables against Life Expectancy

Bivariate Fit of Life_Exp By Cntry _Dev

30

40

50

60

70

80

Life_Exp

-0.1 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1 1.1Cntry _Dev

Linear Fit

Linear Fit Life_Exp = 62.452066 + 13.891267 Cntry _Dev Summary of Fit RSquare 0.26648RSquare Adj 0.261557Root Mean Square Error 9.257363Mean of Response 65.21192Observations (or Sum Wgts) 151

Bivariate Fit of Life_Exp By GNPperCap

30

40

50

60

70

80

Life

_Exp

0 10000 20000 30000 40000GNPperCap

Linear Fit

Linear Fit Life_Exp = 61.233835 + 0.0006863 GNPperCap Summary of Fit RSquare 0.34523RSquare Adj 0.340835Root Mean Square Error 8.746329Mean of Response 65.21192Observations (or Sum Wgts) 151

Page 33: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

33

Bivariate Fit of Life_Exp By GNP_Grow

30

40

50

60

70

80Li

fe_E

xp

-10 -5 0 5 10 15GNP_Grow

Linear Fit

Linear Fit Life_Exp = 64.668998 + 0.6498195 GNP_Grow Summary of Fit RSquare 0.034885RSquare Adj 0.028408Root Mean Square Error 10.61869Mean of Response 65.21192Observations (or Sum Wgts) 151

Bivariate Fit of Life_Exp By GDP_PPP

30

40

50

60

70

80

Life

_Exp

0 5000 10000 15000 20000 25000 30000 GDP_PPP

Linear Fit

Linear Fit Life_Exp = 57.912464 + 0.0010321 GDP_PPP Summary of Fit RSquare 0.491018RSquare Adj 0.487602Root Mean Square Error 7.71139Mean of Response 65.21192Observations (or Sum Wgts) 151

Page 34: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

34

Bivariate Fit of Life_Exp By Inflation

30

40

50

60

70

80Li

fe_E

xp

-100 0 100 300 500 700 900 1100 Inflation

Linear Fit

Linear Fit Life_Exp = 65.599734 - 0.0095838 Inflation Summary of Fit RSquare 0.013599RSquare Adj 0.006979Root Mean Square Error 10.73515Mean of Response 65.21192Observations (or Sum Wgts) 151

Bivariate Fit of Life_Exp By Pop _Growth

30

40

50

60

70

80

Life

_Exp

-1 0 1 2 3 4 5 6 7 8 9 10Pop _Growth

Linear Fit

Linear Fit Life_Exp = 73.030995 - 3.892267 Pop _Growth Summary of Fit RSquare 0.207213RSquare Adj 0.201892Root Mean Square Error 9.624089Mean of Response 65.21192Observations (or Sum Wgts) 151

Bivariate Fit of Life_Exp By Urban_Pop

30

40

50

60

70

80

Life

_Exp

0 10 20 30 40 50 60 70 80 90 100 110Urban_Pop

Linear Fit Linear Fit Life_Exp = 46.983506 + 0.337816 Urban_Pop Summary of Fit RSquare 0.54625RSquare Adj 0.543205Root Mean Square Error 7.280977Mean of Response 65.21192Observations (or Sum Wgts) 151

Bivariate Fit of Life_Exp By Urban_Grow

30

40

50

60

70

80

Life

_Exp

0 1 2 3 4 5 6 7 8 9 10 11 12 13Urban_Grow

Linear Fit Linear Fit Life_Exp = 75.548225 - 2.8657656 Urban_Grow Summary of Fit RSquare 0.405664RSquare Adj 0.401676Root Mean Square Error 8.332919Mean of Response 65.21192Observations (or Sum Wgts) 151

Page 35: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

35

Bivariate Fit of Life_Exp By Health_Exp

30

40

50

60

70

80

Life

_Exp

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14Health_Exp

Linear Fit

Linear Fit Life_Exp = 58.806771 + 1.6066072 Health_Exp Summary of Fit RSquare 0.14592RSquare Adj 0.140188Root Mean Square Error 9.989195Mean of Response 65.21192Observations (or Sum Wgts) 151

Bivariate Fit of Life_Exp By Fertility

30

40

50

60

70

80

Life

_Exp

1 2 3 4 5 6 7 8Fertility

Linear Fit Linear Fit Life_Exp = 83.136123 - 4.9303312 Fertility Summary of Fit RSquare 0.709041RSquare Adj 0.707089Root Mean Square Error 5.830381Mean of Response 65.21192Observations (or Sum Wgts) 151

Bivariate Fit of Life_Exp By Physician

30

40

50

60

70

80

Life

_Exp

0 100 200 300 400 500 600Physician

Linear Fit

Linear Fit Life_Exp = 58.374773 + 0.0525012 Physician Summary of Fit RSquare 0.413375RSquare Adj 0.409438Root Mean Square Error 8.278692Mean of Response 65.21192Observations (or Sum Wgts) 151

Bivariate Fit of Life_Exp By Contracep

30

40

50

60

70

80

Life

_Exp

0 10 20 30 40 50 60 70 80 90Contracep

Linear Fit

Linear Fit Life_Exp = 48.252039 + 0.3680572 Contracep Summary of Fit RSquare 0.724073RSquare Adj 0.722222Root Mean Square Error 5.677773Mean of Response 65.21192Observations (or Sum Wgts) 151

Page 36: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

36

Bivariate Fit of Life_Exp By AIDS

30

40

50

60

70

80

Life

_Exp

0 100AIDS

Linear Fit

Linear Fit Life_Exp = 66.430624 - 0.1237245 AIDS Summary of Fit RSquare 0.062401RSquare Adj 0.056108Root Mean Square Error 10.46622Mean of Response 65.21192Observations (or Sum Wgts) 151

Bivariate Fit of Life_Exp By TB

30

40

50

60

70

80

Life

_Exp

0 100 200 300 400 500 600 700TB

Linear Fit

Linear Fit Life_Exp = 69.204454 - 0.056188 TB Summary of Fit RSquare 0.204279RSquare Adj 0.198939Root Mean Square Error 9.641878Mean of Response 65.21192Observations (or Sum Wgts) 151

Bivariate Fit of Life_Exp By Radios

30

40

50

60

70

80

Life

_Exp

0 500 1000 1500 2000Radios

Linear Fit

Linear Fit Life_Exp = 57.273914 + 0.0208234 Radios Summary of Fit RSquare 0.408865RSquare Adj 0.404898Root Mean Square Error 8.31045Mean of Response 65.21192Observations (or Sum Wgts) 151

Bivariate Fit of Life_Exp By TVs

30

40

50

60

70

80

Life

_Exp

0 100 200 300 400 500 600 700 800TVs

Linear Fit

Linear Fit Life_Exp = 55.723317 + 0.0456474 TVs Summary of Fit RSquare 0.594882RSquare Adj 0.592163Root Mean Square Error 6.879744Mean of Response 65.21192Observations (or Sum Wgts) 151

Page 37: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

37

Bivariate Fit of Life_Exp By News

30

40

50

60

70

80Li

fe_E

xp

0 100 200 300 400 500 600News

Linear Fit

Linear Fit Life_Exp = 62.075193 + 0.0451547 News Summary of Fit RSquare 0.270452RSquare Adj 0.265556Root Mean Square Error 9.232265Mean of Response 65.21192Observations (or Sum Wgts) 151

Bivariate Fit of Life_Exp By Phone

30

40

50

60

70

80

Life

_Exp

0 100 200 300 400 500 600 700Phone

Linear Fit

Linear Fit Life_Exp = 58.911549 + 0.0408865 Phone Summary of Fit RSquare 0.507384RSquare Adj 0.504078Root Mean Square Error 7.586395Mean of Response 65.21192Observations (or Sum Wgts) 151

Bivariate Fit of Life_Exp By Elec_ Con

30

40

50

60

70

80

Life

_Exp

0 5000 10000 15000 20000 25000Elec_ Con

Linear Fit

Linear Fit Life_Exp = 60.93497 + 0.0014917 Elec_ Con Summary of Fit RSquare 0.337822RSquare Adj 0.333378Root Mean Square Error 8.795666Mean of Response 65.21192Observations (or Sum Wgts) 151

Bivariate Fit of Life_Exp By Comm_ Energy

30

40

50

60

70

80

Life

_Exp

-1000 1000 3000 5000 7000 9000 11000Comm_ Energy

Linear Fit

Linear Fit Life_Exp = 60.376935 + 0.0027738 Comm_ Energy Summary of Fit RSquare 0.351126RSquare Adj 0.346772Root Mean Square Error 8.706856Mean of Response 65.21192Observations (or Sum Wgts) 151

Page 38: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

38

Bivariate Fit of Life_Exp By Literacy

30

40

50

60

70

80

Life

_Exp

10 20 30 40 50 60 70 80 90 100Literacy

Linear Fit

Linear Fit Life_Exp = 35.894063 + 0.3776334 Literacy Summary of Fit RSquare 0.632879RSquare Adj 0.630415Root Mean Square Error 6.549164Mean of Response 65.21192Observations (or Sum Wgts) 151

Bivariate Fit of Life_Exp By Enrollment

30

40

50

60

70

80

Life

_Exp

10 20 30 40 50 60 70 80 90 100 110Enrollment

Linear Fit

Linear Fit Life_Exp = 37.677401 + 0.4377922 Enrollment Summary of Fit RSquare 0.602228RSquare Adj 0.599559Root Mean Square Error 6.817079Mean of Response 65.21192Observations (or Sum Wgts) 151

Bivariate Fit of Life_Exp By Water

30

40

50

60

70

80

Life

_Exp

20 30 40 50 60 70 80 90 100 110Water

Linear Fit

Linear Fit Life_Exp = 41.68171 + 0.3148881 Water Summary of Fit RSquare 0.396135RSquare Adj 0.392083Root Mean Square Error 8.399454Mean of Response 65.21192Observations (or Sum Wgts) 151

Bivariate Fit of Life_Exp By Sanitation

30

40

50

60

70

80

Life

_Exp

0 10 20 30 40 50 60 70 80 90 100 110Sanitation

Linear Fit

Linear Fit Life_Exp = 47.661137 + 0.2674938 Sanitation Summary of Fit RSquare 0.462043RSquare Adj 0.458432Root Mean Square Error 7.927845Mean of Response 65.21192Observations (or Sum Wgts) 151

Page 39: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

39

Bivariate Fit of Life_Exp By Forest

30

40

50

60

70

80

Life

_Exp

-10 0 10 20 30 40 50 60 70 80 90 100Forest

Linear Fit

Linear Fit Life_Exp = 65.593862 - 0.0146547 Forest Summary of Fit RSquare 0.000894RSquare Adj -0.00581Root Mean Square Error 10.80406Mean of Response 65.21192Observations (or Sum Wgts) 151

Bivariate Fit of Life_Exp By Deforest

30

40

50

60

70

80

Life

_Exp

-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4Deforest

Linear Fit

Linear Fit Life_Exp = 65.793193 + 1.0916929 Deforest Summary of Fit RSquare 0.024145RSquare Adj 0.017595Root Mean Square Error 10.67761Mean of Response 65.21192Observations (or Sum Wgts) 151

Bivariate Fit of Life_Exp By CO2

30

40

50

60

70

80

Life

_Exp

0 5 10 15 20 25 30CO2

Linear Fit

Linear Fit Life_Exp = 60.587316 + 1.0235894 CO2 Summary of Fit RSquare 0.305987RSquare Adj 0.301329Root Mean Square Error 9.004616Mean of Response 65.21192Observations (or Sum Wgts) 151

Page 40: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

APPENDIX B ASIA & PACIFIC ISLAND

1. Bangladesh 2. Cambodia 3. China 4. Fiji 5. Hong Kong 6. India 7. Indonesia 8. Japan 9. Kazakhstan 10. Korea, Dem 11. Lao, PDR 12. Malaysia 13. Maldives 14. Mongolia 15. Myanmar 16. Nepal 17. Pakistan 18. Papua New Guinea 19. Philippines 20. Russian Federation 21. Samoa (Western) 22. Singapore 23. Sri Lanka 24. Thailand 25. Turkmenistan 26. Uzbekistan 27. Vietnam

AFRICA & MIDDLE EAST

1. Algeria 2. Angola 3. Bahrain 4. Benin 5. Botswana 6. Burkina Faso 7. Burundi 8. Cameroon 9. Central African Republic 10. Chad 11. Comoros 12. Congo 13. Cote d’Ivoire

14. Dem. Rep. Of Congo 15. Egypt 16. Eritrea 17. Ethiopia 18. Fiji 19. Gabon 20. Gambia 21. Ghana 22. Guinea 23. Guinea-Bissau 24. Iran 25. Iraq 26. Israel 27. Jordan 28. Kenya 29. Kuwait 30. Lebanon 31. Lesotho 32. Libya 33. Madagascar 34. Malawi 35. Mali 36. Mauritania 37. Mauritius 38. Morocco 39. Mozambique 40. Namibia 41. Niger 42. Nigeria 43. Oman 44. Saudi Arabia 45. Senegal 46. Sierra Leone 47. South Africa 48. Sudan 49. Swaziland 50. Syrian 51. Tanzania 52. Togo 53. Tunisia 54. Uganda 55. United Arab Emigrants 56. Yemen

Page 41: A STATISTICAL ANALYSIS OF LIFE EXPECTANCY …ese302/Projects/Project_2.pdfA STATISTICAL ANALYSIS OF LIFE EXPECTANCY ACROSS COUNTRIES USING MULTIPLE REGRESSION Sys 302 Project Professor

SYSTEMS 302 FINAL PROJECT: LIFE EXPECTANCY MIRANDA CHEN & MICHAEL CHING

1

57. Zambia 58. Zimbabwe

EUROPE

1. Albania 2. Armenia 3. Austria 4. Belgium 5. Croatia 6. Czech Republic 7. Denmark 8. Estonia 9. Finland 10. France 11. Georgia 12. Germany 13. Greece 14. Hungary 15. Iceland 16. Ireland 17. Italy 18. Latvia 19. Lithuania 20. Luxembourg 21. Macedonia 22. Malta 23. Netherlands 24. Norway 25. Poland 26. Portugal 27. Romania 28. Slovakia 29. Slovenia 30. Spain 31. Sweden 32. Switzerland 33. Turkey 34. Ukraine

35. United Kingdom

NORTH AMERICA, CENTRAL AMERICA, & THE CARIBEAN

1. Bahamas 2. Barbados 3. Belize 4. Canada 5. Costa Rica 6. Cuba 7. Dominica 8. Dominican Republic 9. El Salvador 10. Guatemala 11. Haiti 12. Honduras 13. Jamaica 14. Mexico 15. Nicaragua 16. Panama 17. Trinidad and Tobago

SOUTH AMERICA

1. Argentina 2. Bolivia 3. Brazil 4. Chile 5. Columbia 6. Ecuador 7. Guyana 8. Paraguay 9. Peru 10. Suriname 11. Uruguay 12. Venezuela