Business statistcs

1

QUALITY OF LIFE IN G 20 COUNTRIES

Matteo Biagini

2

Index:

p.3 Introduction

p.5 Correlation Matrix

p.6 Regression model

p.7 Factor Analysis

p.14 Cluster Analysis

p.19 Conclusion

3

QUALITY OF LIFE IN G 20 COUNTRIES

INTRODUCTION

The aim of this research is to investigate how the quality of life in G20 countries is related to some

indicators of life quality.

Considering quality of life we refer to the general well-‐being of individuals and societies. The term

is used in a wide range of contexts, including the fields of international development, healthcare,

and politics. Standard indicators of the quality of life include not only wealth and employment, but

also the built environment, physical and mental health, education, recreation and leisure time,

and social belonging.

So among a variety of indicators we have chosen 8.

Life expectancy is a key indicator of the general health of the population. Improvements in overall

life expectancy reflect improvements in social and economic conditions, lifestyle, access to health

services and medical advances. This indicator uses estimated life expectancy at birth.

CO2 emissions and terrestrial protected areas are indicators that concern how natural

environment supports its people, economy and culture. As the population grows and economic

activity increases, more demands are placed on the natural environment. Environmental issues

impact on economic and public health issues. In fact another indicator that we have chosen is

health expenditure per capita that is very related with previous indicators.

Urban population refers to population growth and change in cities impact on the relationships

people have with others and their sense of belonging to an area.

The concept of community is fundamental to people’s overall quality of life and sense of

belonging. In fact we have chosen subsidies and other transfers like an indicator of quality of life

because these are an instrument with whom government reassign wealth among people of a

country.

Public expenditure on education provides an insight into the knowledge and skills of residents and

how they can apply these to improve their quality of life. Educational achievement is essential for

effective participation in society.

The last indicator is unemployment: a reduction of this indicator helps stimulate further

opportunities for economic growth and development within a community or nation.

4

The considered countries (G20 countries that are the richest one in the world) are: Canada,

France, Germany, Japan, Italy, Russian Federation, United States, United Kingdom, Brazil, China,

South Africa, Australia, Saudi Arabia, South Korea, Indonesia, Mexico, Turkey, Spain, Netherlands.

The source of data is the World data Bank in the section of World Development indicators(WDI).

The year chosen to extract data is 2008.

The specific software used on this project are: ·∙ Gretl(regression) ·∙ R-‐Project (factor and cluster analysis) ·∙ Microsoft Excel (data matrix elaboration, before and after using R)

We have numbered X from 1 to 8 in relation to any variable:

·∙ X1=CO2 emissions (kg per 2000 US$ of GDP)

·∙ X2=Urban population

·∙ X3=Health expenditure per capita (current US$)

·∙ X4=Life expectancy at birth, total (years)

·∙ X5=Unemployment, total (% of total labor force)

·∙ X6=Public spending on education, total (% of GDP)

·∙ X7=Subsidies and other transfers (% of expense)

·∙ X8=Terrestrial protected areas (% of total land area)

5

Correlation matrix

X1 X2 X3 X4 X5 X6 X7 X8

1,0000 0,4108 -‐0,6168 -‐0,7387 0,2370 -‐0,4123 -‐0,0290 -‐0,2151 X1

1,0000 -‐0,2571 -‐0,2300 -‐0,2166 -‐0,5982 -‐0,1159 -‐0,0277 X2

1,0000 0,6361 -‐0,2003 0,4932 0,3154 0,1806 X3

1,0000 -‐0,6507 0,2132 0,2230 0,2105 X4

1,0000 0,0424 -‐0,0984 -‐0,1525 X5

1,0000 0,0872 0,2719 X6

1,0000 0,1855 X7

1,0000 X8

We can see from the data that there is not a very high correlation, but we can run a factor analysis since there are some correlations. Using R we have found this values that refers to correlation coefficient of Pearson. So we can conclude that there is a strong correlation between X4-‐X1 and there is a moderate correlation among X1 and X6-‐X3-‐X2, between X2-‐X6, between X3 and X6-‐X4 and finally between X4-‐X5. We have considered a strong correlation if corr > 0.7 and moderate correlation if 0.3 < corr < 0.7.

6

REGRESSION MODEL

Model 1: OLS, number of observations 1-‐20 Dependent variable: Life expectancy at birth.

Coefficient Std. Error t-‐ratio p-‐value

Constant. 88,4781 8,19707 10,7939 <0,00001 ***

CO2 emissions kg per 2000 US$ of GDP .

-‐3,18062 1,18728 -‐2,6789 0,02008 **

Urban population. -‐1,19832e-‐08 8,08775e-‐09 -‐1,4817 0,16421

Health expenditure per capita.

0,00106495 0,000551237 1,9319 0,07732 *

Unemployment total.

-‐0,903724 0,206679 -‐4,3726 0,00091 ***

Public spending on education.

-‐1,75829 1,13982 -‐1,5426 0,14888

Subsidies and other transfers.

0,0396108 0,0953704 0,4153 0,68523

Terrestrial protected areas.

0,026664 0,0893965 0,2983 0,77060

R-‐squared 0,865092 R (adjusted) 0,786395 P-‐value(F) 0,000221

With the software Gretl we have run a regression of our data using OLS regression method. Analyzing R-‐squared we can conclude that the model as a whole is very good. Also P-‐value(F) is very low so it means that the model as a whole is very significant for any value of α. The dependent variable is “life expectancy at birth” and the others are independent variables. The

7

independent variables that have a significant p-‐value are: CO2 emissions, health expenditure per capita and unemployment. Since p-‐value is smaller than 0.05, we reject the null hypothesis and we affirm that the regressor CO2 emissions has a significant impact on life expectancy at birth at level 5%.. Since p-‐value is smaller than 0.1, we reject the null hypothesis and we affirm that the regressor health expenditure per capita has a significant impact on life expectancy at birth at level 10%.. Finally since p-‐value is smaller than 0.01, we reject the null hypothesis and we affirm that the regressor unemployment total has a significant impact on life expectancy at birth at level 1%. So we can conclude that if CO2 emissions increase of 1 Kg per 2000 US$ of GDP, life expectancy at birth will reduce of 3,18062 years. Another conclusion is that if health expenditure per capita increases of 1 current US$, life expectancy at birth will increase of 0,00106495 years. Finally if unemployment total will increase of 1% life expectancy at birth will reduce of -‐0,903724 years.

FACTOR ANALYSIS

In order to run a factor analysis we applied the “Principal component method” by using R. So we found these data of eigenvalues, portion of variance(total) and cumulative proportion of variance(total).

Eigenvalues Portion of variance (total)

Cumulative proportion of variance(total)

3.13602447 0.3920031 0.3920031

1.59218446 0.1990231 0.5910261

1.06125308 0.1326566 0.7236828

0.88797144 0.1109964 0.8346792

0.55766918 0.06970865 0.90438783

0.48900580 0.06112573 0.96551355

0.19844296 0.02480537 0.99031892

0.07744861 0.009681076 1.000000000

8

To select how many factors to use we considered eigenvalues> 1 applying “kaiser criterium”, so we dropped all components with eigenvalues under 1. Eigenvalue≅equivalent number of variables which the factor represents. Looking at the table we can see that with 3 eigenvalues, the factor model will explain 72.37% of total original variability.

SCREE PLOT

We can see also the results from another

point of view thanks to the scree plot. This

test puts the components in the X axis and

the corresponding eigenvalues in the Y-‐axis.

The factor loading lij is the covariance between the j-‐th common factor and the i-‐th original variable. But the chosen variables are standardized so it coincides with the correlation between the j-‐th common factor and the i-‐th original variable. In these case the minimum value is -‐1 (in case of perfect negative correlation) and the maximum value is 1 (in case of perfect positive correlation).

Comp.1 Comp.3 Comp.5 Comp.7

.PC

Variances

0.00.5

1.01.5

2.02.5

3.0

9

VARIANCE EXPLAINED BY EACH FACTOR

FACTOR 1 FACTOR 2 FACTOR 3

30.11% 22.34% 8.9%

The portion of total variability explained by the first factor is 2.409/8=30.11% (ss loading/sum of total variance). The portion of total variability explained by the second factor is 1.787/8=22.34%. The portion of total variability explained by the third factor is 0.712/8=8.9%. The total variance explained by the model is 61.35%, which indicates that the model is quite good.

FACTOR LOADING MATRIX

Factor 1 Factor 2 Factor 3

CO2.emissions ( X1) -‐0.596 -‐0.349 -‐0.460

Health expenditure per capita ( X2) 0.532 0.430 0.334

Life expectancy at birth ( X3) 0.923 0.376

Public spending.on education ( X4) 0.246 0.955 -‐0.148

Subsidies and other transfers of expense ( X5)

0.188 0.122

Terrestrial protected areas ( X6) 0.237 0.216

Unemployment ( X7) -‐0.869 0.325 0.365

Urban population ( X8) -‐0.106 -‐0.640 -‐0.274

SS loadings 2.409 1.787 0.712

Proportion Var 0.301 0.223 0.089

Cumulative Var 0.301 0.525 0.614

10

FINAL ESTIMATION OF THE COMMUNALITIES

communalities Specific variance

CO2.emissions ( X1) 0,689 0,311

Health expenditure per capita ( X2) 0,58 0,42

Life expectancy at birth ( X3) 0,995 0,005

Public spending on education ( X4) 0,995 0,005

Subsidies and other transfers of expense (X5 ) 0,0054 0,946

Terrestrial protected areas ( X6) 0,105 0,895

Unemployment ( X7) 0,995 0,005

Urban population ( X8) 0,496 0,504

Total 4,8604

By the final estimation of the communalities we can see that there are 5 communalities that well explain the model because higher than 50% (these communalities refers to variables: X1 , X2, X3, X4, X7). There are also 3 communalities that don’t explain the model very well (these communalities refers to variables X5, X6, X8) . In fact variables with high communality share more in common with the rest of the variables. Indeed specific variance for each observed variable is that portion of the variable that cannot be predicted from the other variables.

So we decided that after ,in naming factors, we will not consider X5, X6. But given that X8 has a communality very near to 50% we can consider this variable.

11

Now we can improve the interpretation of a the factors by applying a rotation to the factor loading matrix.

ROTATED VARIANCE EXPLAINED BY EACH FACTOR (Total=61.36%)

FACTOR 1 FACTOR 2 FACTOR 3

26.02% 19.9% 15.44%

ROTATED FACTOR LOADING MATRIX ( varimax)

Factor 1 Factor 2 Factor 3

CO2.emissions ( X1) -‐0.772 -‐0.301

Health expenditure per capita ( X2) 0.645 0.402

Life expectancy at birth ( X3) 0.890 0.101 -‐0.439

Public spending.on education ( X4) 0.154 0.984


0.221

Terrestrial protected areas ( X6) 0.143 0.260 -‐0.129

Unemployment ( X7) -‐0.260 0.962

Urban population ( X8) -‐0.343 -‐0.537 -‐0.300

SS loadings 2.082 1.592 1.235

Proportion Var 0.260

0.199 0.154

Cumulative Var 0.260 0.459 0.614

12

It is clear that with the rotation now the variance explained by each factor is well distributed and mostable factor 3 passes from 8.9% to 15.44%.

Furthermore we want to assign a label to each factor considering the more significant variables. In naming the label of latent variables we have considered more the original variables with communality>50%. First factor is mainly explained by CO2 emissions, health expenditure per capita, life expectancy at birth unemployment. We have not considered subsidies and other transfers of expense and terrestrial protected areas because they have communality<50%. Second factor is mainly explained by public spending on education and urban population but only the first has a communality>50%. The third factor is explained by unemployment. In principal components, the first factor describes most of variability. After choosing number of factors to retain, we want to spread variability among factors to improve the interpretation. So we consider “rotated factors” that have a better distinction in the meanings of the factor.

NEW LATENT VARIABLES ORIGINAL VARIABLES

FACTOR 1

WELFARE AND WELL-‐BEING

CO2.emissions ( X1)

Health expenditure per capita ( X2)

Life expectancy at birth ( X3)


FACTOR2 PUBLIC INTERVENTION ON POPULATION

Public spending on education ( X4)

Terrestrial protected areas ( X6)

Urban population ( X8)

FACTOR3 UNEMPLYMENT Unemployment ( X7)

13

CLUSTER ANALYSIS

Now we want to analyze how we can cluster the countries using the observations of real variable in order to get few homogenous groups. We compared two methods of clustering: 1. hierarchical method, using Euclidean distance and the ward method; 2. hierarchical method, using Euclidean distance and the complete linkage method. This is the legend of countries: 1. Canada 2. France 3. Germany 4. Japan 5. Italy 6. RussianFederation 7. United States 8. United Kingdom 9. Brazil 10. China 11. India 12. South Africa 13. Australia 14. Saudi Arabia 15. Korea, Rep. 16. Indonesia 17. Mexico 18. Turkey 19. Spain 20. Netherlands

14

With R Software we have run an analysis to choose the number of clusters basing on the within sum of squares computation. From this graph we see that we could have four clusters after cluster analysis.

15

In this cluster analysis we have used the ward method with the Euclidian distance. The ward method is a non-‐hierarchical method based on the ANOVA approach. Where ANOVA stands for ANalysis Of VAriance table.

The graph suggests us that we can use 3 clusters because we can consider China like an isolated country because has very few in common with other clusters.

Cluster 1: Usa, India. (7-11)

Cluster 2: Brazil, Mexico, Russia, Japan, Indonesia. (9-17-6-4-16)

Cluster 3: Canada, France, Germany, Italy, United Kingdom, South Africa, Australia, Saudi Arabia, South Korea, Turkey, Spain, Netherland.(1-12-20-13-14-19-5-15-8-18-2-3)

16

These are the means for each variable:

Cluster1 Cluster2 Cluster3

X1=CO2 emissions (kg per 2000 US$ of GDP)

7.584599e-‐01 1.401193e+00 1.765399e+00

X2=Urban population 3.652584e+07 1.153590e+08 4.082957e+08

X3=Health expenditure per capita (current US$)

3.652584e+07 1.036108e+03 2.639784e+03

X4=Life expectancy at birth, total (years)

7.691514e+01 7.343499e+01 7.173244e+01

X5=Unemployment, total (% of total labor force)

7.783333e+00 5.860000e+00 4.833333e+00

X6=Public spending on education, total (% of GDP)

4.815916e+00 4.147186e+00 3.538987e+00

X7=Subsidies and other transfers (% of expense)

6.459847e+01 6.140823e+01 6.176835e+01

X8=Terrestrial protected areas (% of total land area)

1.513201e+01 1.538366e+01

1.134538e+01

The cluster 1 is that one represents more variables. It is composed only by Usa and India. This cluster seems to have higher values in health expenditure, life expectancy, unemployment, public spending on education and subsidies.

The second cluster is that one with more terrestrial protected areas.

Finally the third cluster has the higher co2 emissions and urban population, but we can see also that is the cluster formed by the majority of elements.

17

10

1 12 20 13 14 19 5 15

3 2 8 18

9

17 6 4 16

7 11

0e+0

01e

+08

2e+0

83e

+08

4e+0

85e

+08

Cluster Dendrogram for Solution HClust.10

Method=average; Distance=euclidianObservation Number in Data Set Dataset

Hei

ght

This cluster analysis with average method and Euclidian distance give us a result worse than the previous analysis. Now we have 10(China) that is an outlier and 7 and 11(U.S. and India) that are far different from other two clusters.

18

Without 7 9 10 11(U.S. Brazil, China, India), we obtain a better cluster analysis without outlier. Now we have two clusters, the first composed by Canada, France, Germany, Italy, United Kingdom, South Africa, Australia, Saudi Arabia, South Korea, Turkey, Spain, Netherland.(1-12-20-13-14-19-5-15-8-18-2-3). The second is composed by: Mexico, Russia, Japan, Indonesia. (17-6-4-16) .

19

CONCLUSION

The initial aim of this research was to find a possible relationship between countries belonging to G20. After cluster and factor analysis we can say that the results obtained are quite interesting since the factor analysis suggests us 3 new latent variables that summarize the original ones. We passed from 11 original variables to 3 variables. The factor analysis produced a quite satisfactory result. We have now three groups: “welfare and well-‐being”, “public intervention” and “unemplyment”. Also cluster analysis produced a satisfactory result. We can find some common characteristics among clusters. We can note that cluster 2: Brazil, Mexico, Russia, Japan, Indonesia is characterized by countries with an high population and apart Japan they are all developing countries. Cluster 3 Canada, France, Germany, Italy, United Kingdom, South Africa, Australia, Saudi Arabia, South Korea, Turkey, Spain, Netherland is the cluster with all the European country that means is the cluster with the higher welfare and equality of people inside clusters. We can also note that there is the highest urban population but also the highest CO2 emissions.

It could be more difficult to discuss cluster 1 because is formed by 2 different countries. One the U.S. is characterized by richness and is developed. Indeed India as a majority of poor population and is a developing country. But we can also find some common points that could be public spending on education because both India and U.S. have a good system of education.

Documents

Business statistcs