Upload
matteo-biagini
View
40
Download
0
Embed Size (px)
Citation preview
1
QUALITY OF LIFE IN G 20 COUNTRIES
Matteo Biagini
2
Index:
p.3 Introduction
p.5 Correlation Matrix
p.6 Regression model
p.7 Factor Analysis
p.14 Cluster Analysis
p.19 Conclusion
3
QUALITY OF LIFE IN G 20 COUNTRIES
INTRODUCTION
The aim of this research is to investigate how the quality of life in G20 countries is related to some
indicators of life quality.
Considering quality of life we refer to the general well-‐being of individuals and societies. The term
is used in a wide range of contexts, including the fields of international development, healthcare,
and politics. Standard indicators of the quality of life include not only wealth and employment, but
also the built environment, physical and mental health, education, recreation and leisure time,
and social belonging.
So among a variety of indicators we have chosen 8.
Life expectancy is a key indicator of the general health of the population. Improvements in overall
life expectancy reflect improvements in social and economic conditions, lifestyle, access to health
services and medical advances. This indicator uses estimated life expectancy at birth.
CO2 emissions and terrestrial protected areas are indicators that concern how natural
environment supports its people, economy and culture. As the population grows and economic
activity increases, more demands are placed on the natural environment. Environmental issues
impact on economic and public health issues. In fact another indicator that we have chosen is
health expenditure per capita that is very related with previous indicators.
Urban population refers to population growth and change in cities impact on the relationships
people have with others and their sense of belonging to an area.
The concept of community is fundamental to people’s overall quality of life and sense of
belonging. In fact we have chosen subsidies and other transfers like an indicator of quality of life
because these are an instrument with whom government reassign wealth among people of a
country.
Public expenditure on education provides an insight into the knowledge and skills of residents and
how they can apply these to improve their quality of life. Educational achievement is essential for
effective participation in society.
The last indicator is unemployment: a reduction of this indicator helps stimulate further
opportunities for economic growth and development within a community or nation.
4
The considered countries (G20 countries that are the richest one in the world) are: Canada,
France, Germany, Japan, Italy, Russian Federation, United States, United Kingdom, Brazil, China,
South Africa, Australia, Saudi Arabia, South Korea, Indonesia, Mexico, Turkey, Spain, Netherlands.
The source of data is the World data Bank in the section of World Development indicators(WDI).
The year chosen to extract data is 2008.
The specific software used on this project are: ·∙ Gretl(regression) ·∙ R-‐Project (factor and cluster analysis) ·∙ Microsoft Excel (data matrix elaboration, before and after using R)
We have numbered X from 1 to 8 in relation to any variable:
·∙ X1=CO2 emissions (kg per 2000 US$ of GDP)
·∙ X2=Urban population
·∙ X3=Health expenditure per capita (current US$)
·∙ X4=Life expectancy at birth, total (years)
·∙ X5=Unemployment, total (% of total labor force)
·∙ X6=Public spending on education, total (% of GDP)
·∙ X7=Subsidies and other transfers (% of expense)
·∙ X8=Terrestrial protected areas (% of total land area)
5
Correlation matrix
X1 X2 X3 X4 X5 X6 X7 X8
1,0000 0,4108 -‐0,6168 -‐0,7387 0,2370 -‐0,4123 -‐0,0290 -‐0,2151 X1
1,0000 -‐0,2571 -‐0,2300 -‐0,2166 -‐0,5982 -‐0,1159 -‐0,0277 X2
1,0000 0,6361 -‐0,2003 0,4932 0,3154 0,1806 X3
1,0000 -‐0,6507 0,2132 0,2230 0,2105 X4
1,0000 0,0424 -‐0,0984 -‐0,1525 X5
1,0000 0,0872 0,2719 X6
1,0000 0,1855 X7
1,0000 X8
We can see from the data that there is not a very high correlation, but we can run a factor analysis since there are some correlations. Using R we have found this values that refers to correlation coefficient of Pearson. So we can conclude that there is a strong correlation between X4-‐X1 and there is a moderate correlation among X1 and X6-‐X3-‐X2, between X2-‐X6, between X3 and X6-‐X4 and finally between X4-‐X5. We have considered a strong correlation if corr > 0.7 and moderate correlation if 0.3 < corr < 0.7.
6
REGRESSION MODEL
Model 1: OLS, number of observations 1-‐20 Dependent variable: Life expectancy at birth.
Coefficient Std. Error t-‐ratio p-‐value
Constant. 88,4781 8,19707 10,7939 <0,00001 ***
CO2 emissions kg per 2000 US$ of GDP .
-‐3,18062 1,18728 -‐2,6789 0,02008 **
Urban population. -‐1,19832e-‐08 8,08775e-‐09 -‐1,4817 0,16421
Health expenditure per capita.
0,00106495 0,000551237 1,9319 0,07732 *
Unemployment total.
-‐0,903724 0,206679 -‐4,3726 0,00091 ***
Public spending on education.
-‐1,75829 1,13982 -‐1,5426 0,14888
Subsidies and other transfers.
0,0396108 0,0953704 0,4153 0,68523
Terrestrial protected areas.
0,026664 0,0893965 0,2983 0,77060
R-‐squared 0,865092 R (adjusted) 0,786395 P-‐value(F) 0,000221
With the software Gretl we have run a regression of our data using OLS regression method. Analyzing R-‐squared we can conclude that the model as a whole is very good. Also P-‐value(F) is very low so it means that the model as a whole is very significant for any value of α. The dependent variable is “life expectancy at birth” and the others are independent variables. The
7
independent variables that have a significant p-‐value are: CO2 emissions, health expenditure per capita and unemployment. Since p-‐value is smaller than 0.05, we reject the null hypothesis and we affirm that the regressor CO2 emissions has a significant impact on life expectancy at birth at level 5%.. Since p-‐value is smaller than 0.1, we reject the null hypothesis and we affirm that the regressor health expenditure per capita has a significant impact on life expectancy at birth at level 10%.. Finally since p-‐value is smaller than 0.01, we reject the null hypothesis and we affirm that the regressor unemployment total has a significant impact on life expectancy at birth at level 1%. So we can conclude that if CO2 emissions increase of 1 Kg per 2000 US$ of GDP, life expectancy at birth will reduce of 3,18062 years. Another conclusion is that if health expenditure per capita increases of 1 current US$, life expectancy at birth will increase of 0,00106495 years. Finally if unemployment total will increase of 1% life expectancy at birth will reduce of -‐0,903724 years.
FACTOR ANALYSIS
In order to run a factor analysis we applied the “Principal component method” by using R. So we found these data of eigenvalues, portion of variance(total) and cumulative proportion of variance(total).
Eigenvalues Portion of variance (total)
Cumulative proportion of variance(total)
3.13602447 0.3920031 0.3920031
1.59218446 0.1990231 0.5910261
1.06125308 0.1326566 0.7236828
0.88797144 0.1109964 0.8346792
0.55766918 0.06970865 0.90438783
0.48900580 0.06112573 0.96551355
0.19844296 0.02480537 0.99031892
0.07744861 0.009681076 1.000000000
8
To select how many factors to use we considered eigenvalues> 1 applying “kaiser criterium”, so we dropped all components with eigenvalues under 1. Eigenvalue≅equivalent number of variables which the factor represents. Looking at the table we can see that with 3 eigenvalues, the factor model will explain 72.37% of total original variability.
SCREE PLOT
We can see also the results from another
point of view thanks to the scree plot. This
test puts the components in the X axis and
the corresponding eigenvalues in the Y-‐axis.
The factor loading lij is the covariance between the j-‐th common factor and the i-‐th original variable. But the chosen variables are standardized so it coincides with the correlation between the j-‐th common factor and the i-‐th original variable. In these case the minimum value is -‐1 (in case of perfect negative correlation) and the maximum value is 1 (in case of perfect positive correlation).
Comp.1 Comp.3 Comp.5 Comp.7
.PC
Variances
0.00.5
1.01.5
2.02.5
3.0
9
VARIANCE EXPLAINED BY EACH FACTOR
FACTOR 1 FACTOR 2 FACTOR 3
30.11% 22.34% 8.9%
The portion of total variability explained by the first factor is 2.409/8=30.11% (ss loading/sum of total variance). The portion of total variability explained by the second factor is 1.787/8=22.34%. The portion of total variability explained by the third factor is 0.712/8=8.9%. The total variance explained by the model is 61.35%, which indicates that the model is quite good.
FACTOR LOADING MATRIX
Factor 1 Factor 2 Factor 3
CO2.emissions ( X1) -‐0.596 -‐0.349 -‐0.460
Health expenditure per capita ( X2) 0.532 0.430 0.334
Life expectancy at birth ( X3) 0.923 0.376
Public spending.on education ( X4) 0.246 0.955 -‐0.148
Subsidies and other transfers of expense ( X5)
0.188 0.122
Terrestrial protected areas ( X6) 0.237 0.216
Unemployment ( X7) -‐0.869 0.325 0.365
Urban population ( X8) -‐0.106 -‐0.640 -‐0.274
SS loadings 2.409 1.787 0.712
Proportion Var 0.301 0.223 0.089
Cumulative Var 0.301 0.525 0.614
10
FINAL ESTIMATION OF THE COMMUNALITIES
communalities Specific variance
CO2.emissions ( X1) 0,689 0,311
Health expenditure per capita ( X2) 0,58 0,42
Life expectancy at birth ( X3) 0,995 0,005
Public spending on education ( X4) 0,995 0,005
Subsidies and other transfers of expense (X5 ) 0,0054 0,946
Terrestrial protected areas ( X6) 0,105 0,895
Unemployment ( X7) 0,995 0,005
Urban population ( X8) 0,496 0,504
Total 4,8604
By the final estimation of the communalities we can see that there are 5 communalities that well explain the model because higher than 50% (these communalities refers to variables: X1 , X2, X3, X4, X7). There are also 3 communalities that don’t explain the model very well (these communalities refers to variables X5, X6, X8) . In fact variables with high communality share more in common with the rest of the variables. Indeed specific variance for each observed variable is that portion of the variable that cannot be predicted from the other variables.
So we decided that after ,in naming factors, we will not consider X5, X6. But given that X8 has a communality very near to 50% we can consider this variable.
11
Now we can improve the interpretation of a the factors by applying a rotation to the factor loading matrix.
ROTATED VARIANCE EXPLAINED BY EACH FACTOR (Total=61.36%)
FACTOR 1 FACTOR 2 FACTOR 3
26.02% 19.9% 15.44%
ROTATED FACTOR LOADING MATRIX ( varimax)
Factor 1 Factor 2 Factor 3
CO2.emissions ( X1) -‐0.772 -‐0.301
Health expenditure per capita ( X2) 0.645 0.402
Life expectancy at birth ( X3) 0.890 0.101 -‐0.439
Public spending.on education ( X4) 0.154 0.984
Subsidies and other transfers of expense ( X5)
0.221
Terrestrial protected areas ( X6) 0.143 0.260 -‐0.129
Unemployment ( X7) -‐0.260 0.962
Urban population ( X8) -‐0.343 -‐0.537 -‐0.300
SS loadings 2.082 1.592 1.235
Proportion Var 0.260
0.199 0.154
Cumulative Var 0.260 0.459 0.614
12
It is clear that with the rotation now the variance explained by each factor is well distributed and mostable factor 3 passes from 8.9% to 15.44%.
Furthermore we want to assign a label to each factor considering the more significant variables. In naming the label of latent variables we have considered more the original variables with communality>50%. First factor is mainly explained by CO2 emissions, health expenditure per capita, life expectancy at birth unemployment. We have not considered subsidies and other transfers of expense and terrestrial protected areas because they have communality<50%. Second factor is mainly explained by public spending on education and urban population but only the first has a communality>50%. The third factor is explained by unemployment. In principal components, the first factor describes most of variability. After choosing number of factors to retain, we want to spread variability among factors to improve the interpretation. So we consider “rotated factors” that have a better distinction in the meanings of the factor.
NEW LATENT VARIABLES ORIGINAL VARIABLES
FACTOR 1
WELFARE AND WELL-‐BEING
CO2.emissions ( X1)
Health expenditure per capita ( X2)
Life expectancy at birth ( X3)
Subsidies and other transfers of expense ( X5)
FACTOR2 PUBLIC INTERVENTION ON POPULATION
Public spending on education ( X4)
Terrestrial protected areas ( X6)
Urban population ( X8)
FACTOR3 UNEMPLYMENT Unemployment ( X7)
13
CLUSTER ANALYSIS
Now we want to analyze how we can cluster the countries using the observations of real variable in order to get few homogenous groups. We compared two methods of clustering: 1. hierarchical method, using Euclidean distance and the ward method; 2. hierarchical method, using Euclidean distance and the complete linkage method. This is the legend of countries: 1. Canada 2. France 3. Germany 4. Japan 5. Italy 6. RussianFederation 7. United States 8. United Kingdom 9. Brazil 10. China 11. India 12. South Africa 13. Australia 14. Saudi Arabia 15. Korea, Rep. 16. Indonesia 17. Mexico 18. Turkey 19. Spain 20. Netherlands
14
With R Software we have run an analysis to choose the number of clusters basing on the within sum of squares computation. From this graph we see that we could have four clusters after cluster analysis.
15
In this cluster analysis we have used the ward method with the Euclidian distance. The ward method is a non-‐hierarchical method based on the ANOVA approach. Where ANOVA stands for ANalysis Of VAriance table.
The graph suggests us that we can use 3 clusters because we can consider China like an isolated country because has very few in common with other clusters.
Cluster 1: Usa, India. (7-11)
Cluster 2: Brazil, Mexico, Russia, Japan, Indonesia. (9-17-6-4-16)
Cluster 3: Canada, France, Germany, Italy, United Kingdom, South Africa, Australia, Saudi Arabia, South Korea, Turkey, Spain, Netherland.(1-12-20-13-14-19-5-15-8-18-2-3)
16
These are the means for each variable:
Cluster1 Cluster2 Cluster3
X1=CO2 emissions (kg per 2000 US$ of GDP)
7.584599e-‐01 1.401193e+00 1.765399e+00
X2=Urban population 3.652584e+07 1.153590e+08 4.082957e+08
X3=Health expenditure per capita (current US$)
3.652584e+07 1.036108e+03 2.639784e+03
X4=Life expectancy at birth, total (years)
7.691514e+01 7.343499e+01 7.173244e+01
X5=Unemployment, total (% of total labor force)
7.783333e+00 5.860000e+00 4.833333e+00
X6=Public spending on education, total (% of GDP)
4.815916e+00 4.147186e+00 3.538987e+00
X7=Subsidies and other transfers (% of expense)
6.459847e+01 6.140823e+01 6.176835e+01
X8=Terrestrial protected areas (% of total land area)
1.513201e+01 1.538366e+01
1.134538e+01
The cluster 1 is that one represents more variables. It is composed only by Usa and India. This cluster seems to have higher values in health expenditure, life expectancy, unemployment, public spending on education and subsidies.
The second cluster is that one with more terrestrial protected areas.
Finally the third cluster has the higher co2 emissions and urban population, but we can see also that is the cluster formed by the majority of elements.
17
10
1 12 20 13 14 19 5 15
3 2 8 18
9
17 6 4 16
7 11
0e+0
01e
+08
2e+0
83e
+08
4e+0
85e
+08
Cluster Dendrogram for Solution HClust.10
Method=average; Distance=euclidianObservation Number in Data Set Dataset
Hei
ght
This cluster analysis with average method and Euclidian distance give us a result worse than the previous analysis. Now we have 10(China) that is an outlier and 7 and 11(U.S. and India) that are far different from other two clusters.
18
Without 7 9 10 11(U.S. Brazil, China, India), we obtain a better cluster analysis without outlier. Now we have two clusters, the first composed by Canada, France, Germany, Italy, United Kingdom, South Africa, Australia, Saudi Arabia, South Korea, Turkey, Spain, Netherland.(1-12-20-13-14-19-5-15-8-18-2-3). The second is composed by: Mexico, Russia, Japan, Indonesia. (17-6-4-16) .
19
CONCLUSION
The initial aim of this research was to find a possible relationship between countries belonging to G20. After cluster and factor analysis we can say that the results obtained are quite interesting since the factor analysis suggests us 3 new latent variables that summarize the original ones. We passed from 11 original variables to 3 variables. The factor analysis produced a quite satisfactory result. We have now three groups: “welfare and well-‐being”, “public intervention” and “unemplyment”. Also cluster analysis produced a satisfactory result. We can find some common characteristics among clusters. We can note that cluster 2: Brazil, Mexico, Russia, Japan, Indonesia is characterized by countries with an high population and apart Japan they are all developing countries. Cluster 3 Canada, France, Germany, Italy, United Kingdom, South Africa, Australia, Saudi Arabia, South Korea, Turkey, Spain, Netherland is the cluster with all the European country that means is the cluster with the higher welfare and equality of people inside clusters. We can also note that there is the highest urban population but also the highest CO2 emissions.
It could be more difficult to discuss cluster 1 because is formed by 2 different countries. One the U.S. is characterized by richness and is developed. Indeed India as a majority of poor population and is a developing country. But we can also find some common points that could be public spending on education because both India and U.S. have a good system of education.