19
1 QUALITY OF LIFE IN G 20 COUNTRIES Matteo Biagini

Business statistcs

Embed Size (px)

Citation preview

Page 1: Business statistcs

1  

 

QUALITY OF LIFE IN G 20 COUNTRIES

 

 

 

Matteo  Biagini  

 

 

 

 

 

 

 

 

Page 2: Business statistcs

2  

 

Index:

p.3 Introduction

p.5 Correlation Matrix

p.6 Regression model

p.7 Factor Analysis

p.14 Cluster Analysis

p.19 Conclusion

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Page 3: Business statistcs

3  

 

QUALITY  OF  LIFE  IN  G  20  COUNTRIES  

 

INTRODUCTION  

The  aim  of  this  research  is  to  investigate  how  the  quality  of  life  in  G20  countries  is  related  to  some  

indicators  of    life  quality.  

Considering  quality  of  life    we  refer  to  the  general  well-­‐being  of  individuals  and  societies.  The  term  

is  used  in  a  wide  range  of  contexts,  including  the  fields  of  international  development,  healthcare,  

and  politics.  Standard  indicators  of  the  quality  of  life  include  not  only  wealth  and  employment,  but  

also  the  built  environment,  physical  and  mental  health,  education,  recreation  and  leisure  time,  

and  social  belonging.  

So  among  a  variety  of  indicators  we  have  chosen  8.  

Life  expectancy  is  a  key  indicator  of  the  general  health  of  the  population.  Improvements  in  overall  

life  expectancy  reflect  improvements  in  social  and  economic  conditions,  lifestyle,  access  to  health  

services  and  medical  advances.  This  indicator  uses  estimated  life  expectancy  at  birth.  

CO2   emissions   and   terrestrial   protected   areas   are   indicators   that   concern   how   natural  

environment   supports   its   people,   economy  and   culture.  As   the  population   grows  and  economic  

activity   increases,  more   demands   are   placed   on   the   natural   environment.   Environmental   issues  

impact  on  economic   and  public  health   issues.   In   fact   another   indicator   that  we    have   chosen   is  

health  expenditure  per  capita  that  is  very  related  with  previous  indicators.  

Urban   population   refers   to   population   growth   and   change   in   cities   impact   on   the   relationships  

people  have  with  others  and  their  sense  of  belonging  to  an  area.  

The   concept   of   community   is   fundamental   to   people’s   overall   quality   of   life   and   sense   of  

belonging.  In  fact  we  have  chosen  subsidies  and  other  transfers   like  an  indicator  of  quality  of  life  

because   these   are   an   instrument   with   whom   government   reassign   wealth   among   people   of   a  

country.      

Public  expenditure  on    education  provides  an  insight  into  the  knowledge  and  skills  of  residents  and  

how  they  can  apply  these  to  improve  their  quality  of  life.  Educational  achievement  is  essential  for  

effective  participation  in  society.  

The   last   indicator   is   unemployment:   a   reduction   of   this   indicator   helps   stimulate   further  

opportunities  for  economic  growth  and  development  within  a  community  or  nation.  

Page 4: Business statistcs

4  

 

The   considered   countries   (G20   countries   that   are   the   richest   one   in   the   world)   are:   Canada,  

France,  Germany,   Japan,   Italy,  Russian  Federation,  United  States,  United  Kingdom,  Brazil,  China,  

South  Africa,  Australia,  Saudi  Arabia,  South  Korea,  Indonesia,  Mexico,  Turkey,  Spain,  Netherlands.  

The  source  of  data  is  the  World  data  Bank  in  the  section  of  World  Development  indicators(WDI).  

The  year  chosen  to  extract  data  is  2008.  

The  specific  software  used  on  this  project  are:  ·∙ Gretl(regression)  ·∙ R-­‐Project  (factor  and  cluster  analysis)  ·∙ Microsoft  Excel  (data  matrix  elaboration,  before  and  after  using  R)  

 We  have  numbered  X  from  1  to  8  in  relation  to  any  variable:    

·∙ X1=CO2  emissions  (kg  per  2000  US$  of  GDP)  

·∙ X2=Urban  population  

·∙ X3=Health  expenditure  per  capita  (current  US$)  

·∙ X4=Life  expectancy  at  birth,  total  (years)  

·∙ X5=Unemployment,  total  (%  of  total  labor  force)  

·∙ X6=Public  spending  on  education,  total  (%  of  GDP)  

·∙ X7=Subsidies  and  other  transfers  (%  of  expense)  

·∙ X8=Terrestrial  protected  areas  (%  of  total  land  area)  

 

 

Page 5: Business statistcs

5  

 

 

 

Correlation  matrix  

 

X1   X2   X3   X4   X5   X6   X7   X8    

1,0000   0,4108   -­‐0,6168   -­‐0,7387   0,2370   -­‐0,4123   -­‐0,0290   -­‐0,2151   X1  

  1,0000   -­‐0,2571   -­‐0,2300   -­‐0,2166   -­‐0,5982   -­‐0,1159   -­‐0,0277   X2  

    1,0000   0,6361   -­‐0,2003   0,4932   0,3154   0,1806   X3  

      1,0000   -­‐0,6507   0,2132   0,2230   0,2105   X4  

        1,0000   0,0424   -­‐0,0984   -­‐0,1525   X5  

          1,0000   0,0872   0,2719   X6  

            1,0000   0,1855   X7  

              1,0000   X8  

 

 

 

We   can   see   from     the   data   that   there   is   not   a   very   high   correlation,   but   we   can   run   a   factor  analysis   since   there   are   some   correlations.   Using   R   we   have   found   this   values   that   refers   to  correlation  coefficient  of  Pearson.  So  we  can  conclude  that  there  is  a  strong  correlation  between  X4-­‐X1  and  there   is  a  moderate  correlation  among  X1  and  X6-­‐X3-­‐X2,  between  X2-­‐X6,  between  X3  and  X6-­‐X4  and  finally  between  X4-­‐X5.  We  have  considered  a  strong  correlation  if    corr  >  0.7  and  moderate  correlation  if  0.3  <  corr  <  0.7.        

 

           

 

Page 6: Business statistcs

6  

 

REGRESSION  MODEL    

Model  1:  OLS,  number  of  observations    1-­‐20  Dependent  variable:  Life  expectancy  at  birth.  

      Coefficient    Std.  Error   t-­‐ratio   p-­‐value    

Constant.   88,4781   8,19707   10,7939   <0,00001   ***      

         

CO2  emissions  kg  per  2000  US$  of  GDP  .    

-­‐3,18062   1,18728   -­‐2,6789   0,02008   **  

Urban  population.   -­‐1,19832e-­‐08   8,08775e-­‐09   -­‐1,4817   0,16421    

     

Health  expenditure  per  capita.  

0,00106495   0,000551237   1,9319   0,07732   *  

           Unemployment  total.  

-­‐0,903724   0,206679   -­‐4,3726   0,00091   ***  

           Public  spending  on  education.  

-­‐1,75829   1,13982   -­‐1,5426   0,14888        

Subsidies  and  other  transfers.  

0,0396108   0,0953704   0,4153   0,68523      

 

Terrestrial  protected  areas.  

0,026664   0,0893965   0,2983   0,77060    

   

                         R-­‐squared    0,865092     R  (adjusted)    0,786395           P-­‐value(F)    0,000221  

 

 

 With   the   software   Gretl     we   have   run   a   regression   of   our   data   using   OLS   regression   method.  Analyzing  R-­‐squared  we  can  conclude  that   the  model  as  a  whole   is  very  good.  Also  P-­‐value(F)   is  very   low   so   it   means   that   the   model   as   a   whole   is   very   significant   for   any   value   of   α.   The  dependent   variable   is   “life   expectancy   at   birth”   and   the   others   are   independent   variables.   The  

Page 7: Business statistcs

7  

 

independent  variables  that  have  a  significant  p-­‐value  are:  CO2  emissions,  health  expenditure  per  capita  and  unemployment.  Since  p-­‐value  is  smaller  than  0.05,  we  reject  the  null  hypothesis  and  we  affirm  that  the  regressor  CO2  emissions  has  a  significant  impact  on  life  expectancy  at  birth  at  level  5%..  Since  p-­‐value   is  smaller   than  0.1,  we  reject   the  null  hypothesis  and  we  affirm  that  the  regressor    health  expenditure  per  capita  has  a  significant  impact  on  life  expectancy  at  birth  at  level  10%..  Finally   since   p-­‐value   is   smaller   than   0.01,  we   reject   the   null   hypothesis   and  we   affirm   that   the  regressor  unemployment  total  has  a  significant  impact  on  life  expectancy  at  birth  at  level  1%.  So  we  can  conclude  that  if  CO2  emissions  increase    of  1  Kg  per  2000  US$  of  GDP,  life  expectancy  at  birth  will  reduce  of  3,18062  years.  Another   conclusion   is   that   if   health   expenditure   per   capita   increases   of   1   current   US$,   life  expectancy  at  birth    will  increase  of  0,00106495  years.  Finally  if  unemployment  total  will  increase  of  1%    life  expectancy    at  birth  will  reduce  of  -­‐0,903724  years.      

FACTOR  ANALYSIS  

 

In  order  to  run  a  factor  analysis  we  applied  the  “Principal  component  method”    by  using  R.  So  we  found   these   data   of   eigenvalues,   portion   of   variance(total)   and   cumulative   proportion   of  variance(total).  

 

 

 

 

 

 

 

 

 

 

       

Eigenvalues   Portion  of  variance  (total)  

Cumulative  proportion  of  variance(total)  

3.13602447   0.3920031   0.3920031  

1.59218446   0.1990231   0.5910261  

1.06125308   0.1326566   0.7236828  

0.88797144   0.1109964   0.8346792  

0.55766918   0.06970865   0.90438783  

0.48900580   0.06112573   0.96551355  

0.19844296   0.02480537   0.99031892  

0.07744861   0.009681076   1.000000000  

Page 8: Business statistcs

8  

 

To  select  how  many   factors   to  use  we  considered  eigenvalues>  1  applying  “kaiser  criterium”,  so  we  dropped  all  components  with  eigenvalues  under  1.  Eigenvalue≅equivalent  number  of  variables  which  the  factor  represents.  Looking  at  the  table  we  can  see  that  with  3  eigenvalues,  the  factor  model  will  explain  72.37%  of  total  original  variability.    

 

 

SCREE  PLOT  

We   can   see   also   the   results   from   another  

point  of  view  thanks  to  the  scree  plot.  This  

test  puts  the  components   in  the  X  axis  and  

the  corresponding  eigenvalues  in  the  Y-­‐axis.  

 

 

 

 

The   factor   loading   lij   is   the   covariance   between   the   j-­‐th   common   factor   and   the   i-­‐th   original  variable.  But   the  chosen  variables  are  standardized  so   it   coincides  with   the  correlation  between  the   j-­‐th   common   factor  and   the   i-­‐th  original   variable.   In   these   case   the  minimum  value   is   -­‐1   (in  case   of   perfect   negative   correlation)   and   the   maximum   value   is   1   (in   case   of   perfect   positive  correlation).                                

Comp.1 Comp.3 Comp.5 Comp.7

.PC

Variances

0.00.5

1.01.5

2.02.5

3.0

Page 9: Business statistcs

9  

 

 

 

VARIANCE  EXPLAINED  BY  EACH  FACTOR  

FACTOR  1   FACTOR  2   FACTOR  3  

30.11%   22.34%   8.9%  

 

 

The  portion  of  total  variability  explained  by  the  first  factor   is  2.409/8=30.11%  (ss   loading/sum  of  total  variance).  The  portion  of  total  variability  explained  by  the  second  factor  is  1.787/8=22.34%.  The  portion  of   total   variability   explained  by   the   third   factor   is   0.712/8=8.9%.   The   total   variance  explained  by  the  model  is  61.35%,  which  indicates  that  the  model  is  quite  good.    

 

FACTOR  LOADING  MATRIX  

  Factor  1   Factor  2   Factor  3  

CO2.emissions  (  X1)                               -­‐0.596   -­‐0.349   -­‐0.460  

Health  expenditure  per  capita  (  X2)   0.532   0.430   0.334  

Life  expectancy  at  birth  (  X3)     0.923     0.376  

Public  spending.on  education  (  X4)     0.246   0.955   -­‐0.148  

Subsidies  and  other  transfers  of  expense  (  X5)    

0.188     0.122  

Terrestrial  protected  areas    (  X6)     0.237   0.216    

Unemployment  (  X7)   -­‐0.869   0.325   0.365  

Urban  population  (  X8)                                                           -­‐0.106   -­‐0.640   -­‐0.274  

 

SS  loadings       2.409         1.787         0.712  

Proportion  Var         0.301         0.223         0.089  

Cumulative  Var         0.301         0.525         0.614  

Page 10: Business statistcs

10  

 

 

FINAL  ESTIMATION  OF  THE  COMMUNALITIES  

    communalities   Specific  variance  

CO2.emissions  (  X1)                               0,689   0,311  

Health  expenditure  per  capita  (  X2)   0,58   0,42  

Life  expectancy  at  birth  (  X3)     0,995   0,005  

Public  spending  on  education  (  X4)     0,995   0,005  

Subsidies  and  other  transfers  of  expense  (X5  )     0,0054   0,946  

Terrestrial  protected  areas    (  X6)     0,105   0,895  

Unemployment  (  X7)   0,995   0,005  

Urban  population  (  X8)                                                           0,496   0,504  

Total   4,8604    

 

 

By  the  final  estimation  of  the  communalities  we  can  see  that  there  are  5  communalities  that  well  explain  the  model    because  higher  than  50%  (these  communalities  refers  to  variables:  X1  ,  X2,    X3,    X4,   X7).   There   are   also   3   communalities   that   don’t   explain   the   model   very   well     (these  communalities  refers  to  variables  X5,    X6,    X8)  .    In  fact  variables  with  high  communality  share  more  in  common  with  the  rest  of  the  variables.  Indeed  specific  variance  for  each  observed  variable  is  that  portion  of  the  variable  that  cannot  be  predicted  from  the  other  variables.  

So  we  decided  that  after  ,in  naming  factors,  we  will  not  consider  X5,  X6.  But  given  that  X8  has  a  communality  very  near  to  50%  we  can  consider  this  variable.  

 

 

 

 

 

Page 11: Business statistcs

11  

 

 

 

Now  we  can  improve  the  interpretation  of  a  the  factors  by  applying  a  rotation  to  the  factor  loading  matrix.  

 

 ROTATED  VARIANCE  EXPLAINED  BY  EACH  FACTOR                                            (Total=61.36%)  

FACTOR  1   FACTOR  2   FACTOR  3  

26.02%   19.9%   15.44%  

 

 ROTATED  FACTOR  LOADING  MATRIX    (  varimax)  

  Factor  1   Factor  2   Factor  3  

CO2.emissions  (  X1)                               -­‐0.772   -­‐0.301    

Health  expenditure  per  capita  (  X2)   0.645   0.402    

Life  expectancy  at  birth  (  X3)     0.890   0.101   -­‐0.439  

Public  spending.on  education  (  X4)     0.154   0.984    

Subsidies  and  other  transfers  of  expense  (  X5)    

0.221      

Terrestrial  protected  areas    (  X6)     0.143   0.260   -­‐0.129  

Unemployment  (  X7)   -­‐0.260     0.962  

Urban  population  (  X8)                                                           -­‐0.343   -­‐0.537   -­‐0.300  

 

SS  loadings       2.082   1.592   1.235  

Proportion  Var         0.260  

 

0.199   0.154  

Cumulative  Var         0.260   0.459   0.614  

Page 12: Business statistcs

12  

 

It  is  clear  that  with  the  rotation  now  the  variance  explained  by  each  factor  is  well  distributed  and  mostable  factor  3  passes  from  8.9%  to  15.44%.  

Furthermore  we  want  to  assign  a  label  to  each  factor  considering  the  more  significant    variables.    In   naming   the   label   of   latent   variables   we   have   considered   more   the   original   variables   with  communality>50%.   First   factor   is   mainly   explained   by   CO2   emissions,     health   expenditure     per  capita,     life   expectancy     at   birth   unemployment.  We   have   not   considered     subsidies   and   other  transfers  of  expense  and  terrestrial  protected  areas    because  they  have  communality<50%.  Second  factor  is  mainly  explained  by    public  spending  on  education  and  urban  population  but  only  the  first  has  a  communality>50%.  The  third  factor  is  explained  by  unemployment.  In  principal  components,  the  first  factor  describes  most  of  variability.  After   choosing   number   of   factors   to   retain,   we   want   to   spread   variability   among   factors   to  improve  the  interpretation.  So  we  consider  “rotated  factors”  that  have  a  better  distinction  in  the  meanings  of  the  factor.        

  NEW  LATENT  VARIABLES   ORIGINAL  VARIABLES  

FACTOR  1    

WELFARE  AND  WELL-­‐BEING  

CO2.emissions  (  X1)                    

Health  expenditure  per  capita  (  X2)  

Life  expectancy  at  birth  (  X3)  

Subsidies  and  other  transfers  of  expense  (  X5)  

FACTOR2   PUBLIC  INTERVENTION  ON  POPULATION  

Public  spending  on  education  (  X4)  

Terrestrial  protected  areas    (  X6)  

Urban  population  (  X8)                                                          

FACTOR3   UNEMPLYMENT   Unemployment  (  X7)  

   

 

 

 

Page 13: Business statistcs

13  

 

 

 

CLUSTER  ANALYSIS  

Now  we  want  to  analyze  how  we  can  cluster  the  countries  using  the  observations  of  real  variable  in  order  to  get  few  homogenous  groups.  We    compared  two  methods  of  clustering:  1.  hierarchical  method,  using  Euclidean  distance  and  the  ward  method;  2.  hierarchical  method,  using  Euclidean  distance  and  the  complete  linkage  method.    This  is  the  legend  of  countries:  1. Canada  2. France  3. Germany  4. Japan  5. Italy  6. RussianFederation  7. United  States  8. United  Kingdom  9. Brazil  10. China  11. India  12. South  Africa  13. Australia  14. Saudi  Arabia  15. Korea,  Rep.  16. Indonesia  17. Mexico  18. Turkey  19. Spain  20. Netherlands  

Page 14: Business statistcs

14  

 

 

 

With R Software we have run an analysis to choose the number of clusters basing on the within sum of squares computation. From this graph we see that we could have four clusters after cluster analysis.  

 

Page 15: Business statistcs

15  

 

 

In  this  cluster  analysis  we  have  used  the  ward  method  with  the  Euclidian  distance.  The  ward  method  is  a  non-­‐hierarchical  method  based  on  the  ANOVA  approach.  Where  ANOVA  stands  for  ANalysis  Of  VAriance  table.  

The graph suggests us that we can use 3 clusters because we can consider China like an isolated country because has very few in common with other clusters.

Cluster 1: Usa, India. (7-11)

Cluster 2: Brazil, Mexico, Russia, Japan, Indonesia. (9-17-6-4-16)

Cluster 3: Canada, France, Germany, Italy, United Kingdom, South Africa, Australia, Saudi Arabia, South Korea, Turkey, Spain, Netherland.(1-12-20-13-14-19-5-15-8-18-2-3)

 

 

Page 16: Business statistcs

16  

 

 

These are the means for each variable:  

  Cluster1   Cluster2   Cluster3  

X1=CO2  emissions  (kg  per  2000  US$  of  GDP)  

7.584599e-­‐01   1.401193e+00   1.765399e+00  

X2=Urban  population   3.652584e+07   1.153590e+08   4.082957e+08    

X3=Health  expenditure  per  capita  (current  US$)  

3.652584e+07   1.036108e+03   2.639784e+03  

X4=Life  expectancy  at  birth,  total  (years)  

7.691514e+01   7.343499e+01   7.173244e+01    

X5=Unemployment,  total  (%  of  total  labor  force)  

7.783333e+00   5.860000e+00   4.833333e+00  

X6=Public  spending  on  education,  total  (%  of  GDP)  

4.815916e+00   4.147186e+00   3.538987e+00  

X7=Subsidies  and  other  transfers  (%  of  expense)  

6.459847e+01   6.140823e+01   6.176835e+01  

X8=Terrestrial  protected  areas  (%  of  total  land  area)  

1.513201e+01   1.538366e+01    

1.134538e+01    

 

 

The  cluster  1  is  that  one  represents  more  variables.  It  is  composed  only  by  Usa  and  India.  This  cluster  seems  to  have    higher  values  in  health  expenditure,  life  expectancy,  unemployment,  public  spending  on  education  and  subsidies.  

The  second  cluster  is  that  one  with  more  terrestrial  protected  areas.  

Finally  the  third  cluster  has  the  higher  co2  emissions  and  urban  population,  but  we  can  see  also  that  is  the  cluster  formed  by  the  majority  of  elements.  

 

Page 17: Business statistcs

17  

 

 

 

10

1 12 20 13 14 19 5 15

3 2 8 18

9

17 6 4 16

7 11

0e+0

01e

+08

2e+0

83e

+08

4e+0

85e

+08

Cluster Dendrogram for Solution HClust.10

Method=average; Distance=euclidianObservation Number in Data Set Dataset

Hei

ght

 

 

This  cluster  analysis  with  average  method  and  Euclidian  distance  give  us  a  result  worse  than  the  previous  analysis.  Now  we  have  10(China)  that  is  an  outlier  and  7  and  11(U.S.  and  India)  that  are  far  different  from  other  two  clusters.  

 

 

Page 18: Business statistcs

18  

 

   

 

 

 

Without  7  9  10  11(U.S.  Brazil,  China,  India),  we  obtain  a  better  cluster  analysis  without  outlier.  Now  we  have  two  clusters,  the  first  composed  by Canada, France, Germany, Italy, United Kingdom, South Africa, Australia, Saudi Arabia, South Korea, Turkey, Spain, Netherland.(1-12-20-13-14-19-5-15-8-18-2-3). The second is composed by: Mexico, Russia, Japan, Indonesia. (17-6-4-16) .

Page 19: Business statistcs

19  

 

 

CONCLUSION    

The  initial  aim  of  this  research  was  to  find  a  possible  relationship  between  countries  belonging  to  G20.  After   cluster  and   factor  analysis  we  can   say   that   the   results  obtained  are  quite   interesting  since  the  factor  analysis  suggests  us  3  new  latent  variables  that  summarize  the  original  ones.    We  passed  from  11  original  variables  to  3  variables.  The  factor  analysis  produced  a  quite  satisfactory  result.  We  have  now  three  groups:  “welfare  and  well-­‐being”,  “public  intervention”  and  “unemplyment”.  Also   cluster   analysis   produced   a   satisfactory   result.  We   can   find   some   common   characteristics  among   clusters.   We   can   note   that   cluster 2: Brazil, Mexico, Russia, Japan, Indonesia is characterized by countries with an high population and apart Japan they are all developing countries.  Cluster 3 Canada, France, Germany, Italy, United Kingdom, South Africa, Australia, Saudi Arabia, South Korea, Turkey, Spain, Netherland is the cluster with all the European country that means is the cluster with the higher welfare and equality of people inside clusters. We can also note that there is the highest urban population but also the highest CO2 emissions.

It could be more difficult to discuss cluster 1 because is formed by 2 different countries. One the U.S. is characterized by richness and is developed. Indeed India as a majority of poor population and is a developing country. But we can also find some common points that could be public spending on education because both India and U.S. have a good system of education.