47
11/18/2017 Cluster analysis - Dati World (1) file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 1/47 Cluster analysis - Dati World Emanuele Taufer

Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 1/47

Cluster analysis - Dati WorldEmanuele Taufer

Page 2: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 2/47

World dataUn data set sui Paesi che contiene informazioni demografiche,economiche, sulla sanità e sulle abitudini alimentari.

Originariamente disponibile ahttps://docs.google.com/spreadsheets/d/1W8Bnx48Xz4ybA9PF0XHiNRhsingle=true&gid=0

library(ggplot2) library(data.table)

world=read.csv("http://www.cs.unitn.it/~taufer/Data/World.csv") str(world)

## 'data.frame': 86 obs. of 102 variables: ## $ Countries ## $ Average.latitude..º. ## $ Annual.insolation..W.h.m2.day. ## $ Energy..kcal.day. ## $ Protein..g.day. ## $ Fats..g.day. ## $ Carbohydrates..g.day. ## $ Animal.Products....kcal.day. ## $ Animal.Fats..kcal.day. ## $ Bovine.Meat..kcal.day. ## $ Butter..Ghee..kcal.day. ## $ Cheese..kcal.day. ## $ Eggs..kcal.day. ## $ Fats..Animals..Raw..kcal.day. ## $ Fish..Seafood..kcal.day. ## $ Freshwater.Fish..kcal.day. ## $ Honey..kcal.day. ## $ Meat..kcal.day. ## $ Milk...Excluding.Butter..kcal.day. ## $ Milk..Whole..kcal.day. ## $ Mutton...Goat.Meat..kcal.day. ## $ Offals..Edible..kcal.day. ## $ Pelagic.Fish..kcal.day. ## $ Pigmeat..kcal.day. ## $ Poultry.Meat..kcal.day. ## $ Vegetal.Products....kcal.day. ## $ Alcoholic.Beverages..kcal.day. ## $ Apples..kcal.day. ## $ Bananas..kcal.day. ## $ Beans..kcal.day. ## $ Cereals...Excluding.Beer..kcal.day. ## $ Coconut.Oil..kcal.day. ## $ Coffee..kcal.day. ## $ Fruits...Excluding.Wine..kcal.day. ## $ Nuts..kcal.day. ## $ Olive.Oil..kcal.day. ## $ Palm.Oil..kcal.day.

Page 3: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 3/47

## $ Potatoes..kcal.day. ## $ Pulses..kcal.day. ## $ Rice..Milled.Equivalent...kcal.day. ## $ Rice..Paddy.Equivalent...kcal.day. ## $ Roots...Tuber.Dry.Equiv..kcal.day. ## $ Soyabean.Oil..kcal.day. ## $ Starchy.Roots..kcal.day. ## $ Sugar...Sweeteners..kcal.day. ## $ Sugar..Raw.Equivalent...kcal.day. ## $ Sugar..Raw.Equivalent..kcal.day. ## $ Sugar..Refined.Equiv..kcal.day. ## $ Vegetable.Oils..kcal.day. ## $ Vegetables..kcal.day. ## $ Wheat..kcal.day. ## $ Wine..kcal.day. ## $ Gross.national.income.per.capita..PPP.international... ## $ Population.annual.growth.rate.... ## $ Population.in.urban.areas.... ## $ Population.median.age..years...2006 ## $ Population.proportion.over.60......2006 ## $ Population.proportion.under.15......2006 ## $ Total.fertility.rate..per.female. ## $ Per.capita.recorded.alcohol.consumption..litres.of.pure.alcohol..among.adults....1## $ Population.with.sustainable.access.to.improved.drinking.water.sources.....rural ## $ Population.with.sustainable.access.to.improved.drinking.water.sources.....total ## $ Population.with.sustainable.access.to.improved.drinking.water.sources.....urban ## $ Population.with.sustainable.access.to.improved.sanitation.....rural ## $ Population.with.sustainable.access.to.improved.sanitation.....total ## $ Population.with.sustainable.access.to.improved.sanitation.....urban ## $ Prevalence.of.current.tobacco.use.among.adults....15.years......both.sexes..2005 ## $ Prevalence.of.current.tobacco.use.among.adults....15.years......female..2005 ## $ Prevalence.of.current.tobacco.use.among.adults....15.years......male..2005 ## $ Mean.total.cholesterol..men..mg.dl...2005 ## $ Mean.total.cholesterol..female..mg.dl...2005 ## $ Diabetes.crude.prevalence..adults.aged.20.to.79.... ## $ Systolic.blood.pressure..adults.aged.15.and.above..men..mmHg. ## $ Systolic.blood.pressure..adults.aged.15.and.above..female..mmHg. ## $ Obesity.prevalence..men.... ## $ Obesity.prevalence..female.... ## $ Adult.mortality.rate..probability.of.dying.between.15.to.60.years.per.1000.populat## $ Adult.mortality.rate..probability.of.dying.between.15.to.60.years.per.1000.populat## $ Adult.mortality.rate..probability.of.dying.between.15.to.60.years.per.1000.populat## $ Age.standardized.mortality.rate.for.cancer..per.100.000.population...2002 ## $ Age.standardized.mortality.rate.for.cardiovascular.diseases..per.100.000.populatio## $ Age.standardized.mortality.rate.for.injuries..per.100.000.population...2002 ## $ Age.standardized.mortality.rate.for.non.communicable.diseases..per.100.000.populat## $ Healthy.life.expectancy..HALE..at.birth..years..both.sexes ## $ Healthy.life.expectancy..HALE..at.birth..years..female ## $ Healthy.life.expectancy..HALE..at.birth..years..male ## $ Incidence.of.tuberculosis..per.100.000.population.per.year. ## $ Infant.mortality.rate..per.1.000.live.births..both.sexes ## $ Infant.mortality.rate..per.1.000.live.births..female ## $ Infant.mortality.rate..per.1.000.live.births..male ## $ Life.expectancy.at.birth..years..both.sexes ## $ Life.expectancy.at.birth..years..female ## $ Life.expectancy.at.birth..years..male ## $ Maternal.mortality.ratio..per.100.000.live.births...2005 ## $ Neonatal.mortality.rate..per.1.000.live.births...2004 ## $ Prevalence.of.tuberculosis..per.100.000.population. ## $ Under.5.mortality.rate..probability.of.dying.by.age.5.per.1000.live.births..both.s## $ Under.5.mortality.rate..probability.of.dying.by.age.5.per.1000.live.births..female## $ Under.5.mortality.rate..probability.of.dying.by.age.5.per.1000.live.births..male ## [list output truncated]

Page 4: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 4/47

Page 5: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 5/47

Dati sulle abitudini alimentariLe variabili dalla # 4 alla # 52 nei dati riguardano il consumo di unavarietà di alimenti misurato in kcal / giorno.

Mettiamo queste varibili nell’oggetto (data.frame) food.

Usiamo questo sottoinsieme dei dati analizzare le abitudinialimentari utilizzando una cluster analysis

food=world[,4:52] str(food)

## 'data.frame': 86 obs. of 49 variables: ## $ Energy..kcal.day. : int 2860 2980 3120 3740 2200 2960 3640 2220 ## $ Protein..g.day. : int 96 94 107 111 48 87 92 57 72 83 ... ## $ Fats..g.day. : int 86 100 134 162 25 99 162 58 58 93 ... ## $ Carbohydrates..g.day. : num 426 426 372 460 446 ... ## $ Animal.Products....kcal.day. : int 813 823 1033 1219 65 766 1120 397 344 67## $ Animal.Fats..kcal.day. : int 49 72 124 320 5 140 404 73 24 55 ... ## $ Bovine.Meat..kcal.day. : int 62 342 142 59 5 103 54 113 32 130 ... ## $ Butter..Ghee..kcal.day. : int 11 28 62 102 3 65 119 1 10 9 ... ## $ Cheese..kcal.day. : int 50 90 107 193 0 18 165 6 36 2 ... ## $ Eggs..kcal.day. : int 22 24 23 49 3 47 45 11 13 24 ... ## $ Fats..Animals..Raw..kcal.day. : int 38 42 61 200 2 71 246 72 12 46 ... ## $ Fish..Seafood..kcal.day. : int 8 10 30 21 19 24 0 4 7 10 ... ## $ Freshwater.Fish..kcal.day. : int 1 1 3 6 17 2 0 1 1 4 ... ## $ Honey..kcal.day. : int 2 1 8 12 0 2 4 0 2 0 ... ## $ Meat..kcal.day. : int 196 475 492 488 12 295 286 247 84 378 ..## $ Milk...Excluding.Butter..kcal.day. : int 524 222 331 336 22 242 376 47 206 198 ..## $ Milk..Whole..kcal.day. : int 465 127 172 131 21 168 117 36 162 192 ..## $ Mutton...Goat.Meat..kcal.day. : int 33 8 97 6 3 1 13 14 5 3 ... ## $ Offals..Edible..kcal.day. : int 13 17 30 2 1 15 7 12 7 7 ... ## $ Pelagic.Fish..kcal.day. : int 5 1 7 9 0 10 0 2 4 1 ... ## $ Pigmeat..kcal.day. : int 52 34 107 358 0 154 137 49 29 102 ... ## $ Poultry.Meat..kcal.day. : int 48 83 143 60 3 34 74 66 17 141 ... ## $ Vegetal.Products....kcal.day. : int 2059 2135 2100 2512 2127 2118 2513 1822 ## $ Alcoholic.Beverages..kcal.day. : int 35 102 152 278 0 108 204 39 165 73 ... ## $ Apples..kcal.day. : int 17 26 18 67 0 31 21 3 15 4 ... ## $ Bananas..kcal.day. : int 11 19 20 15 6 3 1 88 16 53 ... ## $ Beans..kcal.day. : int 28 1 1 3 2 0 6 2 31 162 ... ## $ Cereals...Excluding.Beer..kcal.day.: int 1299 1050 719 891 1802 1017 783 830 1278## $ Coconut.Oil..kcal.day. : int 0 1 16 11 5 7 78 0 0 0 ... ## $ Coffee..kcal.day. : int 1 1 3 7 0 0 8 3 3 2 ... ## $ Fruits...Excluding.Wine..kcal.day. : int 129 86 121 168 13 54 72 166 75 112 ... ## $ Nuts..kcal.day. : int 8 2 31 39 2 12 47 28 12 2 ... ## $ Olive.Oil..kcal.day. : int 17 2 34 13 0 0 29 0 1 2 ... ## $ Palm.Oil..kcal.day. : int 0 0 112 15 52 0 61 0 0 20 ... ## $ Potatoes..kcal.day. : int 57 80 85 113 36 317 151 102 134 28 ... ## $ Pulses..kcal.day. : int 30 9 9 6 40 0 19 24 46 165 ... ## $ Rice..Milled.Equivalent...kcal.day.: int 68 41 91 51 1598 52 43 207 12 371 ... ## $ Rice..Paddy.Equivalent...kcal.day. : int 68 41 91 51 1598 52 43 207 12 371 ... ## $ Roots...Tuber.Dry.Equiv..kcal.day. : int 57 100 87 113 42 317 151 168 134 129 ...## $ Soyabean.Oil..kcal.day. : int 2 43 17 89 48 20 112 11 9 251 ...

Page 6: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 6/47

## $ Starchy.Roots..kcal.day. : int 57 100 87 113 42 317 151 168 134 129 ...## $ Sugar...Sweeteners..kcal.day. : int 193 406 423 437 59 288 522 282 251 550 .## $ Sugar..Raw.Equivalent...kcal.day. : int 187 337 407 404 29 279 488 280 237 529 .## $ Sugar..Raw.Equivalent..kcal.day. : int 191 405 415 424 59 285 517 282 249 549 .## $ Sugar..Refined.Equiv..kcal.day. : int 187 337 407 404 29 279 488 280 237 529 .## $ Vegetable.Oils..kcal.day. : int 174 311 435 442 131 237 545 207 153 321 ## $ Vegetables..kcal.day. : int 94 51 67 61 10 62 124 46 107 29 ... ## $ Wheat..kcal.day. : int 1166 914 559 617 180 608 718 387 529 388## $ Wine..kcal.day. : int 6 59 39 55 0 6 54 0 3 2 ...

Page 7: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 7/47

1 Controllare le variabiliSembra vi siano duplicazioni nei dati:

1. Meat, Bovine.Meat, Pigmeat …. C’è sovrapposizione diinformazioni?

2. Energy, Protein, Fats …. Sono sintesi di altre variabili?(ancora, sovrapposizione)

3. Rice.paddy equivalent e Rice.milled equivalent ….sembrano la stessa variabile (osservandone i valori)

4. …

Page 8: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 8/47

I dati food ripulitiEliminiamo da food le variabli ridondanti

choose=c(7:14,16:36,38:42,46:49) food=food[,choose] str(food)

## 'data.frame': 86 obs. of 38 variables: ## $ Bovine.Meat..kcal.day. : int 62 342 142 59 5 103 54 113 32 130 ... ## $ Butter..Ghee..kcal.day. : int 11 28 62 102 3 65 119 1 10 9 ... ## $ Cheese..kcal.day. : int 50 90 107 193 0 18 165 6 36 2 ... ## $ Eggs..kcal.day. : int 22 24 23 49 3 47 45 11 13 24 ... ## $ Fats..Animals..Raw..kcal.day. : int 38 42 61 200 2 71 246 72 12 46 ... ## $ Fish..Seafood..kcal.day. : int 8 10 30 21 19 24 0 4 7 10 ... ## $ Freshwater.Fish..kcal.day. : int 1 1 3 6 17 2 0 1 1 4 ... ## $ Honey..kcal.day. : int 2 1 8 12 0 2 4 0 2 0 ... ## $ Milk...Excluding.Butter..kcal.day. : int 524 222 331 336 22 242 376 47 206 198 ..## $ Milk..Whole..kcal.day. : int 465 127 172 131 21 168 117 36 162 192 ..## $ Mutton...Goat.Meat..kcal.day. : int 33 8 97 6 3 1 13 14 5 3 ... ## $ Offals..Edible..kcal.day. : int 13 17 30 2 1 15 7 12 7 7 ... ## $ Pelagic.Fish..kcal.day. : int 5 1 7 9 0 10 0 2 4 1 ... ## $ Pigmeat..kcal.day. : int 52 34 107 358 0 154 137 49 29 102 ... ## $ Poultry.Meat..kcal.day. : int 48 83 143 60 3 34 74 66 17 141 ... ## $ Vegetal.Products....kcal.day. : int 2059 2135 2100 2512 2127 2118 2513 1822 ## $ Alcoholic.Beverages..kcal.day. : int 35 102 152 278 0 108 204 39 165 73 ... ## $ Apples..kcal.day. : int 17 26 18 67 0 31 21 3 15 4 ... ## $ Bananas..kcal.day. : int 11 19 20 15 6 3 1 88 16 53 ... ## $ Beans..kcal.day. : int 28 1 1 3 2 0 6 2 31 162 ... ## $ Cereals...Excluding.Beer..kcal.day.: int 1299 1050 719 891 1802 1017 783 830 1278## $ Coconut.Oil..kcal.day. : int 0 1 16 11 5 7 78 0 0 0 ... ## $ Coffee..kcal.day. : int 1 1 3 7 0 0 8 3 3 2 ... ## $ Fruits...Excluding.Wine..kcal.day. : int 129 86 121 168 13 54 72 166 75 112 ... ## $ Nuts..kcal.day. : int 8 2 31 39 2 12 47 28 12 2 ... ## $ Olive.Oil..kcal.day. : int 17 2 34 13 0 0 29 0 1 2 ... ## $ Palm.Oil..kcal.day. : int 0 0 112 15 52 0 61 0 0 20 ... ## $ Potatoes..kcal.day. : int 57 80 85 113 36 317 151 102 134 28 ... ## $ Pulses..kcal.day. : int 30 9 9 6 40 0 19 24 46 165 ... ## $ Rice..Paddy.Equivalent...kcal.day. : int 68 41 91 51 1598 52 43 207 12 371 ... ## $ Roots...Tuber.Dry.Equiv..kcal.day. : int 57 100 87 113 42 317 151 168 134 129 ...## $ Soyabean.Oil..kcal.day. : int 2 43 17 89 48 20 112 11 9 251 ... ## $ Starchy.Roots..kcal.day. : int 57 100 87 113 42 317 151 168 134 129 ...## $ Sugar...Sweeteners..kcal.day. : int 193 406 423 437 59 288 522 282 251 550 .## $ Vegetable.Oils..kcal.day. : int 174 311 435 442 131 237 545 207 153 321 ## $ Vegetables..kcal.day. : int 94 51 67 61 10 62 124 46 107 29 ... ## $ Wheat..kcal.day. : int 1166 914 559 617 180 608 718 387 529 388## $ Wine..kcal.day. : int 6 59 39 55 0 6 54 0 3 2 ...

Page 9: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 9/47

2 Clustering con k-meansProviamo a valutare rapidamente diversi valori di tracciando ivalori di k contro il rapporto SSW / SST

st.food=scale(food) # k- means clustering loop ratiowss=vector() for (i in 2:30){ km=kmeans(st.food,i,nstart=50) ratiowss[i]=km$tot.withinss/km$totss} dt=data.frame("K"=2:30,"ratiowss"=ratiowss[2:30]) ggplot(dt,aes(x=K,y=ratiowss))+geom_line(size=3)

Non c’è chiara indicazione, ci sono forse molti gruppi.

k

Page 10: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 10/47

K=7 clusters?Proviamo ad analizzare in dettaglio il caso k=7 per avere piùinformazioni

set.seed(1) km=kmeans(st.food,7,nstart=50) km

## K-means clustering with 7 clusters of sizes 25, 20, 8, 10, 12, 9, 2 ## ## Cluster means: ## Bovine.Meat..kcal.day. Butter..Ghee..kcal.day. Cheese..kcal.day. ## 1 0.32989097 0.97331797 1.3370428 ## 2 -0.04616923 -0.37423100 -0.6170505 ## 3 -0.34668805 0.05005711 -0.3388631 ## 4 -0.80878720 -0.71539835 -0.7111165 ## 5 0.94149481 -0.24355357 -0.3200256 ## 6 -0.77951292 -0.77908292 -0.7490346 ## 7 -0.37241740 -0.08020678 -0.3406861 ## Eggs..kcal.day. Fats..Animals..Raw..kcal.day. Fish..Seafood..kcal.day. ## 1 0.86876834 0.9721236 0.4105214 ## 2 -0.51299141 -0.2773722 -0.5053352 ## 3 -0.04621414 -0.6696613 -0.3617156 ## 4 0.07140877 -0.5979779 0.9067984 ## 5 0.05902741 -0.1011835 -0.6136138 ## 6 -1.21350123 -0.7479965 -0.4256301 ## 7 -0.79528641 0.2637976 2.4317222 ## Freshwater.Fish..kcal.day. Honey..kcal.day. ## 1 0.1471113 0.78747214 ## 2 -0.3871934 -0.51401317 ## 3 -0.2142599 0.02472802 ## 4 1.3779202 -0.66996457 ## 5 -0.6813789 -0.23519098 ## 6 0.2329817 -0.37696498 ## 7 -0.9596626 1.65512902 ## Milk...Excluding.Butter..kcal.day. Milk..Whole..kcal.day. ## 1 1.0535298 0.32838020 ## 2 -0.4229140 -0.08508669 ## 3 -0.3021864 -0.14812903 ## 4 -1.0287812 -0.92110764 ## 5 0.5812926 1.22441443 ## 6 -1.2292098 -1.10512312 ## 7 -0.5436417 -0.42926379 ## Mutton...Goat.Meat..kcal.day. Offals..Edible..kcal.day. ## 1 0.1696441 0.1493913 ## 2 -0.2607088 -0.1720812 ## 3 0.2005357 -0.1580227 ## 4 -0.3949278 -0.4204492 ## 5 0.3691971 0.3730788 ## 6 -0.3989520 -0.6891241 ## 7 1.2391349 3.4503428 ## Pelagic.Fish..kcal.day. Pigmeat..kcal.day. Poultry.Meat..kcal.day. ## 1 0.1889683 1.1434802 0.47555592 ## 2 -0.5080530 -0.5858025 -0.07232933 ## 3 -0.3086798 -0.8932905 0.63235391 ## 4 0.9676226 -0.1354736 -0.49943904

Page 11: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 11/47

## 5 -0.4787488 -0.1880691 -0.36535016 ## 6 -0.2798989 -0.8142066 -1.06658298 ## 7 3.2470704 0.6073976 1.73834829 ## Vegetal.Products....kcal.day. Alcoholic.Beverages..kcal.day. ## 1 0.3033270 1.1694372 ## 2 -0.2773750 -0.4859975 ## 3 1.5493774 -1.0598159 ## 4 -0.1677874 -0.6173812 ## 5 -0.4022675 0.0728212 ## 6 -0.7528943 -0.6614677 ## 7 -0.5747812 0.1078569 ## Apples..kcal.day. Bananas..kcal.day. Beans..kcal.day. ## 1 0.9366070 -0.11602552 -0.3689216 ## 2 -0.7080348 0.29999252 0.9464497 ## 3 0.3971556 -0.38524806 -0.3513856 ## 4 -0.7191422 -0.04896357 -0.3765610 ## 5 0.3508746 -0.45510426 -0.3354701 ## 6 -0.9264811 -0.24824327 0.1709316 ## 7 -0.5562331 4.08392411 -0.3210015 ## Cereals...Excluding.Beer..kcal.day. Coconut.Oil..kcal.day. ## 1 -0.71526346 -0.006741647 ## 2 -0.03225038 -0.063084068 ## 3 1.47838912 -0.385689864 ## 4 0.76943553 0.441271471 ## 5 0.29245338 -0.468991830 ## 6 -0.24036759 -0.312485106 ## 7 -1.17050324 4.271647325 ## Coffee..kcal.day. Fruits...Excluding.Wine..kcal.day. Nuts..kcal.day. ## 1 1.0535729 0.257912843 0.8367175 ## 2 -0.3863959 0.003010899 -0.6365760 ## 3 -0.5321418 0.944180840 0.9379361 ## 4 -0.6195893 -0.788933794 -0.6365760 ## 5 -0.4835598 -0.683285121 -0.3377103 ## 6 -0.6293057 -0.285818187 -0.4748206 ## 7 1.6540458 2.299818650 -0.4991186 ## Olive.Oil..kcal.day. Palm.Oil..kcal.day. Potatoes..kcal.day. ## 1 0.6164991 -0.3428790 0.7463559 ## 2 -0.3174581 0.2483901 -0.4593402 ## 3 0.1664928 0.1818258 -0.2292374 ## 4 -0.3278656 0.3309446 -0.8565629 ## 5 -0.2971767 -0.6505372 0.8151124 ## 6 -0.3287552 0.8299079 -0.7908761 ## 7 -0.2958424 -0.4113025 -0.8680144 ## Pulses..kcal.day. Rice..Paddy.Equivalent...kcal.day. ## 1 -0.5909183 -0.6156692 ## 2 0.9182791 0.1290478 ## 3 0.5015319 -0.1319391 ## 4 -0.4611223 1.9996180 ## 5 -0.7911267 -0.6622619 ## 6 0.7783029 0.2516673 ## 7 -0.2524306 -0.2238785 ## Roots...Tuber.Dry.Equiv..kcal.day. Soyabean.Oil..kcal.day. ## 1 -0.1431194 0.1467725 ## 2 -0.2107852 0.4359844 ## 3 -0.5908237 0.5243360 ## 4 -0.4999269 -0.0941256 ## 5 -0.1047665 -0.5604882 ## 6 1.9954806 -0.7791927 ## 7 0.4087093 -0.9519196 ## Starchy.Roots..kcal.day. Sugar...Sweeteners..kcal.day. ## 1 -0.1431194 0.8414865 ## 2 -0.2107852 0.2490932 ## 3 -0.5908237 -0.1025423 ## 4 -0.4999269 -0.6526871 ## 5 -0.1047665 -0.2690550

Page 12: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 12/47

## 6 1.9954806 -1.6746178 ## 7 0.4087093 -0.1857987 ## Vegetable.Oils..kcal.day. Vegetables..kcal.day. Wheat..kcal.day. ## 1 0.7143137 0.6079541 0.3275307 ## 2 -0.2000841 -0.7375136 -0.5415179 ## 3 0.3354790 1.2793650 1.5343514 ## 4 -0.2970543 -0.3366515 -0.9221915 ## 5 -0.4046467 0.2992595 0.9555595 ## 6 -0.7252155 -0.9854983 -1.2817166 ## 7 -1.0933747 -1.0193077 -0.1710360 ## Wine..kcal.day. ## 1 1.10091380 ## 2 -0.54867585 ## 3 -0.59679222 ## 4 -0.59679222 ## 5 0.02471093 ## 6 -0.61238549 ## 7 -0.29606489 ## ## Clustering vector: ## [1] 5 5 1 1 4 5 1 2 5 2 5 6 1 5 4 2 2 1 1 1 2 2 3 1 6 2 1 1 4 5 1 6 1 2 2 ## [36] 1 1 2 4 3 3 1 2 4 2 2 6 6 6 4 1 2 2 5 3 1 1 6 2 2 2 4 1 1 5 5 7 7 3 4 ## [71] 6 2 1 4 1 1 6 4 2 3 3 5 3 1 1 5 ## ## Within cluster sum of squares by cluster: ## [1] 605.59446 367.94875 135.06515 174.00272 232.60861 163.25349 68.95847 ## (between_SS / total_SS = 45.9 %) ## ## Available components: ## ## [1] "cluster" "centers" "totss" "withinss" ## [5] "tot.withinss" "betweenss" "size" "iter" ## [9] "ifault"

Page 13: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 13/47

CommentiIn cluster means troviamo le coordinate dei centrodi per gruppo.

I centroidi sono in uno spazio a 38 dimensioni.

I valori sono standardizzati e quindi le coordinate dei centroidi sonoespresse in deviazioni standard.

Ad esempio il primo gruppo ha un consumo sopra le media diCarne Bovina (0.32 DS), Burro (0.97 DS), Formaggio (1.33 DS)rispetto a tutti gli altri gruppi

I valori da analizzare sono troppi, vale la pena effettuare un’analisiesplorativa dei risultati

Page 14: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 14/47

3 EDA dei risultati ( )

Per poter effettuare una EDA mettiamo i valori dei centroidi in undata.frame, che chiameremo dt.

dt=data.frame(km$centers) dt

## Bovine.Meat..kcal.day. Butter..Ghee..kcal.day. Cheese..kcal.day. ## 1 0.32989097 0.97331797 1.3370428 ## 2 -0.04616923 -0.37423100 -0.6170505 ## 3 -0.34668805 0.05005711 -0.3388631 ## 4 -0.80878720 -0.71539835 -0.7111165 ## 5 0.94149481 -0.24355357 -0.3200256 ## 6 -0.77951292 -0.77908292 -0.7490346 ## 7 -0.37241740 -0.08020678 -0.3406861 ## Eggs..kcal.day. Fats..Animals..Raw..kcal.day. Fish..Seafood..kcal.day. ## 1 0.86876834 0.9721236 0.4105214 ## 2 -0.51299141 -0.2773722 -0.5053352 ## 3 -0.04621414 -0.6696613 -0.3617156 ## 4 0.07140877 -0.5979779 0.9067984 ## 5 0.05902741 -0.1011835 -0.6136138 ## 6 -1.21350123 -0.7479965 -0.4256301 ## 7 -0.79528641 0.2637976 2.4317222 ## Freshwater.Fish..kcal.day. Honey..kcal.day. ## 1 0.1471113 0.78747214 ## 2 -0.3871934 -0.51401317 ## 3 -0.2142599 0.02472802 ## 4 1.3779202 -0.66996457 ## 5 -0.6813789 -0.23519098 ## 6 0.2329817 -0.37696498 ## 7 -0.9596626 1.65512902 ## Milk...Excluding.Butter..kcal.day. Milk..Whole..kcal.day. ## 1 1.0535298 0.32838020 ## 2 -0.4229140 -0.08508669 ## 3 -0.3021864 -0.14812903 ## 4 -1.0287812 -0.92110764 ## 5 0.5812926 1.22441443 ## 6 -1.2292098 -1.10512312 ## 7 -0.5436417 -0.42926379 ## Mutton...Goat.Meat..kcal.day. Offals..Edible..kcal.day. ## 1 0.1696441 0.1493913 ## 2 -0.2607088 -0.1720812 ## 3 0.2005357 -0.1580227 ## 4 -0.3949278 -0.4204492 ## 5 0.3691971 0.3730788 ## 6 -0.3989520 -0.6891241 ## 7 1.2391349 3.4503428 ## Pelagic.Fish..kcal.day. Pigmeat..kcal.day. Poultry.Meat..kcal.day. ## 1 0.1889683 1.1434802 0.47555592 ## 2 -0.5080530 -0.5858025 -0.07232933 ## 3 -0.3086798 -0.8932905 0.63235391 ## 4 0.9676226 -0.1354736 -0.49943904 ## 5 -0.4787488 -0.1880691 -0.36535016 ## 6 -0.2798989 -0.8142066 -1.06658298 ## 7 3.2470704 0.6073976 1.73834829

k = 7

Page 15: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 15/47

## Vegetal.Products....kcal.day. Alcoholic.Beverages..kcal.day. ## 1 0.3033270 1.1694372 ## 2 -0.2773750 -0.4859975 ## 3 1.5493774 -1.0598159 ## 4 -0.1677874 -0.6173812 ## 5 -0.4022675 0.0728212 ## 6 -0.7528943 -0.6614677 ## 7 -0.5747812 0.1078569 ## Apples..kcal.day. Bananas..kcal.day. Beans..kcal.day. ## 1 0.9366070 -0.11602552 -0.3689216 ## 2 -0.7080348 0.29999252 0.9464497 ## 3 0.3971556 -0.38524806 -0.3513856 ## 4 -0.7191422 -0.04896357 -0.3765610 ## 5 0.3508746 -0.45510426 -0.3354701 ## 6 -0.9264811 -0.24824327 0.1709316 ## 7 -0.5562331 4.08392411 -0.3210015 ## Cereals...Excluding.Beer..kcal.day. Coconut.Oil..kcal.day. ## 1 -0.71526346 -0.006741647 ## 2 -0.03225038 -0.063084068 ## 3 1.47838912 -0.385689864 ## 4 0.76943553 0.441271471 ## 5 0.29245338 -0.468991830 ## 6 -0.24036759 -0.312485106 ## 7 -1.17050324 4.271647325 ## Coffee..kcal.day. Fruits...Excluding.Wine..kcal.day. Nuts..kcal.day. ## 1 1.0535729 0.257912843 0.8367175 ## 2 -0.3863959 0.003010899 -0.6365760 ## 3 -0.5321418 0.944180840 0.9379361 ## 4 -0.6195893 -0.788933794 -0.6365760 ## 5 -0.4835598 -0.683285121 -0.3377103 ## 6 -0.6293057 -0.285818187 -0.4748206 ## 7 1.6540458 2.299818650 -0.4991186 ## Olive.Oil..kcal.day. Palm.Oil..kcal.day. Potatoes..kcal.day. ## 1 0.6164991 -0.3428790 0.7463559 ## 2 -0.3174581 0.2483901 -0.4593402 ## 3 0.1664928 0.1818258 -0.2292374 ## 4 -0.3278656 0.3309446 -0.8565629 ## 5 -0.2971767 -0.6505372 0.8151124 ## 6 -0.3287552 0.8299079 -0.7908761 ## 7 -0.2958424 -0.4113025 -0.8680144 ## Pulses..kcal.day. Rice..Paddy.Equivalent...kcal.day. ## 1 -0.5909183 -0.6156692 ## 2 0.9182791 0.1290478 ## 3 0.5015319 -0.1319391 ## 4 -0.4611223 1.9996180 ## 5 -0.7911267 -0.6622619 ## 6 0.7783029 0.2516673 ## 7 -0.2524306 -0.2238785 ## Roots...Tuber.Dry.Equiv..kcal.day. Soyabean.Oil..kcal.day. ## 1 -0.1431194 0.1467725 ## 2 -0.2107852 0.4359844 ## 3 -0.5908237 0.5243360 ## 4 -0.4999269 -0.0941256 ## 5 -0.1047665 -0.5604882 ## 6 1.9954806 -0.7791927 ## 7 0.4087093 -0.9519196 ## Starchy.Roots..kcal.day. Sugar...Sweeteners..kcal.day. ## 1 -0.1431194 0.8414865 ## 2 -0.2107852 0.2490932 ## 3 -0.5908237 -0.1025423 ## 4 -0.4999269 -0.6526871 ## 5 -0.1047665 -0.2690550 ## 6 1.9954806 -1.6746178 ## 7 0.4087093 -0.1857987 ## Vegetable.Oils..kcal.day. Vegetables..kcal.day. Wheat..kcal.day.

Page 16: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 16/47

## 1 0.7143137 0.6079541 0.3275307 ## 2 -0.2000841 -0.7375136 -0.5415179 ## 3 0.3354790 1.2793650 1.5343514 ## 4 -0.2970543 -0.3366515 -0.9221915 ## 5 -0.4046467 0.2992595 0.9555595 ## 6 -0.7252155 -0.9854983 -1.2817166 ## 7 -1.0933747 -1.0193077 -0.1710360 ## Wine..kcal.day. ## 1 1.10091380 ## 2 -0.54867585 ## 3 -0.59679222 ## 4 -0.59679222 ## 5 0.02471093 ## 6 -0.61238549 ## 7 -0.29606489

Page 17: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 17/47

EDA - continuaPer avere una visione migliore, si transponga il data.frame appenacreato.

tdt=t(dt) tdt

## 1 2 3 ## Bovine.Meat..kcal.day. 0.329890974 -0.046169227 -0.34668805 ## Butter..Ghee..kcal.day. 0.973317975 -0.374231004 0.05005711 ## Cheese..kcal.day. 1.337042802 -0.617050500 -0.33886310 ## Eggs..kcal.day. 0.868768338 -0.512991406 -0.04621414 ## Fats..Animals..Raw..kcal.day. 0.972123552 -0.277372209 -0.66966126 ## Fish..Seafood..kcal.day. 0.410521440 -0.505335175 -0.36171562 ## Freshwater.Fish..kcal.day. 0.147111285 -0.387193352 -0.21425993 ## Honey..kcal.day. 0.787472140 -0.514013175 0.02472802 ## Milk...Excluding.Butter..kcal.day. 1.053529776 -0.422914031 -0.30218635 ## Milk..Whole..kcal.day. 0.328380197 -0.085086688 -0.14812903 ## Mutton...Goat.Meat..kcal.day. 0.169644062 -0.260708836 0.20053573 ## Offals..Edible..kcal.day. 0.149391344 -0.172081217 -0.15802265 ## Pelagic.Fish..kcal.day. 0.188968255 -0.508052983 -0.30867979 ## Pigmeat..kcal.day. 1.143480184 -0.585802544 -0.89329048 ## Poultry.Meat..kcal.day. 0.475555919 -0.072329333 0.63235391 ## Vegetal.Products....kcal.day. 0.303327021 -0.277374993 1.54937743 ## Alcoholic.Beverages..kcal.day. 1.169437229 -0.485997506 -1.05981586 ## Apples..kcal.day. 0.936606994 -0.708034798 0.39715560 ## Bananas..kcal.day. -0.116025517 0.299992516 -0.38524806 ## Beans..kcal.day. -0.368921587 0.946449740 -0.35138562 ## Cereals...Excluding.Beer..kcal.day. -0.715263459 -0.032250379 1.47838912 ## Coconut.Oil..kcal.day. -0.006741647 -0.063084068 -0.38568986 ## Coffee..kcal.day. 1.053572931 -0.386395940 -0.53214178 ## Fruits...Excluding.Wine..kcal.day. 0.257912843 0.003010899 0.94418084 ## Nuts..kcal.day. 0.836717456 -0.636575989 0.93793609 ## Olive.Oil..kcal.day. 0.616499079 -0.317458076 0.16649281 ## Palm.Oil..kcal.day. -0.342878963 0.248390072 0.18182582 ## Potatoes..kcal.day. 0.746355907 -0.459340243 -0.22923736 ## Pulses..kcal.day. -0.590918349 0.918279065 0.50153189 ## Rice..Paddy.Equivalent...kcal.day. -0.615669171 0.129047804 -0.13193911 ## Roots...Tuber.Dry.Equiv..kcal.day. -0.143119350 -0.210785158 -0.59082372 ## Soyabean.Oil..kcal.day. 0.146772472 0.435984362 0.52433602 ## Starchy.Roots..kcal.day. -0.143119350 -0.210785158 -0.59082372 ## Sugar...Sweeteners..kcal.day. 0.841486503 0.249093223 -0.10254234 ## Vegetable.Oils..kcal.day. 0.714313713 -0.200084147 0.33547902 ## Vegetables..kcal.day. 0.607954119 -0.737513573 1.27936499 ## Wheat..kcal.day. 0.327530728 -0.541517897 1.53435142 ## Wine..kcal.day. 1.100913805 -0.548675851 -0.59679222 ## 4 5 6 ## Bovine.Meat..kcal.day. -0.80878720 0.94149481 -0.7795129 ## Butter..Ghee..kcal.day. -0.71539835 -0.24355357 -0.7790829 ## Cheese..kcal.day. -0.71111649 -0.32002560 -0.7490346 ## Eggs..kcal.day. 0.07140877 0.05902741 -1.2135012 ## Fats..Animals..Raw..kcal.day. -0.59797792 -0.10118349 -0.7479965 ## Fish..Seafood..kcal.day. 0.90679842 -0.61361379 -0.4256301 ## Freshwater.Fish..kcal.day. 1.37792018 -0.68137894 0.2329817 ## Honey..kcal.day. -0.66996457 -0.23519098 -0.3769650 ## Milk...Excluding.Butter..kcal.day. -1.02878125 0.58129262 -1.2292098

Page 18: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 18/47

## Milk..Whole..kcal.day. -0.92110764 1.22441443 -1.1051231 ## Mutton...Goat.Meat..kcal.day. -0.39492781 0.36919714 -0.3989520 ## Offals..Edible..kcal.day. -0.42044923 0.37307876 -0.6891241 ## Pelagic.Fish..kcal.day. 0.96762261 -0.47874879 -0.2798989 ## Pigmeat..kcal.day. -0.13547360 -0.18806915 -0.8142066 ## Poultry.Meat..kcal.day. -0.49943904 -0.36535016 -1.0665830 ## Vegetal.Products....kcal.day. -0.16778745 -0.40226747 -0.7528943 ## Alcoholic.Beverages..kcal.day. -0.61738122 0.07282120 -0.6614677 ## Apples..kcal.day. -0.71914224 0.35087460 -0.9264811 ## Bananas..kcal.day. -0.04896357 -0.45510426 -0.2482433 ## Beans..kcal.day. -0.37656102 -0.33547013 0.1709316 ## Cereals...Excluding.Beer..kcal.day. 0.76943553 0.29245338 -0.2403676 ## Coconut.Oil..kcal.day. 0.44127147 -0.46899183 -0.3124851 ## Coffee..kcal.day. -0.61958928 -0.48355983 -0.6293057 ## Fruits...Excluding.Wine..kcal.day. -0.78893379 -0.68328512 -0.2858182 ## Nuts..kcal.day. -0.63657599 -0.33771027 -0.4748206 ## Olive.Oil..kcal.day. -0.32786562 -0.29717670 -0.3287552 ## Palm.Oil..kcal.day. 0.33094462 -0.65053719 0.8299079 ## Potatoes..kcal.day. -0.85656294 0.81511241 -0.7908761 ## Pulses..kcal.day. -0.46112228 -0.79112666 0.7783029 ## Rice..Paddy.Equivalent...kcal.day. 1.99961800 -0.66226191 0.2516673 ## Roots...Tuber.Dry.Equiv..kcal.day. -0.49992687 -0.10476653 1.9954806 ## Soyabean.Oil..kcal.day. -0.09412560 -0.56048816 -0.7791927 ## Starchy.Roots..kcal.day. -0.49992687 -0.10476653 1.9954806 ## Sugar...Sweeteners..kcal.day. -0.65268711 -0.26905500 -1.6746178 ## Vegetable.Oils..kcal.day. -0.29705432 -0.40464666 -0.7252155 ## Vegetables..kcal.day. -0.33665152 0.29925947 -0.9854983 ## Wheat..kcal.day. -0.92219149 0.95555953 -1.2817166 ## Wine..kcal.day. -0.59679222 0.02471093 -0.6123855 ## 7 ## Bovine.Meat..kcal.day. -0.37241740 ## Butter..Ghee..kcal.day. -0.08020678 ## Cheese..kcal.day. -0.34068608 ## Eggs..kcal.day. -0.79528641 ## Fats..Animals..Raw..kcal.day. 0.26379762 ## Fish..Seafood..kcal.day. 2.43172224 ## Freshwater.Fish..kcal.day. -0.95966261 ## Honey..kcal.day. 1.65512902 ## Milk...Excluding.Butter..kcal.day. -0.54364171 ## Milk..Whole..kcal.day. -0.42926379 ## Mutton...Goat.Meat..kcal.day. 1.23913493 ## Offals..Edible..kcal.day. 3.45034283 ## Pelagic.Fish..kcal.day. 3.24707045 ## Pigmeat..kcal.day. 0.60739764 ## Poultry.Meat..kcal.day. 1.73834829 ## Vegetal.Products....kcal.day. -0.57478120 ## Alcoholic.Beverages..kcal.day. 0.10785686 ## Apples..kcal.day. -0.55623310 ## Bananas..kcal.day. 4.08392411 ## Beans..kcal.day. -0.32100151 ## Cereals...Excluding.Beer..kcal.day. -1.17050324 ## Coconut.Oil..kcal.day. 4.27164732 ## Coffee..kcal.day. 1.65404578 ## Fruits...Excluding.Wine..kcal.day. 2.29981865 ## Nuts..kcal.day. -0.49911859 ## Olive.Oil..kcal.day. -0.29584240 ## Palm.Oil..kcal.day. -0.41130255 ## Potatoes..kcal.day. -0.86801441 ## Pulses..kcal.day. -0.25243056 ## Rice..Paddy.Equivalent...kcal.day. -0.22387855 ## Roots...Tuber.Dry.Equiv..kcal.day. 0.40870927 ## Soyabean.Oil..kcal.day. -0.95191956 ## Starchy.Roots..kcal.day. 0.40870927 ## Sugar...Sweeteners..kcal.day. -0.18579867 ## Vegetable.Oils..kcal.day. -1.09337472

Page 19: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 19/47

## Vegetables..kcal.day. -1.01930769 ## Wheat..kcal.day. -0.17103600 ## Wine..kcal.day. -0.29606489

Page 20: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 20/47

EDA - continuaNotiamo che il Gruppo 1 ha un consumo decisamente sopra lemedia per molti alimenti.

L’analisi dei valori nel data.frame tdt non è agevole e valeutilizzare strumenti grafici.

Può essere inoltre interessante individuare, per ciascun alimento,qual è il gruppo che ne consuma di più. A tal fine si utilizzi il codicesotto

maxims=vector() for (i in 1:nrow(tdt)){maxims[i]=which.max(tdt[i,])} # per ogni riga di tdt, il codice sopra indica il numero della colonna # (cioè il gruppo) con il valore più alto

Page 21: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 21/47

Mettiamo tutte le variabili create in data.frame per analizzarlecon gli strumenti grafici

tdt=as.data.frame(tdt,keep.rownames = TRUE) tdt$Aliments<-row.names(tdt) colnames(tdt)<-c("G1","G2","G3","G4","G5","G6","G7","Aliments") world=as.data.frame(world) world$Cluster=km$cluster

Nota che la colonna che contiene i nomi degli alimenti neldata.frame tdt è la numero 8

Nota che la colonna che contiene i nomi dei Paesi nel data.frameworld è la numero 1

Page 22: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 22/47

Analisi del gruppo 1Alimenti con consumo più elevato rispetto agli altri gruppi

tdt[maxims==1,8]

## [1] "Butter..Ghee..kcal.day." ## [2] "Cheese..kcal.day." ## [3] "Eggs..kcal.day." ## [4] "Fats..Animals..Raw..kcal.day." ## [5] "Milk...Excluding.Butter..kcal.day." ## [6] "Pigmeat..kcal.day." ## [7] "Alcoholic.Beverages..kcal.day." ## [8] "Apples..kcal.day." ## [9] "Olive.Oil..kcal.day." ## [10] "Sugar...Sweeteners..kcal.day." ## [11] "Vegetable.Oils..kcal.day." ## [12] "Wine..kcal.day."

Nazioni del gruppo

world[km$cluster==1,1]

## [1] Australia Austria ## [3] Belgium Canada ## [5] Cyprus Czech Republic ## [7] Denmark Estonia ## [9] Finland France ## [11] Germany Greece ## [13] Hungary Iceland ## [15] Italy Malta ## [17] Netherlands New Zealand ## [19] Poland Portugal ## [21] Spain Sweden ## [23] Switzerland United Kingdom ## [25] United States of America ## 86 Levels: Albania Argentina Australia Austria Bangladesh ... Uzbekistan

Page 23: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 23/47

Plot degli alimenti a più alto consumo

## ordina gli alimenti in base ai valori del centroide nel gruppo e plotta tdt$Aliments <- with(tdt, factor(tdt$Aliments, levels=tdt[order(-tdt$G1), ]$Aliments)) ggplot(tdt[tdt$G1>0,],aes(Aliments,G1,fill=Aliments))+geom_bar(stat="identity")+ggtitle("Cluster 1: Alimenti

Page 24: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 24/47

Analisi del gruppo 2Alimenti con consumo più elevato rispetto agli altri gruppi

tdt[maxims==2,8]

## [1] Beans..kcal.day. Pulses..kcal.day. ## 38 Levels: Cheese..kcal.day. ... Cereals...Excluding.Beer..kcal.day.

Nazioni del gruppo

world[km$cluster==2,1]

## [1] Bolivia Brazil Colombia ## [4] Costa Rica Dominican Republic Ecuador ## [7] Fiji Guatemala Haiti ## [10] India Jamaica Kenya ## [13] Lesotho Mauritius Mexico ## [16] Pakistan Paraguay Peru ## [19] South Africa Trinidad and Tobago ## 86 Levels: Albania Argentina Australia Austria Bangladesh ... Uzbekistan

Page 25: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 25/47

Plot degli alimenti a più alto consumo

## ordina gli alimenti in base ai valori del centroide nel gruppo e plotta tdt$Aliments <- with(tdt, factor(tdt$Aliments, levels=tdt[order(-tdt$G2), ]$Aliments)) ggplot(tdt[tdt$G1>0,],aes(Aliments,G2,fill=Aliments))+geom_bar(stat="identity")+ggtitle("Cluster 2: Alimenti

Page 26: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 26/47

Analisi del gruppo 3Alimenti con consumo più elevato rispetto agli altri gruppi

tdt[maxims==3,8]

## [1] Vegetal.Products....kcal.day. Cereals...Excluding.Beer..kcal.day. ## [3] Nuts..kcal.day. Soyabean.Oil..kcal.day. ## [5] Vegetables..kcal.day. Wheat..kcal.day. ## 38 Levels: Beans..kcal.day. Pulses..kcal.day. ... Vegetables..kcal.day.

Nazioni del gruppo

world[km$cluster==3,1]

## [1] Egypt Iran, Islamic Republic of ## [3] Israel Morocco ## [5] Saudi Arabia Tunisia ## [7] Turkey United Arab Emirates ## 86 Levels: Albania Argentina Australia Austria Bangladesh ... Uzbekistan

Page 27: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 27/47

Plot degli alimenti a più alto consumo

tdt$Aliments <- with(tdt, factor(tdt$Aliments, levels=tdt[order(-tdt$G3), ]$Aliments)) ggplot(tdt[tdt$G1>0,],aes(Aliments,G3,fill=Aliments))+geom_bar(stat="identity")+ggtitle("Cluster 3: Alimenti

Page 28: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 28/47

Analisi del gruppo 4Alimenti con consumo più elevato rispetto agli altri gruppi

tdt[maxims==4,8]

## [1] Freshwater.Fish..kcal.day. Rice..Paddy.Equivalent...kcal.day. ## 38 Levels: Vegetal.Products....kcal.day. ... Alcoholic.Beverages..kcal.day.

Nazioni del gruppo

world[km$cluster==4,1]

## [1] Bangladesh China Gambia Indonesia Japan ## [6] Malaysia Philippines Senegal Sri Lanka Thailand ## 86 Levels: Albania Argentina Australia Austria Bangladesh ... Uzbekistan

Page 29: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 29/47

Plot degli alimenti a più alto consumo

## ordina gli alimenti in base ai valori del centroide nel gruppo e plotta tdt$Aliments <- with(tdt, factor(tdt$Aliments, levels=tdt[order(-tdt$G4), ]$Aliments)) ggplot(tdt[tdt$G4>0,],aes(Aliments,G4,fill=Aliments))+geom_bar(stat="identity")+ggtitle("Cluster 4: Alimenti

Page 30: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 30/47

Analisi del gruppo 5Alimenti con consumo più elevato rispetto agli altri gruppi

tdt[maxims==5,8]

## [1] Bovine.Meat..kcal.day. Milk..Whole..kcal.day. Potatoes..kcal.day. ## 38 Levels: Rice..Paddy.Equivalent...kcal.day. ...

Nazioni del gruppo

world[km$cluster==5,1]

## [1] Albania Argentina Belarus ## [4] Bosnia and Herzegovina Bulgaria Chile ## [7] Georgia Mongolia Romania ## [10] Russian Federation Ukraine Uzbekistan ## 86 Levels: Albania Argentina Australia Austria Bangladesh ... Uzbekistan

Page 31: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 31/47

Plot degli alimenti a più alto consumo

## ordina gli alimenti in base ai valori del centroide nel gruppo e plotta tdt$Aliments <- with(tdt, factor(tdt$Aliments, levels=tdt[order(-tdt$G5), ]$Aliments)) ggplot(tdt[tdt$G5>0,],aes(Aliments,G5,fill=Aliments))+geom_bar(stat="identity")+ggtitle("Cluster 5: Alimenti

Page 32: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 32/47

Analisi del gruppo 6Alimenti con consumo più elevato rispetto agli altri gruppi

tdt[maxims==6,8]

## [1] Palm.Oil..kcal.day. Roots...Tuber.Dry.Equiv..kcal.day. ## [3] Starchy.Roots..kcal.day. ## 38 Levels: Milk..Whole..kcal.day. ... Pulses..kcal.day.

Nazioni del gruppo

world[km$cluster==6,1]

## [1] Cameroon Ethiopia Ghana Liberia Madagascar ## [6] Malawi Nigeria Sierra Leone Tanzania ## 86 Levels: Albania Argentina Australia Austria Bangladesh ... Uzbekistan

Page 33: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 33/47

Plot degli alimenti a più alto consumo

## ordina gli alimenti in base ai valori del centroide nel gruppo e plotta tdt$Aliments <- with(tdt, factor(tdt$Aliments, levels=tdt[order(-tdt$G6), ]$Aliments)) ggplot(tdt[tdt$G6>0,],aes(Aliments,G6,fill=Aliments))+geom_bar(stat="identity")+ggtitle("Cluster 6: Alimenti

Page 34: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 34/47

Analisi del gruppo 7Alimenti con consumo più elevato rispetto agli altri gruppi

tdt[maxims==7,8]

## [1] Fish..Seafood..kcal.day. Honey..kcal.day. ## [3] Mutton...Goat.Meat..kcal.day. Offals..Edible..kcal.day. ## [5] Pelagic.Fish..kcal.day. Poultry.Meat..kcal.day. ## [7] Bananas..kcal.day. Coconut.Oil..kcal.day. ## [9] Coffee..kcal.day. Fruits...Excluding.Wine..kcal.day. ## 38 Levels: Roots...Tuber.Dry.Equiv..kcal.day. ... Sugar...Sweeteners..kcal.day.

Nazioni del gruppo

world[km$cluster==7,1]

## [1] Saint Lucia Samoa ## 86 Levels: Albania Argentina Australia Austria Bangladesh ... Uzbekistan

Page 35: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 35/47

Plot degli alimenti a più alto consumo

## ordina gli alimenti in base ai valori del centroide nel gruppo e plotta tdt$Aliments <- with(tdt, factor(tdt$Aliments, levels=tdt[order(-tdt$G7), ]$Aliments)) ggplot(tdt[tdt$G7>0,],aes(Aliments,G7,fill=Aliments))+geom_bar(stat="identity")+ggtitle("Cluster 7: Alimenti

Page 36: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 36/47

World map

library(rworldmap) jd=joinCountryData2Map(world,joinCode="NAME",nameJoinColumn = "Countries")

FALSE 86 codes from your data successfully matched countries in the map FALSE 0 codes from your data failed to match with a country code in the map FALSE 157 codes from the map weren't represented in your data

mapCountryData(jd, nameColumnToPlot="Cluster", catMethod="categorical",colourPalette = "rainbow",addLegend =

Page 37: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 37/47

CommentiLa mappa individua correttamente, sembra, diverse aree mondiali adieta simile.

Si potrebbe provare a ripetere l’analisi con un numero maggiore digruppi per vedere se aumenta la precisione della rappresentazionedelle diverse diete nel mondo (ad esempio, in questa soluzione, nonsi nota la presenza di una “dieta mediterranea”)

Se questa soluzione è ritenuta soddisfacente, si potrebbe analizzareulteriormente i gruppi attraverso le CP e provare ad individuarecaratteristiche generali delle diete (in relazione ad esempio,all’apporto calorico, i grassi etc.)

Page 38: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 38/47

4 Clustering gerarchico: legamesingolo (min)

Proviamo ad utilizzare un clustering gerarchico, confrontandolo conl’algoritmo k-means, e confrontando tra loro diversi metodi diaggregazione.

rownames(st.food)<-world$Countries distance=dist(st.food) hc=hclust(distance,method="single")

Page 39: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 39/47

Dendrogramma

plot(hc) rect.hclust(hc, k = 8)

Page 40: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 40/47

Ottenere i gruppi

# get cluster IDs groups <- cutree(hc, k = 8) Groups=as.data.table(groups,keep.rownames = TRUE) setnames(Groups,"rn","Counties") Groups[groups==1,]

## Counties groups ## 1: Albania 1 ## 2: Australia 1 ## 3: Austria 1 ## 4: Bangladesh 1 ## 5: Belarus 1 ## 6: Belgium 1 ## 7: Bolivia 1 ## 8: Bosnia and Herzegovina 1 ## 9: Brazil 1 ## 10: Bulgaria 1 ## 11: Cameroon 1 ## 12: Canada 1 ## 13: Chile 1 ## 14: Colombia 1 ## 15: Costa Rica 1 ## 16: Cyprus 1 ## 17: Czech Republic 1 ## 18: Denmark 1 ## 19: Dominican Republic 1 ## 20: Ecuador 1 ## 21: Egypt 1 ## 22: Estonia 1 ## 23: Ethiopia 1 ## 24: Fiji 1 ## 25: Finland 1 ## 26: France 1 ## 27: Gambia 1 ## 28: Georgia 1 ## 29: Germany 1 ## 30: Greece 1 ## 31: Guatemala 1 ## 32: Haiti 1 ## 33: Hungary 1 ## 34: Iceland 1 ## 35: India 1 ## 36: Indonesia 1 ## 37: Iran, Islamic Republic of 1 ## 38: Israel 1 ## 39: Italy 1 ## 40: Jamaica 1 ## 41: Japan 1 ## 42: Kenya 1 ## 43: Lesotho 1 ## 44: Liberia 1 ## 45: Madagascar 1 ## 46: Malawi 1 ## 47: Malaysia 1 ## 48: Malta 1 ## 49: Mauritius 1 ## 50: Mexico 1

Page 41: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 41/47

## 51: Morocco 1 ## 52: Netherlands 1 ## 53: Nigeria 1 ## 54: Pakistan 1 ## 55: Paraguay 1 ## 56: Peru 1 ## 57: Philippines 1 ## 58: Poland 1 ## 59: Portugal 1 ## 60: Romania 1 ## 61: Russian Federation 1 ## 62: Saudi Arabia 1 ## 63: Senegal 1 ## 64: Sierra Leone 1 ## 65: South Africa 1 ## 66: Spain 1 ## 67: Sri Lanka 1 ## 68: Sweden 1 ## 69: Switzerland 1 ## 70: Tanzania 1 ## 71: Thailand 1 ## 72: Trinidad and Tobago 1 ## 73: Tunisia 1 ## 74: Turkey 1 ## 75: Ukraine 1 ## 76: United Arab Emirates 1 ## 77: United Kingdom 1 ## 78: United States of America 1 ## 79: Uzbekistan 1 ## Counties groups

Page 42: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 42/47

Clustering gerarchico: metodo diWard

rownames(st.food)<-world$Countries distance=dist(st.food) hc=hclust(distance,method="ward.D2")

Page 43: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 43/47

Dendrogramma

plot(hc) rect.hclust(hc, k = 8)

Page 44: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 44/47

Ottenere i gruppi

# get cluster IDs groups <- cutree(hc, k = 8) Groups=as.data.table(groups,keep.rownames = TRUE) setnames(Groups,"rn","Country") Groups[groups==1,]

## Country groups ## 1: Albania 1 ## 2: Argentina 1 ## 3: Bolivia 1 ## 4: Bosnia and Herzegovina 1 ## 5: Brazil 1 ## 6: Bulgaria 1 ## 7: Chile 1 ## 8: Colombia 1 ## 9: Costa Rica 1 ## 10: Dominican Republic 1 ## 11: Ecuador 1 ## 12: Fiji 1 ## 13: Georgia 1 ## 14: India 1 ## 15: Jamaica 1 ## 16: Mauritius 1 ## 17: Mexico 1 ## 18: Mongolia 1 ## 19: Pakistan 1 ## 20: Paraguay 1 ## 21: Saudi Arabia 1 ## 22: South Africa 1 ## 23: Trinidad and Tobago 1 ## 24: Uzbekistan 1 ## Country groups

Page 45: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 45/47

Gruppo 2

Groups[groups==2,]

## Country groups ## 1: Australia 2 ## 2: Belarus 2 ## 3: Canada 2 ## 4: Cyprus 2 ## 5: Czech Republic 2 ## 6: Estonia 2 ## 7: Finland 2 ## 8: Iceland 2 ## 9: Israel 2 ## 10: Malta 2 ## 11: New Zealand 2 ## 12: Poland 2 ## 13: Romania 2 ## 14: Russian Federation 2 ## 15: Sweden 2 ## 16: Ukraine 2 ## 17: United Arab Emirates 2 ## 18: United Kingdom 2 ## 19: United States of America 2

Page 46: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 46/47

World map

world=data.frame(world,"Cluster.Ward"=groups) jd=joinCountryData2Map(world,joinCode="NAME",nameJoinColumn = "Countries")

FALSE 86 codes from your data successfully matched countries in the map FALSE 0 codes from your data failed to match with a country code in the map FALSE 157 codes from the map weren't represented in your data

mapCountryData(jd, nameColumnToPlot="Cluster.Ward", catMethod="categorical",colourPalette = "rainbow",addLeg

Page 47: Emanuele Taufer - UniTrentotaufer/Labs/L-10_Cluster_Analysis_Food-DF.pdf · 11/18/2017 Cluster analysis - Dati World (1)

11/18/2017 Cluster analysis - Dati World (1)

file:///C:/Users/emanuele.taufer/Google%20Drive/2%20CORSI/3%20SQG/Labs/L-10_Cluster_Analysis_Food-DF.html#(1) 47/47

CommentiL’utilizzo del legame singolo non sembra appropriato per ilproblema in esame.

Il metodo di Ward fornisce una risposta molto simile a k-means. Adifferenza di k-means, sembra individuare una dieta mediterranea. Sevogliamo utilizzare k-means vale la pena utilizzare più di 7 gruppi.

Il clustering gerarchico ci permette di variare il numero di gruppiindividuati “tagliando” (con linee orizzontali) opportunamente ildendrogramma

L’algortmo k-means, fornendo i centroidi dei gruppi permetteun’analisi più approfondita delle loro caratteristiche