24
The Hierarchical Agglomerative Clustering with Gower index: a methodology for automatic design of OLAP cube in ecological data processing context Lucile Sautot, Bruno Faivre, Ludovic Journaux, Paul Molin To cite this version: Lucile Sautot, Bruno Faivre, Ludovic Journaux, Paul Molin. The Hierarchical Agglomer- ative Clustering with Gower index: a methodology for automatic design of OLAP cube in ecological data processing context. Ecological Informatics, Elsevier, 2014, pp.1-14. <10.1016/j.ecoinf.2014.07.011>. <hal-01060817> HAL Id: hal-01060817 https://hal.archives-ouvertes.fr/hal-01060817 Submitted on 12 Sep 2014 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destin´ ee au d´ epˆ ot et ` a la diffusion de documents scientifiques de niveau recherche, publi´ es ou non, ´ emanant des ´ etablissements d’enseignement et de recherche fran¸cais ou ´ etrangers, des laboratoires publics ou priv´ es.

The Hierarchical Agglomerative Clustering with Gower index ... · OLAP technology interests more and more elds and especially biology. An OLAP cube provides a very easy navigation

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Hierarchical Agglomerative Clustering with Gower index ... · OLAP technology interests more and more elds and especially biology. An OLAP cube provides a very easy navigation

The Hierarchical Agglomerative Clustering with Gower

index: a methodology for automatic design of OLAP

cube in ecological data processing context

Lucile Sautot, Bruno Faivre, Ludovic Journaux, Paul Molin

To cite this version:

Lucile Sautot, Bruno Faivre, Ludovic Journaux, Paul Molin. The Hierarchical Agglomer-ative Clustering with Gower index: a methodology for automatic design of OLAP cubein ecological data processing context. Ecological Informatics, Elsevier, 2014, pp.1-14.<10.1016/j.ecoinf.2014.07.011>. <hal-01060817>

HAL Id: hal-01060817

https://hal.archives-ouvertes.fr/hal-01060817

Submitted on 12 Sep 2014

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinee au depot et a la diffusion de documentsscientifiques de niveau recherche, publies ou non,emanant des etablissements d’enseignement et derecherche francais ou etrangers, des laboratoirespublics ou prives.

Page 2: The Hierarchical Agglomerative Clustering with Gower index ... · OLAP technology interests more and more elds and especially biology. An OLAP cube provides a very easy navigation

❚❤❡ ❍✐❡r❛r❝❤✐❝❛❧ ❆❣❣❧♦♠❡r❛t✐✈❡ ❈❧✉st❡r✐♥❣ ✇✐t❤ ●♦✇❡r ✐♥❞❡①✿ ❛ ♠❡t❤♦❞♦❧♦❣② ❢♦r

❛✉t♦♠❛t✐❝ ❞❡s✐❣♥ ♦❢ ❖▲❆P ❝✉❜❡ ✐♥ ❡❝♦❧♦❣✐❝❛❧ ❞❛t❛ ♣r♦❝❡ss✐♥❣ ❝♦♥t❡①t

▲✉❝✐❧❡ ❙❛✉t♦t❛✱❝✱❞✱∗✱ ❇r✉♥♦ ❋❛✐✈r❡❛✱ ▲✉❞♦✈✐❝ ❏♦✉r♥❛✉①❜✱ P❛✉❧ ▼♦❧✐♥❝

❛❯▼❘ ❈◆❘❙✴✉❇ ✻✷✽✷ ❇✐♦❣é♦s❝✐❡♥❝❡s✱ ❯♥✐✈❡rs✐té ❞❡ ❇♦✉r❣♦❣♥❡✱ ✻ ❜❞ ●❛❜r✐❡❧ ✷✶✵✵✵ ❉✐❥♦♥✱ ❋r❛♥❝❡❜▲❛❜♦r❛t♦✐r❡ ■♥❢♦r♠❛t✐q✉❡✱ ❊❧❡❝tr♦♥✐q✉❡ ❡t ■♠❛❣❡✱ ❯❋❘ ❙❝✐❡♥❝❡s ❡t ❚❡❝❤♥✐q✉❡s✱ ❯♥✐✈❡rs✐té ❞❡ ❇♦✉r❣♦❣♥❡✱ ❛❧❧é❡ ❆❧❛✐♥ ❙❛✈❛r② ✷✶✵✵✵

❉✐❥♦♥✱ ❋r❛♥❝❡❝❉❙■P✱ ❆❣r♦s✉♣ ❉✐❥♦♥✱ ✷✻ ❜❞ P❡t✐t❥❡❛♥ ✷✶✵✵✵ ❉✐❥♦♥✱ ❋r❛♥❝❡❞❆❣r♦P❛r✐s❚❡❝❤✱ ✶✾ ❛✈❡♥✉❡ ❞✉ ▼❛✐♥❡ ✼✺✼✸✷ P❛r✐s✱ ❋r❛♥❝❡

❆❜str❛❝t

❚❤❡ ❖▲❆P s②st❡♠s ❝❛♥ ❜❡ ❛♥ ✐♠♣r♦✈❡♠❡♥t ❢♦r ❡❝♦❧♦❣✐❝❛❧ st✉❞✐❡s✳ ■♥ ❢❛❝t✱ ❡❝♦❧♦❣② st✉❞✐❡s✱ ❢♦❧❧♦✇s ❛♥❞ ❛♥❛❧②③❡s♣❤❡♥♦♠❡♥♦♥ ❛❝r♦ss s♣❛❝❡ ❛♥❞ t✐♠❡ ❛♥❞ ❛❝❝♦r❞✐♥❣ t♦ s❡✈❡r❛❧ ♣❛r❛♠❡t❡rs✳ ❖▲❆P s②st❡♠s ❝❛♥ ♣r♦✈✐❞❡ t♦ ❡❝♦❧♦❣✐sts❜r♦✇s✐♥❣ ✐♥ ❛ ❧❛r❣❡ ❞❛t❛s❡t✳ ❖♥❡ ❢♦❝✉s ♦❢ ❝✉rr❡♥t r❡s❡❛r❝❤ ♦♥ ❖▲❆P s②st❡♠ ✐s t❤❡ ❛✉t♦♠❛t✐❝ ❞❡s✐❣♥ ♦❢ ❖▲❆P❝✉❜❡s ❛♥❞ ♦❢ ❞❛t❛ ✇❛r❡❤♦✉s❡ s❝❤❡♠❛s✳ ❚❤✐s ❦✐♥❞ ♦❢ ✇♦r❦s ♠❛❦❡s ❛❝❝❡ss✐❜❧❡ ❖▲❆P t❡❝❤♥♦❧♦❣② t♦ ♥♦♥ ■♥❢♦r♠❛t✐♦♥❚❡❝❤♥♦❧♦❣② ❡①♣❡rts✳ ❇✉t t♦ ❜❡ ❡✣❝✐❡♥t✱ t❤❡ ❛✉t♦♠❛t✐❝ ❖▲❆P ❜✉✐❧❞✐♥❣ ♠✉st t❛❦❡ ❛❝❝♦✉♥t ✐♥t♦ ✈❛r✐♦✉s ❝❛s❡s✳

▼♦r❡♦✈❡r t❤❡ ❖▲❆P t❡❝❤♥♦❧♦❣② ✐s ❜❛s❡❞ ♦♥ t❤❡ ❝♦♥❝❡♣t ♦❢ ❤✐❡r❛r❝❤②✳ ❚❤❡r❡❜② t❤❡ ❤✐❡r❛r❝❤✐❝❛❧ ❝❧✉st❡r✐♥❣ ♠❡t❤♦❞s❛r❡ ♦❢t❡♥ ✉s❡❞ ❜② ❖▲❆P s②st❡♠ ❞❡s✐❣♥❡r✳

■♥ t❤✐s ❛rt✐❝❧❡✱ ✇❡ ♣r♦♣♦s❡ ✉s✐♥❣ ❤✐❡r❛r❝❤✐❝❛❧ ❛❣❣❧♦♠❡r❛t✐✈❡ ❝❧✉st❡r✐♥❣ ✇✐t❤ ❛ ♠❡tr✐❝ t❤❛t ❝♦♠❡s ❢r♦♠ ❡❝♦❧♦❣✐❝❛❧st✉❞✐❡s ✭t❤❡ ●♦✇❡r s✐♠✐❧❛r✐t② ✐♥❞❡①✮ t♦ ❜✉✐❧❞ ❛✉t♦♠❛t✐❝❛❧❧② ❤✐❡r❛r❝❤✐❝❛❧ ❞✐♠❡♥s✐♦♥s ✐♥ ❛♥ ❖▲❆P ❝✉❜❡✳ ❲✐t❤ t❤✐ss✐♠✐❧❛r✐t② ✐♥❞❡① ✇❡ ❝❛♥ ♣❡r❢♦r♠ ❛ ❤✐❡r❛r❝❤✐❝❛❧ ❝❧✉st❡r✐♥❣ ♦♥ ❤❡t❡r♦❣❡♥❡♦✉s ❞❛t❛s❡ts t❤❛t ❝♦♥t❛✐♥s q✉❛❧✐t❛t✐✈❡ ❛♥❞q✉❛♥t✐t❛t✐✈❡ ✈❛r✐❛❜❧❡s✳

❲❡ ♦✛❡r ❛ ♣r♦t♦t②♣✐❝❛❧ ❛✉t♦♠❛t✐❝ s②st❡♠ ✇❤✐❝❤ ❜✉✐❧❞s ❞✐♠❡♥s✐♦♥ ❢♦r ❛♥ ❖▲❆P ❝✉❜❡ ❛♥❞ ✇❡ ♠❡❛s✉r❡ t❤❡ ♣❡r✲❢♦r♠❛♥❝❡s ♦❢ t❤✐s s②st❡♠ ❛❝❝♦r❞✐♥❣ t♦ t❤❡ ♥✉♠❜❡r ♦❢ ❝❧✉st❡r❡❞ ✐♥❞✐✈✐❞✉❛❧s ❛♥❞ ❛❝❝♦r❞✐♥❣ t♦ t❤❡ ♥✉♠❜❡r ♦❢ ✈❛r✐❛❜❧❡s✉s❡❞ ❢♦r ❝❧✉st❡r✐♥❣✳ ❚❤❛♥❦s t♦ t❤❡s❡ ♠❡❛s✉r❡s ✇❡ ❝❛♥ ♦✛❡r ❛♥ ❛♣♣r♦①✐♠❛t✐♦♥ ♦❢ ♣❡r❢♦r♠❛♥❝❡s ✇✐t❤ ❛ ❧❛r❣❡ ❞❛t❛s❡t✳

❚❤❡r❡❜② t❤❡ ●♦✇❡r ✐♥❞❡① ✐♥ ❛ ❤✐❡r❛r❝❤✐❝❛❧ ❛❣❣❧♦♠❡r❛t✐✈❡ ❝❧✉st❡r✐♥❣ ♣❡r♠✐ts t❤❡ ♠❛♥❛❣❡♠❡♥t ♦❢ ❤❡t❡r♦❣❡♥❡♦✉s❞❛t❛s❡t ✇✐t❤ ♠✐ss✐♥❣ ✈❛❧✉❡s ✐♥ ❛ ❝♦♥t❡①t ♦❢ ❛✉t♦♠❛t✐❝ ❜✉✐❧❞✐♥❣ ♦❢ ❖▲❆P ❝✉❜❡✳ ❲✐t❤ t❤✐s ♠❡t❤♦❞♦❧♦❣②✱ ✇❡ ❝❛♥ ❜✉✐❧❞♥❡✇ ❞✐♠❡♥s✐♦♥s ❜❛s❡❞ ♦♥ ❤✐❡r❛r❝❤✐❡s ✐♥ t❤❡ ❞❛t❛✱ ✇❤✐❝❤ ❛r❡ ♥♦t ❡✈✐❞❡♥t✳ ❚❤❡ ❞❛t❛ ♠✐♥✐♥❣ ♠❡t❤♦❞s ❝❛♥ ❝♦♠♣❧❡t❡ t❤❡❡①♣❡rt ❦♥♦✇❧❡❞❣❡ ❞✉r✐♥❣ t❤❡ ❞❡s✐❣♥ ♦❢ ❛♥ ❖▲❆P ❝✉❜❡✱ ❜❡❝❛✉s❡ t❤❡s❡ ♠❡t❤♦❞s ❝❛♥ ❡①♣❧❛✐♥ t❤❡ ✐♥❤❡r❡♥t str✉❝t✉r❡ ♦❢t❤❡ ❞❛t❛✳

❑❡②✇♦r❞s✿ ❖▲❆P❀ ❍✐❡r❛r❝❤✐❝❛❧ ❆❣❣❧♦♠❡r❛t✐✈❡ ❈❧✉st❡r✐♥❣❀ ❇✐r❞ P♦♣✉❧❛t✐♦♥❀ ❆✉t♦♠❛t✐❝ ❉❡s✐❣♥

■♥tr♦❞✉❝t✐♦♥✿ ✉s❡ ❞❛t❛ ♠✐♥✐♥❣ ❢♦r ❖▲❆P ❝✉❜❡ ❞❡s✐❣♥

❙✐♥❝❡ ✶✾✾✸✱ ❖▲❆P ✭❖♥ ▲✐♥❡ ❆♥❛❧②t✐❝❛❧ Pr♦❝❡ss✐♥❣✮ s②st❡♠s ❤❛✈❡ ❜❡❡♥ ♣r♦♣♦s❡❞ t♦ ✐♠♣r♦✈❡ ❞❡❝✐s✐♦♥ ♠❛❦✐♥❣ ♣r♦❝❡ss❞✉❡ t♦ ❛♥❛❧②s✐s ♦❢ ❧❛r❣❡ ❞❛t❛s❡ts ✭❈♦❞❞ ❡t ❛❧✳✱ ✶✾✾✸✮✳ ❚❤✐s ❦✐♥❞ ♦❢ s♦❢t✇❛r❡ ✐s ❞❡s✐❣♥❡❞ t♦ ❡①♣❧♦r❡ ❡❛s✐❧② ❛♥❞ q✉✐❝❦❧②♠✉❧t✐❞✐♠❡♥s✐♦♥❛❧ ❞❛t❛ ✭❘✐✈❡st ❡t ❛❧✳✱ ✷✵✵✺✮✳ ❚❤❡ ✇♦r❞ ❖▲❆P ❝❛♥ ❜❡ ❛ss♦❝✐❛t❡❞ ✇✐t❤ ❛ ♣r♦❝❡ss✱ ❛ ❦✐♥❞ ♦❢ s②st❡♠ ♦r ❛❦✐♥❞ ♦❢ ❞❛t❛ ✭❏❡r❜✐ ❡t ❛❧✳✱ ✷✵✵✾✮✳ ❆ ❜❛s✐❝ ❘❡❧❛t✐♦♥❛❧ ❖▲❆P ✭❘❖▲❆P✮ s②st❡♠ ❛r❝❤✐t❡❝t✉r❡ ❝♦♥s✐sts ♦❢ ✭✐✮ ❛ r❡❧❛t✐♦♥❛❧❉❛t❛ ❇❛s❡ ▼❛♥❛❣❡♠❡♥t ❙②st❡♠ ✭❉❇▼❙✮✱ t❤❛t st♦r❡s ❞❛t❛ ✐♥ ❛❝❝♦r❞❛♥❝❡ ✇✐t❤ ❞❛t❛ ✇❛r❡❤♦✉s✐♥❣ ♣❛r❛❞✐❣♠❀ ✭✐✐✮ ❛♥❖▲❆P s❡r✈❡r t❤❛t ✐♠♣❧❡♠❡♥ts t❤❡ ♠✉❧t✐❞✐♠❡♥s✐♦♥❛❧ ♠♦❞❡❧ ❛♥❞ ❖▲❆P ♦♣❡r❛t♦rs ♦♥ t♦♣ ♦❢ t❤❡ ❉❇▼❙❀ ✭✐✐✐✮ ❛♥ ❖▲❆P❝❧✐❡♥t✱ t❤❛t ❝♦♠❜✐♥❡s ❛♥❞ s②♥❝❤r♦♥✐③❡s t❛❜✉❧❛r ❛♥❞ ❣r❛♣❤✐❝❛❧ ❞✐s♣❧❛②s ❛♥❞ ❛❧❧♦✇s q✉❡r② ❜✉✐❧❞✐♥❣❀ ✭✐✈✮ ❛♥ ❊❚▲ t♦♦❧t❤❛t ❡①tr❛❝ts ❞❛t❛ ❢r♦♠ ❤❡t❡r♦❣❡♥❡♦✉s s♦✉r❝❡s✱ tr❛♥s❢♦r♠s t❤❡♠ ❛♥❞ ❧♦❛❞s t❤❡♠ ✐♥t♦ ❛ ❞❛t❛ ✇❛r❡❤♦✉s❡✳

■♥ t❤✐s ♣❛♣❡r✱ ✇❡ ❛r❡ ❢♦❝✉s❡❞ ♦♥ ❞❡s✐❣♥ ♦❢ ❖▲❆P s❝❤❡♠❛✱ ✇❤✐❝❤ ✐s ❞❡✜♥❡ ❜② ❯s♠❛♥ ❛s ❛ ❝♦❧❧❡❝t✐♦♥ ♦❢ ❞❛t❛❜❛s❡♦❜❥❡❝ts✱ ✐♥❝❧✉❞✐♥❣ t❛❜❧❡s✱ ✈✐❡✇s✱ ✐♥❞❡①❡s ❛♥❞ s②♥♦♥②♠s ✭❯s♠❛♥ ❡t ❛❧✳✱ ✷✵✶✵✮✳

❙❡✈❡r❛❧ r❡s❡❛r❝❤ ✇♦r❦s s✉❣❣❡st ♠♦❞❡❧✐♥❣ ❢♦r ❖▲❆P s❝❤❡♠❛✱ t❤❛t ❡✐t❤❡r r❡❧② ♦♥ ❡①✐st✐♥❣ ♠♦❞❡❧s ✭❊♥t✐t②✴❘❡❧❛t✐♦♥s❤✐♣✱❖❜❥❡❝t✲❖r✐❡♥t❡❞✱ ✳✳✳✮ ♦r s✉❣❣❡st ♥❡✇ ♠♦❞❡❧s ✭▲❡❤♥❡r✱ ✶✾✾✽ ❀ ◆❣✉②❡♥ ❛♥❞ ❚❥♦❛✱ ✷✵✵✵ ❀ P❡❞❡rs❡♥ ❛♥❞ ❏❡♥s❡♥✱ ✶✾✾✽ ❀❚s♦✐s ❡t ❛❧✳✱ ✷✵✵✶✮✳ ❘❡❣❛r❞❧❡ss ♦❢ t❤❡ ♠❡t❤♦❞s ❝❤♦s❡♥ ❜② t❤❡ ❛✉t❤♦rs t♦ ❞❡✜♥❡ t❤❡ r✉❧❡s ♦❢ t❤❡✐r ♠♦❞❡❧s✱ t❤❡s❡ ♠♦❞❡❧s❛r❡ ❜❛s❡❞ ♦♥ t❤r❡❡ ❝♦♥❝❡♣t ♦❢ ♠✉❧t✐❞✐♠❡♥s✐♦♥❛❧ ♠♦❞❡❧✐♥❣ ✿ ♠❡❛s✉r❡s✱ ❞✐♠❡♥s✐♦♥s ❛♥❞ ❤✐❡r❛r❝❤✐❡s ✭❏❡r❜✐ ❡t ❛❧✳✱ ✷✵✵✾✮✳

∗❈♦rr❡s♣♦♥❞✐♥❣ ❛✉t❤♦r✳ ❊♠❛✐❧ ❛❞❞r❡ss ✿ ❧✳s❛✉t♦t❅❛❣r♦s✉♣❞✐❥♦♥✳❢r

Pr❡♣r✐♥t s✉❜♠✐tt❡❞ t♦ ❊❧s❡✈✐❡r ✷✷ ❥✉✐♥ ✷✵✶✹

Page 3: The Hierarchical Agglomerative Clustering with Gower index ... · OLAP technology interests more and more elds and especially biology. An OLAP cube provides a very easy navigation

▼❡❛s✉r❡s ❛r❡ ❞❡✜♥❡❞ ❛s ❞②♥❛♠✐❝❛❧ ❛♥❞ ❞❡♣❡♥❞❡♥t ✈❛r✐❛❜❧❡s ✭◆❣✉②❡♥ ❛♥❞ ❚❥♦❛✱ ✷✵✵✵✮✳ ❚❤❡② q✉❛♥t✐❢② t❤❡ ♦❜❥❡❝ts❝♦✈❡r❡❞ ❜② t❤❡ ❛♥❛❧②s✐s✱ ❝❛❧❧❡❞ ✏❢❛❝ts✑✳ ❆ ❢❛❝t ❞❡s❝r✐❜❡s ♦❢t❡♥ ❛♥ ❡✈❡♥t ✭❢♦r ❡①❛♠♣❧❡✱ t❤❡ s❛❧❡s✮ t❤❛t ♦❝❝✉rs ✇✐t❤✐♥ ❛♥♦r❣❛♥✐③❛t✐♦♥ ✇❤✐❝❤ ✉s❡s t❤❡ ❞❡❝✐s✐♦♥ ♠❛❦✐♥❣ s②st❡♠✳ ❚❤❡ ♦r❣❛♥✐③❛t✐♦♥ ✇✐s❤❡s ❡①♣❧❛✐♥ t❤❡ ❢❛❝t ✭❲❡❤r❧❡ ❡t ❛❧✳✱ ✷✵✵✺✮✳

❉✐♠❡♥s✐♦♥s ❛r❡ ❞❡✜♥❡❞ ❛s st❛t✐❝ ❛♥❞ ✐♥❞❡♣❡♥❞❡♥t ✈❛r✐❛❜❧❡s ✭◆❣✉②❡♥ ❛♥❞ ❚❥♦❛✱ ✷✵✵✵✮✱ t❤❛t t❛❧❧② ✇✐t❤ ❛♥❛❧②s✐s❛①❡s✳ ❆ ❞✐♠❡♥s✐♦♥ ❣✉✐❞❡s t❤❡ q✉❡r✐❡s✱ ✇❤✐❝❤ ♣r♦✈✐❞❡s s❡✈❡r❛❧ ✈✐❡✇s ♦♥ ❞❛t❛ ✭❲❡❤r❧❡ ❡t ❛❧✳✱ ✷✵✵✺✮✳

❚❤❡ ❞✐♠❡♥s✐♦♥s ♦❢ ❛♥ ❖▲❆P s❝❤❡♠❛ ❝❛♥ ❝♦♥t❛✐♥ ♦♥❡ ♦r ♠♦r❡ ❤✐❡r❛r❝❤✐❡s ✐♥ ❞❛t❛✳ ❍✐❡r❛r❝❤✐❡s ♣r♦✈✐❞❡ ❛ str✉❝t✉r❡t♦ t❤❡ ❞✐♠❡♥s✐♦♥s✿ t❤❡ ❞❛t❛ ♦❢ ❛ ❞✐♠❡♥s✐♦♥ ❝❛♥ ❜❡ ❝❛t❡❣♦r✐③❡❞ ❛❝❝♦r❞✐♥❣ t♦ ✈❛r✐♦✉s ❝❤❛r❛❝t❡r✐st✐❝s✳ ❯s❡rs ♦❢ ❖▲❆Ps②st❡♠ ❛r❡ ✉s✉❛❧❧② ✐♥t❡r❡st❡❞ ✐♥ ❛❣❣r❡❣❛t❡❞ ❞❛t❛ ✭❢♦r ❡①❛♠♣❧❡✱ t❤❡ ❛✈❡r❛❣❡ ♦❢ t❤❡ s❛❧❡s ❢♦r s♦♠❡ ❣❡♦❣r❛♣❤✐❝❛❧ ❛r❡❛s✮✳❚❤✉s ❤✐❡r❛r❝❤✐❡s ❛r❡ ❛❣❣r❡❣❛t✐♦♥ ❧❡✈❡❧s ♦❢ ❞❛t❛ ✭▼❛❤❜♦✉❜✐ ❡t ❛❧✳✱ ✷✵✶✷❀ ▼❛r❦❧ ❡t ❛❧✳✱ ✶✾✾✾❀ ❙❛r❛✇❛❣✐ ❡t ❛❧✳✱ ✶✾✾✽✮✳❊❛❝❤ ❧❡✈❡❧ ♦❢ ❛ ❤✐❡r❛r❝❤② ❝♦♥t❛✐♥s ❞❡s❝r✐♣t♦rs✱ ♥❛♠❡❞ ✏❛ttr✐❜✉t❡s✑ ✭❘♦♠❡r♦ ❛♥❞ ❆❜❡❧❧♦✱ ✷✵✶✵✮✳ ❚❤❡s❡ ❛ttr✐❜✉t❡s❞❡s❝r✐❜❡ ❡❛❝❤ ♠❡♠❜❡r ♦❢ ❡❛❝❤ ❧❡✈❡❧✳

❚♦ ❞❡s✐❣♥ ❛♥ ❖▲❆P ❝✉❜❡✱ ✇❡ ❤❛✈❡ t♦ ❞❡t❡r♠✐♥❡✿

❼ ❲❤❛t ❛r❡ t❤❡ ♠❡❛s✉r❡s❄ ✐✳❡✳ ✇❤❛t ✐s t❤❡ ♣❤❡♥♦♠❡♥♦♥ ✇❡ ✇❛♥t t♦ st✉❞② ❛♥❞ ❤♦✇ t♦ ♠❡❛s✉r❡ ✐t❄ ❲✐t❤ ❛♠❡❛s✉r❡✱ ✇❡ ❤❛✈❡ t♦ ❞❡t❡r♠✐♥❡ ❛♥ ❛❣❣r❡❣❛t✐♦♥ ❢✉♥❝t✐♦♥✿ ❞♦ ✇❡ ✉s❡ s✉♠✱ ❛✈❡r❛❣❡ ♦r ❝♦✉♥t t♦ ❥♦✐♥ t✇♦ ✈❛❧✉❡s❄

❼ ❲❤❛t ❛r❡ t❤❡ ❞✐♠❡♥s✐♦♥s❄ ✐✳❡✳ ✇❤❛t ❛r❡ t❤❡ ✇❛②s ♦❢ ♦✉r ❛♥❛❧②s✐s❄ ❲❤❛t ❛r❡ t❤❡ ♣❛r❛♠❡t❡rs ✇❡ ✇❛♥t ❝♦♥s✐❞❡r❡①♣❧❛✐♥✐♥❣ ♠❡❛s✉r❡ ✈❛r✐❛t✐♦♥s❄ ❋♦r ❡❛❝❤ ❞✐♠❡♥s✐♦♥✱ ✇❡ ❤❛✈❡ t♦ ❞❡t❡r♠✐♥❡ ❤✐❡r❛r❝❤✐❡s✱ ✐✳❡✳ ❞❛t❛ ♦r❣❛♥✐③❛t✐♦♥✐♥t♦ t❤❡ ❞✐♠❡♥s✐♦♥✱ ❛♥❞ ❛ttr✐❜✉t❡s ❢♦r t❤❡ ❞✐♠❡♥s✐♦♥ ♠❡♠❜❡rs✳

❖▲❆P t❡❝❤♥♦❧♦❣② ✐♥t❡r❡sts ♠♦r❡ ❛♥❞ ♠♦r❡ ✜❡❧❞s ❛♥❞ ❡s♣❡❝✐❛❧❧② ❜✐♦❧♦❣②✳ ❆♥ ❖▲❆P ❝✉❜❡ ♣r♦✈✐❞❡s ❛ ✈❡r② ❡❛s②♥❛✈✐❣❛t✐♦♥ ✐♥t♦ ❛ ❞❛t❛ s❡t✱ t❤❡ ♣♦ss✐❜✐❧✐t② t♦ ❜✉✐❧❞ ❝r♦ss t❛❜✉❧❛t✐♦♥ t♦ ❛♥❛❧②③❡ t❤❡ ❞❛t❛ ❛♥❞ t❤❡ ♣♦ss✐❜✐❧✐t② t♦♠♦♥✐t♦r ❛ ❝♦♠♣❧❡① ♣❤❡♥♦♠❡♥♦♥✱ s✉❝❤ ❛s ♣♦❧❧✉t✐♦♥ ♦❢ ❛ ❜❛② ✭▼❛❤❜♦✉❜✐ ❡t ❛❧✳✱ ✷✵✶✸✮ ♦r ❣r♦✇t❤ ♦❢ ❛ ❢♦r❡st ✭▼✐q✉❡❧❡t ❛❧✳✱ ✷✵✵✷✮✳ ❇✉t ❜✐♦❧♦❣✐sts ❣❡♥❡r❛❧❧② ❞♦ ♥♦t ❤❛✈❡ s❦✐❧❧s t♦ ❜✉✐❧❞ ❛♥❞ ♠❛♥❛❣❡ ❛♥ ❖▲❆P s②st❡♠✳

❚❤❡r❡❜② t❤✐s ♥❡❡❞❢✉❧ ❤✐❣❤ ❧❡✈❡❧ ♦❢ s❦✐❧❧s ✐s ❛♥ ♦❜st❛❝❧❡ t♦ ❞❡♠♦❝r❛t✐③✐♥❣ ♦❢ ❖▲❆P s②st❡♠s✳ ❖✉r ♦❜❥❡❝t✐✈❡ ✐♥ t❤✐s❛rt✐❝❧❡ ✐s t♦ s✉❣❣❡st ❛♥ ❖▲❆P s②st❡♠ t❤❛t ✇✐❧❧ ❜❡ ❛❜❧❡ t♦ ♦r❣❛♥✐③❡ ❛✉t♦♠❛t✐❝❛❧❧② ❤✐❡r❛r❝❤✐❡s ✐♥ ❛ ❞✐♠❡♥s✐♦♥✳ ❲✐t❤t❤✐s ❦✐♥❞ ♦❢ s②st❡♠✱ ❖▲❆P ❞❡s✐❣♥ ❝❛♥ ❜❡ ❛♥ ❛✉t♦♠❛t✐❝ t❛s❦ ❛♥❞ ✉❧t✐♠❛t❡❧② ❞♦❡s ♥♦t r❡q✉✐r❡ s♣❡❝✐✜❝ ■❚ s❦✐❧❧s✳

❚♦ ❜❡❣✐♥✱ ✇❡ ✐❞❡♥t✐✜❡❞ t❤❡ t②♣❡ ♦❢ ❛✉t♦♠❛t✐❝ ♦r s❡♠✐✲❛✉t♦♠❛t✐❝ ❛♣♣r♦❛❝❤✱ ✇❤✐❝❤ ❛r❡ ✉s❡❞ t♦ r❡❛❧✐③❡ t❤❡ ❞❡s✐❣♥ ♦❢❛ ❞❛t❛ ✇❛r❡❤♦✉s❡ ♦r ❖▲❆P ❝✉❜❡✳ ❚❤r❡❡ t②♣❡s ♦❢ ❛♣♣r♦❛❝❤❡s ❝❛♥ ❜❡ ✉s❡❞ t♦ ♠❛❦❡ t❤❡ ❞❡s✐❣♥ ♦❢ ❛♥ ❞❛t❛ ✇❛r❡❤♦✉s❡✭❈r❛✈❡r♦ ❛♥❞ ❙❡♣ú❧✈❡❞❛✱ ✷✵✶✹❀ ❚❡❜♦✉rs❦✐ ❡t ❛❧✳✱ ✷✵✶✸✮✿ ✭✐✮ ▼❡t❤♦❞s ❜❛s❡❞ ♦♥ ✉s❡r s♣❡❝✐✜❝❛t✐♦♥s✱ ♦r ❞❡♠❛♥❞✲❞r✐✈❡♥❛♣♣r♦❛❝❤❀ ✭✐✐✮ ▼❡t❤♦❞s ❜❛s❡❞ ♦♥ ❛✈❛✐❧❛❜❧❡ ❞❛t❛✱ ♦r ❞❛t❛✲❞r✐✈❡♥ ❛♣♣r♦❛❝❤❀ ✭✐✐✐✮ ▼✐①❡❞ ♠❡t❤♦❞s✱ ♦r ❤②❜r✐❞ ❛♣♣r♦❛❝❤✳

❋♦r ❡①❛♠♣❧❡✱ ♦r✐❡♥t❡❞ t♦ ❞❡♠❛♥❞✲❞r✐✈❡♥ ♠❡t❤♦❞s✱ ✇❡ ❝✐t❡ t❤❡ ✇♦r❦ ♦❢ ❏♦✈❛♥♦✈✐❝ ❡t ❛❧✳✱ ✇❤♦ ❞❡✈❡❧♦♣❡❞ ❛ ♠❡t❤♦❞✲♦❧♦❣② ❢♦r ❞❡s✐❣♥✐♥❣ ❛ ❞❛t❛ ✇❛r❡❤♦✉s❡ ✭❏♦✈❛♥♦✈✐❝ ❡t ❛❧✳✱ ✷✵✶✹✮✳ ❚❤✐s ♠❡t❤♦❞ ✐s ✐t❡r❛t✐✈❡✿ ❛t ❡❛❝❤ st❡♣✱ t❤❡ s②st❡♠s❡❛r❝❤❡s ✐♥ t❤❡ ❞❛t❛ t❤❛t ❜❡st ❝♦rr❡s♣♦♥❞ ✇✐t❤ ✐♥❢♦r♠❛t✐♦♥ r❡q✉✐r❡❞ ❜② t❤❡ ✉s❡r ✐♥ t❡r♠s ♦❢ ❞✐♠❡♥s✐♦♥s ♦r ❢❛❝ts✳❉❛t❛ ❛r❡ ♠♦❞❡❧❡❞ ✇✐t❤ ❛♥ ♦♥t♦❧♦❣②✳

▼♦r❡♦✈❡r✱ s❡✈❡r❛❧ ♦t❤❡r ❤❛✈❡ ♣r♦♣♦s❡❞ s②st❡♠s ❜❛s❡❞ ♦♥ ❤②❜r✐❞ ❛♣♣r♦❛❝❤✿

❼ ❘♦♠❡r♦ ❛♥❞ ❆❜❡❧❧♦ ♦✛❡r ❛ ❤②❜r✐❞ ♠❡t❤❞♦❧♦❣② t♦ ❜✉✐❧❞ ♠✉❧t✐❞✐♠❡♥s✐♦♥❛❧ s❝❤❡♠❛ ❢r♦♠ ❛ r❡❧❛t✐♦♥❛❧ ❞❛t❛❜❛s❡✭❘♦♠❡r♦ ❛♥❞ ❆❜❡❧❧♦✱ ✷✵✶✵✮✳

❼ ❆❜❞❡❧❤❡❞✐ ❡t ❛❧✳ ❤❛✈❡ ❞❡✈❡❧♦♣❡❞ ❛ ♣r♦t♦t②♣❡ ❝❛❧❧❡❞ ❈❆❙❊ t♦ ❜✉✐❧❞ ❛♥ ❖▲❆P ❝✉❜❡ ✇✐t❤ ❛ ❤②❜r✐❞ ♠❡t❤♦❞✭❆❜❞❡❧❤❡❞✐ ❡t ❛❧✳✱ ✷✵✶✶✮✳ ❚❤❡ ❞❡s✐❣♥ ✐s ❞r✐✈❡♥ ❜② ❜♦t❤ t❤❡ ❞❛t❛ s♦✉r❝❡s ❛♥❞ t❤❡ ✉s❡r s♣❡❝✐✜❝❛t✐♦♥s✳

❼ ❆s ✐♥ ♠❛♥② ❝✉rr❡♥t ✇♦r❦s✱ ❚❤❡♥♠♦③❤✐ ❛♥❞ ❱✐✈❡❦❛♥❛♥❞❛♥ ♣r♦♣♦s❡ ❛♥ ❛✉t♦♠❛t✐❝ s②st❡♠ t♦ ❜✉✐❧❞ t❤❡ s❝❤❡♠❛♦❢ ❛ ❞❛t❛ ✇❛r❡❤♦✉s❡ ❢r♦♠ ❛♥ ♦♥t♦❧♦❣② ✭❚❤❡♥♠♦③❤✐ ❛♥❞ ❱✐✈❡❦❛♥❛♥❞❛♥✱ ✷✵✶✸✮✳

❋✐♥❛❧❧②✱ t❤❡ ❢♦❧❧♦✇✐♥❣ ❛✉t❤♦rs ❤❛✈❡ ✇♦r❦❡❞ ♦♥ ❛✉t♦♠❛t✐❝ ❞❛t❛✲❞r✐✈❡♥ s②st❡♠s ❛♥❞ ✉s✐♥❣ ❞❛t❛ ♠✐♥✐♥❣ t♦ ❜✉✐❧❞ ❛ ❞❛t❛✇❛r❡❤♦✉s❡ ♦r ❛♥ ❖▲❆P ❝✉❜❡✿

✶✳ ❊❞❡r ❡t ❛❧✳ ❛♣♣❧② ❞❛t❛ ♠✐♥✐♥❣ ❛❧❣♦r✐t❤♠s s✉❝❤ ❛s ❛✉t♦✲r❡❣r❡ss✐♦♥✱ ❛✉t♦✲❝♦rr❡❧❛t✐♦♥✱ r❡❣r❡ss✐♦♥ ♦r ❢❛st ❋♦✉r✐❡rtr❛♥s❢♦r♠ ♦♥ t❤❡ ❞❛t❛ ✐♥ ❛ ❞❛t❛ ✇❛r❡❤♦✉s❡ ✭❊❞❡r ❡t ❛❧✳✱ ✷✵✵✸✮✳ ❚❤❡✐r ❣♦❛❧ ✐s t♦ ❛✉t♦♠❛t✐❝❛❧❧② ❞❡t❡❝t t❤❡str✉❝t✉r❛❧ ❝❤❛♥❣❡s ✐♥ ❛ ❞❛t❛ ✇❛r❡❤♦✉s❡✱ s✉❝❤ ❛s ❞❡❧❡t✐♥❣✱ ❛❞❞✐♥❣✱ ♠❡r❣✐♥❣ ♠❡♠❜❡r ✐♥ ❛ ❤✐❡r❛r❝❤②✳

✷✳ ❯s♠❛♥ ✭❯s♠❛♥ ❡t ❛❧✳✱ ✷✵✶✵❀ ❯s♠❛♥ ❛♥❞ P❡❛rs✱ ✷✵✶✵✮ ♣r♦✈✐❞❡s ❛ ♠❡t❤♦❞♦❧♦❣② t♦ ❞❡s✐❣♥ ❛✉t♦♠❛t✐❝❛❧❧② ❖▲❆Ps❝❤❡♠❛ ❛♥❞ ❞❛t❛ ✇❛r❡❤♦✉s❡s ✇✐t❤ ❤✐❡r❛r❝❤✐❝❛❧ ❝❧✉st❡r✐♥❣✳ ❚❤✐s ❛✉t❤♦r s✉❣❣❡sts ❛ ❝♦♠♣❧❡t❡ s②st❡♠ t♦ ❜✉✐❧❞❖▲❆P s②st❡♠s ✇✐t❤ ❞❛t❛ s❡ts✳ ❚❤❡ s②st❡♠✱ ✇❤✐❝❤ ✐s ♣r♦♣♦s❡❞ ❜② ❯s♠❛♥ ❡t ❛❧✳✱ ✉s❡s ❤✐❡r❛r❝❤✐❝❛❧ ❛❣❣❧♦♠❡r❛t✐✈❡❝❧✉st❡r✐♥❣ t♦ ♣❡r❢♦r♠ ❛ ♣r❡✲♣r♦❝❡ss✐♥❣ ♦♥ t❤❡ ❞❛t❛✳ ❆❢t❡r t❤❛t✱ t❤❡ s②st❡♠ ✐❞❡♥t✐✜❡s ❢❛❝ts ❛♥❞ ❞✐♠❡♥s✐♦♥s ✐♥t♦t❤❡ ❝❧✉st❡r❡❞ ❞❛t❛✳ ❚❤✐s s②st❡♠ ✐s ❛❜❧❡ t♦ ❜✉✐❧❞ st❛r s❝❤❡♠❛✱ s♥♦✇✢❛❦❡ s❝❤❡♠❛ ❛♥❞ ❝♦♥st❡❧❧❛t✐♦♥ s❝❤❡♠❛✳

✸✳ ❘❡❤♠❛♥ ❡t ❛❧✳ ♣r♦♣♦s❡ ❛ s②st❡♠ t♦ ❞②♥❛♠✐❝❛❧❧② ❜✉✐❧❞ ❤✐❡r❛r❝❤✐❡s ❜❛s❡❞ ♦♥ ❞❛t❛ ❢r♦♠ ❚✇✐tt❡r ✭❘❡❤♠❛♥ ❡t ❛❧✳✱✷✵✶✷✮✳ ❚❤✐s ♣❛♣❡r ❤❛s t✇♦ ■♥t❡r❡sts✿ ❛✮ ❚❤❡ ❝✉❜❡ ✐s ❜✉✐❧t ♦♥ ♦r✐❣✐♥❛❧ ❞❛t❛✱ t❤❛t ❛r❡ ♠❡ss❛❣❡s ♦❢ ✉s❡rs ♦♥ ❛s♦❝✐❛❧ ♥❡t✇♦r❦✳ ❜✮ ❉❛t❛ ♠✐♥✐♥❣ ✐s ✉s❡❞ t♦ ❞②♥❛♠✐❝❛❧❧② ❜✉✐❧❞ ❤✐❡r❛r❝❤✐❡s✿ t❤❛♥❦s t♦ ❞❛t❛ ♠✐♥✐♥❣✱ t❤❡ ❝❛t❡❣♦r✐❡s♦❢ ♥❡t✇♦r❦ ✉s❡rs ❞❡s❝r✐❜❡❞ ✐♥ ❤✐❡r❛r❝❤✐❡s ❛r❡ ✉♣❞❛t❡❞ ❛✉t♦♠❛t✐❝❛❧❧②✳

Page 4: The Hierarchical Agglomerative Clustering with Gower index ... · OLAP technology interests more and more elds and especially biology. An OLAP cube provides a very easy navigation

▼♦r❡♦✈❡r✱ t❤❡ ❢♦❧❧♦✇✐♥❣ ❛✉t❤♦rs ✉s❡ ❝❧✉st❡r✐♥❣ ❛❧❣♦r✐t❤♠s t♦ ❞②♥❛♠✐❝❛❧❧② ❜✉✐❧❞ ♦r ♠♦❞✐❢② ❤✐❡r❛r❝❤✐❡s ✐♥ ❛♥ ❖▲❆P❝✉❜❡✿

✶✳ ▼❡ss❛♦✉❞ ❡t ❛❧✳ ♣r♦♣♦s❡ ❛ ♥❡✇ ❖▲❆P ♦♣❡r❛t♦r ♥❛♠❡❞ ❖P❆❈ ✇❤✐❝❤ ❛❧❧♦✇s t♦ ❛❣❣r❡❣❛t❡ ❢❛❝ts t❤❛t r❡❢❡rt♦ ❝♦♠♣❧❡① ♦❜❥❡❝ts✱ s✉❝❤ ❛s ✐♠❛❣❡s ✭▼❡ss❛♦✉❞ ❡t ❛❧✳✱ ✷✵✵✹✮✳ ❚❤✐s ♦♣❡r❛t♦r ✐s ❜❛s❡❞ ♦♥ ❤✐❡r❛r❝❤✐❝❛❧ ❝❧✉st❡r✲✐♥❣ ❛❧❣♦r✐t❤♠✳ ❚❤❡ ♣r♦t♦t②♣❡ ♣r♦♣♦s❡❞ ❜② t❤❡s❡ ❛✉t❤♦rs ✐♥❝♦r♣♦r❛t❡s ❛ ♠♦❞✉❧❡ t♦ ❡✈❛❧✉❛t❡ t❤❡ q✉❛❧✐t② ♦❢❛❣❣r❡❣❛t✐♦♥s✳

✷✳ ❋❛✈r❡✱ ❇❡♥t❛②❡❜ ❛♥❞ ❇♦✉ss❛✐❞ ✭❋❛✈r❡ ❡t ❛❧✳✱ ✷✵✵✻✮ s✉❣❣❡st ❝♦♥s✐❞❡r✐♥❣ r✉❧❡s ❞❡✜♥❡❞ ❜② t❤❡ ✉s❡rs ❞✉r✐♥❣❜r♦✇s✐♥❣ ✐♥ ❛♥ ❖▲❆P s②st❡♠✳ ❚❤❡s❡ r✉❧❡s ✇❡r❡ ✉s❡❞ t♦ ❝❤❛♥❣❡ ❞②♥❛♠✐❝❛❧❧② t❤❡ ❞❛t❛ ✇❛r❡❤♦✉s❡ s❝❤❡♠❛✳ ❚❤❡s②st❡♠✱ t❤❛t ❋❛✈r❡ ❡t ❛❧✳ ❤❛✈❡ ♣r♦♣♦s❡❞✱ ❤❛s ❛ st❛❜❧❡ ♣❛rt ❛♥❞ ❛ ❞②♥❛♠✐❝ ♣❛rt✳ ❚❤❡ st❛❜❧❡ ♣❛rt ♦❢ t❤❡ s②st❡♠❝♦rr❡s♣♦♥❞s ♦❢ ❛ ❜❛s✐❝ ❖▲❆P s❝❤❡♠❛ ✇✐t❤ ❛ st❛r s❝❤❡♠❛✳ ❋r♦♠ t❤✐s ❜❛s✐s✱ ❡❛❝❤ ✉s❡r ❝❛♥ ❞❡✜♥❡ r✉❧❡s t♦ ❜✉✐❧❞❤✐❡r❛r❝❤✐❡s ✐♥ ❡❛❝❤ ❞✐♠❡♥s✐♦♥✳ ❚❤❡s❡ ❤✐❡r❛r❝❤✐❡s✱ ✇❤✐❝❤ ❞❡♣❡♥❞ ♦❢ t❤❡ ✉s❡r r✉❧❡s✱ ❝♦♥st✐t✉t❡ t❤❡ ❞②♥❛♠✐❝ ♣❛rt♦❢ t❤❡ s②st❡♠✳

✸✳ ■♥ ✷✵✵✽✱ ❇❡♥t❛②❡❜ ♦✛❡rs ❝r❡❛t❡ ♥❡✇ ❧❡✈❡❧s ✐♥ ❛ ❤✐❡r❛r❝❤② ✇✐t❤ t❤❡ ❑✲♠❡❛♥s ❛❧❣♦r✐t❤♠ ✭❇❡♥t❛②❡❜✱ ✷✵✵✽✮✳❚❤❡r❡❛❢t❡r✱ ❇❡♥t❛②❡❜ ❛♥❞ ❑❤❡♠✐r✐ ♣r♦♣♦s❡ ✐♥ ✷✵✶✸ ✭❇❡♥t❛②❡❜ ❛♥❞ ❑❤❡♠✐r✐✱ ✷✵✶✸✮ ❛♥ ♦♣❡r❛t♦r✱ ❝❛❧❧❡❞ Pr♦❈❑✱✇❤✐❝❤✱ ❛s ✐♥ t❤❡ ✇♦r❦ ♦❢ ❍✉❜❡rt ❛♥❞ ❚❡st❡ ✭❍✉❜❡rt ❛♥❞ ❚❡st❡✱ ✷✵✵✾✮✱ ♣❡r♠✐ts t♦ t❤❡ ✉s❡r t♦ ❞②♥❛♠✐❝❛❧❧②❝❤❛♥❣❡ t❤❡ ❤✐❡r❛r❝❤✐❡s ❞✉r✐♥❣ t❤❡ ♥❛✈✐❣❛t✐♦♥✳ ❚❤✐s ♦♣❡r❛t♦r ✉s❡s ❛ ❑✲♠❡❛♥s ❛❧❣♦r✐t❤♠ ♠♦❞✐✜❡❞ t♦ t❛❦❡ ✐♥t♦❛❝❝♦✉♥t t❤❡ ❝♦♥str❛✐♥ts ❞❡✜♥❡❞ ❜② t❤❡ ✉s❡r✳ ❚❤✐s ♦♣❡r❛t♦r ❛❧❧♦✇s t♦ ❞❡✜♥❡ ♥❡✇ ❧❡✈❡❧s ✐♥ ❛ ❤✐❡r❛r❝❤②✳

✹✳ ❚❡st❡ ❛♥❞ ❍✉❜❡rt ♣r♦♣♦s❡ ✐♥ ✷✵✵✾ ❛ ♥❡✇ ♦♣❡r❛t♦r t❤❛t ❛❧❧♦✇s t❤❡ ✉s❡r t♦ ❞②♥❛♠✐❝❛❧❧② ❝❤❛♥❣❡ t❤❡ ❤✐❡r❛r❝❤✐❡s✇✐t❤✐♥ t❤❡ ❝✉❜❡ ❖▲❆P ❞✉r✐♥❣ ♥❛✈✐❣❛t✐♦♥ ✭❍✉❜❡rt ❛♥❞ ❚❡st❡✱ ✷✵✵✾✮✳

✺✳ ▲❡♦♥❤❛r❞✐ ❡t ❛❧✳ ♦✛❡r t❤❡ ✉s❡r t♦ ❝r❡❛t❡ ♥❡✇ ❞✐♠❡♥s✐♦♥ ❞✉r✐♥❣ ♥❛✈✐❣❛t✐♦♥ ✭▲❡♦♥❤❛r❞✐ ❡t ❛❧✳✱ ✷✵✶✵✮✳ ❚❤❡s❡❛✉t❤♦rs ♣r♦♣♦s❡ t♦ ✐♥❝r❡❛s❡ t❤❡ ❖▲❆P ❝✉❜❡ ❡①♣❧♦r❛t✐♦♥ ❢✉♥❝t✐♦♥❛❧✐t✐❡s ❜② ♣r♦✈✐❞✐♥❣ t❤❡ ✉s❡r ❞❛t❛ ♠✐♥✐♥❣❛❧❣♦r✐t❤♠s ❛♣♣❧②✐♥❣ ♦♥ ❞❛t❛✱ ✇❤✐❝❤ ❛r❡ s❡❧❡❝t❡❞ ✐♥ t❤❡ ✇❛r❡❤♦✉s❡✳

❖♥ t❤❡ ♦t❤❡r ❤❛♥❞✱ ❈❡❝✐ ❡t ❛❧✳ ✉s❡ ❛ ❤✐❡r❛r❝❤✐❝❛❧ ❝❧✉st❡r✐♥❣ t♦ ✐♥t❡❣r❛t❡ ❝♦♥t✐♥✉♦✉s ✈❛r✐❛❜❧❡s ❛s ❞✐♠❡♥s✐♦♥s ✐♥❛♥ ❖▲❆P s❝❤❡♠❛ ✭❈❡❝✐ ❡t ❛❧✳✱ ✷✵✶✶✮✳ ❚❤❡✐r t♦♦❧ ✉s❡s ❛ ♠♦❞✐✜❡❞ ❇■❘❈❍ ❛❧❣♦r✐t❤♠✳ ■t ❞✐s❝r❡t✐③❡s ❛ ❝♦♥t✐♥✉♦✉s❞✐♠❡♥s✐♦♥ ✐♥ ♦r❞❡r t❤❛t t❤❡ ✉s❡r ❝❛♥ ♣❡r❢♦r♠ ♦♣❡r❛t✐♦♥s ♦♥ ❝♦♥✈❡♥t✐♦♥❛❧ q✉❡r②✐♥❣ ❛ ❝✉❜❡✿ ❘♦❧❧✲✉♣ ❛♥❞ ❉r✐❧❧✲❞♦✇♥✳❚❤❡s❡ ❛✉t❤♦rs ✉s❡ ❞❛t❛ ♠✐♥✐♥❣ t♦ ✐♥❝♦r♣♦r❛t❡ ✐♥ ❛ ❝✉❜❡ ❖▲❆P ♥❡✇ ❞❛t❛✱ ✇❤♦s❡ t❤❡ t②♣❡ ❧❡♥❞s ✐ts❡❧❢ ♣♦♦r❧②✳

❚❤❡s❡ ✇♦r❦s ♣r❡s❡♥t s❡✈❡r❛❧ ✐♥t❡r❡st✐♥❣ ❛s♣❡❝ts✳ ❋✐rst t❤❡s❡ ✇♦r❦s s✉❣❣❡st t❤❡ ✉s❡ ♦❢ ❛♥ ❛ ♣♦st❡r✐♦r✐ ♠♦❞❡❧✐♥❣♦❢ ❖▲❆P s❝❤❡♠❛✱ ♣❡r❢♦r♠ ❜② ✉s❡r ♦r ❜② ❛♥ ❛❧❣♦r✐t❤♠✳ ❋✉rt❤❡r♠♦r❡ t❤❡s❡ ✇♦r❦s ♦✛❡r t♦ t❤❡ ✉s❡r t❤❡ ♣♦ss✐❜✐❧✐t②t♦ ❜✉✐❧❞ ❤✐s ♦✇♥ ❖▲❆P s❝❤❡♠❛ ♦r t♦ ❜✉✐❧❞ ❛♥ ❖▲❆P s❝❤❡♠❛ ❛❝❝♦r❞✐♥❣ t♦ t❤❡ ♦✇♥ str✉❝t✉r❡ ♦❢ ❞❛t❛✳ ❚❤✐s ❛rt✐❝❧❡✐s ✐♥s♣✐r❡❞ ❜② t❤❡s❡ ✈✐❡✇♣♦✐♥ts✱ ❛♥❞ ✇❡ ❜✉✐❧❞ ❛ s②st❡♠ t❤❛t ♦✛❡rs t♦ ✉s❡r t❤❡ ♣♦ss✐❜✐❧✐t② t♦ ❜✉✐❧❞ ❤✐s ♦✇♥ ❖▲❆Ps❝❤❡♠❛ ✇✐t❤ ❛ ❞❛t❛ ♠✐♥✐♥❣ ♠❡t❤♦❞✳

■♥ ❛ ❜✐♦❧♦❣✐❝❛❧ st✉❞②✱ ♠❡❛s✉r❡s ❛♥❞ ❞✐♠❡♥s✐♦♥s ❛r❡ ❝❧❡❛r❧② ✐❞❡♥t✐✜❡❞✳ ❇✉t t❤❡ ❞❛t❛ ✇❤✐❝❤ ❞❡s❝r✐❜❡ ❛ ❞✐♠❡♥s✐♦♥❞♦ ♥♦t ♥❡❝❡ss❛r✐❧② ❤❛✈❡ ❛♥ ❛♣♣❛r❡♥t ❤✐❡r❛r❝❤✐❝❛❧ str✉❝t✉r❡✿

❼ ❚❤❡ ❞✐♠❡♥s✐♦♥ ❝❛♥ ❝♦♥t❛✐♥ s❡✈❡r❛❧ q✉❛♥t✐t❛t✐✈❡ ✈❛r✐❛❜❧❡s ❛♥❞ ♥♦t ♦♥❧② ❝❛t❡❣♦r✐❡s✳

❼ ❚❤❡ ✈❛r✐❛❜❧❡s ❛r❡ ❤❡t❡r♦❣❡♥❡♦✉s✿ t❤❡ ❞❛t❛ s❡t ❝❛♥ ❝♦♥t❛✐♥ q✉❛♥t✐t❛t✐✈❡ ✈❛r✐❛❜❧❡s✱ ♥♦♠✐♥❛❧ ✈❛r✐❛❜❧❡s ❛♥❞ ❜✐♥❛r②✈❛r✐❛❜❧❡s✳

❼ ❚❤❡ ❞❛t❛ s❡t ❝❛♥ ❝♦♥t❛✐♥ ❜❧❛♥❦ ✈❛❧✉❡s✳

❚❤❡ ♣r❡s❡♥t❡❞ ♣r❡✈✐♦✉s ✇♦r❦s ♦✛❡r t♦ ❜✉✐❧❞ ❛✉t♦♠❛t✐❝❛❧❧② ❖▲❆P s②st❡♠s ✇✐t❤ ❤✐❡r❛r❝❤✐❝❛❧ ✉s❡ ❞❛t❛ s❡t ✇✐t❤ ❜✐♥❛r②❛♥❞ q✉❛♥t✐t❛t✐✈❡ ✈❛r✐❛❜❧❡s✳ ❲❡ s✉❣❣❡st t♦ s✉♣♣❧❡♠❡♥t t❤❡s❡ ✇♦r❦s ✇✐t❤ ❛ s✐♠✐❧❛r✐t② ✐♥❞❡① ❝♦♠❡s ❢r♦♠ ❡❝♦❧♦❣✐❝❛❧❛♥❛❧②s✐s✱ t❤❡ ●♦✇❡r ✐♥❞❡①✳

■♥ t❤✐s ❛rt✐❝❧❡ ✇❡ ♣r♦✈✐❞❡ ❛ ♠❡t❤♦❞♦❧♦❣② t♦ ❜✉✐❧❞ ❛✉t♦♠❛t✐❝❛❧❧② ❛ ❤✐❡r❛r❝❤② ✇✐t❤ ❛ ❜✐♦❧♦❣✐❝❛❧ ❞❛t❛ s❡t t❤❛t❝♦♥t❛✐♥s ❤❡t❡r♦❣❡♥❡♦✉s ✈❛r✐❛❜❧❡s✳ ❖✉r ❛♣♣r♦❛❝❤ ✐s ❛s ❢♦❧❧♦✇s✿

❼ ■♥ t❤❡ ✜rst ♣❛rt✱ ✇❡ ✐♥tr♦❞✉❝❡ ❢♦r❡♠♦st t❤❡ ❞❛t❛ s❡t t❤❛t ✇❡ ✉s❡ ❛♥❞ t❤❡ ❢❡❛t✉r❡s ♦❢ t❤✐s ❞❛t❛ s❡t✳

❼ ■♥ ❛ s❡❝♦♥❞ ♣❛rt✱ ✇❡ ♣r❡s❡♥t s❡✈❡r❛❧ ❛ ♣r✐♦r✐ ❖▲❆P s❝❤❡♠❛s ❛♥❞ t❤❡✐r ❧✐♠✐t❛t✐♦♥s✳

❼ ■♥ ❛ t❤✐r❞ ♣❛rt✱ ✇❡ ❡①♣❧❛✐♥ ✜rst ❤♦✇ ♦✉r s②st❡♠ ✇♦r❦s✳ ❲❡ ♣r❡s❡♥t t❤❡ ❤✐❡r❛r❝❤✐❝❛❧ ❛❣❣❧♦♠❡r❛t✐✈❡ ❝❧✉st❡r✐♥❣❛♥❞ ✇❡ ❞❡✜♥❡ ✇❤❛t ❝❧✉st❡r✐♥❣ ♣❛r❛♠❡t❡rs ✇❡ ♥❡❡❞ t♦ ♣❡r❢♦r♠ t❤❡ ❤✐❡r❛r❝❤✐❝❛❧ ❛❣❣❧♦♠❡r❛t✐✈❡ ❝❧✉st❡r✐♥❣ ✇✐t❤♦✉r ❞❛t❛ s❡t✳ ◆❡①t ✇❡ ❡①♣❧❛✐♥ ✇❤❛t t❤❡ ●♦✇❡r ✐♥❞❡① ✐s ❛♥❞ ✇❤❛t t❤❡✐r ✐♥t❡r❡sts ❛r❡✳

❼ ■♥ ❛ ❢♦✉rt❤ ♣❛rt✱ ✇❡ s✉❣❣❡st ❛♥ ❡✈❛❧✉❛t✐♦♥ ♦❢ t❤❡ ♥❡❡❞❢✉❧ ♠❡♠♦r② ❛♥❞ t❤❡ ♥❡❡❞❢✉❧ ❝❛❧❝✉❧❛t✐♦♥ t✐♠❡ ❛❝❝♦r❞✐♥❣t♦ t❤❡ ♥✉♠❜❡r ♦❢ ♣r♦❝❡ss❡❞ ❞❛t❛✳

❼ ❋✐♥❛❧❧② ✇❡ ❝♦♥❝❧✉❞❡ ♦♥ t❤❡ s②st❡♠ ✇♦r❦✐♥❣ ❛♥❞ ♣❡r❢♦r♠❛♥❝❡s ❛♥❞ ✇❡ ♣r❡s❡♥t ♦✉r ❢✉t✉r❡ ✇♦r❦✳

Page 5: The Hierarchical Agglomerative Clustering with Gower index ... · OLAP technology interests more and more elds and especially biology. An OLAP cube provides a very easy navigation

✶✳ ❆ ❞❛t❛ s❡t ❢r♦♠ ❛ ❧❛r❣❡ ❡❝♦❧♦❣✐❝❛❧ st✉❞②

❖✉r ❞❛t❛ s❡t ❝♦♠❡s ❢r♦♠ ❛ ❝❡♥s✉s ♣r♦❣r❛♠ ❢♦r ♥❡st✐♥❣ ❜✐r❞s ❛❧♦♥❣ t❤❡ ▲♦✐r❡ ❘✐✈❡r ✭❋r❛♥❝❡✮ ✭❋r♦❝❤♦t ❡t ❛❧✳✱✷✵✵✸✮✳ ❚❤❡ ❙❚❖❘■ ✭❙✉✐✈✐ ❚❡♠♣♦r❡❧ ❞❡s ❖✐s❡❛✉① ♥✐❝❤❡✉rs ❡♥ ❘✐✈✐èr❡✿ ❚❡♠♣♦r❛❧ ▼♦♥✐t♦r✐♥❣ ♦❢ ◆❡st✐♥❣ ❇✐r❞s ✐♥❘✐✈❡r ❱❛❧❧❡②✮ ✐s ❛ ✇✐❞❡ r❡s❡❛r❝❤ ♣r♦❣r❛♠✱ ✇❤✐❝❤ st✉❞✐❡s ❜✐r❞ ♣♦♣✉❧❛t✐♦♥s ❛❧♦♥❣ t❤❡ r✐✈❡rs✳ ❚❤❡ ♦❜❥❡❝t✐✈❡ ♦❢ t❤✐s♣r♦❣r❛♠ ✐s t❤❡ ♦❜s❡r✈❛t✐♦♥ ♦❢ t❡♠♣♦r❛❧ ❛♥❞ s♣❛t✐❛❧ ❝❤❛♥❣❡s ✐♥t♦ ❜✐r❞ ♣♦♣✉❧❛t✐♦♥s✳ ❖♥❡ ❤✉♥❞r❡❞ ♥✐♥❡t② ❡✐❣❤t ♣♦✐♥ts✇❡r❡ ❞❡✜♥❡❞ ❛❧♦♥❣ t❤❡ r✐✈❡r ✐♥ t❤❡ ❢r❛♠❡✇♦r❦ ♦❢ t❤✐s ♣r♦❣r❛♠✳ ❆t ❡❛❝❤ ♣♦✐♥t t❤❡ ❜✐r❞s ❛r❡ ✐❞❡♥t✐✜❡❞ ✇✐t❤ t❤❡■P❆ ✭■♥❞✐❝❡ P♦♥❝t✉❡❧ ❞✬❆❜♦♥❞❛♥❝❡✿ P✉♥❝t✉❛❧ ❆❜✉♥❞❛♥❝❡ ■♥❞❡①✮ ♠❡t❤♦❞ ✭❇❧♦♥❞❡❧ ❡t ❛❧✳✱ ✶✾✽✶✮ ❞✉r✐♥❣ ❢♦✉r ❝❡♥s✉s❝❛♠♣❛✐❣♥s ✭✶✾✾✵✱ ✶✾✾✻✱ ✷✵✵✷ ❛♥❞ ✷✵✶✶✮✳ ❇✐r❞ ❛❜✉♥❞❛♥❝❡s ✇❡r❡ ❞❡s❝r✐❜❡❞ ❜② ❛ s❡♠✐✲q✉❛♥t✐t❛t✐✈❡ ❛❜✉♥❞❛♥❝❡ ✐♥❞❡①✳

❖♥❡ ♦❢ t❤❡ ♠❛✐♥ ♦❜❥❡❝t✐✈❡s ♦❢ t❤❡ ❙❚❖❘■ ✐s st✉❞②✐♥❣ ❣❧♦❜❛❧ ❛♥❞ ❧♦❝❛❧ ❢❛❝t♦rs t❤❛t ❡①♣❧❛✐♥ t❤❡s❡ ❝❤❛♥❣❡s✳ ■♥ t❤✐s❝♦♥t❡①t✱ t❤❡ ❡✈♦❧✉t✐♦♥ ♦❢ ❡♥✈✐r♦♥♠❡♥ts ❛❧♦♥❣ t❤❡ ▲♦✐r❡ ❘✐✈❡r ❜❡t✇❡❡♥ ✶✾✾✵ ❛♥❞ ✷✵✶✶ ✇❡r❡ ❞❡s❝r✐❜❡❞ ❛t ❡❛❝❤ ♣♦✐♥t✐♥ ♣❛r❛❧❧❡❧ ✇✐t❤ t❤❡ ■P❆ ❞❛t❛✱ t♦ ✜♥❞ ❝♦rr❡❧❛t✐♦♥s ❜❡t✇❡❡♥ t❤❡s❡ ♣♦♣✉❧❛t✐♦♥s ❛♥❞ t❤✐s ❡♥✈✐r♦♥♠❡♥t✳

■♥ ❢❛❝t✱ t❤❡ ❞❛t❛ s❡t ❝❛♥ ❜❡ s✉♠♠❛r✐③❡❞ ❜②✿

❼ ❆ ♠❡❛s✉r❡✿ ❜✐r❞ ❛❜✉♥❞❛♥❝❡s t❤❛t ❝❛♥ ❛❣❣❧♦♠❡r❛t❡ ✇✐t❤ ❛ s✉♠ ♦r ❛♥ ❛✈❡r❛❣❡✳

❼ ❚❤r❡❡ ❞✐♠❡♥s✐♦♥s t♦ ❛♥❛❧②③❡ t❤❡ ❛❜✉♥❞❛♥❝❡✿ s♣❡❝✐❡s✱ t✐♠❡ ❛♥❞ s♣❛❝❡✳

■♥ t❤✐s ❝♦♥t❡①t✱ ✇❡ ❜✉✐❧❞ ❛♥ ❖▲❆P s②st❡♠ t♦ ♠❛♥❛❣❡ ❛♥❞ st♦r❡ t❤❡s❡ ❞❛t❛✳ ❚❤❡ ✇♦r❦✐♥❣ ♦❢ ♦✉r s②st❡♠ ✇❛s ❞❡s❝r✐❜❡❞✐♥ ❛♥♦t❤❡r s❡❝t✐♦♥ ✭s❡❝t✐♦♥ ✸✮✳ ❲❡ ❜✉✐❧❞ ❛ ❞❛t❛ ✇❛r❡❤♦✉s❡ ✇✐t❤ ❛ st❛r s❝❤❡♠❛ ❛♥❞ ❛♥ ❖▲❆P s❝❤❡♠❛ ✇✐t❤ t❤r❡❡❞✐♠❡♥s✐♦♥s✳ ❇✉t t❤❡ s♣❛t✐❛❧ ❞✐♠❡♥s✐♦♥ ♦❢ t❤❡ ❖▲❆P s❝❤❡♠❛ r❛✐s❡s ♣r♦❜❧❡♠s t❤❛t ✇❡r❡ ❡①♣❧❛✐♥❡❞ ❜❡❧♦✇✳

❚♦ ❡①♣❧❛✐♥ ❜✐r❞ ❛❜✉♥❞❛♥❝❡s ✇❡ tr② t♦ ❡st❛❜❧✐s❤ ❝♦rr❡❧❛t✐♦♥s ❜❡t✇❡❡♥ ❜✐r❞s ❛♥❞ ❧❛♥❞s❝❛♣❡s✳ ❆t ❡❛❝❤ ♣♦✐♥t✱ t❤❡r✐✈❡r ❛♥❞ t❤❡ ✈❛❧❧❡② ❛r❡ ❞❡s❝r✐❜❡❞ ❢♦r s❡✈❡r❛❧ ②❡❛rs✳ ■♥ ❢❛❝t ♠❛♥② ✈❛r✐❛❜❧❡s ❛r❡ ❞❡✜♥❡❞ ♦♥❧② ❢♦r ♦♥❡ ❝❛♠♣❛✐❣♥✳▼♦r❡♦✈❡r ❛❧❧ ❦✐♥❞s ♦❢ ✈❛r✐❛❜❧❡s ❛r❡ ♣r❡s❡♥t✿ t❤❡r❡ ❛r❡ ❝♦♥t✐♥✉♦✉s ✈❛r✐❛❜❧❡s✱ ❞✐s❝r❡t❡ ✈❛r✐❛❜❧❡s✱ ♥♦♠✐♥❛❧ ✈❛r✐❛❜❧❡s ❛♥❞♦r❞✐♥❛❧ ✈❛r✐❛❜❧❡s✳ ❚❤❡ ✈❛r✐❛❜❧❡s t❤❛t ❞❡s❝r✐❜❡ ❧❛♥❞s❝❛♣❡s ❛r❡ ♣r❡s❡♥t❡❞ ✐♥ t❤❡ t❛❜❧❡ ❜❡❧♦✇✳

❱❛r✐❛❜❧❡ t②♣❡s ✶✾✾✵ ✶✾✾✻ ✷✵✵✷ ✷✵✶✶

◗✉❛♥t✐t❛t✐✈❡❈♦♥t✐♥✉♦✉s ✽ ✵ ✾✼ ✹✹❉✐s❝r❡t❡ ✼ ✼ ✼ ✶✵

◗✉❛❧✐t❛t✐✈❡❖r❞✐♥❛❧ ✺ ✵ ✵ ✶◆♦♠✐♥❛❧ ✼ ✷ ✹ ✻❇✐♥❛r② ✺ ✵ ✵ ✸

❚❛❜❧❡ ✶✿ ◆✉♠❜❡r ♦❢ ✈❛r✐❛❜❧❡s ✉s❡❞ ❢♦r ❧❛♥❞s❝❛♣❡ ❛♥❞ r✐✈❡r ❞❡s❝r✐♣t✐♦♥ ❛❝❝♦r❞✐♥❣ t♦ t❤❡ ②❡❛r✳

❚❤✐s ❞✐♠❡♥s✐♦♥ ❤❛s t❤r❡❡ ✐♥t❡r❡st✐♥❣ ❢❡❛t✉r❡s✿

❼ ❚❤❡r❡ ✐s ♥♦ ✐♥tr✐♥s✐❝ ❤✐❡r❛r❝❤② ✐♥t♦ t❤❡ ❞❡s❝r✐♣t✐♦♥ ♦❢ ❡♥✈✐r♦♥♠❡♥t ❛❧♦♥❣ t❤❡ r✐✈❡r✿ ❡①❝❡♣t ❦❡②s ❛♥❞ st❛t✐♦♥✐❞❡♥t✐✜❡rs✱ ♦♥❧② t✇♦ st❛t✐♦♥ ❛ttr✐❜✉t❡s ✭♦♥ ✶✶✵✮ ❛r❡ ❧✐♥❦❡❞ ❜② ❛ ❢✉♥❝t✐♦♥❛❧ ❞❡♣❡♥❞❡♥❝②✳

❼ ❚❤❡✐r ❛ttr✐❜✉t❡s ❛r❡ ❤❡t❡r♦❣❡♥❡♦✉s✳

❼ ❚❤❡✐r ❛ttr✐❜✉t❡s ❛r❡ ♥♦t ❞❡✜♥❡❞ ❢♦r ❛❧❧ ❝❛♠♣❛✐❣♥s✳

❆s ❛ ❝♦♥s❡q✉❡♥❝❡ ✇❡ s✉❣❣❡st ❜✉✐❧❞✐♥❣ ❛✉t♦♠❛t✐❝❛❧❧② ❛ ❤✐❡r❛r❝❤② ❢♦r t❤✐s ❞✐♠❡♥s✐♦♥ ❜❡❝❛✉s❡ t❤❡r❡ ✐s ♥♦ ❡①♣❧✐❝✐t❤✐❡r❛r❝❤② ✐♥ t❤✐s ❞✐♠❡♥s✐♦♥ ❛♥❞ ✇❡ ✇❛♥t ♦✛❡r t♦ ❜✐♦❧♦❣✐sts t❤❡ ♣♦ss✐❜✐❧✐t② ♦❢ ❜✉✐❧❞✐♥❣ t❤❡✐r ♦✇♥ ❖▲❆P s❝❤❡♠❛✳

■♥ t❤✐s ❛rt✐❝❧❡✱ ✇❡ ❢♦❝✉s ♦♥ t❤✐s s♣❛t✐❛❧ ❞✐♠❡♥s✐♦♥ ❛♥❞ ♦✉r ♦❜❥❡❝t✐✈❡ ✐s ❣❡♥❡r❛❧✐③✐♥❣ t❤❡ r❡s✉❧ts t❤❛t ✇❡ ♦❜t❛✐♥✇✐t❤ t❤❡s❡ ❞❛t❛✳

✷✳ ❆ ♣r✐♦r✐ ❖▲❆P s❝❤❡♠❛ ❞❡s✐❣♥✿ ✇❤❛t ❛r❡ t❤❡ ❧✐♠✐t❛t✐♦♥s❄

■♥ t❤❡ ♣r❡❝❡❞❡♥t s❡❝t✐♦♥✱ ✇❡ ❤❛✈❡ ♣r❡s❡♥t❡❞ t❤❡ ❞❛t❛ s❡t t❤❛t ✇❡ ✉s❡ ✐♥ t❤✐s st✉❞②✳ ❚❤❡ ✐❞❡❛❧ ❖▲❆P s❝❤❡♠❛t♦ ❛♥❛❧②③❡ t❤❡s❡ ❞❛t❛ ✐s ❛ t❤r❡❡✲❞✐♠❡♥s✐♦♥❛❧ s❝❤❡♠❛ ✇✐t❤ t❤❡ ❛❜✉♥❞❛♥❝❡ ♠❡❛s✉r❡♠❡♥ts ❛s ❢❛❝ts✱ ❛ ❞✐♠❡♥s✐♦♥ t❤❛t❞❡s❝r✐❜❡s t❤❡ s♣❡❝✐❡s✱ ❛ ❞✐♠❡♥s✐♦♥ t❤❛t r❡❝♦r❞s t❤❡ ②❡❛r ♦❢ ❜✐r❞ ❝❡♥s✉s ❛♥❞ ❛ ❞✐♠❡♥s✐♦♥ t❤❛t ❞❡s❝r✐❜❡s t❤❡ ❝❡♥s✉sst❛t✐♦♥s ✭❋✐❣✉r❡ ✶✮✳ ❲✐t❤ t❤✐s str✉❝t✉r❡ ✇❡ ❝❛♥ ♣❡r❢♦r♠ t❤❡ ❛♥❛❧②s✐s t❤❛t ✐s ✐♥t❡r❡st✐♥❣ ✐♥ t❤✐s ❡❝♦❧♦❣✐❝❛❧ st✉❞②✿❡❝♦❧♦❣② s❝✐❡♥t✐sts ✇❛♥t ❝❤❛r❛❝t❡r✐③❡ s♣❛t✐♦✲t❡♠♣♦r❛❧ ❝❤❛♥❣❡s ✐♥t♦ ❜✐r❞ ♣♦♣✉❧❛t✐♦♥s ❛❧♦♥❣ t❤❡ ▲♦✐r❡ ❘✐✈❡r✳

Page 6: The Hierarchical Agglomerative Clustering with Gower index ... · OLAP technology interests more and more elds and especially biology. An OLAP cube provides a very easy navigation

Abundance

Species

- Name- Diet- Migration- ...

Station

Year

- Year- N

ame

- Geographical coordinates

- Ornithological zonation

- ...

❋✐❣✉r❡ ✶✿ ❚❤❡ ❞✐♠❡♥s✐♦♥s ♦❢ ♦✉r ❛♥❛❧②s✐s

❇✉t ✇❡ ❤❛✈❡ ❞❡s❝r✐❜❡❞ s♦♠❡ ❢❡❛t✉r❡s ♦❢ t❤❡ ❞❛t❛ s❡t ✇❤✐❝❤ ❜❛♥ ❛ s✐♠♣❧❡ t❤r❡❡✲❞✐♠❡♥s✐♦♥❛❧ s❝❤❡♠❛✳ ❚❤❡ s♣❛t✐❛❧❞✐♠❡♥s✐♦♥✱ t❤❛t ❞❡s❝r✐❜❡s t❤❡ ❡♥✈✐r♦♥♠❡♥t ❛❧♦♥❣ t❤❡ ▲♦✐r❡ ❘✐✈❡r✱ ✐s str♦♥❣❧② ❝♦rr❡❧❛t❡❞ t♦ t❤❡ t✐♠❡ ❞✐♠❡♥s✐♦♥✳ ❚❤❡❞❡s❝r✐♣t✐♦♥ ♦❢ t❤❡ ❡♥✈✐r♦♥♠❡♥t ✐s t✐♠❡ ❞❡♣❡♥❞❡♥t ❜❡❝❛✉s❡✿

❼ ❚❤❡ ✈❛❧✉❡s ♦❢ s♦♠❡ ❛ttr✐❜✉t❡s✱ t❤❛t ❞❡s❝r✐❜❡ t❤❡ st❛t✐♦♥s✱ ❝❤❛♥❣❡ ❛❝❝♦r❞✐♥❣ t♦ t❤❡ t✐♠❡✳

❼ ▼❛♥② ❛ttr✐❜✉t❡s ❛r❡ ♥♦t ♠❡❛s✉r❡❞ ❢♦r ❛❧❧ ②❡❛rs✳

❙❡✈❡r❛❧ ♠♦❞❡❧s ♦❢ ❞❛t❛ ✇❛r❡❤♦✉s❡ ♠❛② ❜❡ ♣r♦♣♦s❡❞ t♦ ❝♦♥s✐❞❡r t❤✐s ❝♦rr❡❧❛t✐♦♥ ❜❡t✇❡❡♥ s♣❛t✐❛❧ ❞✐♠❡♥s✐♦♥ ❛♥❞t✐♠❡ ❞✐♠❡♥s✐♦♥✳ ❚❤❡ ❢♦❧❧♦✇✐♥❣ s♦❧✉t✐♦♥s ❛r❡ ♣r❡s❡♥t❡❞ ❛t t❤❡ ❝♦♥❝❡♣t✉❛❧ ❧❡✈❡❧✱ ❛❝❝♦r❞✐♥❣ t♦ ▼✉❧t✐❉✐♠❊❘ ♥♦t❛t✐♦♥s✭▼❛❧✐♥♦✇s❦✐ ❛♥❞ ❩✐♠❛♥②✐✱ ✷✵✵✻✮✳ ❉❡t❛✐❧s ♦❢ t❤❡s❡ ♥♦t❛t✐♦♥s ❛r❡ s✉♠♠❛r✐③❡❞ ✐♥ ❆♣♣❡♥❞✐①✳

❚❤❡ ✜rst s♦❧✉t✐♦♥ ✐s ❛ ❢❛❝t ❝♦♥st❡❧❧❛t✐♦♥ s❝❤❡♠❛ ✭❋✐❣✉r❡ ✷✮✳ ❲✐t❤ t❤✐s s♦❧✉t✐♦♥✱ t❤❡r❡ ❛r❡ t✇♦ ❢❛❝t t❛❜❧❡s✿ ❛❢❛❝t t❛❜❧❡ ❢♦r ❛❜✉♥❞❛♥❝❡s ❛❝❝♦r❞✐♥❣ t♦ s♣❡❝✐❡s✱ st❛t✐♦♥s ❛♥❞ ②❡❛rs ❛♥❞ ❛ ❢❛❝t t❛❜❧❡ ❢♦r ❡♥✈✐r♦♥♠❡♥t ❞❡s❝r✐♣t✐♦♥s❛❝❝♦r❞✐♥❣ t♦ st❛t✐♦♥s ❛♥❞ ②❡❛rs✳ ❚❤✐s s♦❧✉t✐♦♥ ✐s t❤❡ ♠♦r❡ ❡❧❡❣❛♥t s♦❧✉t✐♦♥✳ ❲✐t❤ t❤✐s s♦❧✉t✐♦♥✱ t❤❡ ❞❛t❛ st♦r❛❣❡✐s ♦♣t✐♠✐③❡❞✳ ❇✉t t❤❡ ❝r♦ss✐♥❣ ❜❡t✇❡❡♥ ❛❜✉♥❞❛♥❝❡ ❞❛t❛ ❛♥❞ ❡♥✈✐r♦♥♠❡♥t ❞❛t❛ r❡q✉✐r❡s q✉❡r②✐♥❣ t✇♦ ✐♥❞❡♣❡♥❞❡♥t❝✉❜❡s✳ ▼♦r❡♦✈❡r q✉❛❧✐t❛t✐✈❡ ✈❛r✐❛❜❧❡s ❝❛♥♥♦t ❜❡ st♦r❡❞ ✐♥ ❛ ❢❛❝t t❛❜❧❡✳

❚❤❡ s❡❝♦♥❞ s♦❧✉t✐♦♥ ✐s ❛ st❛r s❝❤❡♠❛ ✭❋✐❣✉r❡ ✸✮✳ ❲✐t❤ t❤✐s s♦❧✉t✐♦♥✱ t❤❡r❡ ❛r❡ ❛ ❢❛❝t t❛❜❧❡ ❢♦r ❛❜✉♥❞❛♥❝❡s❛❝❝♦r❞✐♥❣ t♦ s♣❡❝✐❡s✱ t✐♠❡ ❛♥❞ st❛t✐♦♥s✳ ❇✉t t❤❡ ❞❛t❛✱ t❤❛t ❞❡s❝r✐❜❡ t❤❡ s♣❛t✐❛❧ ❞✐♠❡♥s✐♦♥✱ ❛r❡ r❡❧❛t❡❞ t♦ t✐♠❡✳❚❤✉s ❡❛❝❤ st❛t✐♦♥ ✐s ❞✉♣❧✐❝❛t❡❞ ❢♦r ❡❛❝❤ ❝❡♥s✉s ❝❛♠♣❛✐❣♥✳ ❚❤❡r❡❜② t❤❡ st❛t✐♦♥ ♥➦✶ ✐♥ ✶✾✾✵ ❛♥❞ t❤❡ s❛♠❡ st❛t✐♦♥♥➦✶ ✐♥ ✶✾✾✻ ❛r❡ ♥♦t ❝♦♥s✐❞❡r❡❞ ❛s t❤❡ s❛♠❡ ♦❜❥❡❝t ✐♥ t❤❡ ❖▲❆P ❝✉❜❡✳ ❲✐t❤ t❤✐s s♦❧✉t✐♦♥✱ t❤❡ s♣❛t✐❛❧ ❝♦♥s✐st❡♥❝② ♦❢t❤❡ ❞❛t❛s❡t ✐s ❧♦st✳

❚❤❡ t❤✐r❞ s♦❧✉t✐♦♥ ✐s ❛ ❢❛❝t ❝♦♥st❡❧❧❛t✐♦♥ s❝❤❡♠❛ ✭❋✐❣✉r❡ ✹✮✳ ❚❤✐s ❦✐♥❞ ♦❢ s♦❧✉t✐♦♥ ❤❛s ❜❡❡♥ ♣r♦♣♦s❡❞ ❜② ▼✐q✉❡❧❡t ❛❧✳ ✐♥ ✷✵✵✷ ✭▼✐q✉❡❧ ❡t ❛❧✳✱ ✷✵✵✷✮✳ ❲✐t❤ t❤✐s s♦❧✉t✐♦♥✱ ✇❡ ❜✉✐❧❞ ❛ ❢❛❝t t❛❜❧❡ ❢♦r ❡❛❝❤ ❝❡♥s✉s ❝❛♠♣❛✐❣♥✳ ❊❛❝❤ ②❡❛r❧②❢❛❝t t❛❜❧❡ ✐s ❧✐♥❦❡❞ t♦ t❤❡ ✏s♣❡❝✐❡s✑ ❞✐♠❡♥s✐♦♥ ❛♥❞ t♦ ❛ ②❡❛r❧② ✏st❛t✐♦♥s✑ ❞✐♠❡♥s✐♦♥✳ ❚❤❡ ♠❛✐♥ ❞✐s❛❞✈❛♥t❛❣❡ ♦❢ t❤✐ss♦❧✉t✐♦♥ ✐s t❤❡ ❧♦ss ♦❢ t❤❡ t❡♠♣♦r❛❧ ❝♦♥s✐st❡♥❝② ♦❢ t❤❡ ❞❛t❛ s❡t✳

Page 7: The Hierarchical Agglomerative Clustering with Gower index ... · OLAP technology interests more and more elds and especially biology. An OLAP cube provides a very easy navigation

Biodiversity facts

Station

Name

GPS coordinates

Time

Year

Species

Name

Thermic index...

Landscapefacts

Abundance

StreamRiparian forest width

Diet

Name

Migration

Name

Die

tM

igra

tory

b

eh

avio

ur

❋✐❣✉r❡ ✷✿ ❆ ❢❛❝t ❝♦♥st❡❧❧❛t✐♦♥ s❝❤❡♠❛ ✇✐t❤ ❛ ❢❛❝t t❛❜❧❡ ❢♦r ❛❜✉♥❞❛♥❝❡s ❛♥❞ ❛ ❢❛❝t t❛❜❧❡ ❢♦r ❡♥✈✐r♦♥♠❡♥t ❞❡s❝r✐♣t✐♦♥

Biodiversity facts

Station

Name

GPS coordinatesStream (1990)Stream (1996)Riparian forest width (1990)...

Time

Year

Species

Name

Thermic index...

Abundance

Diet

Name

Migration

Name

Die

tM

igra

tory

b

eh

avio

ur

❋✐❣✉r❡ ✸✿ ❆ st❛r s❝❤❡♠❛ ✇✐t❤ ❛ t✐♠❡✲❞❡♣❡♥❞❡♥t s♣❛t✐❛❧ ❞✐♠❡♥s✐♦♥

Page 8: The Hierarchical Agglomerative Clustering with Gower index ... · OLAP technology interests more and more elds and especially biology. An OLAP cube provides a very easy navigation

Biodiversity facts (1990)

Station (1990)

Name

GPS coordinatesStream (1990)Riparian forest width (1990)...

Species

Name

Thermic index...

Abundance

Diet

Name

Migration

Name

Die

tM

igra

tory

b

eh

avio

ur

Biodiversity facts (1996)

Station (1996)

Name

GPS coordinatesStream (1996)...

Abundance

❋✐❣✉r❡ ✹✿ ❆ ❢❛❝t ❝♦♥st❡❧❧❛t✐♦♥ s❝❤❡♠❛ ✇✐t❤ ❛ ❢❛❝t t❛❜❧❡ ❢♦r ❡❛❝❤ ❝❡♥s✉s ②❡❛r

❋✐♥❛❧❧②✱ ♥♦♥❡ ♦❢ t❤❡s❡ t❤r❡❡ s♦❧✉t✐♦♥s ❝❛♥ ♣r♦✈✐❞❡ ❛ ♣❡r❢❡❝t s❝❤❡♠❛ ✭❚❛❜❧❡ ✷✮✳ ❚❤✉s ✇❡ s✉❣❣❡st ✐♥ t❤✐s ❛rt✐❝❧❡❛ s♦❧✉t✐♦♥ t♦ ❜✉✐❧❞ ❛ s✐♥❣❧❡ s♣❛t✐❛❧ ❞✐♠❡♥s✐♦♥✳ ❚❤❡r❡❜② ✇❡ ♦❜t❛✐♥ t❤❡ t❤r❡❡✲❞✐♠❡♥s✐♦♥❛❧ ❝✉❜❡ t❤❛t ✐s s❤♦✇♥ ♦♥❋✐❣✉r❡✶✳ ❚♦ ♣r♦♣♦s❡ ❛ s♣❛t✐❛❧ ❞✐♠❡♥s✐♦♥✱ ✇✐t❤ ❛ ❝♦❤❡r❡♥t ❤✐❡r❛r❝❤②✱ ✇❡ ✉s❡ ❛ ❝❧✉st❡r✐♥❣ ♠❡t❤♦❞✳ ❚❤✐s ❦✐♥❞ ♦❢♠❡t❤♦❞ ❝❛♥ ❞❡t❡❝t ❛ str✉❝t✉r❡ ✐♥ ❛ ❞❛t❛s❡t✳ ❲✐t❤ ❛ ❝❧✉st❡r✐♥❣ ♠❡t❤♦❞ ✇❡ ❝❛♥ ♣r♦♣♦s❡ ❛ ♣r♦t♦t②♣❡ t❤❛t ❜✉✐❧❞s❛✉t♦♠❛t✐❝❛❧❧② ❛ ❞✐♠❡♥s✐♦♥ ❢♦r ❛♥ ❖▲❆P ❝✉❜❡✳

❙♦❧✉t✐♦♥ ✶ ❙♦❧✉t✐♦♥ ✷ ❙♦❧✉t✐♦♥ ✸

❙♦❧✉t✐♦♥❞❡s❝r✐♣t✐♦♥

❋❛❝t ❝♦♥st❡❧❧❛t✐♦♥s❝❤❡♠❛ ✇✐t❤ ❛ ❢❛❝tt❛❜❧❡ ❢♦r❛❜✉♥❞❛♥❝❡s ❛♥❞ ❛❢❛❝t t❛❜❧❡ ❢♦r❡♥✈✐r♦♥♠❡♥t❞❡s❝r✐♣t✐♦♥s

❙t❛r s❝❤❡♠❛ ✇✐t❤ ❛t✐♠❡✲❞❡♣❡♥❞❡♥ts♣❛t✐❛❧ ❞✐♠❡♥s✐♦♥

❋❛❝t ❝♦♥st❡❧❧❛t✐♦♥s❝❤❡♠❛ ✇✐t❤ ❛ ❢❛❝tt❛❜❧❡ ✇✐t❤❛❜✉♥❞❛♥❝❡s ❢♦r❡❛❝❤ ❝❡♥s✉s ②❡❛r

▲✐♠✐t❛t✐♦♥s ♦❢ t❤❡s♦❧✉t✐♦♥

❈r♦ss✐♥❣ ❜❡t✇❡❡♥❛❜✉♥❞❛♥❝❡ ❞❛t❛❛♥❞ ❡♥✈✐r♦♥♠❡♥t❞❛t❛ r❡q✉✐r❡sq✉❡r②✐♥❣ t✇♦❝✉❜❡s✳◗✉❛❧✐t❛t✐✈❡❡♥✈✐r♦♥♠❡♥t❛❧✈❛r✐❛❜❧❡s ❝❛♥♥♦t ❜❡st♦r❡❞✳

❙♣❛t✐❛❧ ❝♦♥s✐st❡♥❝②♦❢ t❤❡ ❞❛t❛s❡t ✐s❧♦st✳

❚❡♠♣♦r❛❧❝♦♥s✐st❡♥❝② ✐s ❧♦st✳

❚❛❜❧❡ ✷✿ ❙✉♠♠❛r② ♦❢ t❤❡ ❧✐♠✐t❛t✐♦♥s ♦❢ ❡❛❝❤ s♦❧✉t✐♦♥

✸✳ Pr♦♣♦s✐t✐♦♥✿ ❛♥ ❛✉t♦♠❛t✐❝ ❤✐❡r❛r❝❤② ❞❡s✐❣♥ ❢♦r ❖▲❆P s❝❤❡♠❛ ❜❛s❡❞ ♦♥ ❝❧✉st❡r✐♥❣ ♠❡t❤♦❞

❚♦ ❡❛s❡ ✉♥❞❡rst❛♥❞✐♥❣ ♦❢ s❡❝t✐♦♥s ✸ ❛♥❞ ✹✱ ✇❡ ♦✛❡r t♦ ❝❧❛r✐❢② s♦♠❡ ✈♦❝❛❜✉❧❛r②✳ ■♥ ❛ ❝❧✉st❡r✐♥❣ ❝♦♥t❡①t✱ ✏✐♥❞✐✲✈✐❞✉❛❧s✑ ❛r❡ ✐t❡♠s✱ ✇❤✐❝❤ ✇✐❧❧ ❜❡ ❝❧❛ss✐✜❡❞✳ ▼♦r❡♦✈❡r ✏✈❛r✐❛❜❧❡s✑ ❛r❡ ❞❡s❝r✐♣t♦rs ♦❢ ✐♥❞✐✈✐❞✉❛❧s✳ ❱❛r✐❛❜❧❡s ❛r❡ ✉s❡❞

Page 9: The Hierarchical Agglomerative Clustering with Gower index ... · OLAP technology interests more and more elds and especially biology. An OLAP cube provides a very easy navigation

t♦ ♣❡r❢♦r♠ t❤❡ ❝❧✉st❡r✐♥❣ ❛❧❣♦r✐t❤♠✱ ❛♥❞ t♦ ♠❡❛s✉r❡ ❛ ❞✐st❛♥❝❡ ❜❡t✇❡❡♥ ✐♥❞✐✈✐❞✉❛❧s✳ ■♥ t❤✐s ❛rt✐❝❧❡✱ t❤❡ ❝❧✉st❡r✐♥❣❛❧❣♦rt✐❤♠ ✐s ♣❡r❢♦r♠ ✐♥ ❛♥ ❖▲❆P ❝♦♥t❡①t ❛♥❞ ✐s ✉s❡❞ t♦ ❜✉✐❧❞ ❛ ❤✐❡r❛r❝❤②✳ ❚❤✉s✱ ✐♥ t❤❡ s❡❝t✐♦♥s ✸ ❛♥❞ ✹✱ ✏✐♥❞✐✈✐❞✉❛❧✑✐s ❛ s②♥♦♥②♠ ♦❢ ✏❞✐♠❡♥s✐♦♥ ♠❡♠❜❡r✑ ❛♥❞ ✏✈❛r✐❛❜❧❡✑ ✐s ❛ s②♥♦♥②♠ ♦❢ ✏❛ttr✐❜✉t❡s✑✳

✸✳✶✳ Pr♦t♦t②♣❡ ✇♦r❦✐♥❣

✸✳✶✳✶✳ ●❡♥❡r❛❧ ✇♦r❦✐♥❣ ♦❢ t❤❡ ♣r♦t♦t②♣❡

❲❡ ❜✉✐❧❞ ❛ ♣r♦t♦t②♣❡ ✇❤✐❝❤ ✐s ❛❜❧❡ t♦ ❡①tr❛❝t t❤❡ r❡❧❡✈❛♥t ❞❛t❛ ❢r♦♠ ❛ ❞❛t❛ ✇❛r❡❤♦✉s❡ ❛♥❞ t♦ ❞❡s✐❣♥ ❛♥❞♣✉❜❧✐s❤ ❛ ♥❡✇ ❤✐❡r❛r❝❤② ✐♥ ❛ ❞✐♠❡♥s✐♦♥✳ ❲❡ s✉❣❣❡st ❛ s②st❡♠ ✇❤✐❝❤ ♣❡r❢♦r♠s ❛ ❤✐❡r❛r❝❤✐❝❛❧ ❝❧✉st❡r✐♥❣ ♦♥ ❛ t❛❜❧❡✐♥ ❛ ❞❛t❛❜❛s❡✳ ❚❤✐s s②st❡♠ ❞❡❞✉❝❡s t❤❡ ♦r❣❛♥✐③❛t✐♦♥ ♦❢ t❤❡ ❤✐❡r❛r❝❤② ❢r♦♠ t❤❡ ❝❧✉st❡r✐♥❣ ♣r♦❝❡ss✳ ◆❡①t ✐t ✉♣❞❛t❡st❤❡ ❖▲❆P s❝❤❡♠❛✱ t❤❡ ❞✐♠❡♥s✐♦♥ t❛❜❧❡ ✐♥ t❤❡ ❞❛t❛ ✇❛r❡❤♦✉s❡ ❛♥❞ t❤❡ ❖▲❆P ❝✉❜❡ ✐♥ ❳▼▲✳

❚❤❡ ✇♦r❦✐♥❣ ♦❢ t❤✐s s②st❡♠ ❤❛s s❡✈❡r❛❧ st❡♣s ✭t❤❡ ♥✉♠❜❡r ♦❢ st❡♣s t❛❧❧✐❡s ✇✐t❤ t❤❡ ♥✉♠❜❡r ♦♥ t❤❡ ❋✐❣✉r❡ ✺✮✿

✶✳ ❚❤❡ s②st❡♠ r❡❝♦✈❡rs ❞❛t❛ ❛♥❞ ♠❡t❛ ❞❛t❛ ❢r♦♠ t❤❡ ❞❛t❛❜❛s❡✳ ❚❤❡ ❞❛t❛ t❤❛t t❤❡ s②st❡♠ ✉s❡s ❛r❡✿ ❞❛t❛ t❤❛t❞❡s❝r✐❜❡ t❤❡ ❞✐♠❡♥s✐♦♥✱ ❞❛t❛ t②♣❡ ✭t❡①t ♦r ♥✉♠❡r✐❝✮ ♦❢ ❡❛❝❤ ✈❛r✐❛❜❧❡ ✐♥ t❤❡ ❞✐♠❡♥s✐♦♥ ❛♥❞ r❡❧❛t✐♦♥s❤✐♣ ❜❡t✇❡❡♥❢❛❝ts ❛♥❞ ♣r♦❝❡ss❡❞ ❞✐♠❡♥s✐♦♥✳

✷✳ ❚❤❡ s②st❡♠ ✐❞❡♥t✐✜❡s t❤❡ t②♣❡ ♦❢ ❡❛❝❤ ✈❛r✐❛❜❧❡✳ ❚❤✐s ✐❞❡♥t✐✜❝❛t✐♦♥ ✐s ❝♦♠♣✉❧s♦r② ❜❡❝❛✉s❡ t❤❡ ❝❛❧❝✉❧❛t✐♦♥ ♦❢❛ ❤✐❡r❛r❝❤✐❝❛❧ ❛❣❣❧♦♠❡r❛t✐✈❡ ❝❧✉st❡r✐♥❣ ♥❡❡❞s ❦♥♦✇❧❡❞❣❡s ❛❜♦✉t t②♣❡ ♦❢ ❡❛❝❤ ✈❛r✐❛❜❧❡✳ ❚❤❡ ✐❞❡♥t✐✜❝❛t✐♦♥ ♦❢❛ ✈❛r✐❛❜❧❡ t②♣❡ ❝❛♥ ❜❡ ♣❡r❢♦r♠❡❞ ❜② t❤❡ ✉s❡r✳ ■♥ t❤✐s ❝❛s❡ t❤❡ ✈❛r✐❛❜❧❡s t②♣❡s ❝❛♥ ❜❡ ❛s❦❡❞ t♦ t❤❡ ✉s❡r ♦rr❡❝♦r❞❡❞ ❛s ♠❡t❛❞❛t❛ ✐♥ t❤❡ ❞❛t❛ ✇❛r❡❤♦✉s❡✳ ❖t❤❡r✇✐s❡ ✐t ✐s ♣♦ss✐❜❧❡ t♦ ❞❡t❡r♠✐♥❡ ❛✉t♦♠❛t✐❝❛❧❧② t❤❡ t②♣❡ ♦❢❛ ✈❛r✐❛❜❧❡ ❛❝❝♦r❞✐♥❣ t♦ t❤❡ t②♣❡ ♦❢ ❞❛t❛ ✭t❡①t ♦r ♥✉♠❡r✐❝✮ ❛♥❞ t❤❡ ♥✉♠❜❡r ♦❢ ✈❛❧✉❡s✳ ❚❤✐s s❡❝♦♥❞ ♣♦✐♥t ✇❛s❡①♣❧❛✐♥❡❞ ✐♥ t❤❡ s✉❜s❡❝t✐♦♥ ✸✳✶✳✹✳

✸✳ ❚❤❡ s②st❡♠ ♣❡r❢♦r♠s t❤❡ ❤✐❡r❛r❝❤✐❝❛❧ ❛❣❣❧♦♠❡r❛t✐✈❡ ❝❧✉st❡r✐♥❣ ✇✐t❤ t❤❡ ●♦✇❡r ✐♥❞❡① ✭❙❡❡ s✉❜s❡❝t✐♦♥ ✸✳✶✳✷❛♥❞ s✉❜s❡❝t✐♦♥ ✸✳✶✳✸✮✳

✹✳ ❆❝❝♦r❞✐♥❣ t♦ t❤❡ r❡s✉❧t ♦❢ ❤✐❡r❛r❝❤✐❝❛❧ ❝❧✉st❡r✐♥❣✱ t❤❡ s②st❡♠ ❝r❡❛t❡s ❛ t❛❜❧❡ ✐♥ t❤❡ ❞❛t❛ ✇❛r❡❤♦✉s❡✳ ❚❤❡ ✜rst❝♦❧✉♠♥ ✐❞❡♥t✐✜❡s t❤❡ ♣♦✐♥ts ❛♥❞ ❡❛❝❤ ♦t❤❡r ❝♦❧✉♠♥ ✐s ❛ ❧❡✈❡❧ ✐♥ t❤❡ ❤✐❡r❛r❝❤✐❝❛❧ ❝❧✉st❡r✐♥❣✳ ■♥ ❢❛❝t✱ t❤❡ ✜rst❝♦❧✉♠♥ ✐s t❤❡ ❧♦✇❡r ❧❡✈❡❧ ♦❢ t❤❡ ❤✐❡r❛r❝❤② ❛♥❞ ❛ ♣r✐♠❛r② ❦❡②✳ ❚❤❡ ✈❛❧✉❡s ♦❢ t❤✐s ✜rst ❝♦❧✉♠♥ ✇❡r❡ ✉s❡❞ ❛s❢♦r❡✐❣♥ ❦❡②s ✐♥ t❤❡ ❢❛❝t t❛❜❧❡✳ ❚❤✐s st❡♣ ✉♣❞❛t❡s t❤❡ ❖▲❆P s❝❤❡♠❛✳ ■♥ ♦✉r ❝❛s❡ ❡❛❝❤ r♦✇ ✐s ❛ ❝❡♥s✉s ♣♦✐♥t❛❧♦♥❣ t❤❡ r✐✈❡r ✭s❡❝t✐♦♥ ✶✮✳

✺✳ ❆❝❝♦r❞✐♥❣ t♦ t❤❡ r❡s✉❧t ♦❢ ❤✐❡r❛r❝❤✐❝❛❧ ❝❧✉st❡r✐♥❣✱ t❤❡ s②st❡♠ ✉♣❞❛t❡s t❤❡ ❳▼▲ ✜❧❡ t❤❛t ❞❡s❝r✐❜❡s t❤❡ ❖▲❆P❝✉❜❡ ✇✐t❤ t❤❡ ♥❡✇ ❤✐❡r❛r❝❤②✳ ❚❤✐s ♥❡✇ ❤✐❡r❛r❝❤② ✐s t❤❡ ❝❛❧❝✉❧❛t❡❞ ❤✐❡r❛r❝❤②✳ ❚❤❡ ❳▼▲ ✜❧❡ s♣❡❝✐✜❡s t❤❡ ❞❛t❛♦r❣❛♥✐③❛t✐♦♥ ✐♥ t❤❡ ❝✉❜❡ ❛♥❞ t❤❡ ♠❡t❛❞❛t❛✳ ❆❢t❡r t❤❡ ❝r❡❛t✐♦♥ ♦❢ t❤❡ ❝✉❜❡✱ t❤✐s ❝✉❜❡ ✐s ♣✉❜❧✐s❤❡❞ ♦♥ t❤❡❖▲❆P s❡r✈❡r✳

✻✳ ❆❢t❡r t❤❡ ❝r❡❛t✐♦♥ ♦❢ t❤❡ ♥❡✇ ❤✐❡r❛r❝❤② ✐♥ t❤❡ ❞❛t❛ ✇❛r❡❤♦✉s❡ ❛♥❞ ❛❢t❡r t❤❡ ♣✉❜❧✐s❤✐♥❣ ♦❢ t❤❡ ♥❡✇ ❝✉❜❡✱ t❤❡✉s❡rs ♦❢ t❤❡ ❖▲❆P s②st❡♠ ❝❛♥ ✉s❡ t❤❡ ♥❡✇ ❝✉❜❡ t❤❛♥❦s t♦ t❤❡ ❞❡❞✐❝❛t❡❞ ✐♥t❡r❢❛❝❡✳

Page 10: The Hierarchical Agglomerative Clustering with Gower index ... · OLAP technology interests more and more elds and especially biology. An OLAP cube provides a very easy navigation

DataWarehouse

Our System

OLAP Server

OLAP Interface

Dimensional

data

Identification of variable types

Hierarchical agglomerative clustering

Cube

New

Hierarchical

dimension

Cube

1

2

3

4 5

5

66

❋✐❣✉r❡ ✺✿ ❚❤❡ ✇♦r❦✐♥❣ ♦❢ ♦✉r ♣r♦t♦t②♣❡

✸✳✶✳✷✳ ❋♦❝✉s ♦♥ ❝❧✉st❡r✐♥❣ ♠❡t❤♦❞✿ t❤❡ ❤✐❡r❛r❝❤✐❝❛❧ ❛❣❣❧♦♠❡r❛t✐✈❡ ❝❧✉st❡r✐♥❣

❉✉r✐♥❣ ❞❡s✐❣♥✐♥❣ ❛♥ ❖▲❆P s❝❤❡♠❛✱ ❤✐❡r❛r❝❤✐❡s ❛r❡ ❝❧❛ss✐❝❛❧❧② ❜✉✐❧t ❜② ❤❛♥❞✳ ❋♦r ❛♥ ❛✉t♦♠❛t✐❝ s②st❡♠✱ ✇❡♥❡❡❞ ✉s❡ ❛♥ ❛❧❣♦r✐t❤♠ t♦ ❜✉✐❧❞ ❤✐❡r❛r❝❤✐❡s✳ ❲❡ s✉❣❣❡st ✉s✐♥❣ ❤✐❡r❛r❝❤✐❝❛❧ ❛❣❣❧♦♠❡r❛t✐✈❡ ❝❧✉st❡r✐♥❣✳ ❍✐❡r❛r❝❤✐❝❛❧❝❧✉st❡r✐♥❣ ❤❛s ❜❡❡♥ ✉s❡❞ ✐♥ ❖▲❆P s②st❡♠s t♦ ✐♠♣r♦✈❡ ♣❡r❢♦r♠❛♥❝❡s ♦❢ q✉❡r✐❡s ✭▼❛r❦❧ ❡t ❛❧✳✱ ✶✾✾✾✮ ♦r t♦ ❞❡s✐❣♥❖▲❆P s❝❤❡♠❛ ✭❯s♠❛♥ ❡t ❛❧✳✱ ✷✵✶✵✮✳

❚❤❡ ❤✐❡r❛r❝❤✐❝❛❧ ❛❣❣❧♦♠❡r❛t✐✈❡ ❝❧✉st❡r✐♥❣ ✐s ❛ ❝❧✉st❡r✐♥❣ ♠❡t❤♦❞✳ ❚❤✐s ♠❡t❤♦❞ ✐s ❛♥ ✉♥s✉♣❡r✈✐s❡❞ ♠❡t❤♦❞ ✭✐✳❡✳♥♦ ❧❡❛r♥✐♥❣ ✐s ♥❡❡❞❢✉❧✮✳ ❚❤❡ ❛✐♠ ♦❢ t❤✐s ♠❡t❤♦❞ ✐s t❤❡ ❜✉✐❧❞✐♥❣ ♦❢ ❛ ❤✐❡r❛r❝❤② ❢♦r ✜♥❞ ❣r♦✉♣s ✐♥t♦ t❤❡ ❞❛t❛✳ ■♥ ❛❤✐❡r❛r❝❤✐❝❛❧ ❛❣❣❧♦♠❡r❛t✐✈❡ ❝❧✉st❡r✐♥❣✱ ❡❛❝❤ ❜r❛♥❝❤ ♦❢ t❤❡ ❜✉✐❧t ❤✐❡r❛r❝❤② ✐s ❛ ❝❧✉st❡r✳ ❚❤✐s ♠❡t❤♦❞ ❤❛s s❡✈❡r❛❧ st❡♣s✭❚✉✛❡r②✱ ✷✵✶✶✮✿

✶✳ ❈❛❧❝✉❧❛t✐♦♥ ♦❢ ❞✐st❛♥❝❡s ❜❡t✇❡❡♥ ✐♥❞✐✈✐❞✉❛❧s✳✷✳ ❈❤♦✐❝❡ ♦❢ t❤❡ t✇♦ ♥❡❛r❡st ✐♥❞✐✈✐❞✉❛❧s✳✸✳ ❆❣❣r❡❣❛t✐♦♥ ♦❢ t❤❡ t✇♦ ♥❡❛r❡st ✐♥❞✐✈✐❞✉❛❧s ✐♥ ❛ ❝❧✉st❡r✳ ❚❤❡ ❝❧✉st❡r ✐s ♥♦✇ ❝♦♥s✐❞❡r❡❞ ❛♥ ✐♥❞✐✈✐❞✉❛❧✳✹✳ ●♦ ❜❛❝❦ t♦ t❤❡ st❡♣ ✶ ❛♥❞ ❧♦♦♣ ✇❤✐❧❡ t❤❡r❡ ✐s ♠♦r❡ t❤❛♥ ♦♥❡ ✐♥❞✐✈✐❞✉❛❧✳

❚❤❡ r❡s✉❧ts ♦❢ ❛ ❤✐❡r❛r❝❤✐❝❛❧ ❛❣❣❧♦♠❡r❛t✐✈❡ ❝❧✉st❡r✐♥❣ ❝❛♥ ❜❡ s❤♦✇❡❞ ❛s ❛ tr❡❡ ✇❤✐❝❤ r❡♣r❡s❡♥ts t❤❡ ❞✐st❛♥❝❡ ❜❡t✇❡❡♥t❤❡ ✐♥❞✐✈✐❞✉❛❧s ✭❏❛✐♥ ❡t ❛❧✳✱ ✶✾✾✾✮✳

❚♦ ♣❡r❢♦r♠ ❛ ❤✐❡r❛r❝❤✐❝❛❧ ❛❣❣❧♦♠❡r❛t✐✈❡ ❝❧✉st❡r✐♥❣✱ ✇❡ ❤❛✈❡ t♦ ❞❡✜♥❡✿

❼ ❆ ♠❡tr✐❝ t♦ ♠❡❛s✉r❡ t❤❡ ❞✐st❛♥❝❡ ❜❡t✇❡❡♥ ✐♥❞✐✈✐❞✉❛❧s✳

❼ ❆ ♠❡t❤♦❞ t♦ ❛❣❣r❡❣❛t❡ ✐♥❞✐✈✐❞✉❛❧s ✐♥ ❝❧✉st❡r✳

❚❤❡ ♣r♦❜❧❡♠ ✇✐t❤ ♦✉r ❞❛t❛ s❡t ✐s q✉❛❧✐t❛t✐✈❡ ✈❛r✐❛❜❧❡s✳ ❲✐t❤ q✉❛❧✐t❛t✐✈❡ ✈❛r✐❛❜❧❡s ✇❡ ❝❛♥♥♦t ❞❡✜♥❡ ❛ ❝❧✉st❡r ❧✐❦❡t❤❡ ❝❡♥tr♦✐❞ ♦❢ t❤❡s❡ ♠❡♠❜❡rs✳ ❚♦ ♠❡❛s✉r❡ t❤❡ ❞✐st❛♥❝❡ ❜❡t✇❡❡♥ t✇♦ ❝❧✉st❡rs✱ ✇❡ ❝❛❧❝✉❧❛t❡ t❤❡ ❛✈❡r❛❣❡ ♦❢ ❛❧❧❞✐st❛♥❝❡s ❜❡t✇❡❡♥ ❛❧❧ ✐♥❞✐✈✐❞✉❛❧s ✐♥ ❡❛❝❤ ❝❧✉st❡r✳ ❲❡ ✉s❡ ✉♥✇❡✐❣❤t❡❞ ❛✈❡r❛❣❡ ❧✐♥❦❛❣❡✳ ❙❡✈❡r❛❧ ❧✐♥❦❛❣❡ ♠❡t❤♦❞s ❝❛♥❜❡ ✉s❡❞✿ ✉♥✇❡✐❣❤t❡❞ ❛✈❡r❛❣❡ ❞✐st❛♥❝❡ ✭❯P●▼❆✮✱ ❢✉rt❤❡st ❞✐st❛♥❝❡✱ s❤♦rt❡st ❞✐st❛♥❝❡ ❛♥❞ ✇❡✐❣❤t❡❞ ❛✈❡r❛❣❡ ❞✐st❛♥❝❡✭❲P●▼❆✮✳ ❲❡ ✉s❡ ❯P●▼❆✱ ❜❡❝❛✉s❡✱ ✇✐t❤ ♥♦ ❦♥♦✇❧❡❞❣❡ ♦♥ t❤❡ ❞❛t❛ str✉❝t✉r❡✱ t❤✐s ❧✐♥❦❛❣❡ ❛♣♣❡❛rs ❧✐❦❡ t❤❡ ❜❡sts✉♠♠❛r② ♦❢ t❤❡ ❞✐st❛♥❝❡ ❜❡t✇❡❡♥ t✇♦ ❝❧✉st❡rs ✭❑♦❥❛❞✐♥♦✈✐❝✱ ✷✵✵✹✮✳

❚❤❡ ❞✐st❛♥❝❡ ❜❡t✇❡❡♥ t✇♦ ✐♥❞✐✈✐❞✉❛❧s ♠✉st ♠✐① q✉❛♥t✐t❛t✐✈❡ ❛♥❞ q✉❛❧✐t❛t✐✈❡ ✈❛r✐❛❜❧❡s✳ ❚❤❡ tr❛❞✐t✐♦♥❛❧ ♠❡tr✐❝s❧✐❦❡ ▼❛♥❤❛tt❛♥ ❞✐st❛♥❝❡✱ ❊✉❝❧✐❞✐❛♥ ❞✐st❛♥❝❡ ♦r ▼✐♥❦♦✇s❦✐ ❞✐st❛♥❝❡ ❛r❡ ♥♦t r❡❧❡✈❛♥t ✐♥ t❤❡ ❝❛s❡ ♦❢ ❛ ♠✐①❡❞ ❞❛t❛ s❡t✳

Page 11: The Hierarchical Agglomerative Clustering with Gower index ... · OLAP technology interests more and more elds and especially biology. An OLAP cube provides a very easy navigation

❚❤❡r❡❜② ✇❡ s✉❣❣❡st ♠❡❛s✉r✐♥❣ t❤❡ ❞✐st❛♥❝❡s ❜❡t✇❡❡♥ ✐♥❞✐✈✐❞✉❛❧s ✇✐t❤ ❛♥ s✐♠✐❧❛r✐t② ✐♥❞❡① t❤❛t ❝♦♠❡s ❢r♦♠ ❜✐♦❧♦❣②✿t❤❡ ●♦✇❡r s✐♠✐❧❛r✐t② ✐♥❞❡① ✭s✉❜s❡❝t✐♦♥ ✸✳✶✳✸✮✳

✸✳✶✳✸✳ ❋♦❝✉s ♦♥ ❞✐st❛♥❝❡ ♠❡❛s✉r❡♠❡♥t✿ t❤❡ ●♦✇❡r ✐♥❞❡①

❚❤❡ ●♦✇❡r ✐♥❞❡① ✐s ❞❡s✐❣♥❡❞ t♦ ♠❡❛s✉r❡ s✐♠✐❧❛r✐t② ❜❡t✇❡❡♥ t✇♦ ✐♥❞✐✈✐❞✉❛❧s t❤❛t ❛r❡ ❞❡✜♥❡❞ ❜② ❤❡t❡r♦❣❡♥❡♦✉s✈❛r✐❛❜❧❡s ✭●♦✇❡r✱ ✶✾✼✶✮✳ ❚❤❡ ●♦✇❡r ✐♥❞❡① ✐s ❛ ❝❧❛ss✐❝❛❧ s✐♠✐❧❛r✐t② ✐♥❞❡①✱ ✇❤✐❝❤ ✐s ♦❢t❡♥ ✉s❡❞ ✐♥ ❛♥ ❡❝♦❧♦❣✐❝❛❧ st✉❞②♦r ✐♥ ❛ ♠♦❞❡❧✐♥❣ ✇♦r❦ ✭❙❡❣✉r❛❞♦ ❛♥❞ ❆r❛✉❥♦✱ ✷✵✵✹❀ ❲❡st♣❤❛❧ ❡t ❛❧✳✱ ✷✵✵✼✮✳ ❚❤❡ ●♦✇❡r ✐♥❞❡① ✐s ❝❛❧❝✉❧❛t❡❞ ❛s ❢♦❧❧♦✇✿

❼ I1❛♥❞ I2 ❛r❡ t✇♦ ✐♥❞✐✈✐❞✉❛❧s✳

❼ N ✐s t❤❡ ♥✉♠❜❡r ♦❢ ✈❛r✐❛❜❧❡s ✉s❡❞ t♦ ❞❡✜♥❡ t❤❡ ✐♥❞✐✈✐❞✉❛❧s✳

❼ wi ✐s ❛ ✇❡✐❣❤t✳ ■❢ t❤❡ ✈❛r✐❛❜❧❡ ♥➦i ✐s ♥♦t ❞❡✜♥❡ ❢♦r I1 ♦r I2✱ t❤❡♥ wi = 0✳ ❊❧s❡ wi = 1✳

❼ Si(I1, I2) ❞❡♣❡♥❞s ♦❢ t❤❡ t②♣❡ ♦❢ t❤❡ ✈❛r✐❛❜❧❡ ♥➦i ❝❛❧❧❡❞ Vi ✿

✕ ■❢ ✈❛r✐❛❜❧❡ ♥➦i ✐s q✉❛❧✐t❛t✐✈❡ t❤❡♥✿

✯ ■❢ Vi(I1) = Vi(I2) t❤❡♥ Si(I1, I2) = 1✱✯ ❊❧s❡ Si(I1, I2) = 0

✕ ■❢ ✈❛r✐❛❜❧❡ ♥➦i ✐s q✉❛♥t✐t❛t✐✈❡ t❤❡♥✿ Si(I1, I2) = 1− |Vi(I1)−Vi(I2)|Max(Vi)−Min(Vi)

✐♥ t❤❡ ❢♦❧❧♦✇✐♥❣ ❡q✉❛t✐♦♥

SG(I1, I2) =

∑N

i=1[wiSi(I1, I2)]∑N

i=1[wi]

❙♦♠❡ ❢❡❛t✉r❡s ♦❢ t❤❡ ●♦✇❡r ✐♥❞❡① ❝❛♥ ❜❡ ❞❡t❛✐❧❡❞✳ ❋✐rst✱ t❤❡ ●♦✇❡r ✐♥❞❡① ✐s ❛ s✐♠✐❧❛r✐t② ✐♥❞❡①✳ ❚❤✉s ✐❢ ❛ ●♦✇❡r✐♥❞❡① ✈❛❧✉❡ ❜❡t✇❡❡♥ t✇♦ ✐♥❞✐✈✐❞✉❛❧s ✐s ❝❧♦s❡ t♦ ✶✱ ✐t ♠❡❛♥s t❤❛t t❤❡ t✇♦ ✐♥❞✐✈✐❞✉❛❧s ❛r❡ ✈❡r② s✐♠✐❧❛r✳

❙❡❝♦♥❞❧② ✇❡ ❡①♣❧❛✐♥ t❤❡ ❜✉✐❧❞✐♥❣ ♦❢ t❤❡ ●♦✇❡r ✐♥❞❡①✳ ❚❤❡ ❝❛❧❝✉❧❛t✐♦♥ ♦❢ ●♦✇❡r ✐♥❞❡① ❝♦rr❡s♣♦♥❞s t♦ ❛ ✇❡✐❣❤t❡❞❛✈❡r❛❣❡✳ ■♥ ❢❛❝t✱ ✇❡ ❝❛❧❝✉❧❛t❡ ❛ s✐♠✐❧❛r✐t② ✈❛❧✉❡ ❜❡t✇❡❡♥ t✇♦ ✐♥❞✐✈✐❞✉❛❧s ❢♦r ❡❛❝❤ ✈❛r✐❛❜❧❡✳ ❚❤❡ ●♦✇❡r ✐♥❞❡① ✐s t❤❡✇❡✐❣❤t❡❞ ❛✈❡r❛❣❡ ♦❢ t❤❡s❡ s✐♠✐❧❛r✐t✐❡s ❛❝❝♦r❞✐♥❣ t♦ ✈❛r✐❛❜❧❡s✳ ❚❤❡ ●♦✇❡r ✐♥❞❡① ❞✐st✐♥❣✉✐s❤❡s q✉❛❧✐t❛t✐✈❡ ✈❛r✐❛❜❧❡s❛♥❞ q✉❛♥t✐t❛t✐✈❡ ✈❛r✐❛❜❧❡s✳ ❖♥ t❤❡ ♦♥❡ ❤❛♥❞ t❤✐s s✐♠✐❧❛r✐t② ✐♥❞❡① tr❡❛ts ❛ q✉❛❧✐t❛t✐✈❡ ✈❛r✐❛❜❧❡ ✇✐t❤ ❛ ❜♦♦❧❡❛♥✳ ■❢t❤❡ ✐♥❞✐✈✐❞✉❛❧s ❛r❡ ✐♥ t❤❡ s❛♠❡ ❝❧❛ss✱ t❤❡ ❜♦♦❧❡❛♥ ✐s ❡q✉❛❧ t♦ ✶✳ ❊❧s❡ t❤❡ ❜♦♦❧❡❛♥ ✐s ❡q✉❛❧ t♦ ✵✳ ❖♥ t❤❡ ♦t❤❡r ❤❛♥❞t❤✐s s✐♠✐❧❛r✐t② ✐♥❞❡① tr❡❛ts t❤❡ q✉❛♥t✐t❛t✐✈❡ ✈❛r✐❛❜❧❡s ❛s ❢♦❧❧♦✇✿ ✇❡ ❝❛❧❝✉❧❛t❡ ❛ ❞✐st❛♥❝❡ ❜❡t✇❡❡♥ t✇♦ ✐♥❞✐✈✐❞✉❛❧s✇✐t❤ t❤❡ ❛❜s♦❧✉t❡ ✈❛❧✉❡ ♦❢ t❤❡ ❞✐✛❡r❡♥❝❡✳ ❚❤✐s ❛❜s♦❧✉t❡ ❞✐✛❡r❡♥❝❡ ✐s ❞✐✈✐❞❡❞ ❜② t❤❡ r❛♥❣❡ ✭t❤❡ ❞✐✛❡r❡♥❝❡ ❜❡t✇❡❡♥♠❛①✐♠✉♠ ❛♥❞ ♠✐♥✐♠✉♠✮ ♦❢ t❤❡ ✈❛r✐❛❜❧❡✳ ❲✐t❤ t❤✐s ❞✐✈✐s✐♦♥✱ t❤❡ ❞✐✛❡r❡♥❝❡ ❜❡t✇❡❡♥ t✇♦ ✐♥❞✐✈✐❞✉❛❧s ❛❝❝♦r❞✐♥❣ t♦ ❛✈❛r✐❛❜❧❡ ✐s ✐♥❞❡♣❡♥❞❡♥t ♦❢ t❤❡ r❛♥❣❡ ♦❢ t❤❡ ✈❛r✐❛❜❧❡✳ ❋✐♥❛❧❧②✱ t❤❡ ❢r❛❝t✐♦♥ ✐s s✉❜tr❛❝t❡❞ t♦ ✶✳ ❚❤❡r❡❜② ✇❡ ♦❜t❛✐♥ t❤❡s✐♠✐❧❛r✐t② ❜❡t✇❡❡♥ t✇♦ ✐♥❞✐✈✐❞✉❛❧s ❛❝❝♦r❞✐♥❣ t♦ ♦♥❡ ✈❛r✐❛❜❧❡✳

◆♦✇ ✇❡ ❝❛♥ ❝❛❧❝✉❧❛t❡ t❤❡ s✐♠✐❧❛r✐t② ❜❡t✇❡❡♥ t✇♦ ✐♥❞✐✈✐❞✉❛❧s ❛❝❝♦r❞✐♥❣ t♦ ❡❛❝❤ ✈❛r✐❛❜❧❡✳ ❇✉t ✇❡ ♥❡❡❞ ❞❡✜♥❡✇❡✐❣❤ts ❢♦r ❡❛❝❤ ✈❛r✐❛❜❧❡✳ ❚❤❡ ✇❡✐❣❤ts ♣❡r♠✐t t♦ ♠❛♥❛❣❡ t❤❡ ♠✐ss✐♥❣ ✈❛❧✉❡s✳ ❲❤❡♥ ✇❡ ❝❛❧❝✉❧❛t❡ t❤❡ ●♦✇❡r ✐♥❞❡①❜❡t✇❡❡♥ t✇♦ ✐♥❞✐✈✐❞✉❛❧s✱ s♦♠❡t✐♠❡s ❛ ✈❛r✐❛❜❧❡ ✐s ✉♥❞❡✜♥❡❞ ❢♦r ❛♥ ✐♥❞✐✈✐❞✉❛❧✳ ■♥ t❤✐s ❝❛❧❝✉❧❛t✐♦♥✱ t❤❡ ✉♥❞❡✜♥❡❞✈❛r✐❛❜❧❡ ✐s ✇❡✐❣❤t❡❞ t♦ ✵✿ t❤✐s ✈❛r✐❛❜❧❡ ✐s ❡①❝❧✉❞❡❞ ♦❢ t❤❡ ●♦✇❡r ✐♥❞❡① ❝❛❧❝✉❧❛t✐♦♥✳ ❚❤❡r❡❜②✱ ✇❡ ♠❛♥❛❣❡ ♠✐ss✐♥❣✈❛❧✉❡s ✇✐t❤ ✈❛r✐❛❜❧❡ ✇❡✐❣❤ts✳ ▼♦r❡♦✈❡r✱ ✇✐t❤ t❤❡ ✇❡✐❣❤ts✱ ✇❡ ❝❛♥ ♠❛♥❛❣❡ t❤❡ ✐♠♣♦rt❛♥❝❡ ♦❢ ❡❛❝❤ ✈❛r✐❛❜❧❡✳ ■❢ t❤❡✉s❡r ✇❛♥t ❣✐✈❡ ♠♦r❡ ✐♠♣♦rt❛♥❝❡ t♦ ❛ ✈❛r✐❛❜❧❡✱ ❤❡ ❝❛♥ ✜① ❛❝❝♦r❞✐♥❣❧② t❤❡ ✇❡✐❣❤t ♦❢ ❡❛❝❤ ✈❛r✐❛❜❧❡✳

❲❡ ♣r♦♣♦s❡ t♦ ❝❛❧❝✉❧❛t❡ t❤❡ ●♦✇❡r ✐♥❞❡① ❢♦r ❛♥ ❡①❛♠♣❧❡ ✭❚❛❜❧❡ ✸✱ ❚❛❜❧❡ ✹ ❛♥❞ ❚❛❜❧❡ ✺✮✳

❚❤❡ ❢♦❧❧♦✇✐♥❣ t❛❜❧❡ ✐s t❤❡ ❞❡s❝r✐♣t✐♦♥ ♦❢ t❤❡ ✈❛r✐❛❜❧❡s t❤❛t ✇❡ ✉s❡ ✐♥ t❤✐s ❡①❛♠♣❧❡✿❱❛r✐❛❜❧❡ ♥❛♠❡ ❱❛r✐❛❜❧❡ t②♣❡ ▼✐♥✐♠✉♠ ✈❛❧✉❡ ▼❛①✐♠✉♠ ✈❛❧✉❡

❆❧t✐t✉❞❡ ◗✉❛♥t✐t❛t✐✈❡ ✵ ✶✹✶✵❈♦♥✢✉❡♥❝❡ ◗✉❛❧✐t❛t✐✈❡ ✲ ✲

❇❛♥❦ ◗✉❛❧✐t❛t✐✈❡ ✲ ✲❈✉rr❡♥t ◗✉❛❧✐t❛t✐✈❡ ✲ ✲

❙✉❜str❛t✉♠ ◗✉❛❧✐t❛t✐✈❡ ✲ ✲❆q✉❛t✐❝ ✈❡❣❡t❛t✐♦♥ ◗✉❛❧✐t❛t✐✈❡ ✲ ✲

❙❛❧✐♥✐t② ◗✉❛♥t✐t❛t✐✈❡ ✵ ✸✺❙❧♦♣❡ ◗✉❛♥t✐t❛t✐✈❡ ✵ ✶✷✵

❱❛❧❧❡② ✇✐❞t❤ ◗✉❛♥t✐t❛t✐✈❡ ✵ ✷✾✺✵

❚❛❜❧❡ ✸✿ ❱❛r✐❛❜❧❡s ✉s❡❞ ❢♦r t❤❡ ❡①❛♠♣❧❡

✶✵

Page 12: The Hierarchical Agglomerative Clustering with Gower index ... · OLAP technology interests more and more elds and especially biology. An OLAP cube provides a very easy navigation

❚❤❡ ❢♦❧❧♦✇✐♥❣ t❛❜❧❡ ✐s t❤❡ ❞❡s❝r✐♣t✐♦♥ ♦❢ t✇♦ st❛t✐♦♥s✱ ✇❤✐❝❤ ❛r❡ ❞❡s❝r✐❜❡❞ ✇✐t❤ t❤❡ ♣r❡✈✐♦✉s ✈❛r✐❛❜❧❡s✿❱❛r✐❛❜❧❡ ♥❛♠❡ ❙t❛t✐♦♥ ♥➦✶ ❙t❛t✐♦♥ ♥➦✶✶

❆❧t✐t✉❞❡ ✶✹✶✵ ✽✾✾❈♦♥✢✉❡♥❝❡ ◆♦ ◆♦

❇❛♥❦ ✵ ✶✲✶✺❈✉rr❡♥t ❁✶✵ ✶✵✲✷✺

❙✉❜str❛t✉♠ ♠✉❞ ❛♥❞ s✐❧t ❜❧♦❝❦s❆q✉❛t✐❝ ✈❡❣❡t❛t✐♦♥ ✵ ✶✲✶✺

❙❛❧✐♥✐t② ✵ ✵❙❧♦♣❡ ✶✷✵ ✸✳✻

❱❛❧❧❡② ✇✐❞t❤ ✵✳✷ ✶✶

❚❛❜❧❡ ✹✿ ■♥❞✐✈✐❞✉❛❧s ✉s❡❞ ❢♦r t❤❡ ❡①❛♠♣❧❡

❚❤❡ ❢♦❧❧♦✇✐♥❣ t❛❜❧❡ s❤♦✇s t❤❡ ♠❡♠❜❡rs ♦❢ t❤❡ ❢♦r♠✉❧❛ ❢♦r ❝❛❧❝✉❧❛t✐♦♥ ♦❢ t❤❡ s✐♠✐❧❛r✐t② ✐♥❞❡①✿❱❛r✐❛❜❧❡ ♥❛♠❡ wi Si

❆❧t✐t✉❞❡ ✶ ✵✳✻✹❈♦♥✢✉❡♥❝❡ ✶ ✶

❇❛♥❦ ✶ ✵❈✉rr❡♥t ✶ ✵

❙✉❜str❛t✉♠ ✶ ✵❆q✉❛t✐❝ ✈❡❣❡t❛t✐♦♥ ✶ ✵

❙❛❧✐♥✐t② ✶ ✶❙❧♦♣❡ ✶ ✵✳✵✸

❱❛❧❧❡② ✇✐❞t❤ ✶ ✵✳✾✾❙✉♠ ✾ ✸✳✻✻

❚❤❡ ❢♦❧❧♦✇✐♥❣ ❢♦r♠✉❧❛ ✐s t❤❡ ❝❛❧❝✉❧❛t✐♦♥ ♦❢ t❤❡ s✐♠✐❧❛r✐t② ❜❡t✇❡❡♥ st❛t✐♦♥ ♥➦✶ ❛♥❞ st❛t✐♦♥ ♥➦ ✶✶✿

SG =

∑wiSi∑wi

=3.66

9≃ 0.41

❚❛❜❧❡ ✺✿ ❈❛❧❝✉❧❛t✐♦♥ ♦❢ ●♦✇❡r ✐♥❞❡① ♦❢ s✐♠✐❧❛r✐t② ❜❡t✇❡❡♥ t✇♦ st❛t✐♦♥s

✸✳✶✳✹✳ ❋♦❝✉s ♦♥ t❤❡ ❞❡t❡r♠✐♥❛t✐♦♥ ♦❢ ❛ ✈❛r✐❛❜❧❡ t②♣❡

■♥ ♦✉r s②st❡♠✱ t❤❡ ✉s❡r t❡❧❧s ✐❢ t❤❡ ✈❛r✐❛❜❧❡ ✐s q✉❛♥t✐t❛t✐✈❡ ♦r q✉❛❧✐t❛t✐✈❡✳ ❇✉t ✐❢ t❤❡ ♥✉♠❜❡r ♦❢ ✈❛r✐❛❜❧❡ ✐s ✈❡r②✐♠♣♦rt❛♥t ♦r ✐❢ t❤❡ ✐♥❢♦r♠❛t✐♦♥ ✐s ♠✐ss✐♥❣✱ ✇❡ ❝❛♥ ✐♠❛❣✐♥❡ t❤❛t t❤❡ s②st❡♠ ✜♥❞ t❤❡ t②♣❡ ♦❢ ✈❛r✐❛❜❧❡ ✐ts❡❧❢✳ ❚②♣❡ ♦❢❛ ✈❛r✐❛❜❧❡ ❞❡♣❡♥❞s ♦❢ t②♣❡ ♦❢ ❞❛t❛ ✭t❡①t ♦r ♥✉♠❜❡r✮ ❛♥❞ t❤❡ ♥✉♠❜❡r ♦❢ ❛♣♣❡❛r❛♥❝❡ ♦❢ ❡❛❝❤ ✈❛❧✉❡s ✭❚❛❜❧❡ ✻✮✳ ❚✇♦❝❛s❡s ❛r❡ ✈❡r② ❡❛s② t♦ s♦❧✈❡✿

✶✳ ■❢ ❞❛t❛ ❛r❡ ♥✉♠❜❡rs ❛♥❞ ✐❢ t❤❡ ♥✉♠❜❡r ♦❢ ✈❛❧✉❡s ✐s ❛♣♣r♦①✐♠❛t❡❧② ❡q✉❛❧ t♦ t❤❡ ♥✉♠❜❡r ♦❢ ✐♥❞✐✈✐❞✉❛❧s✱ t❤❡♥t❤❡ ✈❛r✐❛❜❧❡ ✐s q✉❛♥t✐t❛t✐✈❡✳

✷✳ ■❢ ❞❛t❛ ❛r❡ t❡①ts ❛♥❞ ✐❢ t❤❡ ♥✉♠❜❡r ♦❢ ✈❛❧✉❡s ✐s ✈❡r② s♠❛❧❧❡r t❤❛♥ t❤❡ ♥✉♠❜❡r ♦❢ ✐♥❞✐✈✐❞✉❛❧s✱ t❤❡♥ t❤❡ ✈❛r✐❛❜❧❡✐s q✉❛❧✐t❛t✐✈❡✳

❚✇♦ ❝❛s❡s ❛r❡ ♠♦r❡ ♣r♦❜❧❡♠❛t✐❝✿

✶✳ ■❢ ❞❛t❛ ❛r❡ t❡①ts ❛♥❞ ✐❢ t❤❡ ♥✉♠❜❡r ♦❢ ✈❛❧✉❡s ✐s ❛♣♣r♦①✐♠❛t❡❧② ❡q✉❛❧ t♦ t❤❡ ♥✉♠❜❡r ♦❢ ✐♥❞✐✈✐❞✉❛❧s✳ ■♥ t❤✐s ❝❛s❡✱t❤❡ q✉❡st✐♦♥ ✐s✿ ❞♦❡s t❤❡ ❝♦♠♣❛r✐s♦♥ ❜❡t✇❡❡♥ t✇♦ ❝❤❛r❛❝t❡r str✐♥❣s ♠❛❦❡ s❡♥s❡❄ ■❢ t❤❡ ❝♦♠♣❛r✐s♦♥ ❜❡t✇❡❡♥t✇♦ ❝❤❛r❛❝t❡r s❡q✉❡♥❝❡s ♠❛❦❡s s❡♥s❡✱ t❤✐s ❝♦♠♣❛r✐s♦♥ ✐s ♣♦ss✐❜❧❡ ❛♥❞ ❛ s✐♠✐❧❛r✐t② ❜❡t✇❡❡♥ t✇♦ ✈❛❧✉❡ ❝❛♥ ❜❡❝❛❧❝✉❧❛t❡❞✳ ❊❧s❡ t❤❡ ✈❛r✐❛❜❧❡ ✐s ♣r♦❜❛❜❧② ❛ ♣r✐♠❛r② ❦❡②✱ ❛ ✉♥✐q✉❡ ♥❛♠❡ ❢♦r ❡❛❝❤ ✐♥❞✐✈✐❞✉❛❧✳ ■❢ t❤✐s ✈❛r✐❛❜❧❡ ✐s❛ ♣r✐♠❛r② ❦❡②✱ ✐t ❞♦❡s ♥♦t ♣r♦✈✐❞❡ ❜❡♥❡✜t ❢♦r t❤❡ ❝❧✉st❡r✐♥❣ ♣r♦❝❡ss✳ ❚❤❡r❡❜② t❤✐s t②♣❡ ♦❢ ✈❛r✐❛❜❧❡s ✇✐❧❧ ❜❡❡①❝❧✉❞❡❞ t♦ t❤❡ ❝❧✉st❡r✐♥❣ ♣r♦❝❡ss✳

✷✳ ■❢ ❞❛t❛ ❛r❡ ♥✉♠❜❡rs ❛♥❞ ✐❢ t❤❡ ♥✉♠❜❡r ♦❢ ✈❛❧✉❡s ✐s s♠❛❧❧❡r t❤❛♥ t❤❡ ♥✉♠❜❡r ♦❢ ✐♥❞✐✈✐❞✉❛❧s✱ t❤❡♥ t❤❡ ✈❛r✐❛❜❧❡❝❛♥ ❜❡ ❛ q✉❛❧✐t❛t✐✈❡ ✈❛r✐❛❜❧❡ r❡❝♦r❞❡❞ ✇✐t❤ ♥✉♠❜❡rs ♦r ❛ ❞✐s❝r❡t❡ q✉❛♥t✐t❛t✐✈❡ ✈❛r✐❛❜❧❡✳

■♥ t❤❡s❡ t✇♦ ♣r♦❜❧❡♠❛t✐❝ ❝❛s❡s✱ t❤❡ s②st❡♠ ❝❛♥ ❛s❦s t❤❡ ✉s❡r ✇❤❛t t❤❡ t②♣❡ ♦❢ t❤❡ ✈❛r✐❛❜❧❡ ✐s✳

✶✶

Page 13: The Hierarchical Agglomerative Clustering with Gower index ... · OLAP technology interests more and more elds and especially biology. An OLAP cube provides a very easy navigation

◆✉♠❜❡r ♦❢ ✈❛❧✉❡s◆✉♠❜❡r ♦❢ ✈❛❧✉❡s≈ ◆✉♠❜❡r ♦❢✐♥❞✐✈✐❞✉❛❧s

◆✉♠❜❡r ♦❢ ✈❛❧✉❡s❁✗❁ ◆✉♠❜❡r ♦❢✐♥❞✐✈✐❞✉❛❧s

❉❛t❛ t②♣❡❚❡①t Pr✐♠❛r② ❦❡② ◗✉❛❧✐t❛t✐✈❡

◆✉♠❜❡r ◗✉❛♥t✐t❛t✐✈❡ ❄

❚❛❜❧❡ ✻✿ ❍♦✇ t♦ ❞❡t❡r♠✐♥❡ t❤❡ t②♣❡ ♦❢ ❛ ✈❛r✐❛❜❧❡❄

❚❤❡ ♣r♦❜❧❡♠ ✐s✿ ✇❤❛t ✐s t❤❡ ❧✐♠✐t ♦❢ t❤❡ ♥✉♠❜❡r ♦❢ ✈❛❧✉❡s ❢♦r ❛ q✉❛❧✐t❛t✐✈❡ ✈❛r✐❛❜❧❡ ❡♥❝♦❞❡❞ ✇✐t❤ ♥✉♠❡r✐❝ ❞❛t❛❄❚♦ s♦❧✈❡ t❤✐s ♣r♦❜❧❡♠ ✇❡ ✉s❡ s❡✈❡r❛❧ ❞❛t❛ s❡ts t♦ ❜✉✐❧❞ ❛ ❞❡❝✐s✐♦♥ tr❡❡✳ ❚❤✉s✱ t♦ ✜♥❞ t❤❡ t❤r❡s❤♦❧❞ ❢♦r ♦✉r ❞❛t❛ s❡t✱✇❡ ❤❛✈❡ t♦ ❝♦♥s✐❞❡r ❛ ❧❡❛r♥✐♥❣ ✈❛r✐❛❜❧❡ s❡t✱ ✇❤✐❝❤ ❤❛s t❤❡ s❛♠❡ ❝❤❛r❛❝t❡r✐st✐❝s ❛s ♦✉r ✈❛r✐❛❜❧❡ s❡t✳

❚❤❡r❡❢♦r❡✱ ✇❡ ❤❛✈❡ ❜✉✐❧t ❛ ❞❛t❛ s❡t t❤❛t ❝♦♥t❛✐♥s q✉❛❧✐t❛t✐✈❡ ❛♥❞ q✉❛♥t✐t❛t✐✈❡ ✈❛r✐❛❜❧❡s✳ ❚❤✐s ❞❛t❛s❡t s❤♦✉❧❞❝♦♥t❛✐♥ ✶✾✽ ✐♥❞✐✈✐❞✉❛❧s ✭❛s ♦✉r ❞❛t❛ s❡t✮✳ ❲❡ ❤❛✈❡ ❜✉✐❧t t❤✐s ❞❛t❛s❡t ✇✐t❤ ❡①t❡r♥❛❧ ❞❛t❛s❡ts✱ ✇❤✐❝❤ ❝♦♠❡ ❢r♦♠ t❤❡❯❈■ ▼❛❝❤✐♥❡ ▲❡❛r♥✐♥❣ ❘❡♣♦s✐t♦r② ✭❇❛❝❤❡ ❛♥❞ ▲✐❝❤♠❛♥✱ ✷✵✶✸✮✳❲❡ ❝❤♦♦s❡ ♠✉❧t✐✈❛r✐❛t❡ ❞❛t❛s❡ts ✐✳❡✳ ❞❛t❛s❡ts ✇❤✐❝❤❝♦♥t❛✐♥s q✉❛❧✐t❛t✐✈❡ ❛♥❞ q✉❛♥t✐t❛t✐✈❡ ✈❛r✐❛❜❧❡s✳ ❚❤❡s❡ ❞❛t❛s❡ts ❝♦♥t❛✐♥ ❞❛t❛ ❛❜♦✉t✿

❼ P❤②s✐❝❛❧ ♠❡❛s✉r❡♠❡♥ts ♦❢ ❆❜❛❧♦♥❡✶

❼ ❈❡♥s✉s ✐♥❝♦♠❡ ✷

❼ ❙t❡❡❧ ❛♥♥❡❛❧✐♥❣ ❞❛t❛✸

❼ ❲❛r❞✬s ❆✉t♦♠♦t✐✈❡ ❨❡❛r❜♦♦❦ ✹

❼ ❈②❧✐♥❞❡r ❜❛♥❞s ✐♥ r♦t♦❣r❛✈✉r❡ ♣r✐♥t✐♥❣ ✺

❼ ❍♦rs❡ ❞✐s❡❛s❡✻

❼ ❍♦✉s✐♥❣ ✼✳

■♥ ♦✉r ❞❛t❛ s❡t✱ ✇❡ ❤❛✈❡ ✶✾✽ ✐♥❞✐✈✐❞✉❛❧s✳ ❙♦ ✇❡ ❝❤♦♦s❡ ✶✾✽ ✐♥❞✐✈✐❞✉❛❧s ✐♥ ❡❛❝❤ ❞❛t❛s❡t ❢r♦♠ ❯❈■ ▼❛❝❤✐♥❡ ▲❡❛r♥✐♥❣❘❡♣♦s✐t♦r②✳ ❊❛❝❤ ✐t❡♠ ✉s❡❞ ❢♦r t❤❡ ❧❡❛r♥✐♥❣ ✐s ❛ ✈❛r✐❛❜❧❡✳ ❆♥❞✱ ❢♦r t❤❡ ❧❡❛r♥✐♥❣ ♣❤❛s❡✱ ✇❡ ✇❛♥t ❝♦♥s✐❞❡r ✈❛r✐❛❜❧❡s✱✇❤✐❝❤ ❛r❡ ♥♦t ✐♥ ♦✉r ❡♥✈✐r♦♥♠❡♥t❛❧ ❛♥❞ ♦r♥✐t❤♦❧♦❣✐❝❛❧ ❞❛t❛ s❡t✳ ❚❤✉s t❤❡ ❜✉✐❧❞✐♥❣ ♦❢ t❤❡ ❧❡❛r♥✐♥❣ ✈❛r✐❛❜❧❡ s❡t ✐s✈❡r② t✐♠❡ ❝♦♥s✉♠✐♥❣✳ ❲❡ ❤❛✈❡ ❧✐♠✐t❡❞ t❤❡ ❧❡❛r♥✐♥❣ ✈❛r✐❛❜❧❡ s❡t s♦ t❤❛t t❤❡ ♥✉♠❜❡r ♦❢ ✈❛r✐❛❜❧❡s ❤❛s ❛♥ ♦r❞❡r ♦❢♠❛❣♥✐t✉❞❡ ♥❡❛r ♦❢ ♦✉r ❞❛t❛ s❡t✳ ❲✐t❤ ✶✷✾ ✈❛r✐❛❜❧❡s✱ ✇❡ ❤❛✈❡ ❛ ❧❡❛r♥✐♥❣ ✈❛r✐❛❜❧❡ s❡t q✉✐t❡ s✐♠✐❧❛r t♦ ♦✉r ❞❛t❛✳

❲❡ ♠❛❦❡ ❛ ❞❡❝✐s✐♦♥ tr❡❡ ✇✐t❤ ✶✷✾ ✈❛r✐❛❜❧❡s ❢r♦♠ t❤❡ ❡①t❡r♥❛❧ ❞❛t❛s❡ts ✭❘♦❦❛❝❤ ❡t ❛❧✳✱ ✷✵✵✽✮✳ ❆ ❞❡❝✐s✐♦♥ tr❡❡ ✐s ❛❝❧❛ss✐✜❝❛t✐♦♥ ♠❡t❤♦❞✱ ✇❤✐❝❤ ❤❛s t❤❡ ❛❞✈❛♥t❛❣❡ ♦❢ ♣r♦✈✐❞✐♥❣ ❛✉t♦♠❛t✐❝❛❧❧② ❡①♣❧✐❝✐t r✉❧❡s✳ ❚❤❡ r✉❧❡s ♦❢ ♦✉r ❞❡❝✐s✐♦♥tr❡❡ ❛r❡ ♣r❡s❡♥t❡❞ ♦♥ t❤❡ ❋✐❣✉r❡ ✻✳

✶❲❛r♥✐❝❦ ❏✳ ◆❛s❤ ❛♥❞ ❚r❛❝② ▲✳ ❙❡❧❧❡rs ❛♥❞ ❙✐♠♦♥ ❘✳ ❚❛❧❜♦t ❛♥❞ ❆♥❞r❡✇ ❏✳ ❈❛✇t❤♦r♥ ❛♥❞ ❲❡s ❇✳ ❋♦r❞✱ ✧❚❤❡ P♦♣✉❧❛t✐♦♥ ❇✐♦❧♦❣②♦❢ ❆❜❛❧♦♥❡ ✭❍❛❧✐♦t✐s s♣❡❝✐❡s✮ ✐♥ ❚❛s♠❛♥✐❛ ✲ ❇❧❛❝❦❧✐♣ ❆❜❛❧♦♥❡ ✭❍✳ r✉❜r❛✮ ❢r♦♠ t❤❡ ◆♦rt❤ ❈♦❛st ❛♥❞ ■s❧❛♥❞s ♦❢ ❇❛ss ❙tr❛✐t✳✧✱ ▼❛r✐♥❡❘❡s♦✉r❝❡s ❉✐✈✐s✐♦♥✱ ▼❛r✐♥❡ ❘❡s❡❛r❝❤ ▲❛❜♦r❛t♦r✐❡s ✲ ❚❛r♦♦♥❛✱ ❉❡♣❛rt❡♠❡♥t ♦❢ Pr✐♠❛r② ■♥❞✉str② ❛♥❞ ❋✐s❤❡r✐❡s ✲ ❚❛s♠❛♥✐❛ ✭✶✾✾✹✮✳

✷❘♦♥ ❑♦❤❛✈✐✱ ✧❙❝❛❧✐♥❣ ❯♣ t❤❡ ❆❝❝✉r❛❝② ♦❢ ◆❛✐✈❡✲❇❛②❡s ❈❧❛ss✐✜❡rs✿ ❛ ❉❡❝✐s✐♦♥✲❚r❡❡ ❍②❜r✐❞✧✱ ✐♥ Pr♦❝❡❡❞✐♥❣s ♦❢ t❤❡ ❙❡❝♦♥❞ ■♥t❡r♥❛✲t✐♦♥❛❧ ❈♦♥❢❡r❡♥❝❡ ♦♥ ❑♥♦✇❧❡❞❣❡ ❉✐s❝♦✈❡r② ❛♥❞ ❉❛t❛ ▼✐♥✐♥❣ ✭✶✾✾✻✮✳

✸◆♦ r❡❢❡r❡♥❝❡ ✐s ❛ss♦❝✐❛t❡❞ t♦ t❤✐s ❞❛t❛s❡t✳✹❉✳ ❑✐❜❧❡r ❛♥❞ ❉✳❲✳ ❆❤❛ ❛♥❞ ▼✳ ❆❧❜❡rt✱ ✧■♥st❛♥❝❡✲❜❛s❡❞ ♣r❡❞✐❝t✐♦♥ ♦❢ r❡❛❧✲✈❛❧✉❡❞ ❛ttr✐❜✉t❡s✧✱ ❈♦♠♣✉t❛t✐♦♥❛❧ ■♥t❡❧❧✐❣❡♥❝❡ ✺ ✭✶✾✽✾✮✱

♣♣✳ ✺✶✲✺✼✳✺❇✳ ❊✈❛♥s ❛♥❞ ❉✳ ❋✐s❤❡r✱ ✧❖✈❡r❝♦♠✐♥❣ ♣r♦❝❡ss ❞❡❧❛②s ✇✐t❤ ❞❡❝✐s✐♦♥ tr❡❡ ✐♥❞✉❝t✐♦♥✧✱ ■❊❊❊ ❊①♣❡rt ✾✱ ✶ ✭✶✾✾✹✮✱ ♣♣✳ ✻✵✲✻✻✳✻◆♦ r❡❢❡r❡♥❝❡ ✐s ❛ss♦❝✐❛t❡❞ t♦ t❤✐s ❞❛t❛s❡t✳✼❉✳ ❍❛rr✐s♦♥ ❛♥❞ ❉✳▲✳ ❘✉❜✐♥❢❡❧❞✱ ✧❍❡❞♦♥✐❝ ♣r✐❝❡s ❛♥❞ t❤❡ ❞❡♠❛♥❞ ❢♦r ❝❧❡❛♥ ❛✐r✧✱ ❏✳ ❊♥✈✐r♦♥✳ ❊❝♦♥♦♠✐❝s ✫ ▼❛♥❛❣❡♠❡♥t ✺ ✭✶✾✼✽✮✱

♣♣✳ ✽✶✲✶✵✷✳

✶✷

Page 14: The Hierarchical Agglomerative Clustering with Gower index ... · OLAP technology interests more and more elds and especially biology. An OLAP cube provides a very easy navigation

❋✐❣✉r❡ ✻✿ ❉❡❝✐s✐♦♥ tr❡❡ t♦ ❞❡❝✐❞❡ ✐❢ ❛ ✈❛r✐❛❜❧❡ ✐s q✉❛♥t✐t❛t✐✈❡ ♦r q✉❛❧✐t❛t✐✈❡

■❢ ✇❡ ❛♣♣❧② t❤✐s ❞❡❝✐s✐♦♥ tr❡❡ ✭❋✐❣✉r❡ ✻✮ t♦ ♦✉r ❞❛t❛ s❡t✱ ✶✵ ✈❛r✐❛❜❧❡s ♦♥ ✶✶✵ ❛r❡ ❜❛❞❧② ❝❧❛ss✐✜❡❞✳ ❚❤❡s❡ t❡♥✈❛r✐❛❜❧❡s ❛r❡ q✉❛♥t✐t❛t✐✈❡ ✈❛r✐❛❜❧❡s ✇✐t❤ ❛ ✈❡r② s♠❛❧❧ ♥✉♠❜❡r ♦❢ ✈❛❧✉❡s✱ ❛♥❞ ✇✐t❤ t❤❡ ❞❡❝✐s✐♦♥ tr❡❡ ✇❡ ❝♦♥s✐❞❡r t❤❛tt❤❡s❡ t❡♥ ✈❛r✐❛❜❧❡s ❛r❡ q✉❛❧✐t❛t✐✈❡✳ ❚❤✐s ❦✐♥❞ ♦❢ ❡rr♦r ✭❛ q✉❛♥t✐t❛t✐✈❡ ✈❛r✐❛❜❧❡ ❝♦♥s✐❞❡r❡❞ ❧✐❦❡ ❛ q✉❛❧✐t❛t✐✈❡ ✈❛r✐❛❜❧❡✮✐s ♥♦t ❛ s❡r✐♦✉s ♣r♦❜❧❡♠ ❜❡❝❛✉s❡ ✐♥ t❤✐s s✐t✉❛t✐♦♥✱ s✐♠✐❧❛r ✈❛❧✉❡s ❛r❡ ✇❡❧❧ ♣r♦❝❡ss❡❞ ❛♥❞ t❤❡ ❛❧❣♦r✐t❤♠ ♥❡❣❧❡❝ts t❤❡s✐♠✐❧❛r✐t② ❜❡t✇❡❡♥ t✇♦ ♥❡❛r ✈❛❧✉❡s✳ ❖♥ t❤❡ ♦t❤❡r ❤❛♥❞✱ ❛ q✉❛❧✐t❛t✐✈❡ ✈❛r✐❛❜❧❡ ❝♦♥s✐❞❡r❡❞ ❧✐❦❡ ❛ q✉❛♥t✐t❛t✐✈❡ ✈❛r✐❛❜❧❡✐s ❛ s❡r✐♦✉s ♣r♦❜❧❡♠ ❜❡❝❛✉s❡ t❤❡ ❝❛❧❝✉❧❛t✐♦♥s ♣❡r❢♦r♠❡❞ ❜② t❤❡ ❛❧❣♦r✐t❤♠ ❤❛✈❡ ♥♦ ♠❡❛♥✐♥❣✳

■♥ ❝♦♥❝❧✉s✐♦♥ ✇❡ ❝❛♥ ❞❡t❡r♠✐♥❡ ❛✉t♦♠❛t✐❝❛❧❧② ✐❢ ❛ ✈❛r✐❛❜❧❡ ✐s q✉❛❧✐t❛t✐✈❡ ♦r q✉❛♥t✐t❛t✐✈❡ ✇✐t❤ ♠❡t❛❞❛t❛ ❧✐❦❡ ❞❛t❛t②♣❡ ❛♥❞ ♥✉♠❜❡r ♦❢ ✈❛❧✉❡s✳ ❇✉t t❤❡ ❝❧❛ss✐✜❝❛t✐♦♥ ✐s ♥♦t t♦t❛❧❧② r❡❧✐❛❜❧❡✳ ❚❤❡r❡❜② ✇❡ r❡❝♦♠♠❡♥❞ ✜①✐♥❣ ❛ ❝♦♥✜❞❡♥❝❡✐♥t❡r✈❛❧✿

❼ ■❢ t❤❡ ❞❛t❛ t②♣❡ ✐s t❡①t t❤❡♥ t❤❡ ✈❛r✐❛❜❧❡ ✐s q✉❛❧✐t❛t✐✈❡✳

❼ ■❢ t❤❡ ❞❛t❛ t②♣❡ ✐s ♥✉♠❡r✐❝✿

✕ ■❢ t❤❡ ♥✉♠❜❡r ♦❢ ✈❛❧✉❡s ✐s ❤✐❣❤❡r ❛s ✻ ✈❛❧✉❡s t❤❡♥ t❤❡ ✈❛r✐❛❜❧❡ ✐s q✉❛♥t✐t❛t✐✈❡✳

✕ ■❢ t❤❡ ♥✉♠❜❡r ♦❢ ✈❛❧✉❡s ✐s ❧♦✇❡r ❛s ✻ ♦r ❡q✉❛❧ t♦ ✻ ✈❛❧✉❡s t❤❡♥ t❤❡ t②♣❡ ♦❢ ✈❛r✐❛❜❧❡ ✐s ♣r♦❜❧❡♠❛t✐❝ ❛♥❞ t❤❡s②st❡♠ ♠✉st ❛s❦ t❤✐s t②♣❡ t♦ t❤❡ ✉s❡r✳

✸✳✷✳ ❈♦♠♣❛r✐s♦♥ ❜❡t✇❡❡♥ ❛ ♣r✐♦r✐ s❝❤❡♠❛ ❛♥❞ ❝❛❧❝✉❧❛t❡❞ s❝❤❡♠❛

❲❡ ❞❡t❛✐❧ s❡✈❡r❛❧ ❛ ♣r✐♦r✐ ❖▲❆P s❝❤❡♠❛s ❛♥❞ t❤❡✐r ❧✐♠✐t❛t✐♦♥s ✐♥ t❤❡ ✷✳ ❚❤❡ s❝❤❡♠❛ t❤❛t ✇❡ ♦❜t❛✐♥ ✇✐t❤ t❤❡♣r♦t♦t②♣❡ ✐s ♣r❡s❡♥t❡❞ ♦♥ t❤❡ ❋✐❣✉r❡ ✼✳ ❚❤❡ str✉❝t✉r❡ ♦❢ t❤❡ ♥❡✇ s❝❤❡♠❛ ✐s ❛ st❛r s❝❤❡♠❛✳ ❚❤❡ str✉❝t✉r❡ ✐s ❧✐❦❡ ♦❢t❤❡ str✉❝t✉r❡✱ t❤❛t ✐s s❤♦✇❡❞ ♦♥ t❤❡ ❋✐❣✉r❡ ✸✳ ❚❤❡ ❢❛❝t t❛❜❧❡ ❝♦♥t❛✐♥s t❤❡ ❜✐r❞ ❛❜✉♥❞❛♥❝❡s✳ ❚❤❡ ❢❛❝t t❛❜❧❡ ✐s ❧✐♥❦❡❞t♦ t❤r❡❡ ❞✐♠❡♥s✐♦♥s✿ t❤❡ s♣❡❝✐❡s ❞✐♠❡♥s✐♦♥✱ ✇❤✐❝❤ ❞❡s❝r✐❜❡❞ t❤❡ ❜✐r❞ s♣❡❝✐❡s✱ t❤❡ t❡♠♣♦r❛❧ ❞✐♠❡♥s✐♦♥ ❛♥❞ t❤❡ ♥❡✇❞✐♠❡♥s✐♦♥✳ ❚❤❡ ♥❡✇ ❞✐♠❡♥s✐♦♥ ✐s✱ ❢♦r ♦✉r ❡①❛♠♣❧❡✱ ❛ s♣❛t✐❛❧ ❞✐♠❡♥s✐♦♥✳ ❚❤✐s ♥❡✇ ❞✐♠❡♥s✐♦♥ ❝♦♥t❛✐♥s ❛ ❤✐❡r❛r❝❤②❛♥❞ t❤✐s ❤✐❡r❛r❝❤② ✐s t❤❡ r❡s✉❧t ♦❢ t❤❡ ❤✐❡r❛r❝❤✐❝❛❧ ❛❣❣❧♦♠❡r❛t✐✈❡ ❝❧✉st❡r✐♥❣✳ ❚❤❡ ♥❡✇ s❝❤❡♠❛ ❤❛s t❤❡ s❛♠❡ str✉❝t✉r❡❛s t❤❡ ♥❛t✉r❛❧ ❞✐♠❡♥s✐♦♥❛❧✐t② ♦❢ t❤❡ ❞❛t❛ s❡t✳

❆ ❝❛❧❝✉❧❛t❡❞ ❤✐❡r❛r❝❤② ✐s ♣r❡s❡♥t❡❞ ♦♥ t❤❡ ❋✐❣✉r❡ ✽✳

✶✸

Page 15: The Hierarchical Agglomerative Clustering with Gower index ... · OLAP technology interests more and more elds and especially biology. An OLAP cube provides a very easy navigation

Biodiversity facts

Station

Name

GPS coordinates

Time

Year

Species

Name

Thermic index...

Abundance

Diet

Name

Migration

Name

Die

tM

igra

tory

b

eh

avio

ur

Level 1

Level 1

Name

Level 2

Level 2

Name

Level 3

Level 3

Name

❋✐❣✉r❡ ✼✿ ❆ st❛r s❝❤❡♠❛ ✇✐t❤ t❤❡ ♥❡✇ ❤✐❡r❛r❝❤✐❝❛❧ ❞✐♠❡♥s✐♦♥

❋✐❣✉r❡ ✽✿ ❖♥❡ ❤✐❡r❛r❝❤② ❜✉✐❧t ❜② t❤❡ s②st❡♠

✹✳ ❙②st❡♠ ♣❡r❢♦r♠❛♥❝❡s

■♥ t❤❡ ❝♦♥t❡①t ♦❢ t❤✐s st✉❞② ✇❡ ✇♦r❦ ✇✐t❤ ❛ ❞✐♠❡♥s✐♦♥ t❤❛t ❝♦♥t❛✐♥s ❛♣♣r♦①✐♠❛t❡❧② ✷✵✵ ♦❜❥❡❝ts ✭t❤❡ ❝❡♥s✉s♣♦✐♥ts ❛❧♦♥❣ t❤❡ ▲♦✐r❡ ❘✐✈❡r✳ ❙❡❡ s❡❝t✐♦♥ ✶✮✳ ❇✉t ❖▲❆P s②st❡♠s ❛r❡ ❞❡s✐❣♥❡❞ t♦ ♠❛♥❛❣❡ ❧❛r❣❡ q✉❛♥t✐t✐❡s ♦❢ ❞❛t❛✳❚❤✉s ✇❡ s✉❣❣❡st ♠❡❛s✉r✐♥❣ ♣❡r❢♦r♠❛♥❝❡s ♦❢ ♦✉r s②st❡♠ ✐♥ ♦r❞❡r t♦ ♣r❡❞✐❝t ❝❛❧❝✉❧❛t✐♦♥ t✐♠❡ ❛♥❞ ♥❡❡❞❢✉❧ ♠❡♠♦r②✇✐t❤ ❛ ❧❛r❣❡r ❞❛t❛ s❡t✳

❚❤❡ s②st❡♠ ♣❡r❢♦r♠❛♥❝❡s ❝❛♥ ❜❡ ♠❡❛s✉r❡❞ ❜② t✇♦ ✇❛②s✿

❼ ❚❤❡ ♥❡❡❞❢✉❧ t✐♠❡ ❢♦r ❝❛❧❝✉❧❛t✐♦♥ ♦❢ t❤❡ ❤✐❡r❛r❝❤② ✇✐t❤ ●♦✇❡r ✐♥❞❡①✳

❼ ❚❤❡ ♥✉♠❜❡r ♦❢ ❧❡✈❡❧s ♦❢ t❤❡ ♦❜t❛✐♥❡❞ ❤✐❡r❛r❝❤②✳ ❚❤✐s ♥✉♠❜❡r ♦❢ ❧❡✈❡❧s t❛❧❧✐❡s ✇✐t❤ t❤❡ ♥✉♠❜❡r ♦❢ ❝♦❧✉♠♥s♦❢ t❤❡ t❛❜❧❡ ✇❤✐❝❤ r❡♣r❡s❡♥t t❤❡ ♥❡✇ ❝❛❧❝✉❧❛t❡❞ ❤✐❡r❛r❝❤② ✐♥ t❤❡ ❞❛t❛❜❛s❡✳ ❚❤✉s t❤❡ ♥✉♠❜❡r ♦❢ ❧❡✈❡❧s ✐s ❛♥❡st✐♠❛t✐♦♥ ♦❢ t❤❡ ♥❡❡❞❢✉❧ ♠❡♠♦r② t♦ s❛✈❡ t❤❡ ❤✐❡r❛r❝❤②✳

❚❤❡ ❝❛❧❝✉❧❛t✐♦♥ t✐♠❡ ❛♥❞ t❤❡ ♥✉♠❜❡r ♦❢ ❧❡✈❡❧s ✇❡r❡ ♠❡❛s✉r❡❞ ❛❝❝♦r❞✐♥❣ t♦ t❤❡ ♥✉♠❜❡r ♦❢ ✐♥❞✐✈✐❞✉❛❧s ❛♥❞ t❤❡ ♥✉♠❜❡r♦❢ ✈❛r✐❛❜❧❡s ✉s❡❞ t♦ ❜✉✐❧❞ t❤❡ ❤✐❡r❛r❝❤②✳ ❚❤❡ ♥✉♠❜❡r ♦❢ ✐♥♣✉t ❞❛t❛ ✐s r❡✢❡❝t❡❞ ✐♥ t❤❡s❡ t✇♦ ♣❛r❛♠❡t❡rs ❛♥❞ ✇❡ ❝❛♥❡①♣❡❝t t❤❛t t❤❡ ✐♠♣❛❝t ♦❢ t❤❡s❡ ♣❛r❛♠❡t❡rs ✐s ✐♥❞❡♣❡♥❞❡♥t t♦ t❤❡ ❝♦♠♣✉t❡r ❝♦♥✜❣✉r❛t✐♦♥✳

✶✹

Page 16: The Hierarchical Agglomerative Clustering with Gower index ... · OLAP technology interests more and more elds and especially biology. An OLAP cube provides a very easy navigation

❖♥ t❤❡ ❋✐❣✉r❡ ✾ ✇❡ s❤♦✇ t❤❡ ♥✉♠❜❡r ♦❢ ❧❡✈❡❧s ❛❝❝♦r❞✐♥❣ t♦ t❤❡ ♥✉♠❜❡r ♦❢ ✐♥❞✐✈✐❞✉❛❧s ❛♥❞ t❤❡ ♥✉♠❜❡r ♦❢ ❧❡✈❡❧s❛❝❝♦r❞✐♥❣ t♦ t❤❡ ♥✉♠❜❡r ♦❢ ✈❛r✐❛❜❧❡s✳ ❆❜♦✉t t❤❡s❡ ❣r❛♣❤s✱ ✇❡ ♥♦t❡ t❤❛t✿

❼ ❚❤❡ t❤❡♦r❡t✐❝❛❧ ♠✐♥✐♠✉♠ ♦❢ ❧❡✈❡❧s ❛❝❝♦r❞✐♥❣ t♦ t❤❡ ♥✉♠❜❡r ♦❢ ✐♥❞✐✈✐❞✉❛❧s ♦❜❡②s t♦ ❛ ❧♦❣❛r✐t❤♠✐❝ ❢✉♥❝t✐♦♥✭❉❡✈r♦②❡✱ ✶✾✽✻✮✳

❼ ❚❤❡ ♥✉♠❜❡r ♦❢ ❧❡✈❡❧s ❛❝❝♦r❞✐♥❣ t♦ t❤❡ ♥✉♠❜❡r ♦❢ ✐♥❞✐✈✐❞✉❛❧s ✐s ♥❡❛r t♦ t❤✐s ♠✐♥✐♠✉♠✿ ❛♥ ❛s②♠♣t♦t✐❝ ❜❡❤❛✈✐♦r✳

❼ ❇② ❝♦♥tr❛st✱ t❤❡ ♥✉♠❜❡r ♦❢ ✈❛r✐❛❜❧❡s ❤❛s ♥♦ ❡✛❡❝t ♦♥ t❤❡ ♥✉♠❜❡r ♦❢ ❧❡✈❡❧s✳

❚♦ ♠♦❞❡❧ t❤❡ ♥✉♠❜❡r ♦❢ ❧❡✈❡❧ ❛❝❝♦r❞✐♥❣ t♦ t❤❡ ♥✉♠❜❡r ♦❢ ✐♥❞✐✈✐❞✉❛❧s✱ t❤❡ t✇♦ ❜❡st ♠♦❞❡❧s ❛r❡ ❛ ♣♦✇❡r ❢✉♥❝t✐♦♥ ♦r❛ ❧♦❣❛r✐t❤♠✐❝ ❢✉♥❝t✐♦♥✳ ❉❡s♣✐t❡ t❤❡ ❢❛❝t t❤❛t t❤❡ ♣♦✇❡r ❢✉♥❝t✐♦♥ ❤❛s ❛ ❝♦rr❡❧❛t✐♦♥ ❝♦❡✣❝✐❡♥t ❤✐❣❤❡r ✭R➨ = 0.54✮t❤❛♥ t❤❡ ❝♦rr❡❧❛t✐♦♥ ❝♦❡✣❝✐❡♥t ♦❢ t❤❡ ❧♦❣❛r✐t❤♠✐❝ ❢✉♥❝t✐♦♥ ✭R➨ = 0.47✮✱ ✇❡ ❜❡❧✐❡✈❡ t❤❛t t❤❡ ❧♦❣❛r✐t❤♠✐❝ ❢✉♥❝t✐♦♥ ✐s♠♦r❡ r❡❧❡✈❛♥t✱ ❜❡❝❛✉s❡ ✇❡ ❦♥♦✇ t❤❛t t❤❡ ♠✐♥✐♠✉♠ ❢♦❧❧♦✇s ❛ ❧♦❣❛r✐t❤♠✐❝ ❢✉♥❝t✐♦♥✳

▼♦r❡♦✈❡r t❤❡ ❜❡st ♠♦❞❡❧ ❢♦r t❤❡ ♥✉♠❜❡r ♦❢ ❧❡✈❡❧s ❛❝❝♦r❞✐♥❣ t♦ t❤❡ ♥✉♠❜❡r ♦❢ ✈❛r✐❛❜❧❡s ✐s ❛ q✉❛❞r❛t✐❝ ❢✉♥❝t✐♦♥✳❇✉t t❤❡ x➨ ❝♦❡✣❝✐❡♥t ❛♥❞ t❤❡ x ❝♦❡✣❝✐❡♥t ❛r❡ ✈❡r② ♥❡❛r t♦ ✵✳ ❲❡ ❝❛♥ ❡①❝❡♣t t❤❛t t❤❡ ♥✉♠❜❡r ♦❢ ✈❛r✐❛❜❧❡s ❤❛s ❛✈❡r② ❧✐tt❧❡ ✐♠♣❛❝t ♦♥ t❤❡ ♥✉♠❜❡r ♦❢ ❧❡✈❡❧s✳ ❚❤❡ ❝♦rr❡❧❛t✐♦♥ ❝♦❡✣❝✐❡♥t ❢♦r t❤✐s ♠♦❞❡❧ ✐s ✈❡r② ❧♦✇ ✭R➨ = 0.02✮✳

❲❡ ♥♦t❡ t❤❛t t❤❡ ❝♦rr❡❧❛t✐♦♥ ❝♦❡✣❝✐❡♥ts ❛r❡ ❧♦✇ ❢♦r ❡❛❝❤ ❡st✐♠❛t✐♦♥ ♦❢ ♥✉♠❜❡r ♦❢ ❧❡✈❡❧s✳❚❤✉s t❤❡ ❤✐❡r❛r❝❤✐❝❛❧ ❛❣❣❧♦♠❡r❛t✐✈❡ ❝❧✉st❡r✐♥❣ ♣❡r❢♦r♠❡❞ ✇✐t❤ ❛ ●♦✇❡r ✐♥❞❡① ❛s ❞✐st❛♥❝❡ ♠❡❛s✉r❡♠❡♥t ♣r♦❞✉❝❡s

❜✐♥❛r② tr❡❡s ✇❤♦s❡ ❤❡✐❣❤t ❞❡♣❡♥❞s ♦❢ t❤❡ ♥✉♠❜❡r ♦❢ ✐♥❞✐✈✐❞✉❛❧s✳ ❚❤❡ ❛✈❡r❛❣❡ ❤❡✐❣❤t ♦❢ t❤❡s❡ ❜✐♥❛r② tr❡❡s ✐s ✈❡r②♥❡❛r t❤❡ ♠✐♥✐♠✉♠ ❤❡✐❣❤t✳ ❚❤❡ ♥❡❡❞❢✉❧ ♠❡♠♦r② ✉s❡❞ t♦ r❡❝♦r❞ t❤❡ ❤✐❡r❛r❝❤② ✐s s♦ ♥❡❛r t❤❡ ♠✐♥✐♠✉♠✳

y = 2,1701ln(x) - 0,4112

R² = 0,4741

y = 2,3994x0,2943

R² = 0,5414

0

2

4

6

8

10

12

14

16

18

0 20 40 60 80 100 120 140 160 180 200

Nu

mb

er

of

leve

ls

Number of individuals

Height

Minimum of

height

Log. (Height)

Puissance

(Height)

y = -0,0004x2 + 0,0516x + 9,8333

R² = 0,0247

0

2

4

6

8

10

12

14

16

18

20

0 20 40 60 80 100 120

Nu

mb

er

of

leve

ls

Number of variables

Height

Poly. (Height)

❋✐❣✉r❡ ✾✿ ❍❡✐❣❤t ♦❢ t❤❡ ❤✐❡r❛r❝❤② ❛❝❝♦r❞✐♥❣ t♦ ♥✉♠❜❡r ♦❢ ✐♥❞✐✈✐❞✉❛❧s ❛♥❞ ❛❝❝♦r❞✐♥❣ t♦ ♥✉♠❜❡r ♦❢ ✈❛r✐❛❜❧❡s

❖♥ t❤❡ ❋✐❣✉r❡ ✶✵ ✇❡ s❤♦✇ t❤❡ ❝❛❧❝✉❧❛t✐♦♥ t✐♠❡ ❛❝❝♦r❞✐♥❣ t♦ t❤❡ ♥✉♠❜❡r ♦❢ ✐♥❞✐✈✐❞✉❛❧s ❛♥❞ t❤❡ ♥✉♠❜❡r ♦❢✈❛r✐❛❜❧❡s✳ ❲❡ ♥♦t❡ t❤❛t✿

❼ ❚❤❡ ❝❛❧❝✉❧❛t✐♦♥ t✐♠❡ ❛❝❝♦r❞✐♥❣ t♦ t❤❡ ♥✉♠❜❡r ♦❢ ✈❛r✐❛❜❧❡s ♦❜❡②s t♦ ❛ ❧✐♥❡❛r ❢✉♥❝t✐♦♥✳

❼ ❚❤❡ ❝❛❧❝✉❧❛t✐♦♥ t✐♠❡ ❛❝❝♦r❞✐♥❣ t♦ t❤❡ ♥✉♠❜❡r ♦❢ ✐♥❞✐✈✐❞✉❛❧s ♦❜❡②s t♦ ❛ q✉❛❞r❛t✐❝ ❢✉♥❝t✐♦♥✳

✶✺

Page 17: The Hierarchical Agglomerative Clustering with Gower index ... · OLAP technology interests more and more elds and especially biology. An OLAP cube provides a very easy navigation

❚❤❡ ❝♦♠♣❧❡t❡ ♠♦❞❡❧✱ ✇❤✐❝❤ ❝❛♥ ❡①♣r❡ss t❤❡ ❝❛❧❝✉❧❛t✐♦♥ t✐♠❡ ❛❝❝♦r❞✐♥❣ t♦ ❛ ❧✐♥❡❛r ❢✉♥❝t✐♦♥ ♦❢ t❤❡ ♥✉♠❜❡r ♦❢ ✈❛r✐❛❜❧❡s❛♥❞ ❛ q✉❛❞r❛t✐❝ ❢✉♥❝t✐♦♥ ♦❢ t❤❡ ♥✉♠❜❡r ♦❢ ✐♥❞✐✈✐❞✉❛❧s✱ ✐s✿

t(v,M) = b1M2 + b2M + b3M

2v + b4Mv + b5v + b6

■♥ t❤✐s ❢♦r♠✉❧❛✱ t ✐s t❤❡ ❡st✐♠❛t❡❞ ❝❛❧❝✉❧❛t✐♦♥ t✐♠❡✱ M ✐s t❤❡ ♥✉♠❜❡r ♦❢ ✐♥❞✐✈✐❞✉❛❧s✱ v ✐s t❤❡ ♥✉♠❜❡r ♦❢ ✈❛r✐❛❜❧❡s❛♥❞ bi ✇✐t❤ i ✐♥ {1, 2, 3, 4, 5, 6} ❛r❡ ❝♦❡✣❝✐❡♥ts t❤❛t ❞❡♣❡♥❞ ♦♥ t❤❡ ❝♦♥✜❣✉r❛t✐♦♥ ♦❢ t❤❡ ❝♦♠♣✉t❡r ✇❤✐❝❤ ♣❡r❢♦r♠ t❤❡❤✐❡r❛r❝❤② ❝❛❧❝✉❧❛t✐♦♥✳

❲❡ ♣❡r❢♦r♠ ❛ st❡♣✇✐s❡ ❧✐♥❡❛r r❡❣r❡ss✐♦♥ t♦ ✜① t❤❡ ❝♦❡✣❝✐❡♥ts✳ ❚❤❡ ❝♦❡✣❝✐❡♥ts✱ ✇❤✐❝❤ ❝❛♥ ❜❡ st❛t✐s❝❛❧❧② ❝♦♥s✐❞❡r❡❞❡q✉❛❧ t♦ ③❡r♦✱ ❛r❡ r❡♠♦✈❡❞✳ ❲❡ ♦❜t❛✐♥ ❛ ❢♦r♠✉❧❛ ❧✐❦❡✿

t(v,M) = (b1 + b3v)M2 + b2M + b6

❲✐t❤ t❤❡ ❝♦♠♣✉t❡r✱ t❤❛t ✇❡ ✉s❡ ❢♦r t❤❡ ♣❡r❢♦r♠❛♥❝❡s t❡sts✱ ✇❡ ♦❜t❛✐♥ b1 = 1.83①10−3✱ b2 = −1.06①10−6✱b3 = 1.51①10−5❛♥❞ b6 = 1.15 ✳ ❚❤❡ ❝♦rr❡❧❛t✐♦♥ ❝♦❡✣❝✐❡♥t ❜❡t✇❡❡♥ t❤✐s ♠♦❞❡❧ ❛♥❞ t❤❡ ♠❡❛s✉r❡❞ ❝❛❧❝✉❧❛t✐♦♥ t✐♠❡✐s ❡q✉❛❧ t♦ 99.7%✳ ❖♥ t❤❡ ❋✐❣✉r❡ ✶✶ ✇❡ s❤♦✇ t❤❡ ♠❡❛s✉r❡❞ ❝❛❧❝✉❧❛t✐♦♥ t✐♠❡ ❛♥❞ t❤❡ ♠♦❞❡❧ t❤❛t ✇❡ s✉❣❣❡st ❛❜♦✈❡✳❚❤❡ ❡st✐♠❛t✐♦♥ s❤♦✇s ✇❡❧❧ t❤❡ ❝❤❛♥❣❡s ♦❢ ❝❛❧❝✉❧❛t✐♦♥ t✐♠❡ ❛❝❝♦r❞✐♥❣ t♦ t❤❡ ♥✉♠❜❡r ♦❢ ✐♥❞✐✈✐❞✉❛❧s ❛♥❞ t❤❡ ♥✉♠❜❡r♦❢ ✈❛r✐❛❜❧❡s✳

0

1

2

3

4

5

0 20 40 60 80 100

Tim

e (

s)

Number of variables

Number of individuals

10 20 30 50

0

10

20

30

0 20 40 60 80 100

Tim

e (

s)

Number of individuals

Number of variables

10 20 30 50 100

❋✐❣✉r❡ ✶✵✿ ❈❛❧❝✉❧❛t✐♦♥ t✐♠❡ ❛❝❝♦r❞✐♥❣ t♦ t❤❡ ♥✉♠❜❡r ♦❢ ✐♥❞✐✈✐❞✉❛❧s ❛♥❞ t❤❡ ♥✉♠❜❡r ♦❢ ✈❛r✐❛❜❧❡s

✶✻

Page 18: The Hierarchical Agglomerative Clustering with Gower index ... · OLAP technology interests more and more elds and especially biology. An OLAP cube provides a very easy navigation

-2

3

8

13

18

23

10 20 30 50 100 10 20 30 50 100 10 20 30 50 100 10 20 30 50 100 10 20 30 50 100

10 20 30 50 100

Ca

lcu

lati

on

tim

e (

s)

Number of variables (first row) & number of individuals (second row)

measured

data

model

❋✐❣✉r❡ ✶✶✿ ❈❛❧❝✉❧❛t✐♦♥ t✐♠❡ ❛❝❝♦r❞✐♥❣ t♦ ♥✉♠❜❡r ♦❢ ✈❛r✐❛❜❧❡s ✭✜rst r♦✇ ♦❢ ❳ ❛①✐s✮ ❛♥❞ t♦ ♥✉♠❜❡r ♦❢ ✐♥❞✐✈✐❞✉❛❧s ✭s❡❝♦♥❞ r♦✇ ♦❢ ❳ ❛①✐s✮❛♥❞ ❛♥ ❡st✐♠❛t✐♦♥ ♦❢ ❝❛❧❝✉❧❛t✐♦♥ t✐♠❡

❚❤❡s❡ ♣❡r❢♦r♠❛♥❝❡ t❡sts ❤❛✈❡ ❜❡❡♥ ♣❡r❢♦r♠❡❞ ♦♥ t❤❡ ❢♦❧❧♦✇✐♥❣ ❝♦♥✜❣✉r❛t✐♦♥✿

❼ ❚❤❡ ❝♦♠♣✉t❡r ❤❛s ❛ ■♥t❡❧➤ ❈♦r❡➋ ✷❉✉♦ ♣r♦❝❡ss♦r ❛♥❞ ✹●♦ ❘❆▼✳

❼ ❚❤❡ ❖♣❡r❛t✐♥❣ ❙②st❡♠ ✭❖❙✮ ✐s ❛ ❲✐♥❞♦✇s ✼✱ ✸✷✲❜✐t ✭➞ ▼✐❝r♦s♦❢t ❈♦r♣♦r❛t✐♦♥✮✳

❼ ❚❤❡ ♣r♦t♦t②♣❡ r✉♥s ♦♥ t❤❡ s♦❢t✇❛r❡ ▼❆❚▲❆❇➤ ✷✵✶✶ ✭➞ ▼❛t❤❲♦r❦s✮✳

❉✐s❝✉ss✐♦♥

❉✐s❝✉ss✐♦♥ ❛❜♦✉t t❤❡ s②st❡♠ t❤❛t ✇❡ ❤❛✈❡ ♣r♦♣♦s❡❞

■♥ t❤✐s ♣❛rt✱ ✇❡ ❞✐s❝✉ss ❛❜♦✉t t❤❡ s②st❡♠ t❤❛t ✐s ♣r♦♣♦s❡❞ ❛♥❞ ✇❡ s✉❣❣❡st ♣❡rs♣❡❝t✐✈❡s t♦ ✐♠♣r♦✈❡ t❤❡ ♣r♦t♦t②♣❡✳❋✐rst✱ ✇❡ ❞✐s❝✉ss ❛❜♦✉t t❤❡ ❝❧✉st❡r✐♥❣ ♠❡t❤♦❞✳ ❙❡❝♦♥❞❧②✱ ✇❡ ❞✐s❝✉ss ❛❜♦✉t t❤❡ ✉s❡ ♦❢ t❤❡ ●♦✇❡r ✐♥❞❡①✳ ❚❤✐r❞❧②✱ ✇❡❞✐s❝✉ss ❛❜♦✉t ❛ ♣❡rs♣❡❝t✐✈❡ ♦❢ ❝❧✉st❡r ❝❤❛r❛❝t❡r✐③❛t✐♦♥✳

❚❤❡ ✉s❡ ♦❢ ❤✐❡r❛r❝❤✐❝❛❧ ❛❣❣❧♦♠❡r❛t✐✈❡ ❝❧✉st❡r✐♥❣

❲❡ ✉s❡ ❛ ❤✐❡r❛r❝❤✐❝❛❧ ❛❣❣❧♦♠❡r❛t✐✈❡ ❝❧✉st❡r✐♥❣✱ t❤❛t ♣r♦✈✐❞❡s ❛ ❝♦♠♣❧❡t❡ ❤✐❡r❛r❝❤② ♦❢ t❤❡ ❞❛t❛✳ ❇✉t t❤❡ ♣r♦t♦t②♣❡✇♦r❦s ♣❡r❢❡❝t❧② ✇✐t❤ ❛♥♦t❤❡r ❝❧✉st❡r✐♥❣ ❛❧❣♦r✐t❤♠✱ ❧✐❦❡ t❤❡ ❑✲♠❡❛♥s ❛❧❣♦r✐t❤♠✳ ❚❤❡r❡❜② ♦✉r ♣r♦t♦t②♣❡ ❝❛♥ ✇♦r❦✇✐t❤ s❡✈❡r❛❧ ❝❧✉st❡r✐♥❣ ❛❧❣♦r✐t❤♠✳ ■t ✇✐❧❧ ❜❡ ✐♥t❡r❡st✐♥❣ t♦ ❝♦♠♣❛r❡ ❤✐❡r❛r❝❤✐❝❛❧ ❛♥❞ s✐♠♣❧❡ ❝❧✉st❡r✐♥❣ ❛❧❣♦r✐t❤♠✳❚❤❡r❡❜② ✇❡ ❦♥♦✇ ✇❤✐❝❤ t②♣❡ ♦❢ ❝❧✉st❡r✐♥❣ ♠❡t❤♦❞ ✐s ♠♦r❡ ❡✣❝✐❡♥t t♦ ❜✉✐❧❞ ❛ ♥❡✇ ❤✐❡r❛r❝❤② ✐♥ ❛♥ ❖▲❆P s❝❤❡♠❛✳

❙❡❝♦♥❞❧②✱ ✇❡ ✉s❡ ❛♥ ✉♥✇❡✐❣❤t❡❞ ❛✈❡r❛❣❡ ❞✐st❛♥❝❡ ❛s ❛ ❧✐♥❦❛❣❡ ♠❡t❤♦❞✳ ❇✉t t❤❡r❡ ❛r❡ s❡✈❡r❛❧ ❧✐♥❦❛❣❡ ♠❡t❤♦❞s✳❚❤❡ ✉s❡ ♦❢ ❛ ❧✐♥❦❛❣❡ ♠❡t❤♦❞ ❝♦✉❧❞ ❜❡ ❝❤♦s❡♥ ❜② t❤❡ ✉s❡r ✐❢ ❤❡ ❤❛s ❦♥♦✇❧❡❞❣❡ ❛❜♦✉t ❤✐s ❞❛t❛ s❡t✳ ❊❧s❡✱ ✇❡ ❝♦✉❧❞♣r♦♣♦s❡ t♦ ✉s❡r s❡✈❡r❛❧ ❤✐❡r❛r❝❤✐❡s✱ ✇❤✐❝❤ ❛r❡ ♦❜t❛✐♥❡❞ ✇✐t❤ s❡✈❡r❛❧ ❧✐♥❦❛❣❡ ♠❡t❤♦❞s✳ ❚❤❡ ✉s❡r ❝♦✉❧❞ ❝❤♦♦s❡ ❤✐s❢❛✈♦r✐t❡ ❤✐❡r❛r❝❤②✳ ❚❤❡r❡ ❛r❡ t✇♦ ✇❛②s t♦ s❤♦✇ t❤❡ ❤✐❡r❛r❝❤✐❡s ❛t t❤❡ ✉s❡r✿ t❤❡ s②st❡♠ ❝❛♥ ♣r❡s❡♥t t❤❡ r❡s✉❧t ♦❢❤✐❡r❛r❝❤✐❝❛❧ ❛❣❣❧♦♠❡r❛t✐✈❡ ❝❧✉st❡r✐♥❣ ✇✐t❤ ❞✐✛❡r❡♥t ♣❛r❛♠❡t❡rs ♦r t❤❡ s②st❡♠ ❝❛♥ ♣r♦✈✐❞❡ t♦ t❤❡ ✉s❡r t❤❡ ♣♦ss✐❜✐❧✐t②t♦ t❡st t❤❡ ♥❡✇ ❝✉❜❡ ✭❇✐♠♦♥t❡ ❡t ❛❧✳✱ ✷✵✶✸✮✳

✶✼

Page 19: The Hierarchical Agglomerative Clustering with Gower index ... · OLAP technology interests more and more elds and especially biology. An OLAP cube provides a very easy navigation

❚❤❡ ✉s❡ ♦❢ t❤❡ ●♦✇❡r ✐♥❞❡①

❚❤❡ ✉s✐♥❣ ♦❢ t❤❡ ●♦✇❡r ✐♥❞❡① t♦ ♣❡r❢♦r♠ ❛ ❤✐❡r❛r❝❤✐❝❛❧ ❛❣❣❧♦♠❡r❛t✐✈❡ ❝❧✉st❡r✐♥❣ ❛s❦s s♦♠❡ q✉❡st✐♦♥s✳❋✐rst✱ t♦ ♣❡r❢♦r♠ ❛ ❤✐❡r❛r❝❤✐❝❛❧ ❛❣❣❧♦♠❡r❛t✐✈❡ ❝❧✉st❡r✐♥❣ ✇✐t❤ t❤❡ ●♦✇❡r ✐♥❞❡①✱ ✇❡ ♥❡❡❞ t♦ ❦♥♦✇ ✇❤❛t t❤❡ t②♣❡

♦❢ ❡❛❝❤ ✈❛r✐❛❜❧❡ ✐s✳ ■♥ t❤❡ s✉❜s❡❝t✐♦♥ ✸✳✶✳✹✱ ✇❡ s✉❣❣❡st ❛ ✇❛② t♦ ❞❡t❡r♠✐♥❡ ❛✉t♦♠❛t✐❝❛❧❧② t❤❡ t②♣❡ ♦❢ ❛ ✈❛r✐❛❜❧❡✳❇✉t t❤✐s ♠❡t❤♦❞ ✐s ♥♦t ♣❡r❢❡❝t ❛♥❞ t❤❡r❡ ✐s ❛♥ ❡rr♦r r✐s❦✳ ■♥ ♦✉r ❝❛s❡ ✇❡ ♦❜t❛✐♥ ❛♣♣r♦①✐♠❛t❡❧② ✶✵✪ ❡rr♦r✳ ❍♦✇❡✈❡r✇❡ ✐❞❡♥t✐❢② t✇♦ t②♣❡s ♦❢ ❡rr♦r ❛♥❞ ✇✐t❤ ♦✉r ❞❛t❛ s❡t ✇❡ ♦❜t❛✐♥ t❤❡ ❧❡ss ♣r♦❜❧❡♠❛t✐❝ ❡rr♦rs✳ ❚❤✉s t❤❡ t②♣❡ ♦❢ ❛✈❛r✐❛❜❧❡ s❤♦✉❧❞ ❜❡ ❞❡t❡r♠✐♥❡❞ ❜② ❛♥ ❛❧❣♦r✐t❤♠ ♦r ❞✐r❡❝t❧② ❜② t❤❡ ✉s❡r✱ ❛♥❞ t❤❡ ❞❛t❛❜❛s❡ ♠✉st s❛✈❡ t❤❡ ♠❡t❛❞❛t❛t❤❛t ✐♥❞✐❝❛t❡ t❤❡ t②♣❡ ♦❢ t❤❡ ✈❛r✐❛❜❧❡✳

❙❡❝♦♥❞❧②✱ ✇❡ ❝❛♥ q✉❡st✐♦♥ t❤❡ ❝❛❧❝✉❧❛t✐♦♥ ♦❢ t❤❡ ●♦✇❡r ✐♥❞❡①✳ ❆ ❤✐❡r❛r❝❤✐❝❛❧ ❛❣❣❧♦♠❡r❛t✐✈❡ ❝❧✉st❡r✐♥❣ ✇✐t❤ t❤❡●♦✇❡r ✐♥❞❡① ♣❡r♠✐ts ❜✉✐❧❞✐♥❣ ❛ ❤✐❡r❛r❝❤② ✇✐t❤ ❛ ♠✉❧t✐t②♣❡ ❞❛t❛ s❡t✳ ❇✉t t❤✐s ●♦✇❡r ✐♥❞❡① ♣♦s❡s t✇♦ ♣r♦❜❧❡♠s✿

❼ ❋♦r❡♠♦st✱ t❤❡ ♣r♦❝❡ss✐♥❣ ♦❢ ❛ ✈❛r✐❛❜❧❡ ❞❡♣❡♥❞s ♦♥ t❤❡ t②♣❡ ♦❢ t❤❡ ✈❛r✐❛❜❧❡✳ ❚❤✉s ✇❡ ❛r❡ ♥♦t s✉r❡ t❤❛t ❛❧❧ t❤❡✈❛r✐❛❜❧❡s ❤❛✈❡ t❤❡ s❛♠❡ ✇❡✐❣❤t ✐♥ t❤❡ ❝❛❧❝✉❧❛t✐♦♥ ♣r♦❝❡ss ♦❢ t❤❡ ●♦✇❡r ✐♥❞❡①✳

❼ ❖t❤❡r✇✐s❡✱ t❤❡ ♣r❡s❡♥❝❡ ♦❢ q✉❛❧✐t❛t✐✈❡ ✈❛r✐❛❜❧❡s ❜❛♥s t❤❡ ❝❛❧❝✉❧❛t✐♦♥ ♦❢ ❛ ❝❡♥tr♦✐❞ ♦r ❛♥ ❛✈❡r❛❣❡ ✐♥❞✐✈✐❞✉❛❧✳❚❤✉s t❤❡ ❝♦♠♣❛r✐s♦♥ ❜❡t✇❡❡♥ t✇♦ ❝❧✉st❡rs ❝❛♥ ❜❡ ♣r♦❜❧❡♠❛t✐❝✳

❚❤✉s t❤❡ ●♦✇❡r ✐♥❞❡① ♣❡r♠✐ts t❤❡ ✐♥t❡❣r❛t✐♦♥ ♦❢ q✉❛❧✐t❛t✐✈❡ ✈❛r✐❛❜❧❡s ✐♥ ❛ ❝❧✉st❡r✐♥❣ ♠❡t❤♦❞♦❧♦❣②✳ ❇✉t t❤❡s❡q✉❛❧✐t❛t✐✈❡ ✈❛r✐❛❜❧❡s ♠✉st ❜❡ ✉s❡❞ ❝❛✉t✐♦✉s❧②✳

❋✐♥❛❧❧②✱ t❤❡ ❝❛❧❝✉❧❛t✐♦♥ ♦❢ ●♦✇❡r ✐♥❞❡① r❡q✉✐r❡s ❦♥♦✇❧❡❞❣❡ ❛❜♦✉t t❤❡ t②♣❡ ♦❢ ✈❛r✐❛❜❧❡s ✭q✉❛❧✐t❛t✐✈❡ ♦r q✉❛♥t✐t❛✲t✐✈❡✮✳ ❇✉t t❤❡r❡ ✐s ❛ t❤✐r❞ ✈❛r✐❛❜❧❡ t②♣❡✿ ♦r❞✐♥❛❧ ✈❛r✐❛❜❧❡s✳ ❖r❞✐♥❛❧ ✈❛r✐❛❜❧❡s ❛r❡ q✉❛❧✐t❛t✐✈❡ ✈❛r✐❛❜❧❡s ❜✉t t❤❡r❡ ✐s❛♥ ♦r❞❡r r❡❧❛t✐♦♥s❤✐♣ ❜❡t✇❡❡♥ t❤❡ ❝❧❛ss❡s ♦❢ t❤❡ ✈❛r✐❛❜❧❡s✳ ❋♦r ❡①❛♠♣❧❡✱ ❛♥ ♦r❞✐♥❛❧ ✈❛r✐❛❜❧❡ ✐s ❛ ✈❛r✐❛❜❧❡ t❤❛t ❝❛♥t❛❦❡ t❤❡ ✈❛❧✉❡s ④✈❡r② ❧♦✇✱ ❧♦✇✱ ♠❡❞✐✉♠✱ ❤✐❣❤✱ ✈❡r② ❤✐❣❤⑥✳ ❚❤✐s ✈❛r✐❛❜❧❡ ✐s q✉❛❧✐t❛t✐✈❡✳ ❇✉t ✇❡ ❦♥♦✇ t❤❛t t❤❡ ✈❛❧✉❡✬✈❡r② ❧♦✇✬ ✐s ❝❧♦s❡r t♦ ✬❧♦✇✬ t❤❛♥ ✬✈❡r② ❤✐❣❤✬✳ ❆ ❝❛❧❝✉❧❛t✐♦♥ ♦❢ ❞✐st❛♥❝❡ ✐s t❤❡r❡❢♦r❡ ♣♦ss✐❜❧❡ ❜❡t✇❡❡♥ t✇♦ ✈❛❧✉❡s ♦❢t❤✐s ✈❛r✐❛❜❧❡✳ ❋♦r t❤❡ ♠♦♠❡♥t✱ t❤❡ ●♦✇❡r ✐♥❞❡① ✐s ♥♦t ❞❡✜♥❡❞ ❢♦r t❤❡ ♦r❞✐♥❛❧ ✈❛r✐❛❜❧❡s ❛♥❞ t❤❡ ♦r❞✐♥❛❧ ✈❛r✐❛❜❧❡s❛r❡ tr❡❛t❡❞ ❛s q✉❛❧✐t❛t✐✈❡ ✈❛r✐❛❜❧❡s✳ ■t ✇♦✉❧❞ ❜❡ ✐♥t❡r❡st✐♥❣ t♦ ❞❡✜♥❡ t❤❡ ●♦✇❡r ✐♥❞❡① ❢♦r ♦r❞✐♥❛❧ ✈❛r✐❛❜❧❡s✳ ❇✉t t❤❡❛✉t♦♠❛t✐❝ ❞❡t❡❝t✐♦♥ ♦❢ ♦r❞✐♥❛❧ ✈❛r✐❛❜❧❡s ✇♦✉❧❞ ❜❡ ❞✐✣❝✉❧t✳

❍♦✇ ❝❛♥ t❤❡ ❝❛❧❝✉❧❛t❡❞ ❝❧✉st❡rs ❜❡ ❝❤❛r❛❝t❡r✐③❡❞❄

❚❤❡ ✜♥❛❧ ♣♦✐♥t ♦❢ t❤✐s ❞✐s❝✉ss✐♦♥✱ ✇❤✐❝❤ ✐s ❢♦❝✉s❡❞ ♦♥ ♦✉r ♣r♦t♦t②♣❡✱ ✐s ❛❜♦✉t ❝❧✉st❡r ❝❤❛r❛❝t❡r✐③❛t✐♦♥✳ ❲✐t❤❛ ❞❛t❛ ♠✐♥✐♥❣ ♠❡t❤♦❞✱ ✇❡ ❞❡t❡r♠✐♥❡ ❛ ❤✐❡r❛r❝❤② ✐♥ t❤❡ ❞❛t❛✳ ❇✉t ❛❢t❡r t❤✐s ❝❛❧❝✉❧❛t✐♦♥✱ t❤❡ ❝❧✉st❡rs s❤♦✉❧❞ ❜❡❝❤❛r❛❝t❡r✐③❡❞✳ ❚❤❡r❡❜② t❤❡ s②st❡♠ ❝♦✉❧❞ ✜♥❞ ❛ ❧❛❜❡❧ ❢♦r ❡❛❝❤ ❝❧✉st❡r✳ ❲❡ ❝❛♥ ❡①♣❡❝t t❤❛t ❛ st❛t✐st✐❝❛❧ ♠❡t❤♦❞❝♦✉❧❞ ✜♥❞ ❛ ❧❛❜❡❧ ❢♦r ❡❛❝❤ ❝❧✉st❡r✳ ❲❡ ❞❡✈❡❧♦♣ ♥♦✇ ❛♥ ♦♣✐♥✐♦♥ t♦ ✜♥❞ ❧❛❜❡❧ ❢♦r ❡❛❝❤ ❝❧✉st❡r✳

❲❡ ❞❡✜♥❡ ❢♦✉r ♠❛✐♥ ❝❧✉st❡rs ✐♥ ♦✉r ❞❛t❛ ✇✐t❤ t❤❡ ❤✐❡r❛r❝❤② ♦♥ t❤❡ ❋✐❣✉r❡ ✽✳ ❲❡ ♣❡r❢♦r♠ st❛t✐st✐❝❛❧ t❡st t♦❞❡t❡r♠✐♥❡ ✇❤✐❝❤ ✈❛r✐❛❜❧❡s ❛r❡ r❡❧❛t❡❞ t♦ ❝❧✉st❡rs✳ ❲❡ ♣❡r❢♦r♠ ❈❤✐➨ t❡st ❢♦r q✉❛❧✐t❛t✐✈❡ ✈❛r✐❛❜❧❡ ❛♥❞ ❆◆❖❱❆ t❡st ❢♦rq✉❛♥t✐t❛t✐✈❡ ✈❛r✐❛❜❧❡s✳ ❲✐t❤ t❤❡s❡ t❡sts✱ ✇❡ ❦♥♦✇ ✇❤✐❝❤ ✈❛r✐❛❜❧❡s ❛r❡ s✐❣♥✐✜❝❛♥t❧② r❡❧❛t❡❞ t♦ t❤❡ ❝❧✉st❡rs✳ ❖♥ t❤❡❋✐❣✉r❡ ✶✷✱ t❤❡ ✈❛r✐❛❜❧❡s s✐❣♥✐✜❝❛♥t❧② r❡❧❛t❡❞ t♦ t❤❡ ❝❧✉st❡rs ❤❛✈❡ ❛ ♣✲✈❛❧✉❡ ✉♥❞❡r t❤❡ s✐❣♥✐✜❝❛♥❝❡ ❧❡✈❡❧ ♦❢ ✺✪✳ ❲❡❝❛♥ s❡❡ ♦♥ t❤✐s ✜❣✉r❡✱ t❤❛t t❤❡ ❧❛♥❞ ❝♦✈❡r ♦❢ ❛q✉❛t✐❝ ❡♥✈✐r♦♥♠❡♥t ✭▼■❆◗✮ ❛♥❞ t❤❡ ❧❛♥❞ ❝♦✈❡r ♦❢ ✉r❜❛♥ ❛r❡❛ ✭❯❘❇❆✮❛r❡ ♥♦t s✐❣♥✐✜❝❛♥t❧② r❡❧❛t❡❞ ✭✇✐t❤ ❛ s✐❣♥✐✜❝❛♥❝❡ ❧❡✈❡❧ ♦❢ ✺✪✮ t♦ t❤❡ ❝❧✉st❡rs✳ ❆❧❧ ♦t❤❡r ✈❛r✐❛❜❧❡s ❛r❡ s✐❣♥✐✜❝❛♥t❧②r❡❧❛t❡❞ t♦ t❤❡ ❝❧✉st❡rs✳

■❢ ✇❡ ❝♦♥s✐❞❡r ❛ s✐❣♥✐✜❝❛♥t r❡❧❛t❡❞ ✈❛r✐❛❜❧❡✱ ✇❡ ❝❛♥ ❝❤❛r❛❝t❡r✐③❡ ❡❛❝❤ ❝❧✉st❡r✳ ❋♦r ❡①❛♠♣❧❡✱ t❤❡ ♠❛①✐♠✉♠ ❤❡✐❣❤t♦❢ r✐♣❛r✐❛♥ ❢♦r❡st ✐s ♥❡❛r t♦ ✵ ♠ ❢♦r t❤❡ st❛t✐♦♥s ♦❢ t❤❡ ❝❧✉st❡r ♥➦✶ ❛♥❞ ❜❡t✇❡❡♥ ✶✵ ❛♥❞ ✸✺ ♠ ❢♦r t❤❡ st❛t✐♦♥s ♦❢t❤❡ ❝❧✉st❡r ♥➦✹ ✭❋✐❣✉r❡ ✶✸✮✳ ❆❝❝♦r❞✐♥❣ t♦ t❤❡ ❋✐❣✉r❡ ✶✸✱ t❤❡ ❝❧✉st❡r ♥➦✶ ✐s ❝❤❛r❛❝t❡r✐③❡❞ ❜② ❧♦✇ ✈❛❧✉❡s ♦❢ ♠❛①✐♠✉♠❤❡✐❣❤t ♦❢ r✐♣❛r✐❛♥ ❢♦r❡st✱ t❤❡ ❝❧✉st❡r ♥➦✷ ❛♥❞ t❤❡ ❝❧✉st❡r ♥➦✸ ✐s ❝❤❛r❛❝t❡r✐③❡❞ ❜② ♠❡❞✐✉♠ ✈❛❧✉❡s ♦❢ ♠❛①✐♠✉♠ ❤❡✐❣❤t♦❢ r✐♣❛r✐❛♥ ❢♦r❡st ❛♥❞ t❤❡ ❝❧✉st❡r ♥➦✹ ✐s ❝❤❛r❛❝t❡r✐③❡❞ ❜② ❤✐❣❤ ✈❛❧✉❡s ♦❢ ♠❛①✐♠✉♠ ❤❡✐❣❤t ♦❢ r✐♣❛r✐❛♥ ❢♦r❡st✳ ❖♥ t❤✐s✜❣✉r❡✱ t❤❡ r❡❞ ❧✐♥❡ r❡♣r❡s❡♥ts t❤❡ ♠❡❞✐❛♥✳

■❢ t❤✐s ❦✐♥❞ ♦❢ ♠❡t❤♦❞♦❧♦❣② ✐s ❞❡✈❡❧♦♣❡❞ ❛♥❞ ❛✉t♦♠❛t✐③❡❞✱ t❤❡ s②st❡♠ ❝♦✉❧❞ ❜❡ ✜♥❞ ❧❛❜❡❧ ❢♦r ❡❛❝❤ ❞❛t❛ ❝❧✉st❡rs✳❚❤❡r❡ ✐s ❛ ♥♦t❝❤ ❛r♦✉♥❞ t❤❡ ♠❡❞✐❛♥✳ ■❢ t❤❡ ♥♦t❝❤❡s ♦❢ t✇♦ ❜♦①♣❧♦t ❞♦ ♥♦t ♦✈❡r❧❛♣✱ ✇❡ ❝❛♥ ❝♦♥❝❧✉❞❡ t❤❛t t❤❡ ♠❡❞✐❛♥s❞✐✛❡r ✇✐t❤ ✾✺✪ ❝♦♥✜❞❡♥❝❡✳

✶✽

Page 20: The Hierarchical Agglomerative Clustering with Gower index ... · OLAP technology interests more and more elds and especially biology. An OLAP cube provides a very easy navigation

0,00

0,05

0,10

0,15

0,20

0,25

0,30

CO

NF

CO

UR

FR

AG

GR

VL

ST

RA

T

SU

BS

VE

AQ

ALT

I

CU

LT

DFO

R

DV

IL

FO

RB

FO

RP

HA

RI

HM

AX

IUR

B

LAR

B

LAR

I

LAR

R

LAR

V

LDIV

MIA

Q

NLI

N

NM

IL

PE

NT

PR

AI

RO

CH

SA

LI

UR

BA

VA

LI

qualitative quantitative

p

significance level of p

❋✐❣✉r❡ ✶✷✿ ♣✲✈❛❧✉❡s ♦❢ st❛t✐st✐❝❛❧ t❡sts ❢♦r ❡❛❝❤ ✈❛r✐❛❜❧❡✱ ✇❤✐❝❤ ❛r❡ ✉s❡❞ t♦ ❜✉✐❧❞ t❤❡ ❤✐❡r❛r❝❤②

❋✐❣✉r❡ ✶✸✿ ❱❛❧✉❡s ♦❢ t❤❡ ♠❛①✐♠✉♠ ❤❡✐❣❤t ♦❢ r✐♣❛r✐❛♥ ❢♦r❡st ✭❍▼❆❳✱ ✐♥ ♠❡t❡rs✮ ❛❝❝♦r❞✐♥❣ t♦ t❤❡ ❝❧✉st❡r✐♥❣ r❡s✉❧ts

❉✐s❝✉ss✐♦♥ ❛❜♦✉t t❤❡ s②st❡♠ ♣❡r❢♦r♠❛♥❝❡s

■♥ t❤✐s ♣❛rt✱ ✇❡ ❞✐s❝✉ss ❛❜♦✉t t❤❡ ♣❡r❢♦r♠❛♥❝❡s ♦❢ t❤❡ s②st❡♠ t❤❛t ✐s ♣r♦♣♦s❡❞ ❛♥❞ ✇❡ s✉❣❣❡st ♣❡rs♣❡❝t✐✈❡s t♦✐♠♣r♦✈❡ t❤❡ ♣r♦t♦t②♣❡ ♣❡r❢♦r♠❛♥❝❡s✳ ■♥ ❢❛❝t✱ ✇❡ ❤❛✈❡ ♠❛❞❡ ❝❤♦✐❝❡s ❛❜♦✉t t❤❡ ❞❛t❛ ♠✐♥✐♥❣ ♠❡t❤♦❞✱ ✇❤✐❝❤ ✐s ✉s❡❞t♦ ❝❛❧❝✉❧❛t❡ t❤❡ ♥❡✇ ❤✐❡r❛r❝❤②✳ ❇✉t t❤❡s❡ ❝❤♦✐❝❡s ❤❛✈❡ ❛ str♦♥❣ ✐♠♣❛❝t ♦♥ t❤❡ ❝❛❧❝✉❧❛t✐♦♥ t✐♠❡ ♦❢ ❛ ♥❡✇ ❤✐❡r❛r❝❤②✳

❋✐rst✱ t❤❡ ❤✐❡r❛r❝❤✐❝❛❧ ❛❣❣❧♦♠❡r❛t✐✈❡ ❝❧✉st❡r✐♥❣ ♣❡r♠✐ts t♦ ♦❜t❛✐♥ ❛ ❝♦♠♣❧❡t❡ ❤✐❡r❛r❝❤② ♦❢ t❤❡ ❞❛t❛✳ ❇✉t ✇❡ ❝❛♥t❤✐♥❦ t❤❛t t❤❡ s②st❡♠ ❝❛♥ ✇♦r❦ ✇✐t❤ ❛♥♦t❤❡r ❝❧✉st❡r✐♥❣ ♠❡t❤♦❞✱ ❧✐❦❡ t❤❡ ❑✲♠❡❛♥s ❝❧✉st❡r✐♥❣ ❛❧❣♦r✐t❤♠✳ ❆ ♠♦r❡

✶✾

Page 21: The Hierarchical Agglomerative Clustering with Gower index ... · OLAP technology interests more and more elds and especially biology. An OLAP cube provides a very easy navigation

s✐♠♣❧❡ ❝❧✉st❡r✐♥❣ ♠❡t❤♦❞ ♠❛② ♦✛❡r ❜❡tt❡r ❝❛❧❝✉❧❛t✐♦♥ ♣❡r❢♦r♠❛♥❝❡s✳ ❇✉t ✇❡ ❦♥♦✇ t❤❛t ✇✐t❤ ❛♥ ❛❧❣♦r✐t❤♠✱ ❧✐❦❡❑✲♠❡❛♥s ❛❧❣♦r✐t❤♠✱ t❤❡ ❝❛❧❝✉❧❛t❡❞ ❤✐❡r❛r❝❤② ✇✐❧❧ ❜❡ s✐♠♣❧❡✱ ✇✐t❤ ♦♥❧② ❛ ❧❡✈❡❧✳ ❚❤✉s✱ ✐♠♣r♦✈✐♥❣ ♣❡r❢♦r♠❛♥❝❡s ✇✐t❤❛ s✐♠♣❧❡r ❛❧❣♦r✐t❤♠ ♣r♦❞✉❝❡s ❛ s✐♠♣❧❡r ❤✐❡r❛r❝❤②✳ ❚❤❡ q✉❡st✐♦♥ ✐s✿ ✇❤❡♥ t❤❡ ❤✐❡r❛r❝❤✐❝❛❧ ❛❣❣❧♦♠❡r❛t✐✈❡ ❝❧✉st❡r✐♥❣✐s ❣❛✐♥❢✉❧ ❄ ✐✳❡✳ ✇❤❡♥ t❤❡ ❤✐❡r❛r❝❤✐❝❛❧ ❛❣❣❧♦♠❡r❛t✐✈❡ ❞♦❡s ♣r♦✈✐❞❡ ❛♥ ✐♥t❡r❡st✐♥❣ ❤✐❡r❛r❝❤② ✭♥♦ ♠♦r❡ s✐♠♣❧❡ ❛♥❞ ♥♦♠♦r❡ ❝♦♠♣❧❡①✮✱ ✇❤✐❝❤ ✇❛rr❛♥ts t❤❡ ❤✐❣❤ ❝❛❧❝✉❧❛t✐♦♥ t✐♠❡ ❄

❙❡❝♦♥❞❧②✱ ♦✉r ❝❧✉st❡r✐♥❣ ❛❧❣♦r✐t❤♠ ✐s ♥♦t ♦♣t✐♠✐③❡❞✳ ❇✉t ✇❡ t❤✐♥❦ t❤❛t t❤❡ ♣❡r❢♦r♠❛♥❝❡s ♦❢ ♦✉r ♣r♦t♦t②♣❡ ❝❛♥❜❡ ✐♠♣r♦✈❡❞✱ ❜❡❝❛✉s❡ s❡✈❡r❛❧ st❡♣s ♦❢ t❤❡ ❝❛❧❝✉❧❛t✐♦♥ ❝❛♥ ❜❡ ♣❛r❛❧❧❡❧✐③❡❞✳

❚❤❡r❡❜②✱ t❤❡ ❝❛❧❝✉❧❛t✐♦♥ t✐♠❡ ♣❡r❢♦r♠❛♥❝❡s ❝❛♥ ❜❡ ✇✐❞❡❧② ✐♠♣r♦✈❡❞✳

❈♦♥❝❧✉s✐♦♥

■♥ t❤✐s ❛rt✐❝❧❡✱ ✇❡ ♣r❡s❡♥t❡❞ ❛ ♠❡t❤♦❞ t♦ ❜✉✐❧❞ ❛✉t♦♠❛t✐❝❛❧❧② ♥❡✇ ❤✐❡r❛r❝❤✐❡s ✐♥ ❛ ❞✐♠❡♥s✐♦♥ ✇✐t❤ ❛ ❝❧✉st❡r✐♥❣❛❧❣♦r✐t❤♠✳ ❚❤❡ ♣r♦t♦t②♣❡ t❤❛t ✇❡ ❤❛✈❡ ❜✉✐❧t ✐s ❛❜❧❡ t♦ ❞❡s✐❣♥ ❛♥❞ ♣✉❜❧✐s❤ ❛ ♥❡✇ ❖▲❆P s❝❤❡♠❛ ❛♥❞ ❛ ♥❡✇ ❖▲❆P❝✉❜❡ ❢r♦♠ ❛ t❛❜❧❡ ♦❢ ❛ ❞❛t❛ ✇❛r❡❤♦✉s❡✳

❖✉r s②st❡♠ ❧♦❛❞s t❤❡ ❞❛t❛ ❢r♦♠ ❛ ❞❛t❛ ✇❛r❡❤♦✉s❡✳ ◆❡①t t❤❡ s②st❡♠ ❝❛❧❝✉❧❛t❡s ❛ ❤✐❡r❛r❝❤② ✇✐t❤ ❛ ❤✐❡r❛r❝❤✐❝❛❧❛❣❣❧♦♠❡r❛t✐✈❡ ❝❧✉st❡r✐♥❣✳ ❇✉t✱ t❤❡ ❞❛t❛ s❡ts✱ ✇❤✐❝❤ ❛r❡ ✉s❡❞ ✐♥ ❡❝♦❧♦❣②✱ ❝♦♥t❛✐♥ ♦❢t❡♥ q✉❛❧✐t❛t✐✈❡ ✈❛r✐❛❜❧❡s ❛♥❞q✉❛♥t✐t❛t✐✈❡ ✈❛r✐❛❜❧❡s✳ ▼♦r❡♦✈❡r ❛ ❞❛t❛ s❡t ❝❛♥ ❝♦♥t❛✐♥ ♠✐ss✐♥❣ ✈❛❧✉❡s✳ ❚♦ ♠❛♥❛❣❡ t❤✐s ❞❛t❛ s❡t ❛♥❞ ♣❡r❢♦r♠ ❛❤✐❡r❛r❝❤✐❝❛❧ ❛❣❣❧♦♠❡r❛t✐✈❡ ❝❧✉st❡r✐♥❣✱ ✇❡ ✉s❡ ❛ s✐♠✐❧❛r✐t② ✐♥❞❡① t♦ ❝❤❛r❛❝t❡r✐③❡ t❤❡ ❞✐st❛♥❝❡ ❜❡t✇❡❡♥ t✇♦ r❡❝♦r❞s✳❚❤✐s s✐♠✐❧❛r✐t② ✐♥❞❡① ✐s t❤❡ ●♦✇❡r ✐♥❞❡①✱ ❛♥ ✐♥❞❡① ❝♦♠❡s ❢r♦♠ t❤❡ ❡❝♦❧♦❣②✳ ❚❤❡ ●♦✇❡r ✐♥❞❡① ♣❡r♠✐ts t♦ ♠✐①q✉❛❧✐t❛t✐✈❡ ❛♥❞ q✉❛♥t✐t❛t✐✈❡ ✈❛r✐❛❜❧❡s ❛♥❞ s♦ t❤✐s s✐♠✐❧❛r✐t② ✐♥❞❡① ♣❡r♠✐ts t❤❡ ❝♦♠♣❛r✐s♦♥ ❜❡t✇❡❡♥ ✐♥❞✐✈✐❞✉❛❧s t❤❛t❛r❡ ❞❡s❝r✐❜❡❞ ❜② ❤❡t❡r♦❣❡♥❡♦✉s ✈❛r✐❛❜❧❡s✳ ▼♦r❡♦✈❡r t❤❡ ●♦✇❡r ✐♥❞❡① ♠❛♥❛❣❡s ♠✐ss✐♥❣ ✈❛❧✉❡s✳ ❚♦ ❝♦♠♣❛r❡ t✇♦✐♥❞✐✈✐❞✉❛❧s✱ t❤✐s s✐♠✐❧❛r✐t② ✐♥❞❡① ❝❛❧❝✉❧❛t❡s ❛ ✇❡✐❣❤t❡❞ ❛✈❡r❛❣❡ ♦❢ s✐♠✐❧❛r✐t✐❡s✳ ❙✐♠✐❧❛r✐t✐❡s ❛r❡ ❝❛❧❝✉❧❛t❡❞ ❢♦r ❡❛❝❤✈❛r✐❛❜❧❡ ❛♥❞ t❤❡ ❢♦r♠✉❧❛ ❞❡♣❡♥❞s ♦♥ t❤❡ t②♣❡ ♦❢ ✈❛r✐❛❜❧❡ ✭q✉❛❧✐t❛t✐✈❡ ♦r q✉❛♥t✐t❛t✐✈❡✮✳ ❚❤❡ ✇❡✐❣❤ts ❝♦♥❝❡r♥ t❤❡✈❛r✐❛❜❧❡s ❛♥❞ ♣❡r♠✐t t♦ ♠❛♥❛❣❡ ♠✐ss✐♥❣ ✈❛❧✉❡s✳

❯s✐♥❣ t❤❡ ●♦✇❡r ✐♥❞❡① ❡♥t❛✐❧s t❤❡ ✐❞❡♥t✐✜❝❛t✐♦♥ ♦❢ t❤❡ t②♣❡ ♦❢ ✈❛r✐❛❜❧❡s✳ ❚❤✐s ✐❞❡♥t✐✜❝❛t✐♦♥ ❝❛♥ ❜❡ ❡♥tr✉st❡❞t♦ t❤❡ ✉s❡r✳ ❇✉t t❤❡ t②♣❡ ♦❢ ❛ ✈❛r✐❛❜❧❡ ❝❛♥ ❜❡ ❛❧s♦ ❞❡t❡r♠✐♥❡❞ ❜② ❛♥ ❛❧❣♦r✐t❤♠ ❛❝❝♦r❞✐♥❣ t❤❡ ❞❛t❛ t②♣❡ ✭t❡①t ♦r♥✉♠❡r✐❝✮ ❛♥❞ t❤❡ ♥✉♠❜❡r ♦❢ ✈❛❧✉❡s✳ ❚♦ ❛✉t♦♠❛t✐③❡ t❤❡ ❞❡❝✐s✐♦♥ ♣r♦❝❡ss ❛❜♦✉t t❤❡ t②♣❡ ♦❢ ✈❛r✐❛❜❧❡✱ ✇❡ ❝♦♥str✉❝t ❛❞❡❝✐s✐♦♥ tr❡❡ ✇✐t❤ ❡①t❡r♥❛❧ ❞❛t❛ s❡ts✳ ❚❤❡ ❞❡❝✐s✐♦♥ tr❡❡ ❝❧❛ss✐✜❡s t❤❡ ✈❛r✐❛❜❧❡ ❛❝❝♦r❞✐♥❣ t♦ t❤❡ ❞❛t❛ t②♣❡ ✭t❡①t ♦r♥✉♠❡r✐❝✮ ❛♥❞ t❤❡ ♥✉♠❜❡r ♦❢ ✈❛❧✉❡s✳ ❲❡ ♣♦✐♥t t❤❡ t❤r❡s❤♦❧❞ ♦❢ t❤❡ ♥✉♠❜❡r ♦❢ ✈❛❧✉❡s✿ ✐❢ t❤❡ ❞❛t❛ t②♣❡ ✐s ♥✉♠❡r✐❝❛♥❞ ✐s t❤❡ ♥✉♠❜❡r ♦❢ ✈❛❧✉❡s ✐❢ ❧♦✇❡r t❤❛♥ ✻ t❤❡♥ t❤❡ ✈❛r✐❛❜❧❡ ✐s q✉❛❧✐t❛t✐✈❡✳ ❊❧s❡✱ ✐❢ t❤❡ ❞❛t❛ t②♣❡ ✐s ♥✉♠❡r✐❝ ❛♥❞ ✐st❤❡ ♥✉♠❜❡r ♦❢ ✈❛❧✉❡s ✐❢ ❤✐❣❤❡r t❤❛♥ ✻ t❤❡♥ t❤❡ ✈❛r✐❛❜❧❡ ✐s q✉❛♥t✐t❛t✐✈❡✳

❆❢t❡r t❤❡ ❝❛❧❝✉❧❛t✐♦♥ ♦❢ t❤❡ ♥❡✇ ❤✐❡r❛r❝❤②✱ t❤❡ s②st❡♠ ❜✉✐❧❞s ❛ ♥❡✇ ❞✐♠❡♥s✐♦♥ ✐♥ t❤❡ ❞❛t❛ ✇❛r❡❤♦✉s❡ ❛♥❞♣✉❜❧✐s❤❡s t❤❡ ❝✉❜❡ ♦♥ t❤❡ ❖▲❆P s❡r✈❡r ✇✐t❤ ❛ ❳▼▲ ✜❧❡✳

❚❤✉s ✇✐t❤ t❤✐s ❦✐♥❞ ♦❢ ♠❡t❤♦❞ ✇❡ ❝❛♥ ❜✉✐❧❞ ❛ ❤✐❡r❛r❝❤② ❜❛s❡❞ ♦♥ t❤❡ str✉❝t✉r❡ ♦❢ t❤❡ ❞❛t❛✱ ✇❤❡♥ t❤❡ ❞✐♠❡♥s✐♦♥❝♦♥t❛✐♥s ❤❡t❡r♦❣❡♥❡♦✉s ❞❛t❛ ♦r ✇❤❡♥ t❤❡ ❞❛t❛ ❛r❡ ♥♦t ❤✐❡r❛r❝❤✐❝❛❧✳

❲❡ ❤❛✈❡ ♠❡❛s✉r❡❞ t❤❡ ♣❡r❢♦r♠❛♥❝❡s ♦❢ ♦✉r ♣r♦t♦t②♣❡✳ ❲❡ ❤❛✈❡ ♠❡❛s✉r❡❞ t❤❡ ♥❡❡❞❢✉❧ ❝❛❧❝✉❧❛t✐♦♥ t✐♠❡ ❛♥❞ t❤❡♥❡❡❞❢✉❧ ♠❡♠♦r② t♦ ♣❡r❢♦r♠ ❛ ❤✐❡r❛r❝❤✐❝❛❧ ❛❣❣❧♦♠❡r❛t✐✈❡ ❝❧✉st❡r✐♥❣ ✇✐t❤ t❤❡ ●♦✇❡r ✐♥❞❡①✳ ❲❡ ❛♣♣r♦①✐♠❛t❡ t❤❡♥❡❡❞❢✉❧ ♠❡♠♦r② ✇✐t❤ t❤❡ ❤❡✐❣❤t ♦❢ t❤❡ ❜✐♥❛r② tr❡❡ ✇❤✐❝❤ ✐s t❤❡ r❡s✉❧t ♦❢ ❛ ❤✐❡r❛r❝❤✐❝❛❧ ❝❧✉st❡r✐♥❣ ❛❧❣♦r✐t❤♠✳ ❚❤❡s❡♣❡r❢♦r♠❛♥❝❡ ♠❡❛s✉r❡♠❡♥ts s❤♦✇ t❤❛t✿

❼ ❚❤❡ ❤❡✐❣❤t ♦❢ t❤❡ ❝❛❧❝✉❧❛t❡❞ tr❡❡ ✐s ❢♦❧❧♦✇s ❛ ❧♦❣❛r✐t❤♠✐❝ ❢✉♥❝t✐♦♥ ❛❝❝♦r❞✐♥❣ t♦ t❤❡ ♥✉♠❜❡r ♦❢ ✐♥❞✐✈✐❞✉❛❧s ❛♥❞✐s ❛ ❝♦♥st❛♥t ❛❝❝♦r❞✐♥❣ t♦ t❤❡ ♥✉♠❜❡r ♦❢ ✈❛r✐❛❜❧❡s✳

❼ ❚❤❡ ❝❛❧❝✉❧❛t✐♦♥ t✐♠❡ ❢♦❧❧♦✇s ❛ q✉❛❞r❛t✐❝ ❢✉♥❝t✐♦♥ ❛❝❝♦r❞✐♥❣ t♦ t❤❡ ♥✉♠❜❡r ♦❢ ✐♥❞✐✈✐❞✉❛❧s ❛♥❞ ❛ ❧✐♥❡❛r ❢✉♥❝t✐♦♥❛❝❝♦r❞✐♥❣ t♦ t❤❡ ♥✉♠❜❡r ♦❢ ✈❛r✐❛❜❧❡s✳

❚❤❡ ❝❛❧❝✉❧❛t✐♦♥ t✐♠❡ ♣❡r❢♦r♠❛♥❝❡s ❛r❡ ♥♦t ✈❡r② s❛t✐s❢❛❝t♦r②✳ ■♥❞❡❡❞ ❛ ❣♦♦❞ ♣❡r❢♦r♠❛♥❝❡ ❢♦r ❛♥ ❛❧❣♦r✐t❤♠ ✐s ❛ t✐♠❡❢✉♥❝t✐♦♥ ✉♥❞❡r t❤❡ ❧✐♥❡❛r ❢✉♥❝t✐♦♥✱ ❧✐❦❡ ❧♦❣❛r✐t❤♠✐❝ ❢✉♥❝t✐♦♥✳ ❚❤❡ ❛❧❣♦r✐t❤♠✱ t❤❛t ✇❡ ❤❛✈❡ ✇r✐tt❡♥ t♦ ❝❛❧❝✉❧❛t❡❤✐❡r❛r❝❤② ✇✐t❤ t❤❡ ●♦✇❡r ✐♥❞❡①✱ ❤❛s ❛ ❝❛❧❝✉❧❛t✐♦♥ t✐♠❡ ❢✉♥❝t✐♦♥ ❡q✉❛❧ t♦ ❛ q✉❛❞r❛t✐❝ ❢✉♥❝t✐♦♥ ❛❝❝♦r❞✐♥❣ t♦ t❤❡♥✉♠❜❡r ♦❢ ❤✐❡r❛r❝❤② ♠❡♠❜❡rs✳ ❇✉t t❤✐s ❛❧❣♦r✐t❤♠ ✐s ♥♦t ♦♣t✐♠✐③❡❞ ❛♥❞ ✇❡ ❡①♣❡❝t t❤❛t s♦♠❡ ❝❛❧❝✉❧❛t✐♦♥s ❝❛♥ ❜❡♣❛r❛❧❧❡❧✐③❡❞✳ ❚❤❡r❡❜② t❤❡ ❝❛❧❝✉❧❛t✐♦♥ t✐♠❡ ♣❡r❢♦r♠❛♥❝❡s ❝❛♥ ❜❡ ✐♠♣r♦✈❡❞✳

■♥ ❝♦♥❝❧✉s✐♦♥✱ t❤❡ ❞❛t❛ ♠✐♥✐♥❣✱ ❛♥❞ ✐♥ ♣❛rt✐❝✉❧❛r t❤❡ ❝❧✉st❡r✐♥❣ ♠❡t❤♦❞s✱ ♣❡r♠✐ts t♦ ❛♥❛❧②③❡ t❤❡ str✉❝t✉r❡ ♦❢ t❤❡❞❛t❛✳ ❚❤✐s str✉❝t✉r❡ ❝❛♥ ❜❡ ✉s❡❞ t♦ ❜✉✐❧❞ ❞✐♠❡♥s✐♦♥s ❛✉t♦♠❛t✐❝❛❧❧② ✐♥ ❛♥ ❖▲❆P ❝✉❜❡✳ ❚❤✐s t②♣❡ ♦❢ ❛♥❛❧②s✐s ❝❛♥r❡s♦❧✈❡ ♣r♦❜❧❡♠s ♦❢ ❖▲❆P ❝✉❜❡s ♠♦❞❡❧✐♥❣✱ ✐♥ ♣❛rt✐❝✉❧❛r ✐❢ t❤❡ ❞❛t❛ s❡t ❝♦♥t❛✐♥s ♠✐ss✐♥❣ ✈❛❧✉❡s✱ ♦r ✐♥❝♦♥s✐st❡♥❝②❛❝❝♦r❞✐♥❣ t♦ s♣❛❝❡ ♦r t✐♠❡✳

✷✵

Page 22: The Hierarchical Agglomerative Clustering with Gower index ... · OLAP technology interests more and more elds and especially biology. An OLAP cube provides a very easy navigation

❋✐❣✉r❡ ✶✹✿ ◆♦t❛t✐♦♥s ❢♦r ♠✉❧t✐❞✐♠❡♥s✐♦♥❛❧ ♠♦❞❡❧✿ ✭❛✮ ❧❡✈❡❧✱ ✭❜✮ ❤✐❡r❛r❝❤②✱ ✭❝✮ ❝❛r❞✐♥❛❧✐t✐❡s✱ ✭❞✮ ❛♥❛❧②s✐s ❝r✐t❡r✐♦♥✱ ❛♥❞ ✭❡✮ ❢❛❝t r❡❧❛t✐♦♥s❤✐♣✳

❆♣♣❡♥❞✐①✿ ▼✉❧t✐❉✐♠❊❘ ♥♦t❛t✐♦♥s

❆s ❛ r❡♠✐♥❞❡r✱ ✇❡ ♣r♦✈✐❞❡ t❤❡ ♥♦t❛t✐♦♥s ❞❡✜♥❡❞ ❜② ▼❛❧✐♥♦✇s❦✐ ❛♥❞ ❩✐♠❛♥②✐ ✐♥ ✭▼❛❧✐♥♦✇s❦✐ ❛♥❞ ❩✐♠❛♥②✐✱ ✷✵✵✻✮t♦ ❞❡s❝r✐❜❡ ❛ ❞❛t❛ ✇❛r❡❤♦✉s❡ ❛t t❤❡ ❝♦♥❝❡♣t✉❛❧ ❧❡✈❡❧✳ ❚❤❡ ❢♦❧❧♦✇✐♥❣ ✜❣✉r❡ s✉♠♠❛r✐③❡s t❤❡ ♥♦t❛t✐♦♥s ✿

❘❡❢❡r❡♥❝❡s

❬✶❪ ❆❜❞❡❧❤❡❞✐✱ ❋✳✱ P✉❥♦❧❧❡✱ ●✳✱ ❚❡st❡✱ ❖✳✱ ❩✉r✢✉❤✱ ●✳✱ ✷✵✶✶✳ ❈♦♠♣✉t❡r✲❛✐❞❡❞ ❞❛t❛✲♠❛rt ❞❡s✐❣♥✱ ✐♥✿ ✶✸t❤ ■♥t❡r♥❛✲t✐♦♥❛❧ ❈♦♥❢❡r❡♥❝❡ ♦♥ ❊♥tr❡♣r✐s❡ ■♥❢♦r♠❛t✐♦♥ ❙②st❡♠s ✭■❈❊■❙ ✷✵✶✶✮✳

❬✷❪ ❇❛❝❤❡✱ ❑✳✱ ▲✐❝❤♠❛♥✱ ▼✳✱ ✷✵✶✸✳ ❯❈■ ♠❛❝❤✐♥❡ ❧❡❛r♥✐♥❣ r❡♣♦s✐t♦r②✳

❬✸❪ ❇❡♥t❛②❡❜✱ ❋✳✱ ✷✵✵✽✳ ❑✲♠❡❛♥s ❜❛s❡❞ ❛♣♣r♦❛❝❤ ❢♦r ♦❧❛♣ ❞✐♠❡♥s✐♦♥ ✉♣❞❛t❡s✱ ✐♥✿ ✶✵t❤ ■♥t❡r♥❛t✐♦♥❛❧ ❈♦♥❢❡r❡♥❝❡♦♥ ❊♥t❡r♣r✐s❡ ■♥❢♦r♠❛t✐♦♥ ❙②st❡♠s ✭■❈❊■❙✮✱ ♣♣✳ ✺✸✶✕✺✸✹✳

❬✹❪ ❇❡♥t❛②❡❜✱ ❋✳✱ ❑❤❡♠✐r✐✱ ❘✳✱ ✷✵✶✸✳ ❆❞❛♣t✐♥❣ ♦❧❛♣ ❛♥❛❧②s✐s t♦ ✉s❡rs ❝♦♥str❛✐♥ts t❤r♦✉❣❤ s❡♠❛♥t✐❝ ❤✐❡r❛r❝❤✐❡s✱✐♥✿ Pr♦❝❡❡❞✐♥❣s ♦❢ t❤❡ ✶✺t❤ ■♥t❡r♥❛t✐♦♥❛❧ ❈♦♥❢❡r❡♥❝❡ ♦♥ ❊♥t❡r♣r✐s❡ ■♥❢♦r♠❛t✐♦♥ ❙②st❡♠s ✭■❈❊■❙ ✷✵✶✸✮✱ ♣♣✳✶✻✵✕✶✻✼✳

❬✺❪ ❇✐♠♦♥t❡✱ ❙✳✱ ❊❞♦❤✲❆❧♦✈❡✱ ➱✳✱ ◆❛③✐❤✱ ❍✳✱ ❑❛♥❣✱ ▼✳❆✳✱ ❘✐③③✐✱ ❙✳✱ ✷✵✶✸✳ Pr♦t♦❧❛♣✿ ❘❛♣✐❞ ♦❧❛♣ ♣r♦t♦t②♣✐♥❣ ✇✐t❤♦♥✲❞❡♠❛♥❞ ❞❛t❛ s✉♣♣❧②✱ ✐♥✿ Pr♦❝❡❡❞✐♥❣s ♦❢ t❤❡ s✐①t❡❡♥t❤ ✐♥t❡r♥❛t✐♦♥❛❧ ✇♦r❦s❤♦♣ ♦♥ ❉❛t❛ ✇❛r❡❤♦✉s✐♥❣ ❛♥❞❖▲❆P✱ ❆❈▼✳ ♣♣✳ ✻✶✕✻✻✳

❬✻❪ ❇❧♦♥❞❡❧✱ ❏✳✱ ❋❡rr②✱ ❈✳✱ ❋r♦❝❤♦t✱ ❇✳✱ ✶✾✽✶✳ ❊st✐♠❛t✐♥❣ ◆✉♠❜❡rs ♦❢ ❚❡rr❡str✐❛❧ ❇✐r❞s✳ ❙t✉❞✐❡s ✐♥ ❛✈✐❛♥ ❜✐♦❧♦❣②✳✳❘❆▲P❍ ❛♥❞ ❙❈❖❚❚ ❊❞s✳✳ ✈♦❧✉♠❡ ✻✳ ❝❤❛♣t❡r P♦✐♥t ❝♦✉♥ts ✇✐t❤ ✉♥❧✐♠✐t❡❞ ❞✐st❛♥❝❡✳ ♣♣✳ ✹✶✹✕✹✷✵✳

❬✼❪ ❈❡❝✐✱ ▼✳✱ ❈✉③③♦❝r❡❛✱ ❆✳✱ ▼❛❧❡r❜❛✱ ❉✳✱ ✷✵✶✶✳ ❖❧❛♣ ♦✈❡r ❝♦♥t✐♥✉♦✉s ❞♦♠❛✐♥s ✈✐❛ ❞❡♥s✐t②✲❜❛s❡❞ ❤✐❡r❛r❝❤✐❝❛❧❝❧✉st❡r✐♥❣✱ ✐♥✿ ✶✺t❤ ■♥t❡r♥❛t✐♦♥❛❧ ❈♦♥❢❡r❡♥❝❡ ♦♥ ❑♥♦✇❧❡❞❣❡✲❇❛s❡❞ ❛♥❞ ■♥t❡❧❧✐❣❡♥t ■♥❢♦r♠❛t✐♦♥ ❛♥❞ ❊♥❣✐♥❡❡r✐♥❣❙②st❡♠s ✭❑❊❙ ✷✵✶✶✮✱ ♣♣✳ ✺✺✾✕✺✼✵✳

❬✽❪ ❈♦❞❞✱ ❊✳✱ ❈♦❞❞✱ ❙✳✱ ❙❛❧❧❡②✱ ❈✳✱ ✶✾✾✸✳ Pr♦✈✐❞✐♥❣ ♦❧❛♣ ✭♦♥✲❧✐♥❡ ❛♥❛❧②t✐❝❛❧ ♣r♦❝❡ss✐♥❣✮ t♦ ✉s❡r✲❛♥❛❧②sts ✿ ❆♥ ✐t♠❛♥❞❛t❡✳ ❈♦❞❞ ❛♥❞ ❉❛t✱ ■♥❝ ✸✷✱ ✸✶✳

❬✾❪ ❈r❛✈❡r♦✱ ❆✳✱ ❙❡♣ú❧✈❡❞❛✱ ❙✳✱ ✷✵✶✹✳ ▼✉❧t✐❞✐♠❡♥s✐♦♥❛❧ ❞❡s✐❣♥ ♣❛r❛❞✐❣♠s ❢♦r ❞❛t❛ ✇❛r❡❤♦✉s❡s✿ ❆ s②st❡♠❛t✐❝♠❛♣♣✐♥❣ st✉❞②✳ ❏♦✉r♥❛❧ ♦❢ ❙♦❢t✇❛r❡ ❊♥❣✐♥❡❡r✐♥❣ ❛♥❞ ❆♣♣❧✐❝❛t✐♦♥s ✭❏❙❊❆✮ ✼✱ ✺✸✕✻✶✳

❬✶✵❪ ❉❡✈r♦②❡✱ ▲✳✱ ✶✾✽✻✳ ❆ ♥♦t❡ ♦♥ t❤❡ ❤❡✐❣❤t ♦❢ ❜✐♥❛r② s❡❛r❝❤ tr❡❡s✳ ❏♦✉r♥❛❧ ♦❢ t❤❡ ❆❈▼ ✭❏❆❈▼✮ ✸✸✱ ✹✽✾✕✹✾✽✳

❬✶✶❪ ❊❞❡r✱ ❏✳✱ ❑♦♥❝✐❧✐❛✱ ❈✳✱ ▼✐ts❝❤❡✱ ❉✳✱ ✷✵✵✸✳ ❆✉t♦♠❛t✐❝ ❞❡t❡❝t✐♦♥ ♦❢ str✉❝t✉r❛❧ ❝❤❛♥❣❡s ✐♥ ❞❛t❛ ✇❛r❡❤♦✉s❡s✱ ✐♥✿Pr♦❝❡❡❞✐♥❣s ♦❢ t❤❡ ✺t❤ ■♥t❡r♥❛t✐♦♥❛❧ ❈♦♥❢❡r❡♥❝❡ ♦♥ ❉❛t❛ ❲❛r❡❤♦✉s✐♥❣ ❛♥❞ ❑♥♦✇❧❡❞❣❡ ❉✐s❝♦✈❡r② ✭❉❛❲❛❑✷✵✵✸✮✱ ♣♣✳ ✶✶✾✕✶✷✽✳

❬✶✷❪ ❋❛✈r❡✱ ❈✳✱ ❇❡♥t❛②❡❜✱ ❋✳✱ ❇♦✉ss❛✐❞✱ ❖✳✱ ✷✵✵✻✳ ❆ ❦♥♦✇❧❡❞❣❡✲❞r✐✈❡♥ ❞❛t❛ ✇❛r❡❤♦✉s❡ ♠♦❞❡❧ ❢♦r ❛♥❛❧②s✐s ❡✈♦❧✉t✐♦♥✳❋r♦♥t✐❡rs ✐♥ ❆rt✐✜❝✐❛❧ ■♥t❡❧❧✐❣❡♥❝❡ ❛♥❞ ❆♣♣❧✐❝❛t✐♦♥s ✶✹✸✱ ✷✼✶✳

❬✶✸❪ ❋r♦❝❤♦t✱ ❇✳✱ ❊②❜❡rt✱ ▼✳✱ ❏♦✉r♥❛✉①✱ ▲✳✱ ❘♦❝❤é✱ ❏✳✱ ❋❛✐✈r❡✱ ❇✳✱ ✷✵✵✸✳ ◆❡st✐♥❣ ❜✐r❞s ❛ss❡♠❜❧❛❣❡s ❛❧♦♥❣ t❤❡ r✐✈❡r❧♦✐r❡✿ r❡s✉❧t ❢r♦♠ ❛ ✶✷ ②❡❛rs✲st✉❞②✳ ❆❧❛✉❞❛ ✼✶✱ ✶✼✾✕✶✾✵✳ ❚✐ré à ♣❛rt✳

❬✶✹❪ ●♦✇❡r✱ ❏✳✱ ✶✾✼✶✳ ❆ ❣❡♥❡r❛❧ ❝♦❡✣❝✐❡♥t ❢♦ s✐♠✐❧❛r✐t② ❛♥❞ s♦♠❡ ♦❢ ✐ts ♣r♦♣❡rt✐❡s✳ ❇✐♦♠❡tr✐❝s ✷✼✱ ✽✺✼✕✽✼✶✳

✷✶

Page 23: The Hierarchical Agglomerative Clustering with Gower index ... · OLAP technology interests more and more elds and especially biology. An OLAP cube provides a very easy navigation

❬✶✺❪ ❍✉❜❡rt✱ ●✳✱ ❚❡st❡✱ ❖✳✱ ✷✵✵✾✳ ❆♥❛❧②s❡ ♠✉❧t✐❣r❛❞✉❡❧❧❡ ♦❧❛♣✱ ✐♥✿ ❊●❈ ✷✵✵✾✱ ♣♣✳ ✷✹✶✕✷✺✷✳

❬✶✻❪ ❏❛✐♥✱ ❆✳❑✳✱ ▼✉rt②✱ ▼✳◆✳✱ ❋❧②♥♥✱ P✳❏✳✱ ✶✾✾✾✳ ❉❛t❛ ❝❧✉st❡r✐♥❣✿ ❆ r❡✈✐❡✇✳ ❆❈▼ ❈♦♠♣✉t✐♥❣ ❙✉r✈❡② ✸✶✱ ✷✻✹✕✸✷✷✳

❬✶✼❪ ❏❡r❜✐✱ ❍✳✱ ❘❛✈❛t✱ ❋✳✱ ❚❡st❡✱ ❖✳✱ ❩✉r✢✉❤✱ ●✳✱ ✷✵✵✾✳ ❆♣♣❧②✐♥❣ r❡❝♦♠♠❡♥❞❛t✐♦♥ t❡❝❤♥♦❧♦❣② ✐♥ ♦❧❛♣ s②st❡♠s✱ ✐♥✿❊♥t❡r♣r✐s❡ ■♥❢♦r♠❛t✐♦♥ ❙②st❡♠s✳ ❙♣r✐♥❣❡r✱ ♣♣✳ ✷✷✵✕✷✸✸✳

❬✶✽❪ ❏♦✈❛♥♦✈✐❝✱ P✳✱ ❖s❝❛r❘♦♠❡r♦✱ ❆❧❦✐s❙✐♠✐ts✐s✱ ❆❧❜❡rt♦❆❜❡❧❧ó✱ ▼❛②♦r♦✈❛✱ ❉✳✱ ✷✵✶✹✳ ❆ r❡q✉✐r❡♠❡♥t✲❞r✐✈❡♥ ❛♣♣r♦❛❝❤ t♦ t❤❡ ❞❡s✐❣♥ ❛♥❞ ❡✈♦❧✉t✐♦♥ ♦❢ ❞❛t❛✇❛r❡❤♦✉s❡s✳ ■♥❢♦r♠❛t✐♦♥ ❙②st❡♠s ❯❘▲✿❤tt♣✿✴✴❞①✳❞♦✐✳♦r❣✴✶✵✳✶✵✶✻✴❥✳✐s✳✷✵✶✹✳✵✶✳✵✵✹✐✳ ❤tt♣✿✴✴❞①✳❞♦✐✳♦r❣✴✶✵✳✶✵✶✻✴❥✳✐s✳✷✵✶✹✳✵✶✳✵✵✹✐✳

❬✶✾❪ ❑♦❥❛❞✐♥♦✈✐❝✱ ■✳✱ ✷✵✵✹✳ ❆❣❣❧♦♠❡r❛t✐✈❡ ❤✐❡r❛r❝❤✐❝❛❧ ❝❧✉st❡r✐♥❣ ♦❢ ❝♦♥t✐♥✉♦✉s ✈❛r✐❛❜❧❡s ❜❛s❡❞ ♦♥ ♠✉t✉❛❧ ✐♥❢♦r♠❛✲t✐♦♥✳ ❈♦♠♣✉t❛t✐♦♥❛❧ ❙t❛t✐st✐❝s ✫ ❉❛t❛ ❆♥❛❧②s✐s ✹✻✱ ✷✻✾ ✕ ✷✾✹✳

❬✷✵❪ ▲❡❤♥❡r✱ ❲✳✱ ✶✾✾✽✳ ▼♦❞❡❧✐♥❣ ❧❛r❣❡ s❝❛❧❡ ♦❧❛♣ s❝❡♥❛r✐♦s✱ ✐♥✿ ■♥ ❆❞✈❛♥❝❡s ✐♥ ❉❛t❛❜❛s❡ ❚❡❝❤♥♦❧♦❣② ✲ ❊❉❇❚✬✾✽✱✈♦❧✉♠❡ ✶✸✼✼ ♦❢ ▲◆❈❙✱ ❙♣r✐♥❣❡r✳ ♣♣✳ ✶✺✸✕✶✻✼✳

❬✷✶❪ ▲❡♦♥❤❛r❞✐✱ ❇✳✱ ▼✐ts❝❤❛♥❣✱ ❇✳✱ P✉❧✐❞♦✱ ❘✳✱ ❙✐❡❜✱ ❈✳✱ ❲✉rst✱ ▼✳✱ ✷✵✶✵✳ ❆✉❣♠❡♥t✐♥❣ ♦❧❛♣ ❡①♣❧♦r❛t✐♦♥ ✇✐t❤ ❞②♥❛♠✐❝❛❞✈❛♥❝❡❞ ❛♥❛❧②t✐❝s✱ ✐♥✿ ✶✸t❤ ■♥t❡r♥❛t✐♦♥❛❧ ❈♦♥❢❡r❡♥❝❡ ♦♥ ❊①t❡♥❞✐♥❣ ❉❛t❛❜❛s❡ ❚❡❝❤♥♦❧♦❣② ✭❊❉❇❚ ✷✵✶✵✮✳

❬✷✷❪ ▼❛❤❜♦✉❜✐✱ ❍✳✱ ❇✐♠♦♥t❡✱ ❙✳✱ ❉❡✛✉❛♥t✱ ●✳✱ ❈❤❛♥❡t✱ ❏✳P✳✱ ✱ P✐♥❡t✱ ❋✳✱ ✷✵✶✸✳ ❙❡♠✐✲❛✉t♦♠❛t✐❝ ❞❡s✐❣♥ ♦❢ s♣❛t✐❛❧❞❛t❛ ❝✉❜❡s ❢r♦♠ s✐♠✉❧❛t✐♦♥ ♠♦❞❡❧ r❡s✉❧ts✳ ■♥t❡r♥❛t✐♦♥❛❧ ❏♦✉r♥❛❧ ♦❢ ❉❛t❛ ❲❛r❡❤♦✉s✐♥❣ ❛♥❞ ▼✐♥✐♥❣ ✾✱ ✼✵✕✾✺✳

❬✷✸❪ ▼❛❤❜♦✉❜✐✱ ❍✳✱ ❋❛✉r❡✱ ❚✳✱ ❇✐♠♦♥t❡✱ ❙✳✱ ❉❡✛✉❛♥t✱ ●✳✱ ❈❤❛♥❡t✱ ❏✳P✳✱ ✱ P✐♥❡t✱ ❋✳✱ ✷✵✶✷✳ ◆❡✇ ❚❡❝❤♥♦❧♦❣✐❡s❢♦r ❈♦♥str✉❝t✐♥❣ ❈♦♠♣❧❡① ❆❣r✐❝✉❧t✉r❛❧ ❛♥❞ ❊♥✈✐r♦♥♠❡♥t❛❧ ❙②st❡♠s✳ P✳ P❛♣❛❥♦r❣❥✐ ❛♥❞ ❋✳ P✐♥❡t✳ ❝❤❛♣t❡r ❆▼✉❧t✐❞✐♠❡♥s✐♦♥❛❧ ▼♦❞❡❧ ❢♦r ❉❛t❛ ❲❛r❡❤♦✉s❡s ♦❢ ❙✐♠✉❧❛t✐♦♥ ❘❡s✉❧ts✳ ♣♣✳ ✶✕✶✽✳

❬✷✹❪ ▼❛❧✐♥♦✇s❦✐✱ ❊✳✱ ❩✐♠❛♥②✐✱ ❊✳✱ ✷✵✵✻✳ ❍✐❡r❛r❝❤✐❡s ✐♥ ❛ ♠✉❧t✐❞✐♠❡♥s✐♦♥❛❧ ♠♦❞❡❧✿ ❋r♦♠ ❝♦♥❝❡♣t✉❛❧ ♠♦❞❡❧✐♥❣ t♦❧♦❣✐❝❛❧ r❡♣r❡s❡♥t❛t✐♦♥✳ ❉❛t❛ ❛♥❞ ❑♥♦✇❧❡❞❣❡ ❊♥❣✐♥❡❡r✐♥❣ ✺✾✱ ✸✹✽✕✸✼✼✳

❬✷✺❪ ▼❛r❦❧✱ ❱✳✱ ❘❛♠s❛❦✱ ❋✳✱ ❇❛②❡r✱ ❘✳✱ ✶✾✾✾✳ ■♠♣r♦✈✐♥❣ ♦❧❛♣ ♣❡r❢♦r♠❛♥❝❡ ❜② ♠✉❧t✐❞✐♠❡♥s✐♦♥❛❧ ❤✐❡r❛r❝❤✐❝❛❧ ❝❧✉s✲t❡r✐♥❣✱ ✐♥✿ Pr♦❝✳ ♦❢ ■❉❊❆❙ ✾✾✱ ♣♣✳ ✶✻✺✕✶✼✼✳

❬✷✻❪ ▼❡ss❛♦✉❞✱ ❘✳❇✳✱ ❇♦✉ss❛✐❞✱ ❖✳✱ ❘❛❜❛sé❞❛✱ ❙✳✱ ✷✵✵✹✳ ❆ ♥❡✇ ♦❧❛♣ ❛❣❣r❡❣❛t✐♦♥ ❜❛s❡❞ ♦♥ t❤❡ ❛❤❝ t❡❝❤♥✐q✉❡✱ ✐♥✿❉❖▲❆P ✷✵✵✹✱ ❆❈▼ ❙❡✈❡♥t❤ ■♥t❡r♥❛t✐♦♥❛❧ ❲♦r❦s❤♦♣ ♦♥ ❉❛t❛ ❲❛r❡❤♦✉s✐♥❣ ❛♥❞ ❖▲❆P✱ ♣♣✳ ✻✺✕✼✷✳

❬✷✼❪ ▼✐q✉❡❧✱ ▼✳✱ ❇é❞❛r❞✱ ❨✳✱ ❇r✐s❡❜♦✐s✱ ❆✳✱ P♦✉❧✐♦t✱ ❏✳✱ ▼❛r❝❤❛♥❞✱ P✳✱ ❇r♦❞❡✉r✱ ❏✳✱ ✷✵✵✷✳ ▼♦❞❡❧✐♥❣ ♠✉❧t✐✲❞✐♠❡♥s✐♦♥❛❧s♣❛t✐♦✲t❡♠♣♦r❛❧ ❞❛t❛ ✇❡r❡❤♦✉s❡s ✐♥ ❛ ❝♦♥t❡①t ♦❢ ❡✈♦❧✈✐♥❣ s♣❡❝✐✜❝❛t✐♦♥s✳ ■♥t❡r♥❛t✐♦♥❛❧ ❆r❝❤✐✈❡s ❖❢ P❤♦t♦❣r❛♠✲♠❡tr② ❘❡♠♦t❡ ❙❡♥s✐♥❣ ❆♥❞ ❙♣❛t✐❛❧ ■♥❢♦r♠❛t✐♦♥ ❙❝✐❡♥❝❡s ✸✹✱ ✶✹✷✕✶✹✼✳

❬✷✽❪ ◆❣✉②❡♥✱ ❚✳❇✳✱ ❚❥♦❛✱ ❆✳▼✳✱ ✷✵✵✵✳ ❆♥ ♦❜❥❡❝t ♦r✐❡♥t❡❞ ♠✉❧t✐❞✐♠❡♥s✐♦♥❛❧ ❞❛t❛ ♠♦❞❡❧ ❢♦r ♦❧❛♣✱ ✐♥✿ ■♥ Pr♦❝✳ ♦❢ ✶st■♥t✳ ❈♦♥❢✳ ♦♥ ❲❡❜✲❆❣❡ ■♥❢♦r♠❛t✐♦♥ ▼❛♥❛❣❡♠❡♥t ✭❲❆■▼✮✱ ♥✉♠❜❡r ✶✽✹✻ ✐♥ ▲◆❈❙✱ ❙♣r✐♥❣❡r✳ ♣♣✳ ✻✾✕✽✷✳

❬✷✾❪ P❡❞❡rs❡♥✱ ❚✳❇✳✱ ❏❡♥s❡♥✱ ❈✳❙✳✱ ✶✾✾✽✳ ▼✉❧t✐❞✐♠❡♥s✐♦♥❛❧ ❞❛t❛ ♠♦❞❡❧✐♥❣ ❢♦r ❝♦♠♣❧❡① ❞❛t❛✳

❬✸✵❪ ❘❡❤♠❛♥✱ ◆✳❯✳✱ ▼❛♥s♠❛♥♥✱ ❙✳✱ ❲❡✐❧❡r✱ ❆✳✱ ❙❝❤♦❧❧✱ ▼✳❍✳✱ ✷✵✶✷✳ ❉✐s❝♦✈❡r✐♥❣ ❞②♥❛♠✐❝ ❝❧❛ss✐✜❝❛t✐♦♥ ❤✐❡r❛r❝❤✐❡s✐♥ ♦❧❛♣ ❞✐♠❡♥s✐♦♥s✱ ✐♥✿ ■❙▼■❙ ✷✵✶✷ ✿ ✷✵t❤ ■♥t❡r♥❛t✐♦♥❛❧ ❙②♠♣♦s✐✉♠ ♦♥ ▼❡t❤♦❞♦❧♦❣✐❡s ❢♦r ■♥t❡❧❧✐❣❡♥t ❙②st❡♠✱♣♣✳ ✹✷✺✕✹✸✹✳

❬✸✶❪ ❘✐✈❡st✱ ❙✳✱ ❇é❞❛r❞✱ ❨✳✱ Pr♦✉❧①✱ ▼✳❏✳✱ ◆❛❞❡❛✉✱ ▼✳✱ ❍✉❜❡rt✱ ❋✳✱ P❛st♦r✱ ❏✳✱ ✷✵✵✺✳ ❙♦❧❛♣ t❡❝❤♥♦❧♦❣②✿ ▼❡r❣✐♥❣❜✉s✐♥❡ss ✐♥t❡❧❧✐❣❡♥❝❡ ✇✐t❤ ❣❡♦s♣❛t✐❛❧ t❡❝❤♥♦❧♦❣② ❢♦r ✐♥t❡r❛❝t✐✈❡ s♣❛t✐♦✲t❡♠♣♦r❛❧ ❡①♣❧♦r❛t✐♦♥ ❛♥❞ ❛♥❛❧②s✐s ♦❢ ❞❛t❛✳■❙P❘❙ ❥♦✉r♥❛❧ ♦❢ ♣❤♦t♦❣r❛♠♠❡tr② ❛♥❞ r❡♠♦t❡ s❡♥s✐♥❣ ✻✵✱ ✶✼✕✸✸✳

❬✸✷❪ ❘♦❦❛❝❤✱ ▲✳✱ ▼❛✐♠♦♥✱ ❖✳✱ ▼✐❛♠♦♥✱ ❖✳❩✳✱ ✷✵✵✽✳ ❉❛t❛ ▼✐♥✐♥❣ ✇✐t❤ ❉❡❝✐s✐♦♥ ❚r❡❡s ✿ ❚❤❡♦r② ❛♥❞ ❆♣♣❧✐❝❛t✐♦♥s✳✈♦❧✉♠❡ ✻✾ ♦❢ ▼❛❝❤✐♥❡ P❡r❝♣❡t✐♦♥ ❛♥❞ ❆rt✐✜❝✐❛❧ ■♥t❡❧❧✐❣❡♥❝❡✳ ❲♦r❧❞ ❙❝t✐❡♥t✐✜❝ P✉❜❧✐s❤✐♥❣ ❈♦✳

❬✸✸❪ ❘♦♠❡r♦✱ ❖✳✱ ❆❜❡❧❧♦✱ ❆✳✱ ✷✵✶✵✳ ❆✉t♦♠❛t✐❝ ✈❛❧✐❞❛t✐♦♥ ♦❢ r❡q✉✐r❡♠❡♥ts t♦ s✉♣♣♦rt ♠✉❧t✐❞✐♠❡♥s✐♦♥❛❧ ❞❡s✐❣♥✳ ❉❛t❛✫❛♠♣❀ ❑♥♦✇❧❡❞❣❡ ❊♥❣✐♥❡❡r✐♥❣ ✻✾✱ ✾✶✼✕✾✹✷✳

❬✸✹❪ ❙❛r❛✇❛❣✐✱ ❙✳✱ ❆❣r❛✇❛❧✱ ❘✳✱ ▼❡❣✐❞❞♦✱ ◆✳✱ ✶✾✾✽✳ ❉✐s❝♦✈❡r②✲❞r✐✈❡♥ ❡①♣❧♦r❛t✐♦♥ ♦❢ ♦❧❛♣ ❞❛t❛ ❝✉❜❡s✱ ✐♥✿ ■♥ Pr♦❝✳■♥t✳ ❈♦♥❢✳ ♦❢ ❊①t❡♥❞✐♥❣ ❉❛t❛❜❛s❡ ❚❡❝❤♥♦❧♦❣② ✭❊❉❇❚✬✾✽✱ ❙♣r✐♥❣❡r✲❱❡r❧❛❣✳ ♣♣✳ ✶✻✽✕✶✽✷✳

❬✸✺❪ ❙❡❣✉r❛❞♦✱ P✳✱ ❆r❛✉❥♦✱ ▼✳❇✳✱ ✷✵✵✹✳ ❆♥ ❡✈❛❧✉❛t✐♦♥ ♦❢ ♠❡t❤♦❞s ❢♦r ♠♦❞❡❧❧✐♥❣ s♣❡❝✐❡s ❞✐str✐❜✉t✐♦♥s✳ ❏♦✉r♥❛❧ ♦❢❇✐♦❣❡♦❣r❛♣❤② ✸✶✱ ✶✺✺✺✕✶✺✻✽✳

✷✷

Page 24: The Hierarchical Agglomerative Clustering with Gower index ... · OLAP technology interests more and more elds and especially biology. An OLAP cube provides a very easy navigation

❬✸✻❪ ❚❡❜♦✉rs❦✐✱ ❲✳✱ ❑❛râ❛✱ ❲✳❇✳❆✳✱ ●❤❡③❛❧❛✱ ❍✳❇✳✱ ✷✵✶✸✳ ❙❡♠✐✲❛✉t♦♠❛t✐❝ ❞❛t❛ ✇❛r❡❤♦✉s❡ ❞❡s✐❣♥ ♠❡t❤♦❞♦❧♦❣✐❡s✿ ❛s✉r✈❡②✳ ■♥t❡r♥❛t✐♦♥❛❧ ❏♦✉r♥❛❧ ♦❢ ❈♦♠♣✉t❡r ❙❝✐❡♥❝❡ ■ss✉❡s ✭■❏❈❙■ ✮ ✶✵✱ ✹✽✕✺✹✳

❬✸✼❪ ❚❤❡♥♠♦③❤✐✱ ▼✳✱ ❱✐✈❡❦❛♥❛♥❞❛♥✱ ❑✳✱ ✷✵✶✸✳ ❆ t♦♦❧ ❢♦r ❞❛t❛ ✇❛r❡❤♦✉s❡ ♠✉❧t✐❞✐♠❡♥s✐♦♥❛❧ s❝❤❡♠❛ ❞❡s✐❣♥ ✉s✐♥❣♦♥t♦❧♦❣②✳ ■♥t❡r♥❛t✐♦♥❛❧ ❏♦✉r♥❛❧ ♦❢ ❈♦♠♣✉t❡r ❙❝✐❡♥❝❡ ■ss✉❡s ✭■❏❈❙■✮ ✶✵✱ ✶✻✶✕✶✻✽✳

❬✸✽❪ ❚s♦✐s✱ ❆✳✱ ❑❛r❛②❛♥♥✐❞✐s✱ ◆✳✱ ❙❡❧❧✐s✱ ❚✳✱ ✷✵✵✶✳ ▼❛❝✿ ❈♦♥❝❡♣t✉❛❧ ❞❛t❛ ♠♦❞❡❧✐♥❣ ❢♦r ♦❧❛♣✱ ✐♥✿ ✸r❞ ■♥t❡r♥❛t✐♦♥❛❧❲♦r❦s❤♦♣ ♦♥ ❉❡s✐❣♥ ❛♥❞ ▼❛♥❛❣❡♠❡♥t ♦❢ ❉❛t❛ ❲❛r❡❤♦✉s❡s ✭❉▼❉❲ ✷✵✵✶✱ ♣✳ ✷✵✵✶✳

❬✸✾❪ ❚✉✛❡r②✱ ❙✳✱ ✷✵✶✶✳ ❉❛t❛ ♠✐♥✐♥❣ ❛♥❞ st❛t✐st✐❝s ❢♦r ❞❡❝✐s✐♦♥ ♠❛❦✐♥❣✳ ❏♦❤♥ ❲✐❧❡② ✫ ❙♦♥s✳

❬✹✵❪ ❯s♠❛♥✱ ▼✳✱ ❆s❣❤❛r✱ ❙✳✱ ❋♦♥❣✱ ❙✳✱ ✷✵✶✵✳ ❉❛t❛ ♠✐♥✐♥❣ ❛♥❞ ❛✉t♦♠❛t✐❝ ♦❧❛♣ s❝❤❡♠❛ ❣❡♥❡r❛t✐♦♥✱ ✐♥✿ ❉✐❣✐t❛❧■♥❢♦r♠❛t✐♦♥ ▼❛♥❛❣❡♠❡♥t ✭■❈❉■▼✮✱ ✷✵✶✵ ❋✐❢t❤ ■♥t❡r♥❛t✐♦♥❛❧ ❈♦♥❢❡r❡♥❝❡ ♦♥✱ ■❊❊❊✳ ♣♣✳ ✸✺✕✹✸✳

❬✹✶❪ ❯s♠❛♥✱ ▼✳✱ P❡❛rs✱ ❘✳✱ ✷✵✶✵✳ ❆ ♠❡t❤♦❞♦❧♦❣② ❢♦r ✐♥t❡❣r❛t✐♥❣ ❛♥❞ ❡①♣❧♦✐t✐♥❣ ❞❛t❛ ♠✐♥✐♥❣ t❡❝❤♥✐q✉❡s ✐♥ t❤❡❞❡s✐❣♥ ♦❢ ❞❛t❛ ✇❛r❡❤♦✉s❡s✱ ✐♥✿ ❆❞✈❛♥❝❡❞ ■♥❢♦r♠❛t✐♦♥ ▼❛♥❛❣❡♠❡♥t ❛♥❞ ❙❡r✈✐❝❡ ✭■▼❙✮✱ ✷✵✶✵ ✻t❤ ■♥t❡r♥❛t✐♦♥❛❧❈♦♥❢❡r❡♥❝❡ ♦♥✱ ■❊❊❊✳ ♣♣✳ ✸✻✶✕✸✻✼✳

❬✹✷❪ ❲❡❤r❧❡✱ P✳✱ ▼✐q✉❡❧✱ ▼✳✱ ❚❝❤♦✉♥✐❦✐♥❡✱ ❆✳✱ ✷✵✵✺✳ ❆ ♠♦❞❡❧ ❢♦r ❞✐str✐❜✉t✐♥❣ ❛♥❞ q✉❡r②✐♥❣ ❛ ❞❛t❛ ✇❛r❡❤♦✉s❡ ♦♥ ❛❝♦♠♣✉t✐♥❣ ❣r✐❞✱ ✐♥✿ Pr♦❝❡❡❞✐♥❣s ♦❢ ✶✶t❤ ■♥t❡r♥❛t✐♦♥❛❧ ❈♦♥❢❡r❡♥❝❡ ♦♥ P❛r❛❧❧❡❧ ❛♥❞ ❉✐str✐❜✉t❡❞ ❙②st❡♠s✱ ■❊❊❊✳♣♣✳ ✷✵✸✕✷✵✾✳

❬✹✸❪ ❲❡st♣❤❛❧✱ ▼✳■✳✱ ❋✐❡❧❞✱ ❙✳❆✳✱ P♦ss✐♥❣❤❛♠✱ ❍✳P✳✱ ✷✵✵✼✳ ❖♣t✐♠✐③✐♥❣ ❧❛♥❞s❝❛♣❡ ❝♦♥✜❣✉r❛t✐♦♥ ✿ ❆ ❝❛s❡ st✉❞② ♦❢✇♦♦❞❧❛♥❞ ❜✐r❞s ✐♥ t❤❡ ♠♦✉♥t ❧♦❢t② r❛♥❣❡s✱ s♦✉t❤ ❛✉str❛❧✐❛✳ ▲❛♥❞s❝❛♣❡ ❛♥❞ ❯r❜❛♥ P❧❛♥♥✐♥❣ ✽✶✱ ✺✻✕✻✻✳

✷✸