35
Cluster Analysis

Session 2 - Cluster

Embed Size (px)

DESCRIPTION

Session 2 - Cluster

Citation preview

Cluster AnalysisTypes of Data MiningDirected or Supervised data miningUndirected or Unsupervised data mining1 9117361328512410{1,3,6,7,9,11,13{2,5,8,12{4,10{1,3,6,7,9{11,13{2,5,8,12{4,10{1,3,6,7,9{11,13{2,5,8{12{4{10Which attribute to use for clustering?!o"or com#ination $ designsi%e"ocation&"" o' t(e a#oveHierarchical Cluster - ExampleCustomer Groceries Toiletries1 1200 3002 1300 3803 500 18004 450 19005 1350 15606 1400 16207 1550 1450Cluster Example02004006008001000120014001600180020000 200 400 600 800 1000 1200 1400 1600 1800GroceriesToiletries6751234Where to stop?Use dendrogram!an )ou interpret t(e c"usters*Use di''erent t)pes o' distances, c"ustering mec(anism and see i' )ou are getting simi"ar constituents+ Vertical cicle!"uster ,o+ !ase4 3 7 6 5 2 11234567- - - - - - -Vertical cicle!"uster ,o+ !ase4 3 7 6 5 2 1123456- - - - - - - -7- - - - - - -Vertical cicle!"uster ,o+ !ase4 3 7 6 5 2 112345 - - - - - - - - -6- - - - - - - -7- - - - - - -Vertical cicle!"uster ,o+ !ase4 3 7 6 5 2 11234 - - - - - - - - - -5 - - - - - - - - -6- - - - - - - -7- - - - - - -Vertical cicle!"uster ,o+ !ase4 3 7 6 5 2 1123 - - - - - - - - - - -4 - - - - - - - - - -5 - - - - - - - - -6- - - - - - - -7- - - - - - -Vertical cicle!"uster ,o+ !ase4 3 7 6 5 2 112 - - - - - - - - - - - -3 - - - - - - - - - - -4 - - - - - - - - - -5 - - - - - - - - -6- - - - - - - -7- - - - - - -Vertical cicle!"uster ,o+ !ase4 3 7 6 5 2 11 - - - - - - - - - - - - -2 - - - - - - - - - - - -3 - - - - - - - - - - -4 - - - - - - - - - -5 - - - - - - - - -6- - - - - - - -7- - - - - - -!eer ExampleID BEER CAL SOD ALC COST1 Budweiser 144 15 4.7 0.432 Schlitz 151 1 4. 0.433 L!we"#r$u 157 15 4. 0.4%4 &r!"e"#!ur' 170 7 5.2 0.735 (ei"e)e" 152 11 5 0.77* Old ,il 145 23 4.* 0.2%7 Au's#ur'er 175 24 5.5 0.4% Str!hs 14 27 4.7 0.42 ,iller lite10 4.3 0.4310 Bud li'ht 113 % 3.7 0.4411 C!!rs 140 1% 4.* 0.4412 C!!rs lite 102 15 4.1 0.4513 ,ichel!# li'ht 135 11 4.2 0.514 Bec)s 150 1 4.7 0.7*15 &iri" 14 * 5 0.71* -$#st *% 15 2.3 0.3%17 ($..s 13* 1 4.4 0.431% (eile.$"s 144 24 4. 0.431 Ol/.0i$ 72 * 2. 0.4*20 Schilitz lite 7 7 4.2 0.47Hierarchical ClusteringD !EE" CA# $%D A#C C%$T& !u'(eiser &)) &* )+, -+)./ $chlit0 &*& &1 )+1 -+).. #o(enbrau &*, &* )+1 -+)2) 3ronenbourg &,- , *+/ -+,.* Heine4en &*/ && * -+,,Cluster-#inear DistancesCA# $%D A#C C%$T!u'(eiser - - - -$chlit0 -, -) --+/ -#o(enbrau -&. - --+/ --+-*3ronenbourg -/5 2 --+* --+.Heine4en -2 ) --+. --+.)$6uares of Distances from !u'(eiserCA# $%D A#C C%$T Total!u'(eiser - - - - -$chlit0 )1 &5 -+-) - 5*+-)#o(enbrau &51 - -+-) -+--/* &51+-)/*3ronenbourg 5,5 5) -+/* -+-1 ,)-+.)Heine4en 5) &5 -+-1 -+&&*5 2-+/-*5Eucli'ian Distances !et(een !eers!u'(eiser $chlit0 #o(enbrau 3ronenbourg Heine4en!u'(eiser -+-- 5*+-) &51+-) ,)-+.) 2-+/&$chlit0 5*+-) -+-- */+-- *-*+&2 5*+&.#o(enbrau &51+-) */+-- -+-- /..+&* )&+-13ronenbourg ,)-+.) *-*+&2 /..+&* -+-- .)-+-)Heine4en 2-+/& 5*+&. )&+-1 .)-+-) -+--Den'rogram2020 40 60 801007umber of clusters&18%lympia 9ol' #ight&58:abst Extra #ight&)8!ec4s&*83irin*8Heine4en)83ronenbourg&.8Michelob light&/8Coors lite&-8!u' light/-8$chilit0 lite18Miller lite,8Augsburger&28Heilemans %l' $tyle28$trohs !ohemian $tyl58%l' Mil&,8Hamms&&8Coors/8$chlit0.8#o(enbrau&8!u'(eiser1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 12 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 14 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 15 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1* 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 17 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1% 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 110 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 112 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 113 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 114 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 115 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11* 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 117 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11% 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1C$se2ertic$l IcicleActual ClustersBeer Cluster 7o+!u'(eiser &$chlit0 o(enbrau &%l' Mil &Augsburger &$trohs !ohemian $tyle &Coors &Hamms &Heilemans %l' $tyle &3ronenbourg /Heine4en /!ec4s /3irin /Miller lite .!u' light .Coors lite .Michelob light .$chilit0 lite .:abst Extra #ight )%lympia 9ol' #ight )Cluster Characteristics3-Means ClusterDe'ine . !"ustersDe'ine !"uster !enters&ssign eac( case to t(e c"usterUse t(e c"uster centers/eca"cu"ate !"uster !entersUse 01ScoresCase of An Electrical Appliances Company& 2arge Data#ase o' !ustomersUsed !"ustering 'or 3denti')ing t(e !ustomer 4ro'i"esUsed 3567s 3nte""igent 6iner!"usters to !over a#out 758 o' t(e !ustomersCustomer :rofile9"ectrica" &pp"iances !ompan)72000 !ustomersData co""ected over a period o' t:o )earsData on t(e 4urc(aser and t(e ;ami")Data