Upload
luca-canetta
View
221
Download
0
Embed Size (px)
Citation preview
8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2
1/24
Combining Self-OrganizingMap and clustering
algorithms in industrialdata analysis
Luca CANETTA
Naoufel CHEIKHROUHOURmy GLARDON
Laboratory for Production Management and Processes
Institute for Production and Robotics
Swiss Federal Institute of Technology at Lausanne
8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2
2/24
2
Table of contents
Introduction Motivation
Clustering algorithms
Self-Organising Map (SOM)
New hybrid two-stages approach
Reference data sets analysis
Industrial data analysis
Conclusion
Laboratory for Production Management and ProcessesInstitute for Production and Robotics
Swiss Federal Institute of Technology at Lausanne
8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2
3/24
3
Clustering definition (Everitt,1993)Given a collection ofn objects, individuals, animal, plants etc., eachof which is described by a set ofp characteristics or variables( ), derive a useful division into a number of classes (k), on the
basis of their degree of similarity. Both the number of classes and theproperties of the classes are to be determined
Clustering quality measures: High intra-cluster similarity (homogeneous clusters) Low inter-cluster similarity (well-separated clusters)
p
ix
Introduction
Laboratory for Production Management and ProcessesInstitute for Production and Robotics
Swiss Federal Institute of Technology at Lausanne
8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2
4/24
4
Clustering definition (Everitt,1993)Given a collection ofn objects, individuals, animal, plants etc., eachof which is described by a set ofp characteristics or variables( ), derive a useful division into a number of classes (k), on the
basis of their degree of similarity. Both the number of classes and theproperties of the classes are to be determined
Clustering quality measures: High intra-cluster similarity (homogeneous clusters) Low inter-cluster similarity (well-separated clusters)
p
ix
Introduction
Laboratory for Production Management and ProcessesInstitute for Production and Robotics
Swiss Federal Institute of Technology at Lausanne
Compact clusters
well-separated
clusters
2
3
4
1
1
2
4
3
8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2
5/24
5
Motivation
Challenge Database sizes and numbers
Data complexity and heterogeneity
Time responsiveness
Objective
Development of an effective and efficient clusteringmethod
Laboratory for Production Management and Processes
Institute for Production and Robotics
Swiss Federal Institute of Technology at Lausanne
8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2
6/24
6
Clustering algorithms
Hierarchical (Everitt, 1993; Mangiameli et al., 1996) Rigid Complexity O(n2) Several methods (Single, Average, Complete, Ward)
Partitioning (Hartigan, 1975; Fung, 2001) Complexity O(n) Fixed number of clusters (k) Cluster seeds (barycentres) choice influences clustering
performance K-means
Traditional two-stages approaches (Punj and Steward, 1983) Hierarchical (Average/Ward) + partitioning (K-means) Combined complexity
Laboratory for Production Management and Processes
Institute for Production and Robotics
Swiss Federal Institute of Technology at Lausanne
8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2
7/24
7
Self-Organizing Map (SOM)
input data
output layer
neuron
input layer
Neural Network technique (Kohonen, 1995)
Two (or more) layers
Learning process
Find the closest neuron (most similar) to the input data Update the winning neuron and its neighbours Analyse another input data
Properties Reduced sensibility to noise (outliers) Visualisation of input data topology (2D-3D feature map)
Laboratory for Production Management and Processes
Institute for Production and Robotics
Swiss Federal Institute of Technology at Lausanne
8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2
8/24
8
New hybrid two-stagesapproaches
[ ]pn, [ ] nmpm ~, Kclusters
Input data Neuron prototype vectors
Clustering
algorithms
K-meansAverage
Ward
Single
Complete
Laboratory for Production Management and Processes
Institute for Production and Robotics
Swiss Federal Institute of Technology at Lausanne
Objectives Increasing clustering robustness
Decreasing outliers impact
Improving computational efficiency
Improving assignment quality
Process description
8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2
9/24
9
Approach features
Similarity measure: Euclidean Distance
Data standardisation
Clustering validation: Lbler index
jljjlj
jljil
il
xx
xxW
minmax
min
= i, j = 1 n l= 1 p( )p
ix
)()1(max KgKgK Kopt +=
Laboratory for Production Management and Processes
Institute for Production and Robotics
Swiss Federal Institute of Technology at Lausanne
homogeneity within groups
heterogeneity between groups==
e
o
h
h
Kg )(
8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2
10/24
10
Iris data sets analysis150 instances 3 clusters (equal sizes)
4 dimensions No overlapped clusters
Laboratory for Production Management and Processes
Institute for Production and Robotics
Swiss Federal Institute of Technology at Lausanne
Only the approaches using Single
are not robust (fail to determine
the correct number of clusters)
K-means assignment quality
benefits the most from SOM data
pre-treatment
An hybrid two-stages approach
SOM+K-means
is the most performing method
8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2
11/24
11
Abalone data sets analysis4177 instances 3 clusters
8 dimensions overlapped clusters
Laboratory for Production Management and Processes
Institute for Production and Robotics
Swiss Federal Institute of Technology at Lausanne
Hierarchical algorithms if not
preceded by SOM are not robust
Average, Single, SOM + Single
and SOM + Average have
unsatisfactory assignment quality
SOM + Single, SOM + Average
and all hierarchical algorithms will
be disregarded
8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2
12/24
12
Abalone data sets analysis4177 instances 3 clusters
8 dimensions overlapped clusters
Laboratory for Production Management and Processes
Institute for Production and Robotics
Swiss Federal Institute of Technology at Lausanne
SOM + Ward is not robust
SOM + K-means is the most
performing method
SOM + hierarchical algorithmsmethods have bad assignment
quality
8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2
13/24
13
Industrial data analysis
1020 instances
7 dimensions
1. Commonality
2. Average delivery time3. Variation coefficient ( )
of the delivery time
4. Aver. monthly quantity
5. Variation coefficient of the
monthly quantity6. Unit price
7. Utilisation frequency
SOM + K-means method
x
x
Lbler index (Wilt data)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
2 3 4 5 6 7 8
number of clusters
g(k)
4 clusters
Laboratory for Production Management and Processes
Institute for Production and Robotics
Swiss Federal Institute of Technology at Lausanne
Wilt data: description of purchasing and utilisation
characteristics of industrial components
8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2
14/24
14
Industrial datacharacterisation Cluster A (416): - Frequent use
- Lowest delivery time
- Lowest VC delivery time
Cluster B (200): - Highest price
- Lowest quantity
Cluster C (108): - Lowest commonality
- Highest VC quantity
Cluster D (296): - Lowest price
- Highest commonality
- Highest frequency- Highest quantity
- Highest delivery time
- Highest VC delivery time
Laboratory for Production Management and Processes
Institute for Production and Robotics
Swiss Federal Institute of Technology at Lausanne
8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2
15/24
15
Conclusion
Compared to traditional clustering algorithms new hybridtwo-stages (SOM + clustering) methods have: Better robustness Better efficiency
SOM + Average lasts 8 times less than Average SOM + K-means lasts 3.6 times less than K-means K-means algorithm converges faster if preceded by SOM
Comparable assignment quality
SOM + K-means results to be the most robust and themost performing method
Strategic products management
Laboratory for Production Management and Processes
Institute for Production and Robotics
Swiss Federal Institute of Technology at Lausanne
8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2
16/24
16
SOM learning process
An input ( )is analysed and the winning neuron (j*
)is selected The weights of the winning neuron and its close neighbours (jNj*(d))
move toward according to Kohonen rule
The training step is increased (t=t+1). Another input ( )is analysed
Winning neuron
ix
ix
ix
ix
( ))1()(),()1()( += twtxtdtwtw jijj
Neighbours
Laboratory for Production Management and Processes
Institute for Production and Robotics
Swiss Federal Institute of Technology at Lausanne
8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2
17/24
17
Comparison results
Clustering quality for the Iris data set
Traditional Clustering
SOM Two-stage Clustering
K-means Ward Average Comp. Single K-means Ward Average Comp. Single
Correctlyclassifieddata (%)
88 87.33 88 83.33 67.33 84.67 92.67 88 82.67 88.87 66.67
Robustness 6/6 0/6 6/6 6/6 0/6
Laboratory for Production Management and Processes
Institute for Production and Robotics
Swiss Federal Institute of Technology at Lausanne
8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2
18/24
18
Comparison results
Clustering quality for a subset of the Abalone data set
Traditional Clustering
SOM Two-stage Clustering
K-means Ward Average Comp. Single K-means Ward Average Comp. Single
Correctlyclassifieddata (%)
51.10 40.26 48.61 51.09 37.22 51.22 51.22 52.70 37.00 52.39 37.00
Robustness 6/6 0/6 6/6 6/6 0/6
Laboratory for Production Management and Processes
Institute for Production and Robotics
Swiss Federal Institute of Technology at Lausanne
8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2
19/24
19
Comparison results
Clustering quality for the entire Abalone data set
TraditionalClustering
SOM Two-stage Clustering
K-means K-means Ward Complete
Correctlyclassifieddata (%)
50.30 52.00 51.45 45.39 49.68
Robustness 6/6 6/6 6/6 0/6 6/6
Laboratory for Production Management and Processes
Institute for Production and Robotics
Swiss Federal Institute of Technology at Lausanne
8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2
20/24
20
Industrial data clustering
Cluster A (416): - Used frequently
- Shortest Delivery Time
- Least variable Delivery Time Cluster B (200): - Expensive items
- Littlest quantity
Cluster C (108): - Littlest commonality- Highest VC Quantity
Cluster D (296): - Lowest Unit Price
- Highest Commonality, Frequency, Quantity
- Highest and most variable Delivery Time
Laboratory for Production Management and Processes
Institute for Production and Robotics
Swiss Federal Institute of Technology at Lausanne
8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2
21/24
Part-Machine matrix (Burbidge 1971)1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
A X X
B X X X X X X X X
C X X X X X
D X X X X X X X
E X X X X X X X X X X X X
F X X X X X X X X X X X X X X X X X X
G X X X
H X X X X X X X X X X X X X X X X X X X
I X X X X X X X X X X
J X X X X X X X
K X X X X X X
L X X X X X
M X X
N X X X X
O X X X X X X
P X X X X X X X X
43 parts being manufactured using 16
8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2
22/24
SOM+K-means: Part grouping
3 clusters Cluster 5 clusters
Pa
rtnumber
5; 8; 9; 14; 15; 16; 19; 21; 23;29; 33; 41; 43
A 5; 8; 9; 14; 15; 16; 19;21; 23; 29; 33; 41; 43
2; 4; 7; 10; 17; 18; 28; 32; 37;38; 40; 42
B 2; 4; 10; 18; 28; 32; 37;38; 40; 42
1; 3; 6; 11; 12; 13; 20; 22; 24;25; 26; 27; 30; 31; 34; 35; 36;
39
C 3; 11; 20; 22; 24; 27; 30
D 6; 7; 17; 34; 35; 36
E 1; 12; 13; 25; 26; 31; 39
complete each part just within a single manufacturing cell can require to include the same machine in more than one cell
8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2
23/24
SOM+K-means: Part grouping Cluster CD E O H F A B I P C G J K L M N
1 X X X X
3 X X X
6 X X
11 X X
12 X X X
13 X X X
17 X X X
20 X X
22 X
24 X X X X
25 X X
26 X
27 X X X
30 X X
31 X X
34 X X
35 X X
36 X
39 X X
8/6/2019 Combining Self-Organizing Map and Clustering Algorithms 2
24/24
SOM+K-means: Part grouping Cluster A & BD E O H F A B I P C G J K L M N
5 X X X
8 X X X
9 X X X X
14 X X X X
15 X X
16 X
19 X X X X X
21 X X X X
23 X X X X
29 X X
33 X X X
41 X X X
43 X X X X
2 X X X X X X
4 X
7 X X X
10 X X X
18 X X
28 X X X
32 X X X X
37 X X X X X X
38 X X X X
40 X X X