Upload
esat-publishing-house
View
65
Download
3
Tags:
Embed Size (px)
Citation preview
2. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 02 Issue: 09 | Sep-2013, Available @ http://www.ijret.org 405 different clusters. Clustering techniques are applied in many application areas such as pattern recognition [12], data mining [13], machine learning [14], etc. Clustering algorithms can be broadly classified as Hard, Fuzzy, Possibilistic, and Probabilistic [15], each of which has its own special characteristics. There are a lot of applications of clustering methods, range from unsupervised learning of neural network, Pattern recognitions, Classification analysis, Artificial intelligent, image processing, machine vision, etc. Several clustering methods have been widely studied and successfully applied in image segmentation [1622]. In this paper, the hard and fuzzy clustering schemes have been introduced to get the optimal threshold in speech segmentation. The conventional hard clustering method restricts each point of the data set to exclusively just one cluster. So the membership values to be either 0 or 1. The K- Means clustering algorithm is widely used as hard clustering method. On the other hand, the fuzzy clustering method allows an object to belong to several clusters at the same time, but with the different degrees of membership between 0 and 1. The fuzzy C-means algorithm is the most popular fuzzy clustering method used in pattern classification. 3.1 K-Means Clustering Algorithm K-Means Clustering is one of the unsupervised learning algorithms, first used by James MacQueen in 1967 [23] and first proposed by Stuart Lloyd in 1957 as a technique for pulse-code modulation [24]. It is also referred to as Lloyd's algorithm [25]. Here, the modified algorithm is used to compute the threshold from speech spectrogram. The algorithm classifies a given data set into a certain number of clusters (i.e., k clusters) based on distance measures and to define k-centers, one for each cluster and finally computes the threshold. This algorithm is an iterative process until no new moves of data, as shown in Figure-1. Finally, this algorithm aims at minimizing an objective function, also known as squared error function and given by: ( )= = = k i c j j j i i vxJ 1 1 2)( Where: j j i vx )( is the Euclidean distance between a data point (xi) and the cluster center, (vj); ci is the number of data points in ith cluster; and k is the number of cluster centers. The algorithmic steps for calculating a threshold using k- means clustering is given below: Let x = {x1, x2, x3,,xn} be the set of data values in speech spectrogram. 1) Assign the number of clusters, k=3 and define the 3 cluster centers. Let v = {v1,v2,v3} be the set of 3 cluster centers. 2) Calculate the distance (dij) between ith data and jth cluster center. 3) Assign each data to the closest cluster with minimum distance. 4) Recalculate the new cluster center using: = = ic j j i i x c v 1 1 where, ci represents the number of data points in ith cluster. 5) Recalculate the distance between each data and new obtained cluster centers. 6) If no data was reassigned then stop, otherwise repeat from step (3). 7) Calculate the desired threshold from the average of 3 cluster centers. Figure1. K-Means Clustering Algorithm - Calculating the desired threshold. 3.2 FCM Clustering Algorithm The algorithm is developed by J C Bezdek in 1973 for pattern classification [26], and first reported by J C Dunn in 1974 [27], later J C Bezdek used this algorithm for pattern recognition [28]. FCM is a generalization of K-Means. While K-Means assigns each object to one and only one cluster, 3. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 02 Issue: 09 | Sep-2013, Available @ http://www.ijret.org 406 FCM allows clusters to be fuzzy sets, so that each object belongs to all clusters with varying degrees of membership and the following restriction: The sum of all membership degrees for any given data point is equal to 1. It works by assigning membership to each object on the basis of distance between the cluster center and the object. More the object is near to the cluster center more is its membership. The algorithm is an iteration process, as shown in Figure-2. After each iteration, membership and cluster centers are updated, and finally compute a threshold from the cluster centroids. The algorithmic step for calculating a threshold using fuzzy c- means clustering is given below: Let x = {x1, x2, x3, ..., xn} be the set of data values in speech spectrogram. 1) Assign the number of cluster, c=3 and define the cluster centers. Let v = {v1, v2, v3, v3} be the set of 3 cluster centers. 2) Calculate the Euclidean distance (dij) between ith data and jth cluster center. 3) Update the fuzzy membership function (ij) using: ( ) cjnidd c k m ikijij ,...,2,1;,...,2,1,/1 1 )1/(2 === = 4) Compute the fuzzy centers 'vj' using: ( ) ( ) cjxv n i m ij n i i m ijj ,...,3,2,1, 11 = = == wh ere, m is the fuzziness index [1,]. 5) Repeat from step (2) until the minimum 'J' value is achieved or eUU kk