A fuzzy logic-based clustering algorithm for network ... · A fuzzy logic-based clustering algorithm for network optimisation 133 Department of Computer Science and Engineering of

132 Int. J. Systems, Control and Communications, Vol. 7, No. 2, 2016

Copyright © 2016 Inderscience Enterprises Ltd.

A fuzzy logic-based clustering algorithm for network optimisation

Navdeep Singh* GMUADM, Infosys Ltd., Pune, India Email: [email protected] *Corresponding author

Rakesh Kumar Department of CSE, NITTTR, Chandigarh, India Email: [email protected]

Abstract: Rate of occurrence of high-dimensional data is much higher and sad to relate, classical clustering techniques do not hold good for such high-dimensional networks of arbitrary shapes in the underwater wireless sensor network. This is mainly due to the fact that clustering techniques are highly parameterised. Data aggregation using traditional clustering techniques is a problem for high-dimensional network because of its arbitrary shapes. Moreover, existing clustering algorithms cannot be used for solving the problem and enhancing clustering performance. Numerous top-notch researchers have employed fuzzy logic-based clustering schemes in the past. We analyse these techniques to find scientific tendency. In this paper, a fuzzy-based algorithm for clustering ensemble is proposed where sensor nodes are characterised based upon major and minor criteria. This algorithm is developed to group the sensor nodes in multiple clusters. The clustering precision is designed to assess the efficacy of the algorithm. This algorithm also competes in its ability to distinguish the difference between all the sensor nodes present in the network. Other clustering algorithms in existence for data aggregation are also reviewed and compared with the proposed technique.

Keywords: fuzzy set theory; UWSN; sensor nodes; network optimisation; data aggregation; data clustering; trapezoidal fuzzy numbers; linguistic variables.

Reference to this paper should be made as follows: Singh, N. and Kumar, R. (2016) ‘A fuzzy logic-based clustering algorithm for network optimisation’, Int. J. Systems, Control and Communications, Vol. 7, No. 2, pp.132–150.

Biographical notes: Navdeep Singh received his BTech in Computer Science and Engineering MM University, India, and currently, he is working as a Systems Engineer at Infosys Ltd. His research interests include algorithm design, wireless networking and mobile systems, and digital communication.

Rakesh Kumar received his BTech in Computer Science and Engineering from Punjab Technical University, Jalandhar, India, MTech in Information Technology from Guru Gobind Singh Indraprastha University, New Delhi, India and PhD degree from NIT Kurukshetra, India in 2004, 2008 and 2015 respectively. At present, he is working as an Assistant Professor in the

A fuzzy logic-based clustering algorithm for network optimisation 133

Department of Computer Science and Engineering of NITTTR, Chandigarh, India. He is working in the area of teaching for the last 11 years and has guided 20 MTech dissertations. He has 40 international/national conference/journal publications to his credit. He is an active member of IAENG, IDES and UACEE. His key area of interest includes MANET, VANET, simulation and modelling, discrete mathematics structure and formal languages and automata.

1 Introduction

Network optimisation plays a crucial role in network planning and design. A well-designed network can improve network efficiency, and data aggregation is better followed in this manner. Therefore, optimising a network has become a vital objective. Clustering, as shown in Figure 1 is a technique which aims to group sensor nodes into several small groups known as cluster nodes. These cluster nodes are responsible for agglomerating data from the individual nodes and routing it to the desired location. A number of clustering problems have been studied in the past and usually occurs in the system where data aggregation and data partition is necessary. Most of the existing algorithms were based on geometric procedures. Similarly, the k-means algorithm proposed by MacQueen (1967) is the most popular clustering technique amongst the geometric procedures. Tran et al. (2006) and Tritcler et al. (2005) proposed model-based algorithm which does not holds well in varying scientific domains. In particular, clustering high-dimensional data is a challenging problem and hence shows a disappointing behaviour with high-dimensional data.

For a dataset A, the purpose behind the clustering analysis is to divide A into different clusters and data sensed by sensor nodes in one cluster are as similar as possible while it can be not similar as possible in different nodes. For fuzzy-logic-based clustering, technology on data aggregation can be used for data partitions for better clustering performance. This process of clustering is designed with different initial parameters. SCOLPE algorithm proposed by Ong et al. (2004) which is a clustering algorithm and can be used for data accumulation of categorical attributes. Therefore, some clustering schemes can only be used for cluster categorical attributes. Many clustering algorithms have been designed and used in numerous applications. Clustering is still considered as a challenging process. The reason behind the disappointment is that each cluster is limited to some extent in some area and none of them can handle all clustering problems. Therefore, the strength of one cluster might compensate the weaknesses of the other. Hence, it is intelligent to cluster them in order to work collectively in an effective manner. This combination of different clustering schemes is known as clustering ensemble. This has resulted the most accurate and effective clustering scheme in studies proposed by Fred and Leitao (2003) and Greene and Cunningham (2004).

For high-dimensional network, there is an enormous amount of data that needs to be clustered. To reasonably cluster sensor nodes in a high-dimensional network, multiple parameters should be taken into account. These parameters include geographical location, node mobility, flow, environment, etc. However, certain network attributes cannot be directly measured efficiently. Considering the high-dimensional data, classical clustering schemes may not function effectively. Therefore, it is desired to design a clustering algorithm incorporating both the network data enormity and network location

134 N. Singh and R. Kumar

dynamicity. Also, it is necessary to understand network characteristics and analyse clustering schemes before network optimisation in a high-dimensional network. With a proper clustering technique, a large sensor network can be decomposed into clusters where sensor nodes are similar as possible, such as nearest geographical location, data storage, etc. However, when the number of nodes in a sensor network increases, the network optimisation problem becomes more challenging and therefore, the network optimisation should be undertaken by clustering and data aggregation. Not only it enhances the network efficiency, but also it reduces the network overhead. Our research proposes a method to overcome the sparseness in data clustering and is compared to show that this is an efficient approach.

The rest of the paper is organised as follows: Section 1 briefly discusses the Introduction to underwater sensor network and clustering followed by Section 2 with related studies followed by Section 3 where sensor node clustering is established based upon the fuzzy set theory. Methodology with numerous definitions is documented in Section 4 where fuzzy clustering algorithm is detailed. To evaluate the efficacy of the proposed scheme, clustering precision is evaluated. Comparison of similar approaches is discussed in Section 5. Finally, we conclude the paper with future remarks in Section 6.

Figure 1 Clustering and routing in underwater sensor networks (see online version for colours)

Source: Khalil et al. (2012)

2 Related work

With the transformation in technology such as efficient algorithms for network optimisation, data aggregation and routing have become key approaches for network optimisation. Information to the sensor nodes becomes highly available based on the sensor node. Important knowledge can be extracted using appropriate and effective data mining algorithms such as pattern growth approaches like PSP proposed by Masseglia etal. (1998), used to identify the sequential patterns. Sensor network clustering algorithm categorises the underwater sensor network into multiple clusters. Within each cluster, sensor nodes share common behaviours. In this way, a particular system can develop the


corresponding network optimisation strategy to retain the existing sensor nodes, every time, instead of taking into concern individual sensor node, the system can allocate their limited resources into certain clusters for cost and resource optimisation. In the past decade, a number of clustering techniques were developed. For example, Heinzelman et al. (2002) proposed a low-energy adaptive clustering hierarchy where the sensor nodes are allowed to stay in a sleep state for a long period of time to save power whereas the cluster head always stay awake for receiving and transmitting data. Yu and Wong (2006) proposed a clustering algorithm for heterogeneous data which introduces a fuzzy distance function and fractal correlation dimension to identify a cluster among an arbitrary group of sensor nodes. Considering clustering heterogeneous data streams, Chau et al. (2006) proposed UK-means algorithm minimising the distance between the clustering head and the sensor nodes. However, it failed to cluster the data with categorical attributes. LuMicro (Zhang et al., 2009) is one of the clustering techniques based on a two way selection mechanism for enhancing the cluster quality. Ong proposed an algorithm SCOLPE which adopts the micro-cluster and introduces the cluster histogram but can only cluster data with categorical attributes. Improving the same, He et al. (2008) proposed a divide and conquer strategy to cluster data stream with both categorical and numerical attributes.

In a high dimensional network, clustering the sensor nodes based on their characteristics in a large-scale network is a difficult task. Sensor node similarity is affected by various parameters, such as underwater environment, node mobility, node decay and other such parameters. Measuring most of these parameters quantitatively is not an easy task. This is because the majority of the above attributes are arbitrary and discrete, and usually obtained by human assumptions. Fuzzy theory is considered as the most appropriate measure to tackle the generality and ambiguity. Numerous top notch researchers have introduced and implemented fuzzy theory to tackle ambiguity such as Zadeh (1965) based on the membership function, Bezdek et al. (1981) for clustering analysis, Tamura et al. (1973) for pattern recognition. In fuzzy theory, basic terms for evaluating attributes are used such as ‘Good’, ‘Bad’ and more. Variety of typical forms of fuzzy numbers also exist, such as interval fuzzy numbers, triangular fuzzy numbers, trapezoidal fuzzy numbers which are considered as the general form of fuzzy numbers and are easy to process the basic evaluating terms. Many fuzzy systematic methods are adopted into the network operations and network clustering process in different research fields. Tassa and Cohen (2013) proposed a centralised approach for solving clustering problems. Khaji and Mehrjoo (2014) proposed a genetic algorithm. Li et al. (2013) presented a new maximum lifetime routing algorithm UCLF. Huruiala et al. (2010) proposed a hierarchical routing algorithm based on evolutionary algorithms employ fuzzy logic to handle uncertainties in the WSNs. Kim et al. (2008) used a method known as Mamdani method is used as a fuzzy inference technique.

In existence, it is difficult to find relevant studies addressing the sensor nodes clustering schemes for network optimisation. As introduced earlier about sensor nodes clustering being the intermediate stage during the process of optimisation for wireless sensor network. Distributed clustering scheme for heterogeneous wireless sensor networks proposed by Guo et al. (2010) where electing cluster heads based on the ratio between the residual energy of each node and the average energy of the network and then clustering similar sensor nodes with respective cluster heads. Smaragdakis et al. (2004) and Lindsey and Raghavendra (2002) proposed a technique which is composed of two


types of nodes based on the initial energy. Both the research are energy oriented, thus hold good for small-scale network. But both the research may suffer from numerous issues such a complexity of the modern wireless sensor network, more resources will be utilised to measure the similarity between the sensor nodes. Hence, decrementing the cost-saving factor. Also, the classical clustering techniques and data aggregation algorithms will not be able to handle high-dimensional network with high concentration of sensor nodes. In addition to this, node mobility is a parameter that results in nodes dislocation and entering another cluster acquiring uncertain attributes. Hence, heterogeneity among sensor nodes attributed should be incorporated in the process of clustering.

This literature review does not introduce a pertinent solution for optimising and clustering mobile network. In this paper, we design an effective algorithm based on the fuzzy set theory to provide an effective solution to minimise the research gap. The methodology of the proposed algorithm is divided into sections. Analysing the structure based on a certain hierarchy and rating each level according to its prominence. After that, developing a fuzzy method and proposing a fuzzy clustering algorithm to evaluate sensor node attributes. Next, evaluating the clustering precision to determine the efficacy of the proposed algorithm. Finally, comparing the proposed algorithm with the clustering algorithm in existence.

3 Clustering algorithm

The three phase process followed in this methodology:

1 a structure is established based on the hierarchical analysis

2 definition of linguistic variables for evaluating the sensor node clusters and then transformed into trapezoidal fuzzy numbers to make them easy and accurate for further proceedings

3 a sensor node clustering technique is developed for the high-dimensional network based on the fuzzy theory.

3.1 Structure establishment based on hierarchical analysis

In this section, we aim to establish a structure based on the hierarchical analysis of the sensor nodes and data for clustering the hi-dimensional network. Characteristics of the sensor network are determined with the help of the parameters shown in Figure 2. These are considered as the most prominent parameters based on the previous studies. Also, more parameters can be introduced and accommodated by this structure. Based on these parameters in this structure, evaluation of each cluster can be processed. With this, the aggregated precision can be evaluated by evaluating the precision for individual parameter. All these parameters with regard to the efficiency of the algorithm can be further elaborated as follows:


a Number of clusters: In order to optimise a network, various studies have adopted techniques which lead to a number of clusters. Number of clusters is proportional to the number of cluster heads. Increasing cluster heads results in decreasing the intra-cluster distance between the cluster head and the sensor nodes resulting in an energy-efficient communication as presented by Nam et al. (2011). Therefore, the number of clusters is a critical parameter to enhance the efficiency of the network.

b Intra-cluster communication: After going through the related work such as Intra Cluster Routing algorithm proposed by Adeel et al. (2010), it can be observed that in order to maintain the efficiency of the sensor network either one to one routing or multiple routing is adopted for intra-cluster communication where close regional nodes can perform one to one routing and other nodes can perform multi-hop routing. This is proportional to the number of nodes and the number of cluster heads.

c Nodes mobility: In underwater sensor networks, node mobility is one of the most important parameter. It is not expedient to assume a stationary sensor node and cluster heads under water. One can easily observe the negative impacts on wireless communication. It may results in exploiting the current cluster and entering another. Hence, the wireless network including sensor node and cluster association needs to be maintained dynamically.

d Nodes types: In order to enhance the network efficiency, initial energy plays an important role. Therefore, different types of nodes with different initial energy, energy consumption rate, etc., are selected. Nodes with high initial energy are often selected as the cluster head than the ordinary nodes with the same capability.

e Cluster-head selection: From the perspective to minimise the network’s energy consumption, it is challenging to select a cluster-head as energy drains at a higher rate for cluster-heads. Such a selection is done based on pertinent criteria such as connectivity, distance to the sink as shown in Figure 2, and cost of communication, maximum number of neighbours, mobility and more. Such nodes are picked based on deterministic or probabilistic or criteria listed above.

f Overlapping: This is followed by dividing the sensor network into numerous overlapping clusters with a specific average overlapping precision. In order to perform overlapping, each node can have one of the states, among cluster-head, overlapping nodes or normal node where the overlapping nodes are associated with multiple overlapping clusters. Being a prominent parameter, it is responsible for the robustness of the sensor network.

Also, the above parameters help in creating a structure which can be further elaborated as follows:

a Wireless sensor network environment: When it comes to underwater sensor network, environment is a very critical criteria for clustering. Node decay, node mobility, battery drainage, etc., are the parameters that are proportional to the underwater environment.


b Network compatibility: This attribute is used to measure the similarity of the data sensed by a given number of sensor nodes. Higher the compatibility, more will be the network’s efficiency to agglomerate data.

c Geographical location: If nodes are kept adjacent to each other, then these can be served together by the same cluster head. If the sink is close to a particular set of nodes, it can group together to form a single cluster and communicate. There are many more studies relevant to clustering based on geographical location.

d Inter-cluster routing: Routing is the most prominent aspect that forms a base of clustering. Based on different parameters, different routing techniques are implemented for clustering such that the sensor nodes can be grouped together if they follow the same path to the sink. Routing data from the sensor nodes to the cluster head is an inter-cluster routing.

e Intra-cluster routing: As discussed above, routing the sensed data from the cluster head to the sink or between different clusters is termed as intra-cluster routing.

f Control manners: There can be different control manners in the clustering process such as centralised, distributed and hybrid based on the criteria.

g Execution nature: In the process of clustering, iterative, variable and probabilistic is the nature the process can be executed in.

h Convergence time: Convergence time is a measure of how fast a group of routing reaches the state that have the same topological information about the inter-network. It can further be categorised into variable, constant and deterministic.

i Guarantee of connectivity: Guarantee of connectivity is the measure of probability of the connection working in a healthy state.

j Load balancing: It scales the performance by distributing the data transmitted across multiple cluster heads in order to prevent battery drainage of an individual CH node.

k Quality of service: Quality of service is a quantitative measure of several aspects of the network such as bandwidth, throughput and availability.

l Fault-tolerance: Fault-tolerance is the property that enables a system to continue operation in the event of failure of one or more element in a network.

3.2 Defining linguistic variables and its transformation

In this paper, we adopt fuzzy theory in order to transform the classification into numerical values and then transforming the output into trapezoidal fuzzy numbers. Therefore, a trapezoidal fuzzy number is assumed with applications in nonlinear as well as fuzzy linear approaches of type (a, b, c, d) without any non-negative and is represented as .. = (a, b, c, d) with respect to the study by Xu et al. (2011). Thus, association function can be calculated using trapezoidal fuzzy number Ã where:


0,

,

( ) 1,

0,

x ax a a x bb a

f x b x cd x c x dd c

x d

≤⎧⎪ −⎪ ≤ <

−⎪⎪= ≤ <⎨⎪ −⎪ ≤ <

−⎪⎪ ≥⎩

(1)

where a, b, c, d are real numbers. Due to the arbitrary nature of the clustering problem in the large scale hi-dimensional sensor network, parameters are used as linguistic variables. Here, Table 1 refers to rate the clustering criteria discussed above based on the study proposed by Sun and Deng (2006). Table 1 Representing linguistic terms vs. trapezoidal fuzzy number

L. term Trapezoidal number Absolutely good 1, 0.98, 0.95, 0.92 Very good 0.95, 0.92, 0.86, 0.82 Good 0.85, 0.75, 0.67, 0.63 Fair 0.63, 0.60, 0.58, 0.55 Poor 0.58, 0.55, 0.52, 0.49 Very poor 0.52, 0.45, 0.42, 0.39 Absolutely poor 0.40, 0.37, 0.33, 0.30

Basic enumerations are required in order to direct the clustering algorithm. Following mathematical definitions are required to carry out the clustering algorithm.

Figure 2 Process of clustering (see online version for colours)

3.3 Development of clustering algorithm

Every criteria discussed above needs to be evaluated. Each sub-level evaluation is pertinently synchronised to the higher lever criteria. Next, the clustering algorithm is conducted to cluster the sensor nodes into a number of clusters. Finally, the clustering


precision is developed and evaluated to find the number of clusters to be maintained and number of sensor nodes mapped to each cluster.

{ }1, 2, 3, ,iN N i n= = …

where N is the sensor node i and n is the total number of sensor nodes: 1, ( 1, 2, 3, , ; 1, 2, 3, , )t iμ t n i n= =… …

is the association degree desegregated from sub-criteria to major criteria: 2, ( 1, 2, 3, ; 1, 2, 3, , )u tK u d t m= =… … (2)

is the evaluated value for sub-criterion t of major criterion m with decision precision u.

( )1, 2, 3uA A u n= = …

where A represents each initial cluster, and n is the total number of clusters:

( )1, 2, 3wA A w n= = …

where B represents each final cluster, and n is the total number of clusters: Let F is the fuzzy concept set and x is the sample belong to the fuzzy set F, where

( )F x is the set of association degrees where

( ) ( )mF x ρ x≤

where ρm is the association degree of fuzzy concept m. Therefore, mathematically equations can be expressed as:

{ }( ) ( ) ;mF x w w ρ x= ε (3)

( )( )

( )( )

mx x mm

mx x

ρ xL x

ρ x

−=

−

∑∑

ε

ε

(4)

where L is the measure of a sample belonging to the simple fuzzy concept:

( )( ) inf ( ) [0, 1]F m F mμ x L x= ε ε (5)

where µF(x) is the association degree of fuzzy concept F. Let S be the sample set of sensor nodes in a network, M be the simple fuzzy concept

set of S where F ⊆ M. Using these terms, we can define association entropy function E(F) and association coefficient function C(F) as:

( )( )( ) ( ) ln ( ) ,x S

E F μ x μ x=∑ ε (6)

( )( ) ( )( )( )( ) ( ) ln ( )x S x S

D B μ x n μ x n= ∑ ∑ε ε (7)


Now, the evaluation index can be evaluated as V = E(B) / D(B) and is defined for evaluating the association entropy function and association coefficient function. Smaller the evaluation index, more reasonable for the fuzzy concept F to describe the sample S.

We have divided the network clustering into three phases. In the first phase, the evaluation sub-criterion defined in equation (2) will be mapped into the major-criterion. Then, clustering algorithm is executed to group the sensor nodes into different clusters in the second phase. Finally, in the third phase, clustering precision is designed and evaluated to determine the appropriate number of clusters and the sensor nodes associated with a cluster.

According to the equation (2), we have 2,u tK as the evaluation value for sub-criterion.

Similarly, we have 2,u tO as the evaluation value for major-criterion. These evaluation

values can be further expressed in the form of trapezoidal fuzzy number as:

( )2 2 2 2 2, , , , ,, , , and,u t u t u t u t u tK a b c d=

( )2 2 2 2 2, , , , ,, , , respectively.u t u t u t u t u tO θ h j k=

Let ‘+’ and ‘×’ be the vector addition and multiplication respectively. Then, the evaluation index can be further extensively expressed as:

( ) ( ) ( )( )2 2 2 2 2 2 2, , , , , , ,1

1 mt i u t u t u t u t u t u tu

Y K O K O K Om t =

= × × + × + + ×× ∑ …… (8)

Using the equation, 1( ) ( 2 2 )6

P Y a b c d= + + + as discussed by Chou (2009) and Liu and

Jin (2012), we can define the association degree of the sensor node as:

( )1 2 2 2 2, , , , ,

1 , , ,6t i u t u t u t u tμ A B C D= (9)

Let ft be the attribute to the sample xi for an association degree µ. The attribute ft can be transformed into sub attributes such as mt,1, mt,2, mt,3, mt,4, and their association degree can be expressed as ρmt,1, ρmt,2, ρmt,3, ρmt,4 respectively.

We perform the following steps in order to calculate the fuzzy attributes for each sample:

Firstly, Lm(x) is calculated for all the attributes using equation (4). Now, fuzzy concept of the association fuzzy values can be expressed as

( ) ( ) ( ) ( ) ( ){ },¥ max , , ,t s i m i m i m i m ix μ x μ x μ x μ x= (10)

Using the above fuzzy concept, we can evaluate the evaluation index by calculating the ratio of association entropy function and association coefficient function with respect to the result in equation (10) as:

( ) ( )( ) ( )( )¥ , ¥ , ¥ ,i i it s x t s x t s xV E D= (11)

For ¥t,s(xi) be the fuzzy set, it can also be written as:


( ) ( ){ }, ,¥ ¥ 1, 2, , ; 1, 2, ,t s i t s ix x p m i n= = =… … (12)

where m is the total number of major criteria and n is the total number of sensor nodes. The association entropy function and the entropy function can be calculated using the

definition in equations (6) and (7) respectively. Return to equation (13) and continue the recursion until ( ) ( )¥ , ¥ ,i it s x t s xV V≥

( ) ( )( )inf [0, 1], corresponds to the smallest valuem iXiμ L x= εα

( ) ( )( )inf [0, 1], corresponds to the second smallest valuem iXiμ L x= εβ

Using the sub-criteria and the association degree from equation (3), the association degrees can be written as:

( ) ( ) ,m Xi Xiρ μ=

where

{ }, ( ) ( 1) ( 2)max , , , t i Xi Xi XiX µ µ µ+ += …

Assume two attribute δ and γ. Calculating the smallest value of association degree µδ(xi) with respect to the attribute δ. Now, calculating another smallest value of association degree µγ(xi) with respect to the attributes γ. Using equation (5), we can define association degree for both the attributes.

Similarly, the evaluation index can be calculated as the ratio of association entropy function and the association coefficient function for both the attributes δ and γ individually. If the evaluation index with respect to δ is greater than or equal to γ, then eliminate δ and continue to calculate the evaluation index for the remaining attributes. Continue the loop until evaluation index with respect to δ is less than γ. With this, we have the fuzzy attributes of each sample.

Based on the fuzzy sample relation matrix as presented by Liu (1998a, 1998b) in his paper. We can identify different association degree based on the diagonal values and other values in the matrix. Let us express the diagonal values as (a = 1, 2, 3, …, n), and then the values of an increased diagonal value incremented accordingly. The corresponding samples can be grouped into one or more clusters, and the remaining samples can be grouped into other clusters. This loop continues until the diagonal values a = n. Every iteration will generate an initial cluster 1 2 3, , , , .nC C C C′ ′ ′ ′…

In order to obtain the final clusters with diagonal values and clustering precision such as one pictured in Figure 4, we divide the samples into different clusters accordingly. For each initial cluster 1 2 3, , , , ,nC C C C′ ′ ′ ′… we can calculate the weighted association degree using equation (5) and we can obtain the final clusters, as a result of the above enumerations as C1, C2, C3, …, Cn. Finally, in the third phase, we will calculate the clustering precision which is to be used to evaluate the clustering result and its efficiency. Clustering precision can be expressed as:


( )( )

2

2

( 1) ( )

2

p p

p p

c mm pp x P m P

δ c m me p kp x P m P

cX c ρ x oCP

xp x o o

⊆

⊆

− −=

⎛ ⎞−⎜ ⎟⎝ ⎠

∑ ∑ ∑∑ ∑ ∑

ε ε

ε ε

(13)

where the numerator is the degree of dispersion between the clusters by evaluating and comparing the attributes and the denominator is the closeness of attributes from different nodes in each cluster. Hence, the clustering precision is high when the closeness within each cluster becomes greater and the inter cluster dispersion becomes smaller.

4 Performance evaluation

Let us assume an underwater sensor network of 20 nodes. Using the diagonal values in and other values in the matrix, we can generate an initial cluster with every iteration. For every different value of the diagonal element, we can evaluate CPδ using equation (12). With this, we can produce the initial clustering results as it appears in graph 1 and the clustering precision as:

1 When a = 0.6881, CPδ = 5.41, there are two clusters:

1 { 13, 14, 15},C N N N′ =

2 { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 16, 17, 18, 19, 20}C N N N N N N N N N N N N N N N N N′ =

2 When a = 0.7380, CPδ = 4.61, there are three clusters:

1 { 13, 14, 15},C N N N′ =

2 { 16, 17, 18, 19},C N N N N′ =

3 { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 20}C N N N N N N N N N N N N N′ =

3 When a = 0.7848, CPδ = 2.88, there are four clusters:

1 { 13, 14, 15},C N N N′ =

2 { 16, 17, 18, 19},C N N N N′ =

3 { 4, 5, 6, 7},C N N N N′ =

4 { 1, 2, 3, 8, 9, 10, 11, 12, 20}C N N N N N N N N N′ =

4 When a = 0.8334, CPδ = 3.33, there are five clusters:

1 { 13, 14, 15},C N N N′ =

2 { 16, 17, 18, 19},C N N N N′ =

3 { 4, 5, 6, 7},C N N N N′ =

4 { 8, 9, 10},C N N N′ =


5 { 1, 2, 3, 12, 20}C N N N N N′ =

5 When a = 0.8574, CPδ = 3.80, there are six clusters:

1 { 13, 14, 15},C N N N′ =

2 { 16, 17, 18, 19},C N N N N′ =

3 { 4, 5, 6, 7},C N N N N′ =

4 { 8, 9, 10},C N N N′ =

5 { 1, 2, 3},C N N N′ =

5 { 12, 20}C N N′ =

For each initial cluster obtained, we can calculate the weighted association degree using equation (5). As a result, we can obtain the final clusters and the clustering precision as:

1 When a = 0.6881, CPδ = 5.02, there are two clusters:

1 { 13, 14, 15},C N N N=

2 { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 16, 17, 18, 19, 20}.C N N N N N N N N N N N N N N N N N=

2 When a = 0.7380, CPδ = 4.01, there are three clusters:

1 { 13, 14, 15},C N N N=

2 { 16, 17, 18, 19, 20},C N N N N N=

3 { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}.C N N N N N N N N N N N N=

3 When a = 0.7848, CPδ = 2.41, there are four clusters:

1 { 13, 14, 15},C N N N=

2 { 16, 17, 18, 19, 20},C N N N N N=

3 { 4, 5, 6, 7, 8},C N N N N N=

4 { 1, 2, 3, 9, 10, 11, 12}.C N N N N N N N=

4 When a = 0.8334, CPδ = 2.49, there are five clusters:

1 { 13, 14, 15},C N N N=

2 { 16, 17, 18, 19, 20},C N N N N N=

3 { 4, 5, 6, 7, 8},C N N N N N=

4 { 9, 10},C N N=

5 { 1, 2, 3, 12, 20}.C N N N N N=


The smaller the value of clustering precision, more will be the effectiveness of the network clustering. Therefore, by analysing the final clustering results, we can find CPδ = 2.41 is the smallest value, indicating that the effectiveness of the network clustering is most favourable when a = 0.7848.

Figure 3 Clustering precision for initial cluster formation (see online version for colours)

5 Comparative study

Recent clustering algorithms are compared with the proposed algorithm to prove the effectiveness of the proposed clustering algorithm. Buttyan and Schaffer (2010) presented PANEL as shown in Figure 5 for clustering wireless sensor network. Jung et al. (2007) presented another clustering scheme CCS represented in Figure 6, considering the location of the sink to enhance its performance and to increase the lifetime of the network. Wang in 2010 evaluated the clustering precision as 3.25 and obtained six clusters with that approach. Later, Li evaluated the clustering precision as 3.79 and obtained four clusters with the proposed approach. All the clustering results are shown in Figures 5 and 6. Similarly, Wang presented a clustering precision as 2.83 in 2012 obtaining seven clusters. Clustering precision for initial and final cluster formation is graphically shown in Figure 3 and Figure 4 respectively.

The clustering precision takes into account both the inter-cluster distance as well as the intra-cluster distance and thus can be used to evaluate the effectiveness of clustering results. The smaller the clustering precision, the more effective the clustering algorithm is. With the lowest clustering precision, the presented clustering procedure results into more effective results as compared to the other five algorithms. The other reason of producing effective results is that it can perform with any number of attributes dynamically and is able to split numerous characteristics into generalised criteria. Hence, this approach is able to segregate the difference between the sensor nodes. Also, its property to distinguish between the different types of nodes helps to evaluate resources required for each type of sensor node. For instance, we have compared numerous clustering in Table 2.


Table 2 Comparison table

Clu

ster

alg

orith

m

No.

of c

lust

ers

Clu

ster

ing

prec

isio

n Ti

me

com

plex

ity

Abili

ty o

f han

dlin

g hi

gh

dim

ensi

onal

Pe

rcen

tage

of a

ccur

acy

UC

LF (L

i et a

l., 2

013)

6

3.25

O

(N2 )

Yes

A

bout

99%

C

redi

bilit

y-ba

sed

hier

arch

ical

agg

lom

erat

ive

clus

terin

g (L

au e

t al.,

20

10)

4 3.

73

O(N

) Y

es

96%

AV

C (L

ao e

t al.,

201

2)

7 2.

83

O(N

2 ) Y

es

Cor

rect

ly c

lass

ify in

to

4 gr

oups

C

LAR

A (K

aufm

an a

nd

Rou

ssee

uw, 1

990)

N

A

NA

(O

(K(4

0 +

K)2 +

K(N

– K

))+

No

Acc

urat

e bu

t 15

times

sl

ower

than

BIR

CH

D

BSC

AN

(Est

er e

t al.,

19

96)

NA

N

A

O(N

log

N)

No

Cor

rect

ly c

lass

ify, n

o no

tion

of n

oise

B

IRC

H (Z

hang

et a

l.,

1993

) N

A

NA

O

(N)

Yes

C

orre

ctly

cla

ssify

the

inst

ance

s int

o gr

oups

K

-mea

ns c

lust

erin

g ap

proa

ches

D

epen

ds u

pon

the

appr

oach

an

d nu

mbe

r of d

ata

poin

ts

NA

O

(NK

d)

No

69%

Fuzz

y c-

mea

ns

Dep

ends

upo

n th

e ap

proa

ch

and

num

ber o

f dat

a po

ints

N

A

O(N

) N

o 98

%

Hie

rarc

hica

l clu

ster

ing

Dep

ends

upo

n th

e ap

proa

ch

and

num

ber o

f dat

a po

ints

N

A

O(N

2 ) N

o 96

%

RO

CK

(Guh

e et

al.,

200

0)

21

NA

O

(N lo

g N

) Y

es

Alm

ost 9

9%

Prop

osed

App

roac

h 4

2.41

(low

est c

lust

erin

g pr

ecis

ion)

O

(N)

Yes

C

orre

ctly

cla

ssify

into

4

grou

ps


To summarise, the presented approach produces more favourable and effective results by taking into concern numerous attributes associated with each sensor node. The appropriate clustering approach can improve uniformity in inter-cluster and intra-cluster distance, reduce the network complexity, effective data aggregation and reducing convergence time.

Figure 4 Clustering precision for final cluster formation (see online version for colours)

Figure 5 Panel for clustering the network by Buttyan and Schaffer (see online version for colours)

Figure 6 Concentric clustering scheme by Jung (see online version for colours)


6 Conclusions

Wireless sensor network clustering for a hi-dimensional complex network is of critical importance. It is pertinent for hi-dimensional network optimisation as it takes into concern numerous attributes producing favourable results. Many studies have provided several well established results. There are a number of factors influencing the process of clustering. This paper presents an approach to cluster sensor nodes with similar characteristics under hierarchical structure. The structure is able to deice and categorise each sensor node’s attribute into major and minor criteria. Linguistic variables are used to represent each criterion. Each linguistic variable is then transformed into a trapezoidal fuzzy number to enhance accuracy and standardisation. With this, we propose a clustering approach to enumerate the impacts of all criteria. A clustering precision is defined to measure inter-cluster and intra-cluster distance. Other researches in the area are analysed graphically and relevant comparison is performed. Comparing with these studies, with the lowest clustering precision, the proposed approach outperforms the other approaches. It can handle multiple attributes for a sensor node in the underwater wireless sensor network. In comparison to the previous approaches, it helps in distinguishing the difference between sensor nodes much effectively. Furthermore, experimental result of the proposed approach can be extended to solve clustering problems in other domains. Importance of clustering in underwater wireless sensor network is proved. In addition to this, research can be carried out to consolidate more factors into the approach.

References Adeel, A., Abid, A. and Sohail, J. (2010) ‘Energy aware intra cluster routing for wireless sensor

networks’, International Journal of Hybrid Information Technology, Vol. 3, No. 1, pp.1–6. Bezdek, C., Gunderson, C.R. and Watson, J. (1981) ‘Detection and characterization of cluster

substructure – linear structure, fuzzy c-varieties and convex combinations thereof’, SIAM J. Appl. Math., Vol. 40, No. 2, pp.358–372.

Buttyán, L. and Schaffer, P. (2010) ‘PANEL: position-based aggregator node election in wireless sensor networks’, International Journal of Distributed Sensor Networks, Vol. 2010, Article ID 679205, pp.1–15.

Chau, M., Cheng, R., Kao, B. and Ng, J. (2006) ‘Uncertain data mining: an example in clustering location data’, Proceedings of the 10th Pacific-Asia Conference on Knowledge Discover and Data Mining (PAKDD 2006), Lecture Notes in Computer Science, Vol. 3918, pp.199–204.

Chou, C.C. (2009) ‘Integrated short-term and long-term MCDM model for solving location selection problems’, Journal of Transportation Engineering, Vol. 135, No. 11, pp.880–893.

Ester, M., Kriegel, H., Sander, J. and Xu, X. (1996) ‘A density-based algorithm for discovering clusters in large spatial databases with noise’, Proc. 2nd Int. Conf. Knowledge Discovery and Data Mining (KDD’96), pp.226–231.

Fred, L.N. and Leitao, M.N. (2003) ‘A new cluster isolation criteria based on dissimilarity increments’, IEEE Trans. Pattern Anal. Machine Intelligence, Vol. 25, No. 8, pp.944–958.

Greene, D. and Cunningham, P. (2006) Efficient Ensemble Methods for Document Clustering, Technical Report, Department of Computer Science, Trinity College Dublin, pp.1–6.

Guhe, S., Rastogi, R. and Shim, K. (2000) ‘ROCK: a robust clustering algorithm for categorical attributes’, Inf. Syst., Vol. 25, No. 5, pp.345–366.


Guo, L-Q., Xie, Y., Yang, C-H. and Jing, Z-W. (2010) ‘Improvement on LEACH by combining adaptive cluster head election and two-hop transmission’, Proceedings of the Ninth International Symposium on Distributed Computing and Applications to Business, Engineering and Science – 2010.

He, Z., Xu, X. and Deng, S. (2008) Clustering Mixed Numeric and Categorical Data: A Cluster Ensemble Approach, pp.1–6, ARXiv Computer Science e-prints.

Heinzelman, W.B., Chandrakasanand, A.P. and Balakrishnan, H. (2002) ‘An application specific protocol architecture for wireless micro sensor networks’, IEEE Trans. Wireless Communication, Vol. 1, No. 4, pp.660–670.

Huruiala, P.C., Urzica, A. and Gheorghe, L. (2010) ‘Hierarchical routing protocol based on evolutionary algorithms for wireless sensor networks’, 9th RoEduNet IEEE International Conference, pp.387–392.

Jung, S., Han, Y. and Chung, T. (2007) ‘The concentric clustering scheme for efficient energy consumption in the PEGASIS’, Proceedings of the 9th International Conference on Advanced Communication Technology, pp.260–265.

Kaufman, L. and Rousseeuw, P. (1990) Finding Groups in Data: An Introduction to Cluster Analysis, pp.223–226, Wiley, Applied Probability and Statistics Series, New York, NY.

Khaji, N. and Mehrjoo, M. (2014) ‘Crack detection in a beam with an arbitrary number of transverse cracks using genetic algorithms’, Journal of Mechanical Science and Technology, Vol. 8, No. 3, pp.823–836.

Khalil, I.M., Gadallah, Y., Hayajneh, M. and Khreishah, A. (2012) ‘An adaptive OFDMA-based MAC protocol for underwater acoustic wireless sensor networks’, Sensors, Vol. 7, pp.8782–8805, MDPI, Basel, Switzerland.

Kim, J-M., Park, S-H., Han, Y-J. and Chung, T-M. (2008) ‘CHEF: cluster head election mechanism using fuzzy logic in wireless sensor networks’, Proc. 10th Int. Conf. Advanced Communication Technology ICACT, Vol. 1, pp.654–659.

Lao, Y., Wu, Y., Wang, Y. and McAllister, K. (2012) ‘Fuzzy logic-based mapping algorithm for improving animal-vehicle collision data’, Journal of Transportation Engineering, Vol. 138, No. 5, pp.520–526.

Lau, H.C.W., Jiang, Z.Z., Ip, W.H. and Wang, D.W. (2010) ‘A credibility-based fuzzy location model with Hurwicz criteria for the design of distribution systems in B2C e-commerce’, Computers and Industrial Engineering, Vol. 59, No. 4, pp.873–886.

Li, J., Liao, G., Wang, F. and Li, J. (2013) ‘Maximum lifetime routing based on fuzzy set theory in wireless sensor networks’, JSW, Vol. 8, No. 9, pp.2321–2328.

Lindsey, S. and Raghavendra, C.S. (2002) ‘PEGASIS: power-efficient gathering in sensor information systems’, Proceedings of the IEEE Aerospace Conference, Vol. 3, pp.125–1130.

Liu, P.D. and Jin, F. (2012) ‘A multi-attribute group decision-making method based on weighted geometric aggregation operators of interval-valued trapezoidal fuzzy numbers’, Applied Mathematical Modelling, Vol. 38, No. 1, pp.2498–2509.

Liu, X.D. (1998a) ‘The fuzzy sets and systems based on AFS structure, EI algebra and EII algebra’, Fuzzy Sets and Systems, Vol. 95, No. 2, pp.179–188.

Liu, X.D. (1998b) ‘The fuzzy theory based on AFS algebras and AFS structure’, Journal of Mathematical Analysis and Applications, Vol. 217, No. 2, pp.459–478.

MacQueen, J.B. (1967) ‘Some methods for classification and analysis of multivariate observations’, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, pp.281–297.

Masseglia, F., Cathala, F. and Poncelet, P. (1998) ‘The PSP approach for mining sequential patterns’, Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery, pp.176–184.

Nam, C.S., Han, Y.S. and Shin, D.R. (2011) ‘Multi-hop routing-based optimization of the number of cluster-heads in wireless sensor networks [J]’, Sensors, Vol. 11, No. 3, pp.2875–2884.


Ong, K-L., Li, W., Ng, W-K. and Lim, E-P. (2004) ‘SCLOPE: an algorithm for clustering data streams of categorical attributes’, Lecture Notes in Computer Science, Vol. 3181, pp.209–218.

Smaragdakis, G., Matta, I. and Bestavros, A. (2004) ‘SEP: a stable election protocol for clustered heterogeneous wireless sensor networks’, Proceedings of the 2nd International Workshop on Sensor and Actor Network Protocols and Applications (SANPA ‘04), pp.251–261.

Sun, D. and Deng, Y. (2006) ‘Determine discounting coefficient in data fusion based on fuzzy ART neural network’, Proceedings of the Third International Conference on Advances in Neural Networks, Vol. 1, pp.1286–1292.

Tamura, S., Higuchi, S. and Tanaka, K. (1973) ‘Pattern classification based on fuzzy relations’, IEEE Trans. Syst. Man Cybern., Vol. SMC-3, pp.98–102.

Tassa, T. and Cohen, D.J. (2013) ‘Anonymization of centralized and distributed social networks by sequential clustering’, IEEE Transactions on Knowledge and Data Engineering, Vol. 25, No. 2, pp.2–8.

Tran, T.N., Wehrens, R. and Buydens L.M.C. (2006) ‘SMIXTURE: a strategy of mixture models clustering of multivariate images’, Journal of Chemometrics, Vol. 19, No. 11, pp.607–614.

Tritchler, D., Fallah, S. and Beyene, J. (2005) ‘A spectral clustering method for microarray data’, Computational Statistics and Data Analysis, Vol. 49, No. 1, pp.63–76.

Xu, Z.Y., Shang, S.C., Qian, W.B., and Shu, W.H. (2011) ‘A method for fuzzy risk analysis based on the new similarity of trapezoidal fuzzy numbers’, Expert Systems with Applications, Vol. 37, No. 3, pp.1920–1927.

Yu, Z. and Wong, H-S. (2006) ‘Mining uncertain data in low-dimensional subspace’, The 18th International Conference on Pattern Recognition, (ICPR’06), 0-7695-2521-0/06 $20.00 ©2006, IEEE.

Zadeh, L.A. (1965) ‘Fuzzy sets’, Information and Control, Vol. 8, No. 3, pp.338–353. Zhang, C., Gao, M. and Zhou, A. (2009) ‘Tracking high quality clusters over uncertain data

streams’, Proceedings of the IEEE International Conference on Data Engineering, Vol. 1, pp.1641–1648.

Zhang, T., Ramakrishnan, R. and Livny, M. (1993) ‘BIRCH: an efficient data clustering method for very large databases’, Proc. ACM SIGMOD Conf. Management of Data, pp.103–114.

Documents

A fuzzy logic-based clustering algorithm for network ... · A fuzzy logic-based clustering algorithm for network optimisation 133 Department of Computer Science and Engineering of