4
Web Services Dynamic Discovery Based on Modified CLIQUE Algorithm Lixin Shen 1 , Yan Chen 1 , Zhiguo Wang 1 , Weihong Yu 1 , Sen He 2 , Shoudong Zhang 3 1 College of Transportation and Management, Dalian Maritime University, Dalian, Liaoning,116026, China 2 College of Computer and Information Management Jilin Institute of Chemical Technology, Jilin, Jilin, 132022, China 3 JiLin Bureau of Land and Resources, Jilin, Jilin, 132022, China [email protected], [email protected] Abstract Web services have been considered as an effective method to create unprecedented opportunities for organizations to establish more agile and versatile collaborations with other organizations. But services dynamic discovery is one of factors not only tiring consumers but also preventing them from enjoying high quality of service. It is one of key issues in services dynamic discovery how to select the high quality of web service with similar or same functionalities. This paper provides a novel approach based on a modified CLIQUE algorithm to discover desired web services. In the method, some web services with the similar or same function are classified by the index of service quality. The paper adopt key performance index (KPI) as the index of service quality. 1. Introduction As web services proliferate, there may be so many candidate services that have similar or same functionalities but vary from one another on the quality of service. How to dynamically discover these services is one of factors which not only tire consumers but also prevent them from enjoying high quality of service. Although there are different kinds of approaches dealing with discovery issue, two of commonly used methods are identified in the paper: keywords-based and Ontology-based reasoning approaches. Keywords-based mechanism is one of the dominating techniques for web services discovery and matching. However, this approach based on the syntax keyword (string) is lowly accurate. This approach based on term frequency analysis is insufficient in the majority of the cases because it fails to contemplate the semantic concepts hidden behind the service descriptions. Ontology-based reasoning approach has been seeking to use ontology to annotate elements in web services, which aims to not only capture the information on the structure and semantics of a domain, but facilitate software agents to make inference at the level of concept. As selecting Web services, by defining semantic information of Web services can find the appropriated Web services, and compose the Web services to meet the functional requirements [1-3] . This approach is a good solution in the semantic, but can’t solve selecting the web services with the similar or same function as the difference in the quality. This paper provides a novel approach based on a modified CLIQUE algorithm to discover web services. The approach can divide web services with the similar or same function into candidate web services sets by the index of service quality. The paper adopt key performance index (KPI) as the evaluating index of service quality. 2. Algorithm Analysis Cluster analysis is an important approach in data mining. During selecting web services, cluster analysis should be carried out in a multidimensional database, where there is tremendous and complex datum. Comparatively speaking, of all cluster analysis algorithms, mesh-based algorithm is more efficient, as it considers cells, but not data points, and the operation is on the mesh structure of striping data space. Typical algorithms include STING [4] , STING+ [5] , WaveCluster [6] , CLIQUE [7] and GDILC [8] . CLIQUE is the subspace cluster algorithm of multidimensional data, which can find the clusters in multidimensional data subspace. 2.1. Description of the Modified Algorithm The original algorithm has been using a fixed ξ value. If ξ value is too small, the cells will be divided great and clustering results will be inaccurate. Contrariwise, cell division is too small. As the dimension increase, the number of cells will increase rapidly, so that the program may be out of control. One possible solution is International Symposium on Intelligent Information Technology Application Workshops 978-0-7695-3505-0/08 $25.00 © 2008 IEEE DOI 10.1109/IITA.Workshops.2008.21 379

[IEEE 2008 International Symposium on Intelligent Information Technology Application Workshops (IITAW) - Shanghai, China (2008.12.21-2008.12.22)] 2008 International Symposium on Intelligent

Embed Size (px)

Citation preview

Page 1: [IEEE 2008 International Symposium on Intelligent Information Technology Application Workshops (IITAW) - Shanghai, China (2008.12.21-2008.12.22)] 2008 International Symposium on Intelligent

Web Services Dynamic Discovery Based on Modified CLIQUE Algorithm

Lixin Shen1, Yan Chen1, Zhiguo Wang1, Weihong Yu1, Sen He2, Shoudong Zhang3

1College of Transportation and Management, Dalian Maritime University, Dalian, Liaoning,116026, China

2 College of Computer and Information Management

Jilin Institute of Chemical Technology, Jilin, Jilin, 132022, China 3JiLin Bureau of Land and Resources, Jilin, Jilin, 132022, China

[email protected], [email protected]

Abstract Web services have been considered as an effective

method to create unprecedented opportunities for organizations to establish more agile and versatile collaborations with other organizations. But services dynamic discovery is one of factors not only tiring consumers but also preventing them from enjoying high quality of service. It is one of key issues in services dynamic discovery how to select the high quality of web service with similar or same functionalities. This paper provides a novel approach based on a modified CLIQUE algorithm to discover desired web services. In the method, some web services with the similar or same function are classified by the index of service quality. The paper adopt key performance index (KPI) as the index of service quality. 1. Introduction

As web services proliferate, there may be so many candidate services that have similar or same functionalities but vary from one another on the quality of service. How to dynamically discover these services is one of factors which not only tire consumers but also prevent them from enjoying high quality of service. Although there are different kinds of approaches dealing with discovery issue, two of commonly used methods are identified in the paper: keywords-based and Ontology-based reasoning approaches. Keywords-based mechanism is one of the dominating techniques for web services discovery and matching. However, this approach based on the syntax keyword (string) is lowly accurate. This approach based on term frequency analysis is insufficient in the majority of the cases because it fails to contemplate the semantic concepts hidden behind the service descriptions. Ontology-based reasoning approach has been seeking to use ontology to annotate elements in web services, which aims to not only capture the information on the structure and

semantics of a domain, but facilitate software agents to make inference at the level of concept. As selecting Web services, by defining semantic information of Web services can find the appropriated Web services, and compose the Web services to meet the functional requirements[1-3]. This approach is a good solution in the semantic, but can’t solve selecting the web services with the similar or same function as the difference in the quality.

This paper provides a novel approach based on a modified CLIQUE algorithm to discover web services. The approach can divide web services with the similar or same function into candidate web services sets by the index of service quality. The paper adopt key performance index (KPI) as the evaluating index of service quality.

2. Algorithm Analysis

Cluster analysis is an important approach in data mining. During selecting web services, cluster analysis should be carried out in a multidimensional database, where there is tremendous and complex datum. Comparatively speaking, of all cluster analysis algorithms, mesh-based algorithm is more efficient, as it considers cells, but not data points, and the operation is on the mesh structure of striping data space. Typical algorithms include STING[4], STING+[5], WaveCluster[6], CLIQUE[7] and GDILC[8]. CLIQUE is the subspace cluster algorithm of multidimensional data, which can find the clusters in multidimensional data subspace. 2.1. Description of the Modified Algorithm

The original algorithm has been using a fixed ξ value. If ξ value is too small, the cells will be divided great and clustering results will be inaccurate. Contrariwise, cell division is too small. As the dimension increase, the number of cells will increase rapidly, so that the program may be out of control. One possible solution is

International Symposium on Intelligent Information Technology Application Workshops

978-0-7695-3505-0/08 $25.00 © 2008 IEEE

DOI 10.1109/IITA.Workshops.2008.21

379

Page 2: [IEEE 2008 International Symposium on Intelligent Information Technology Application Workshops (IITAW) - Shanghai, China (2008.12.21-2008.12.22)] 2008 International Symposium on Intelligent

to reduce the number of the candidate cells when the subspace dimension is lower. In considering the low-dimensional space, the ξ value is relatively small and the cell size is bigger, so the cell can be considered-intensive only when they contain more data. With the dimension increased, cells are also getting smaller and smaller, so the standard judged intensive cells should be reduced accordingly. Based on the above considerations, ξ is the increasing function of the dimension, and σ should decreasing function of the dimension. According to the experimental results, set:

ξi=2(i-1)/2, σi =0.32-0.02i, i≥1 This algorithm ensures not only accurate but also the

speed of the program running without be affected by the cells increase. 2.2. Modified CLIQUE Algorithm Flow Dk is the k-dimension dense cells aggregate; Ck+1 is the candidate dense cells aggregate; HDC is the highest dimension-level with clusters; Input: starting subspace s; Cluster dimension d;

Cfs(s,d) {For k=1 to n k++ {ξi=2(i-1)/2; σi =0.32-0.02i FindDenseCells(Dk); //Find the Dk; DoCandidateDenseCells(Ck+1); //Calculate Ck+1; If(IsNotNull(Ck+1)) {FormDenseCells(Dk+1);} Else {HDC=s; Do {Denote the subspace with undirected graph; Travel connected offsets with DFS in the graph; Store the found clusters in this subspace; Do {Form description of the clusters; Tag=whether is clusters remains to be found; } While (Tag); Tag=true or false; } While (Tag); } } }

2.3. Complexity Analysis

Step 1: Acquire each maximum rectangle region R. Suppose that |R| denotes for the number of dense

cells included in R. Greedy searching will perform for time O(|R|). Suppose that S is the subspace in which R is located, k is the dimension number of S, and n is the number of dense cells in S. The algorithm must search each cell to confirm whether R is part of the cluster. In

addition, it must visit each neighboring cell of R to ensure that R is the maximum. The number of neighboring cells is < 2k|R|. Because each new maximum region covers at least one dense cell that has not yet included until now, greedy algorithm will find at least O (n) new regions. O(|R|)=O(n)cells need visiting in every region the time here needed will be less than O(n2). Suppose that there are n dense cells and only one cluster in subspace S, and the borders are two paralleled hyper planes and a cylindrical. Because the hyper-plane do not parallel with any dimension, there are O(n(k-1)/k) cells touching the clusters in the hyper-plane. These cell should be included in the maximum region, and the size of each dimension should be O (n(k-1)/k) too. Because each region should neighbor another hyper plane, greedy algorithm should visit O(n2(k-1)/k) dense cells.

Step 2: Find the minimum coverage. The regions should be ordered, and cost is O(nlogn).

|Ri| cells are to be visited in each region Ri, and the total time is O (n2).

To sum up, the time complex of the three steps of the algorithm is: O(Ck+ mk)+ O(nlogn)+O(n)+O(n2)=O(Ck+ mk)+O(n2)

The k is the dimension level number of the highest dimension dense subspace, m is the number recorded in the database, and n is the number of dense cell.

3. Key Performance Index (KPI)

Web services (WS) with the owl-s description are mainly discovered through retrieving the input and output parameters of the service profile. In actual business environment, the user may not know what parameters to be imported and be output; however they only know the goal and principle of partner selection. In the commercial area the KPI is used to evaluate partners [9-10]. In considering that, the paper adds the non-functional KPI description into the service profile.

Definition: WS = (Fun, KPIpub, KPIpri)[11] Fun is the functional sets of the services. Based on

the facts that different industries (organizations) have different KPIs, the paper proposes two sets of KPI assembly: KPIpub and KPIpri. KPIpub is common while KPIpri is particular in the industry. KPIpri and KPIpub can be empty assembly. If KPIpub is empty while KPIpri is non-empty, such the WS is affected heavily by the user and their empirical value. Conversely, If KPIpub is non-empty while KPIpri is empty, such the WS can’t reflect the user’s aspirations. So the paper ranges over the web services which are based on two non-empties KPI.

4. Example Analysis

In order to verify the effectiveness of the approach, the paper applies the algorithm into a transport enterprise. The result of clusting is the candidate web

380

Page 3: [IEEE 2008 International Symposium on Intelligent Information Technology Application Workshops (IITAW) - Shanghai, China (2008.12.21-2008.12.22)] 2008 International Symposium on Intelligent

services sets with same function as difference in quality of service. In the example, each dimension of the system is divided intoξ=10 sectors. There areξ* k=160 cells in one-dimension subspace. After traversing the database, we have 81 dense cells. The occupancy of one-dimension dense cells is about 50%. There should have been 24000 subspaces in the two-dimension cells, but start from one-dimension cells, there are 3109 subspaces in the two-dimension candidate subspaces, and the occupancy is 12.95%. Finally there are 28 dense cells of 16-dimension, which is far less than 1016, the actual 16-dimension cells. The time consumption of 216 is much lowered, and it is generally in linear relation with the data volume.

Table 1. Logistic Industry KPIpri [11] No KPIpri Name No KPIpri Name

S1 mean of transportation time (day) S2 ability of key process (CPk)

S3 mean square deviation of transportation time (day) S4

assurance of transportationquality (1-5)

S5 minimum of delivery amount (each)

Table 2. Transport Industry KPIpub

[11] No KPIpub Name No KPIpub Name

P1 Total cost units(yuan/each) P6

Reaction rate of customer requirement (%)

P2 market scale(%) P7 Reaction time of customer requirement (day)

P3 Competitive Environment(1-5) P8

Expected value of delivery amount (each)

P4 strategic alignment(1-5) P9 marker structure (Gjni coefficient)

P5 evaluation result P10 Quality system (1-5)

Table 3. Part of KPIpub Source Data

No P1 P2 P3 P4 P5 P6 P7 P8 P9 P10

1 150 65 4 4 0.8 90 30 2000 0.65 4

2 50 65 5 4 0.88 90 30 2000 0.65 4

3 150 65 5 3 0.95 90 30 1000 0.65 4

4 145 65 5 4 0.85 90 30 2000 0.65 4

5 130 70 2 3 1 90 30 200 0.85 5

6 130 70 4 3 0.45 100 15 2000 0.85 3

7 130 70 4 2 1 100 15 200 0.85 5

8 130 70 4 5 0.7 100 15 200 0.85 5

9 130 70 4 5 0.75 100 35 200 0.85 5

10 130 70 4 5 0.8 100 15 200 0.85 5

……………………….

Table 4. Part of KPIpri Source Data

No. S1 S2 S3 S4 S5

1 15 1.33 5 3 400

2 15 1.67 5 4 400

3 15 1.67 3 4 200

4 15 1.33 5 3 400

5 10 1.67 1 5 50

6 10 1.67 1 5 50

7 10 1.67 1 5 50

8 10 1.67 1 5 50

9 30 1.67 1 5 50

10 10 1.67 5 5 50

……………………….

Table 5. The interval from top to bottom sector(ξ=10) No P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 S1 S2 S3 S4 S5

1 0 0.1 1 20.01 60 15 200 0.1 2 10 0 0 0 50

2 2010.1 1.5 2.40.11 65 20 700 0.19 2.412.5 0.2 1 0.6 250

3 4020.1 2 2.80.21 70 25 1200 0.28 2.8 15 0.4 2 1.2 450

4 6030.1 2.5 3.20.31 75 30 1700 0.37 3.217.5 0.6 3 1.8 650

5 8040.1 3 3.60.41 80 35 2200 0.46 3.6 20 0.8 4 2.4 850

6 10050.1 3.5 40.51 85 40 2700 0.55 422.5 1 5 31050

7 12060.1 4 4.40.61 90 45 3200 0.64 4.4 25 1.2 6 3.61250

8 14070.1 4.5 4.80.71 95 50 3700 0.73 4.827.5 1.4 7 4.21450

9 16080.1 5 5.20.81 100 55 4200 0.82 5.2 30 1.6 8 4.81650

10 18090.1 5.5 5.60.91 105 60 4700 0.91 5.632.5 1.8 9 5.41850

Table 6. The result of clusting

Cluster P1 P2 P3 P4 P5 P8 P9 S2 S5 1 6-8 6 8 7 6-9 0 8 8 0

1 7 6 6 5 9 4 6 8 0

1 7 6 6 5 7-8 9 6 6 1

2 8 8-9 0 0 0 9 8-9 3 9

2 6 8 6 0 0 0 8 3 0

3 7 6 8 7 1 0 8 8 0

4 6 6 8 2 3 0 8 3 0

5 8 6 8 7 5 0 8 8 0

6 7 0 2 0 3-4 9 0 8 3-4

5. Conclusion

The paper, on the analysis of the issue which is the same or similar function web services with different quality of service, introduces the non-functional KPI description into the service profile. That makes web services selection closer to the example of business operation. A modified CLIQUE algorithm is used to

381

Page 4: [IEEE 2008 International Symposium on Intelligent Information Technology Application Workshops (IITAW) - Shanghai, China (2008.12.21-2008.12.22)] 2008 International Symposium on Intelligent

discover the web services, which modifies how to divide the subspace. The method divides some web services with the similar or same function into candidate web services sets by the ranks of service quality. The method has been proved in the virtual logistic system. Our further work is dynamic combination of web services and the ontology of KPI. References [1] Kunal Verma, et al. METEOR-S WSDI: A Scalable P2P

Infrastructure of Registries for Semantic Publication and Discovery of Web Serivces. http://lsdis.cs.uga.edu/lib/download/VSS+03-TM06-003-METEOR-E-WSDI.pdf

[2] Jinghai Rao, Xiaomeng Su. A Survey of Automated Web Service Composition Methods. http://www.cs.cmu.edu/ ~jinghai/papers/survey_rao.pdf

[3] Masssimo Paolucci, et al. Semantic Matching of Web Services Capabilities. http://www.cs.cmu.edu / %7Esoftagent/papers/ISWC2002.pdf

[4] Wang W., Yang J. and Muntz.R. (1997), STING: A Statistical Information Grid Approach to Spatial Data Mining, Proceedings of the 23rd VLDB Conference Athens, Greece.

[5] W. Wang, J. Yang, R. Muntz (1999). STING+: an approach to active spatial data mining, Proceedings of 15th International Conference on Data Engineering, pp.116–125.

[6] G. Sheikholeslami, S. Chatterjee, A. Zhang (2000). Wave Cluster: a wavelet-based clustering approach for spatial data in very large data bases, VLDB J. No.8, pp.289–304.

[7] Agrawal R., Gehrke J., Gunopulos D. and Raghavan P. (1998). Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In Proc. of the 1998 ACM-SIGMOD Conf. On the Management of Data, pp.94-105.

[8] Karypis G., Han E. H. and Kumar V. (1999), CHAMELEON: A hierarchical clustering algorithm using dynamic modeling, Computer Vol. 32, No. 8, pp.68-75.

[9] KPI Project Management Group Chaired by Keith Folwell (2004) Construction Industry Key Performance Index.Constructing Excellence,London,UK.

[10] Yinhua Gu, Huiqi Wang, Yaqian Zhang. (2008) Reference Review and the Study of KPI. Neijiang Technology. pp.26-27

[11] Linxin shen, Zhiguo Wang and Jun Zhai (2008). Research on the Key Issue of Logistics System Integration in the Web. The 2008 International Conference on Information Management, Innovation Management and Industrial Engineering, 2008

382