[IEEE 2013 International Conference on Cloud & Ubiquitous Computing & Emerging Technologies (CUBE) - Pune, India (2013.11.15-2013.11.16)] 2013 International Conference on Cloud & Ubiquitous

�

Web Service Discovery based on Semantic Description

S. Naveen kumar1 Dr. P. Pabitha2 A.K. Mansoor Ahamed3

Department of Computer Technology, MIT Campus, Anna University, Chennai 600044, India 1. [email protected]

2. [email protected] 3. [email protected]

Abstract—Web Services plays an important role in several fields such as e-commerce and e-business platforms. The rapid development and need of web services in a short time is because of its interoperability and portability. It supports machine to machine interaction over a network and it also supports machines working with different platforms to be interoperable without any additional process. From the enormous amounts of web services developed and published,finding the most relevant web service according to user’s need is a very crucial one. For very long time, the traditional approach user for discovering the relevant web service is keyword based approach. Some other approaches for discovering the web services are based on semantic based and syntax based. Several web services exist without proper semantic descriptions. Due to this many web services which are most relevant to the user request are left undiscoverable during the web service discovery process. The proposed approach mainly concentrates on the web service matching based on the semantic description of the web services which is registered in the Universal Description Discovery and Integration (UDDI).

Index Terms—Keyword-based approach; Semantic-based approach; Semantic description; UDDI; Web Service; Web Service Discovery

I. INTRODUCTION

A vast amount of web services are developed and published over the network with or without proper explicit description. Because of this issue, a service requestor needs to spend lot of time in finding the appropriate web services based on their needs. Web service is a piece of code that may be created by anybody and once it is exposed over the network, the functionality of that code can be utilized by anyone over the network. Many people access web services for different purposes such as personal, business and especially in military for transferring organizational messages in an encrypted manner. Hence, the selection of such a web service must be done carefully and must be appropriate for the service requestor.

Universal Description Discovery and Integration (UDDI) is a public, distributed, xml based registry, where the descriptions about web services are stored for others to access the functionality of those web services. Simply, registering the web service is called an advertising process. The Service Providers publish the web services in UDDI registry in their own manner. Such diversity imposes difficulty in finding the web services based on its functionality. A promising approach for automated web service discovery is the semantic web technology. Various approaches are there for web service discovery for semantic based web services that have semantic tagged descriptions (e.g. OWL-S, WSDL-S). But there are several limitations in those approaches. They are listed below:

(I) it is impractical to expect all new services to be withsemantic tagged descriptions.

(II) The descriptions of existing web services are only based on WSDL and don’t have any associated sematic meaning.

It is not possible to expect that the service requestor would have knowledge about all the domains. Specifically, the service requestor may not be aware of all the terms related to the service request. As a result of which many services relevant to the request may not be considered in the service discovery process. To simplify the process of finding the appropriate web service, UDDI introduced the predefined categories, under which the web services providing similar functionalities are grouped. For e.g., web services providing e-commerce functionalities are arranged under a predefined category. Therefore, there is a need to categorize web services based on their functional semantics rather than based on the classifications of service providers.

II. SEMANTIC WEB

Web services are able to convert your applications to web applications. A web service is a combination of program module and data. It is a software system designed to support machine to machine interaction over the

2013 International Conference on Cloud & Ubiquitous Computing & Emerging Technologies

978-0-4799-2235-2/13 $26.00 © 2013 IEEE

DOI 10.1109/CUBE.2013.44

200


978-0-4799-2235-2/13 $26.00 © 2013 IEEE

DOI 10.1109/CUBE.2013.44

199


978-0-4799-2235-2/13 $31.00 © 2013 IEEE

DOI 10.1109/CUBE.2013.44

199


978-0-4799-2235-2/13 $31.00 © 2013 IEEE

DOI 10.1109/CUBE.2013.44

199

network. One doesn’t require separate software on client side, but on server side, it requires HTTP server and SOAPserver. WSDL file is enough to publish a web service as a service provider and to invoke a service as a service requestor. It is platform independent, language independent, and hence portable. Web services allow different applications from different sources to communicate with each other without time-consuming custom coding, and because all communication are in XML. The normal World Wide Web is not an intelligent one. For a user request to be processed, several human interruptions are required. For a given term from the user request, it is required to process that term with contextual meaning. For E.g., the web 2.0, doesn’t know that the word ‘heat’ and

‘temperature’ have the same meaning. Web 2.0 just returns the services based on the keyword based matching. Then Tim Berners Lee, the Founder of World Wide Web Consortium (W3C) introduced the semantic web.

Figure 1. Web Service Architecture

The idea behind the semantic web is to make the machine to think over the network. The Semantic Web is a mesh of information linked up in such a way as to be easily process able by machines, on a global scale. It is an efficient way of comprehending data on the World Wide Web, or as a globally linked database. The World Wide Web contains many billions of pages. The existing technology has not yet been able to eliminate all semantically duplicated or replicated terms. Any automated reasoning system will have to deal with truly huge inputs. These are imprecise concepts like “temperature” or “heat”. This arises from the vagueness of user queries, of concepts represented by content providers, of matching query terms to provider terms and of trying to combine different knowledge bases with overlapping but subtly different concepts. These are precise concepts with uncertain values. For example, a patient might present a set of symptoms which correspond to a number of different distinct diagnoses each with a different

probability. Probabilistic reasoning techniques are generally employed to address uncertainty.

III. WEB SERVICE DISCOVERY

In web 2.0, the user request is matched using keyword based matching from the keywords extracted from the user request. But keyword based matching returns bulks ofservices; this is a challenge as the user needs to find exact services according to his request. For addressing these problems, semantic based matching is introduced, in which the semantic information of a web service will be added with the WSDL file of that web service [10]. By using the WSDL file along with the semantic description of the web services, the web services that are more relevant to the user request are found and returned to the service requestor.

Another problem with web service discovery is that, UDDI, which is a web service registry for registering the web service, has a WSDL document of plenty of web services. Among those web services, discovering a web service based on user request is a time consuming process. The access time can be reduced by introducing some efficient clustering and matchmaking algorithms. The clustering algorithm chosen should form the clusters based on functionalities of the web services. For e.g. web servicesthat provide information about weather forecasting are grouped into single cluster [1]. By this we are able to find the domain of the web service according to the user request with a minimal access time.

Figure 2. Web Service Discovery Process

After retrieving the set of web services from the specific domain based on user request, it is sufficient to return the most relevant web services according to the user request. This can be achieved by proposing an efficient matchmaking algorithm, which needs to match the user request and web service parameters to find the most relevant web services to satisfy the service requestor. By using machine learning techniques, we are able to achieve

201200200200

an efficient web service matching based on semantic information and ontological representations.

Machine learning techniques allow for complex computations on vast amounts of data and infer underlying interesting patterns or trends within this data, which can be used for classifying or grouping any future data similar to existing ones. In other words, machines can be trained as to how to react to unknown future data request. The above matching problem can be formulated as a classification problem. Case-based reasoning was found to be better suited for the above operation, as it makes use of fragmented comparison of the given input with the collection of previous inputs.

IV. RELATED WORK

Lamiaa Mostafa et al [9], propose an approach for keyword extraction which has to pass through three main modules called the Loader, Parser, Stopword remover and Stemmer. The initial parser module breaks the larger units of data into smaller units called tokens. The lists of words are detected. The second phase stop-word remover finds and removes the words that have less meaning than the keywords. After removing the stopwords from the list of keywords, the relevant words can be discovered. Stemmer can start working on these words. The third phase stemming reduces the words that are derived to its stem. Then the words with higher frequency which are referred as keywords are determined.

An approach for automatic text document categorization is proposed by Luiza et al [2], in which a classifier is built and the training set is given as the input. For a given training set, data preprocessing is to be performed which involves stopword removal, word stemming according to TF/IDF (Term Frequency/Inverse Document Frequency) values. After preprocessing, the associative based classifier is to be built. The association between the can be identified using Apriori algorithm. Apriori algorithm determines the patterns having very strong association. Using the determined associated patterns, association rule is to be constructed. According to the rule, documents will be classified under their particular class labels.

Zhou et al [8], propose a clustering Data Providing service using a refined Fuzzy C – means algorithm. ADP service vector is assigned to one or multiple clusters with certain degrees. When grouping similar services into one cluster, while partitioning different services into different clusters, the capability of service search engine is improved significantly. ADP services are described base on a domain ontology which includes DO=(C, D, TP, OP, SC, and SP). The clustering of DP services is performed in three steps they are as follows: (1) Represent DP services in terms of vectors, (2) compute the distance between two DP service vectors as required by the fuzzy C – means algorithm, and (3) apply the fuzzy clustering method to group DP service vectors into clusters.

The importance of the polysemous and synonymous nouns in clustering and developing a unique approach that allows us to measure the information gain in disambiguating the nouns in an unsupervised learning setting is proposed by Samah Fodeh et al [5]. The goal of this work is to show a different utility for word disambiguation: feature reduction that both maintains, and even improves, clustering and identification of a theme of a document based on the features identified.

Stefan Dietze et al. [4] proposed a semantic matching algorithm for matching the web services based on the user request, by improving the semantic distance between the terms. Semantic distance based algorithms considers only four matching degrees ranked as: 1.exact, 2.plugin, 3.subsumes, and 4.fail. They do not consider other parameters such as binary relation, similar relation, and false positives. The improved matching algorithm based on improved semantic distance overcomes several drawbacks such as binary relation, similar relation, etc. The improved matching algorithm that calculates semantic distance is based on assigning weight for relationship between each concept. The improved algorithm is based on four relations as follows 1.Generalization, 2.Specialization, 3.Binary relation and 4.Similar relation.

Hui Guo et al, presented a novel technique that significantly improves the quality of web service matching by (1) Automatically generating ontologies based on Web service description and (2) Using these ontologies to guide the mapping between the Web services. This approach differs from earlier approaches of service matching by considering the relationship between the words rather than treating them as a bag of unrelated words [6]. The main idea proposed in this ontology-learning approach is to capture the relationship between the words contained in a tag, and match tags if both words are similar (Dictionary-based approach) and the relationships are equivalent. The first in the ontology-learning approach is capturing the relationships between the words in a tag and save them in ontology [6]. The next step is finding the tag matches given the matches between concepts at the ontology level.

S. Colucci et al, proposed an approach for semantic matching of Web services based on Description Logic [14].This approach proposed a match categorization in terms of exact match, potential match – when request and offer though not identical are compatible – and partial match –

when one or more inconsistency is present – and rank of matches within categories. A matchmaking infrastructure should receive and store advertisement descriptions both demanders and suppliers, and, as new demands or supplies are submitted dynamically, find the most satisfying matches and return them. Knowledge representation, in particular, Description Logics (DL) – can deal with a uniform treatment of knowledge from suppliers and demanders, by modeling both as generic concepts to be matched.

Aviv segev et al [11], proposed the ontology bootstrapping process for web services. Ontological

202201201201

bootstrapping which aims at automatically generating concepts and their relations in a given domain, a promising technique for ontology construction. The proposed technique uses two methods: Term Frequency/Inverse document frequency (TF/IDF) and web content extraction. This proposed bootstrapping process integrates the results of both methods and applies a third method to validate the concepts using the service free text descriptor, thereby offering more accurate definitions of ontologies. TF/IDF is a common mechanism in IR for generating a robust set of representative keywords from a corpus of documents.

Micheal C. Jeager et al [3], propose an algorithm, which ranks the matching degree of service descriptions according to OWL-S. Different matching degrees are achieved based on the contra variance of the input and output types for requested and advertised services. This approach includes user-customizable modules to enhance the matching algorithm, which is decided in favor of a client-sided scenario. From the three basic elements of OWL-S, they will concentrate on service profile. Here matching algorithm is divided into four stages: (a) the matching of inputs, (b) the matching of outputs, and (c) the matching of the service category. The algorithm determines the matching for each of the stages individually. The results are aggregated with a fourth stage (d), where user-defined constraints or functionality can complete the matching result. The simplest approach is to add the ranks of the first three stages and complete the sum with the result of the user-defined module(s).

Yanan Hao et al [12], propose a novel web services discovery strategy given a textual description of services, which is a new schema matching algorithm for supporting web-service operations matching. The matching algorithm catches not only structures, but also semantic information of schemas. The key part of this algorithm is a schema tree matching algorithm, which employs a new cost model to compute tree edit distances. An XML schema can be modeled as a tree of labeled nodes. Tree edit distance is one of the efficient approaches to describe difference between two trees. Generally, the tree edit distance operations include: (a) node removal, (b) node insertion, and (c) node relabeling. Such a set of operations can be represented by a mapping with minimum cost between the two trees. Measuring similarity between two XML schema trees equals to finding a mapping with minimum cost. So, the cost of each edit operation involved in the mapping needs to be computed first.

Marco Luca Sbodio et al, propose a web service discovery mechanism that utilizes SPARQL as the formal query language used for the description and matching of pre and post conditions of a web service along with its goals and the goals of the service requestor agents [7]. The Query may be evaluated to obtain the truthfulness of the pre-conditions by analyzing the post-conditions that arises on execution of the web service. Thereby, determining if the given web service matches the user’s goal and thus is

discovered for the user agents. They also discuss optimization of the above tasks to efficiently utilize the available resources.

V. PROPOSED SYSTEM

The main objective of the proposed framework is to enhance the web service matching to discover relevant web service, which is requested by the user. This can be achieved by using the semantic description of the web services. The semantic information is extracted from the service description document, which is registered in the UDDI registry. Instead of web service discovery based on keyword matching, which doesn’t provide exact web

services requested by the user, the significance of the proposed idea is to categorize the UDDI registry based on the service functionalities in offline mode [1], which reduces the overhead in matchmaking process. Also, the system proposes to improve the matchmaking process of web services discovery, by allowing the user to be given some suggestions, whenever the domain is unidentifiable. By using the suggestions provided at hand, the match-making algorithm will be able to return at least partially matched web service to satisfy the service requestor.

Figure 3. Overview of the Proposed Approach

The proposed system mainly concentrates on the web service matching based on the semantic description of the web services which is registered in the UDDI registry. In this, web services over the network are categorized based on their functionalities using their own semantic descriptions.The categorization of web services based on semantic description is performed offline at the UDDI. The better matching with relevant web services are achieved by enhancing the service request. This service request

203202202202

enhancement has been achieved with the help of ontological descriptions.

As the initialization process the user request for accessing the web service is received via user interface and then the terms required for matching the web services through the semantic description are extracted from the user request. By using the service description document of each web services, the term parser is used to extract the termsfrom those files and from ontology database for web service matchmaking. By using these extracted terms, clustering is performed for further categorization. The registry which is used for advertising the web services created by the service providers is categorized based on the functionalities of the web services by using clustering algorithm. Then the matchmaking algorithm matches the term from the user request with the ontologies attached with each clusters, and then the semantic ranking is performed between the terms, so that highly ranked web services will be returned to the user. This conceptual framework, which is yet to be implemented, proposes to achieve the said criteria by applying a machine learning technique [13], named case-based learning, which gives new solutions based on the solutions of previously handled problems.

VI. REFERENCES

[1] Aabhas V. Paliwal, Basit Shafiq, Jaideep Vaidya, ”Semantics-Based Automated Service Discovery”, IEEE Transactions On Services

Computing, Vol. 5, No. 2, pp. 260-275, 2012 [2] M.-L. Antonie and O.R. Zaane, “Text Document Categorization by

Term Association,” Proc. IEEE International Conf. Data Mining (ICDM ’02), pp. 19-26, 2002

[3] Jaeger, Michael C., Gregor Rojec-Goldmann, Christoph Liebetruth, Gero Mühl, and Kurt Geihs, “Ranked matching for service descriptions using owl-s”, In Kommunikation in Verteilten Systemen (KiVS), pp.91-102, Springer Berlin Heidelberg, 2005.

[4] Dietze, Stefan, Alessio Gugliotta, and John Domingue, “Exploiting metrics for similarity-based semantic web service discovery”, in Proc. of IEEE International Conference on Web Services, ICWS, pp.327-334,2009.

[5] Samah Fodeh, Bill Punch, “On ontology-driven document clustering using core semantic features”, Springer-Verlag London Limited, 2011.

[6] Hui Guo, Anca Ivan, Rama Akkiraju, Richard, “Learning Ontologies to

Improve the Quality of Automatic Web Service Matching” IEEE

International Conference on Web Services, 2007. [7] Sbodio, Marco Luca, David Martin, and Claude Moulin, “Discovering

Semantic Web services using SPARQL and intelligent agents”, Web Semantics: Science, Services and Agents on the World Wide Web,Vol.8, No.4, pp.310-328, 2010.

[8] Zhou, Sellami, Walid, “Clustering and Managing Data Providing

Services Using Machine Learning Technique” 7th International

Conference on Semantics, Knowledge and Grids, 2011. [9] Lamiaa Mostafa, “Web Page Keyword Extraction Using Term

Extraction” International Journal of Computer Theory and Engineering,

Vol. 5, No. 1, February 2013. [10] Junghans, M., Agarwal, S., and Studer, R., “Towards practical semantic

web service discovery”, The Semantic Web: Research and Applications, Springer Berlin Heidelberg, pp.15-29, 2010.

[11] Segev, Aviv, and Quan Z. Sheng, “Bootstrapping ontologies for web

services”,." IEEE Transactions on Services Computing, Vol.5, No.1,

pp.33-44, 2012.

[12] Hao, Yanan, and Yanchun Zhang, “Web services discovery based on

schema matching”, In Proc. of the Thirtieth Australasian conference on

Computer Science, Vol. 62, pp.107-113, Australian Computer Society, Inc., 2007.

[13] Feifan Liu, Deana Pennell, “Unsupervised Approaches for Automatic

Keyword Extraction Using Meeting Transcripts”, Annual Conference of the North American Chapter of the ACL, pages 620–628, Boulder, Colorado, June 2009.

[14] Colucci, Simona, Tommaso Di Noia, Eugenio Di Sciascio, F. Donini, and Marina Mongiello, “Logic Based Approach to web services discovery and matchmaking”, In Proc. of the E-Services Workshop at ICEC’03, 2003.

204203203203

Documents

[IEEE 2013 International Conference on Cloud & Ubiquitous Computing & Emerging Technologies (CUBE) - Pune, India (2013.11.15-2013.11.16)] 2013 International Conference on Cloud & Ubiquitous