30
1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing 1 Designing High Performance Web-Based Computing Services to Promote Telemedicine Database Management System Ismail Hababeh 1 , Issa Khalil 2 , and Abdallah Khreishah 3 1: Computer Engineering & Information Technology, German-Jordanian University, Jordan, [email protected] 2: Qatar Computing Research Institute (QCRI), Qatar Foundation, Doha, Qatar, [email protected] 3: Department of Electrical & Computer Engineering, New Jersey Institute of Technology, USA, [email protected] Corresponding Author: Issa Khalil, Email: [email protected] Abstract Many web computing systems are running real time database services where their information change continuously and expand incrementally. In this context, web data services have a major role and draw significant improvements in monitoring and controlling the information truthfulness and data propagation. Currently, web telemedicine database services are of central importance to distributed systems. However, the increasing complexity and the rapid growth of the real world healthcare challenging applications make it hard to induce the database administrative staff. In this paper, we build an integrated web data services that satisfy fast response time for large scale Tele-health database management systems. Our focus will be on database management with application scenarios in dynamic telemedicine systems to increase care admissions and decrease care difficulties such as distance, travel, and time limitations. We propose three- fold approach based on data fragmentation, database web sites clustering and intelligent data distribution. This approach reduces the amount of data migrated between web sites during applications’ execution; achieves cost-effective communications during applications’ processing and improves applications’ response time and throughput. The proposed approach is validated internally by measuring the impact of using our computing services’ techniques on various performance features like communications cost, response time, and throughput. The external validation is achieved by comparing the performance of our approach to that of other techniques in the literature. The results show that our integrated approach significantly improves the performance of web database systems and outperforms its counterparts. Keywords: Web Telemedicine Database Systems (WTDS); database fragmentation; data distribution; sites clustering.

Designing High Performance Web-Based Computing … IEEE PAPERS/android... · Designing High Performance Web-Based Computing Services to Promote Telemedicine Database Management System

Embed Size (px)

Citation preview

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

1

Designing High Performance Web-Based Computing Services to Promote Telemedicine Database Management System

Ismail Hababeh1, Issa Khalil2, and Abdallah Khreishah3

1: Computer Engineering & Information Technology, German-Jordanian University, Jordan, [email protected]

2: Qatar Computing Research Institute (QCRI), Qatar Foundation, Doha, Qatar, [email protected]

3: Department of Electrical & Computer Engineering, New Jersey Institute of Technology, USA, [email protected]

Corresponding Author: Issa Khalil, Email: [email protected]

Abstract

Many web computing systems are running real time database services where their information change continuously and

expand incrementally. In this context, web data services have a major role and draw significant improvements in

monitoring and controlling the information truthfulness and data propagation. Currently, web telemedicine database

services are of central importance to distributed systems. However, the increasing complexity and the rapid growth of the

real world healthcare challenging applications make it hard to induce the database administrative staff. In this paper, we

build an integrated web data services that satisfy fast response time for large scale Tele-health database management

systems. Our focus will be on database management with application scenarios in dynamic telemedicine systems to

increase care admissions and decrease care difficulties such as distance, travel, and time limitations. We propose three-

fold approach based on data fragmentation, database web sites clustering and intelligent data distribution. This approach

reduces the amount of data migrated between web sites during applications’ execution; achieves cost-effective

communications during applications’ processing and improves applications’ response time and throughput. The proposed

approach is validated internally by measuring the impact of using our computing services’ techniques on various

performance features like communications cost, response time, and throughput. The external validation is achieved by

comparing the performance of our approach to that of other techniques in the literature. The results show that our

integrated approach significantly improves the performance of web database systems and outperforms its counterparts.

Keywords: Web Telemedicine Database Systems (WTDS); database fragmentation; data distribution; sites clustering.

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

2

1. Introduction

The rapid growth and continuous change of the real world software applications have provoked researchers to propose

several computing services’ techniques to achieve more efficient and effective management of web telemedicine

database systems (WTDS). Significant research progress has been made in the past few years to improve WTDS

performance. In particular, databases as a critical component of these systems have attracted many researchers. The web

plays an important role in enabling healthcare services like telemedicine to serve inaccessible areas where there are few

medical resources. It offers an easy and global access to patients’ data without having to interact with them in person and

it provides fast channels to consult specialists in emergency situations. Different kinds of patient’s information such as

ECG, temperature, and heart rate need to be accessed by means of various client devices in heterogeneous

communications environments. WTDS enable high quality continuous delivery of patient’s information wherever and

whenever needed. Several benefits can be achieved by using web telemedicine services including: medical consultation

delivery, transportation cost savings, data storage savings, and mobile applications support that overcome obstacles

related to the performance (e.g. bandwidth, battery life, and storage), security (e.g. privacy, and reliability), and

environment (e.g. scalability, heterogeneity, and availability). The objectives of such services are to: (i) develop large

applications that scale as the scope and workload increases, (ii) achieve precise control and monitoring on medical data to

generate high telemedicine database system performance, (iii) provide large data archive of medical data records,

accurate decision support systems, and trusted event-based notifications in typical clinical centers.

Recently, many researchers have focused on designing web medical database management systems that satisfy certain

performance levels. Such performance is evaluated by measuring the amount of relevant and irrelevant data accessed

and the amount of transferred medical data during transactions’ processing time. Several techniques have been proposed

in order to improve telemedicine database performance, optimize medical data distribution, and control medical data

proliferation. These techniques believed that high performance for such systems can be achieved by improving at least

one of the database web management services, namelydatabase fragmentation, data distribution, web sites clustering,

distributed caching, and database scalability. However, the intractable time complexity of processing large number of

medical transactions and managing huge number of communications make the design of such methods a non-trivial task.

Moreover, none of the existing methods consider the three-fold services together which makes them impracticable in the

field of web database systems. Additionally, using multiple medical services from different web database providers may

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

3

not fit the needs for improving the telemedicine database system performance. Furthermore, the services from different

web database providers may not be compatible or in some cases it may increase the processing time because of the

constraints on the network [1]. Finally, there has been lack in the tools that support the design, analysis and cost-effective

deployments of web telemedicine database systems [1].

Designing and developing fast, efficient, and reliable incorporated techniques that can handle huge number of medical

transactions on large number of web healthcare sites in near optimal polynomial time are key challenges in the area of

WTDS. Data fragmentation, web sites clustering, and data allocation are the main components of the WTDS that continue

to create great research challenges as their current best near optimal solutions are all NP-Complete.

To improve the performance of medical distributed database systems, we incorporate data fragmentation, web sites

clustering, and data distribution computing services together in a new web telemedicine database system approach. This

new approach intends to decrease data communication, increase system throughput, reliability, and data availability.

The decomposition of web telemedicine database relations into disjoint fragments allows database transactions to be

executed concurrently and hence minimizes the total response time. Fragmentation typically increases the level of

concurrency and, therefore, the system throughput. The benefits of generating telemedicine disjoint fragments cannot be

deemed unless distributing these fragments over the web sites, so that they reduce communication cost of database

transactions. Database disjoint fragments are initially distributed over logical clusters (a group of web sites that satisfy a

certain physical property, e.g. communications cost). Distributing database disjoint fragments to clusters where a benefit

allocation is achieved, rather than allocating the fragments to all web sites, have an important impact on database system

throughput. This type of distribution reduces the number of communications required for query processing in terms of

retrieval and update transactions; it has always a significant impact on the web telemedicine database system

performance. Moreover, distributing disjoint fragments among the web sites where it is needed most, improves database

system performance by minimizing the data transferred and accessed during the execution time, reducing the storage

overheads, and increasing availability and reliability as multiple copies of the same data are allocated.

Database partitioning techniques aim at improving database systems throughput by reducing the amount of irrelevant

data packets (fragments) to be accessed and transferred among different web sites. However, data fragmentation raises

some difficulties; particularly when web telemedicine database applications have contradictory requirements that avert

breakdown of the relation into mutually exclusive fragments. Those applications whose views are defined on more than

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

4

one fragment may suffer performance ruin. In this case, it might be necessary to retrieve data from two or more

fragments and take their join, which is costly [31]. Data fragmentation technique describes how each fragment is derived

from the database global relations. Three main classes of data fragmentation have been discussed in the literature;

horizontal [2][3], vertical [4][5], and hybrid [6][7]. Although there are various schemes describing data partitioning, few

are known for the efficiency of their algorithms and the validity of their results [33].

The Clustering technique identifies groups of network sites in large web database systems and discovers better data

distributions among them. This technique is considered to be an efficient method that has a major role in reducing the

amount of transferred and accessed data during processing database transactions. Accordingly, clustering techniques help

in eliminating the extra communications costs between web sites and thus enhances distributed database systems

performance [32]. However, the assumptions on the web communications and the restrictions on the number of network

sites, make clustering solutions impractical [16][31]. Moreover, some constraints about network connectivity and

transactions processing time bound the applicability of the proposed solutions to small number of clusters [9][10].

Data distribution describes the way of allocating the disjoint fragments among the web clusters and their respective sites

of the database system. This process addresses the assignment of each data fragment to the distributed database web

sites [8][17][18][21]. Data distribution related techniques aim at improving distributed database systems performance.

This can be accomplished by reducing the number of database fragments that are transferred and accessed during the

execution time. Additionally, Data distribution techniques attempt to increase data availability, elevate database

reliability, and reduce storage overhead [11][27]. However, the restrictions on database retrieval and update frequencies

in some data allocation methods may negatively affect the fragments distribution over the web sites [20].

In this work, we address the previous drawbacks and propose a three-fold approach that manages the computing web

services that are required to promote telemedicine database system performance. The main contributions are:

Develop a fragmentation computing service technique by splitting telemedicine database relations into small disjoint

fragments. This technique generates the minimum number of disjoint fragments that would be allocated to the web

servers in the data distribution phase. This in turn reduces the data transferred and accessed through different web

sites and accordingly reduces the communications cost.

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

5

Introduce a high speed clustering service technique that groups the web telemedicine database sites into sets of

clusters according to their communications cost. This helps in grouping the web sites that are more suitable to be in

one cluster to minimize data allocation operations, which in turn helps to avoid allocating redundant data.

Propose a new computing service technique for telemedicine data allocation and redistribution services based on

transactions’ processing cost functions. These functions guarantee the minimum communications cost among web

sites and hence accomplish better data distribution compared to allocating data to all web sites evenly.

Develop a user-friendly experimental tool to perform services of telemedicine data fragmentation, web sites

clustering, and fragments allocation, as well as assist database administrators in measuring WTDS performance.

Integrate telemedicine database fragmentation, web sites clustering, and data fragments allocation into one scenario

to accomplish ultimate web telemedicine system throughput in terms of concurrency, reliability, and data availability.

We call this scenario Integrated-Fragmentation-Clustering- Allocation (IFCA) approach. Figure 1 depicts the

architecture of the proposed telemedicine IFCA approach.

Figure 1: IFCA Computing Services Architecture

In Figure 1, the data request is initiated from the telemedicine database system sites. The requested data is defined as

SQL queries that are executed on the database relations to generate data set records. Some of these data records may be

overlapped or even redundant, which increase the I/O transactions’ processing time and so the system communications

overhead. To solve this problem, we execute the proposed fragmentation technique which generates telemedicine

disjoint fragments that represent the minimum number of data records. The web telemedicine database sites are

grouped into clusters by using our clustering service technique in a phase prior to data allocation. The purpose of this

clustering is to reduce the communications cost needed for data allocation. Accordingly, the proposed allocation service

Web Telemedicine Database System Administrator

Phase 6: Executing Allocation Technique on Clusters &

Allocate Fragments to Clusters

Phase 4: Executing

Fragmentation Technique & Generating

Disjoint Fragments

DSR1DSR2

QR3….QR2

QR1

DSR3DSR4…

DF1 DF2 DF3….

WS1

WS 2

WS3…..

Phase 2: Defining Queries

Phase 5: Executing Clustering Technique &

Generating Clusters

Phase 1: Requesting Data for Processing

Phase 7: Executing Allocation Technique on Sites & Allocate Fragments to Sites

Phase 3: Executing Queries &

Generating Data Set of

Records

Web Telemedicine Database Sites

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

6

technique is applied to allocate the generated disjoint fragments at the clusters that show positive benefit allocation.

Then the fragments are allocated to the sites within the selected clusters. Database administrator is responsible for

recovering any site failure in the WTDS.

The remainder of the paper is organized as follows. Section 22 summarizes the related work. Basic concepts of the web

telemedicine database settings and assumptions are discussed in Section 3. Telemedicine computation services and

estimation model are discussed in Section 4. Experimental results and performance evaluation are presented in

Section 35. Finally, in Section 6, we draw conclusions and outline the future work.

2. Related Work

Many research works have attempted to improve the performance of distributed database systems. These works have

mostly investigated fragmentation, allocation and sometimes clustering problems. In this section, we present the main

contributions related to these problems, discuss and compare their contributions with our proposed solutions.

2.1. Data Fragmentation

With respect to fragmentation, the unit of data distribution is a vital issue. A relation is not appropriate for distribution as

application views are usually subsets of relations *31+. Therefore, the locality of applications’ accesses is defined on the

derivative relations subsets. Hence it is important to divide the relation into smaller data fragments and consider it for

distribution over the network sites. The authors in [8] considered each record in each database relation as a disjoint

fragment that is subject for allocation in a distributed database sites. However, large number of database fragments is

generated in this method, causing a high communication cost for transmitting and processing the fragments. In contrast

to this approach, the authors in [11] considered the whole relation as a fragment, not all the records of the fragment have

to be retrieved or updated, and a selectivity matrix that indicates the percentage of accessing a fragment by a transaction

is proposed. However, this research suffers from data redundancy and fragments overlapping.

2.2. Clustering Web Sites

Clustering service technique identifies groups of networking sites and discovers interesting distributions among large web

database systems. This technique is considered as an efficient method that has a major role in reducing transferred and

accessed data during transactions processing [9]. Moreover, grouping distributed network sites into clusters helps to

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

7

eliminate the extra communication costs between the sites and then enhances the distributed database system

performance by minimizing the communication costs required for processing the transactions at run time.

In a web database system environment where the number of sites has expanded tremendously and amount of data has

increased enormously, the sites are required to manage these data and should allow data transparency to the users of the

database. Moreover, to have a reliable database system, the transactions should be executed very fast in a flexible load

balancing database environment. When the number of sites in a web database system increases to a large scale, the

problem of supporting high system performance with consistency and availability constraints becomes crucial. Different

techniques could be developed for this purpose; one of them is web sites clustering.

Grouping web sites into clusters reduces communications cost and then enhances the performance of the web database

system. However, clustering network sites is still an open problem and the optimal solution to this problem is NP-

Complete [12]. Moreover, in case of a complex network where large numbers of sites are connected to each other, a huge

number of communications are required, which increases the system load and degrades its performance.

The authors in [13] have proposed a hierarchical clustering algorithm that uses similarity upper approximation derived

from a tolerance(similarity) relation and based on rough set theory that does not require any prior information about the

data. The presented approach results in rough clusters in which an object is a member of more than one cluster. Rough

clustering can help researchers to discover multiple needs and interests in a session by looking at the multiple clusters

that a session belongs to. However, in order to carry out rough clustering, two additional requirements, namely, an

ordered value set of each attribute and a distance measure for clustering need to be specified [14]. Clustering coefficients

are needed in many approaches in order to quantify the structural network properties. In [15], the authors proposed

higher order clustering coefficients defined as probabilities that determine the shortest distance between any two

nearest neighbors of a certain node when neglecting all paths crossing this node. The outcomes of this method declare

that the average shortest distance in the node’s neighborhood is smaller than all network distances. However,

independent constant values and natural logarithm function are used in the shortest distance approximation function to

determine the clustering mechanism, which results in generating small number of clusters.

2.3. Data Allocation (Distribution)

Data allocation describes the way of distributing the database fragments among the clusters and their respective sites in

distributed database systems. This process addresses the assignment of network node(s) to each fragment [8]. However,

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

8

finding an optimal data allocation is NP-complete problem [12]. Distributing data fragments among database web sites

improves database system performance by minimizing the data transferred and accessed during execution, reducing the

storage overhead, and increasing availability and reliability where multiple copies of the same data are allocated.

Many data allocation algorithms are described in the literature. The efficiency of these algorithms is measured in term of

response time. Authors in [19] proposed an approach that handles the full replication of data allocation in database

systems. In this approach, a database file is fully copied to all participating nodes through the master node. This approach

distributes the sequences through fragments with a round-robin strategy for sequence input set already ordered by size,

where the number of sequences is about the same and number of characters at each fragment is similar. However, this

replicated schema does not achieve any performance gain when increasing the number of nodes. When a non-previously

determined number of input sequences are present, the replication model may not be the best solution and other

fragmentation strategies have to be considered. In [20], the author has addressed the fragment allocation problem in web

database systems. He presented an integer programming formulations for the non-redundant version of the fragment

allocation problem. This formulation is extended to address problems, which have both storage and processing capacity

constraints. In this method, the constraints essentially state that there has been exactly one copy of a fragment across all

sites, which increase the risk of data inconsistency and unavailability in case of any site failure. However, the fragment

size is not addressed while the storage capacity constraint is one of the major objectives of this approach. In addition, the

retrieval and update frequencies are not considered in the formulations, they are assumed to be the same, which affects

the fragments distribution over the sites. Moreover, this research is limited by the fact that none of the approaches

presented have been implemented and tested on a real web database system.

A dynamic method for data fragmentation, allocation, and replication is proposed in [25]. The objective of this approach is

to minimize the cost of access, re-fragmentation, and reallocation. DYFRAM algorithm of this method examines accesses

for each replica and evaluates possible re-fragmentations and reallocations based on recent history. The algorithm runs at

given intervals, individually for each replica. However, data consistency and concurrency control are not considered in

DYFRAM. Additionally, DYFRAM doesn’t guarantee data availability and system reliability when all sites have negative

utility values. In [28], the authors present a horizontal fragmentation technique that is capable of taking a fragmentation

decision at the initial stage, and then allocates the fragments among the sites of DDBMS. A modified matrix MCRUD is

constructed by placing predicates of attributes of a relation in rows and applications of the sites of a DDBMS in columns.

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

9

Attribute locality precedence ALP; the value of importance of an attribute with respect to sites of distributed database is

generated as a table from MCRUD. However, when all attributes have the same locality precedence, the same fragment

has to be allocated in all sites, and a huge data redundancy occurs. Moreover, the initial values of frequencies and weights

don’t reflect the actual ones in real systems, and this may affect the number of fragments and their allocation accordingly.

The authors in [29] presented a method for modeling the distributed database fragmentation by using UML 2.0 to

improve applications performance. This method is based on a probability distribution function where the execution

frequency of a transaction is estimated mainly by the most likely time. However, the most likely time is not determined to

distinguish the priorities between transactions. Furthermore, no performance evaluations are performed and no

significant results are generated from this method. A database tool shown in [30] addresses the problem of designing

DDBs in the context of the relational data model. Conceptual design, fragmentation issues, as well as the allocation

problem are considered based on other methods in the literature. However, this tool doesn’t consider the local

optimization of fragment allocation problem over the distributed network sites. In addition, many design parameters

need to be estimated and entered by designers where different results may be generated for the same application case.

Our fragmentation approach circumvents the problems associated with the aforementioned studies by introducing a

computing service technique that generates disjoint fragments, avoids data redundancy and considers that all records of

the fragment are retrieved or updated by a transaction. By such a technique, less communication costs are required to

transmit and process the disjoint database fragments. In addition, by applying the clustering service technique, the

complexity of allocation problem is significantly reduced. Therefore, the intractable solution of fragment allocation is

turned out into fragment distribution among the clusters and then replicating it among the related sites.

2.4. Commercial Outsource Databases

This section investigates current commercial outsource databases and compares them with our IFCA. The commercial

outsource databases support enormous data storage architecture that distributes storage and processing across several

servers. It can be used to address web database system performance and scalability requirements.

Amazon Dynamo [34] is a cloud distributed database system with high available and scalable capabilities. In contrast to

traditional relational distributed database management systems, Dynamo only supports database applications with

simple read and write queries on data records. However, in Dynamo availability is more important than consistency.

Accordingly, in certain times, data consistency can’t be guaranteed. MongoDB [35] is an open source, document oriented

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

10

cloud distributed database developed by the 10gen software company. MongoDB is highly available and scalable system

that supports search by fields in the documents. However, MongoDB uses a special query optimizer to make queries more

efficient, and executes different query plans concurrently, which increases the search time complexity.

Google BigTable [36] is a cloud distributed database system built on Google file system and a highly available and

persistent distributed lock service. The BigTable data model is designed as a distributed multidimensional (row key,

column key, timestamps) sorted map, and implemented on three parts: tablet servers, client library, and master server.

However, as the failure of the master server results in the failure of the whole database system, and thus another server

usually backs up the master server. This incurs additional costs to the cloud distributed database system in addition to

the costs of operating and maintaining the new servers.

Due to their contrast in priorities, architecture, data consistency, and search time complexity compared to our IFCA, the

previous outsource databases can’t optimize and secure telemedicine data in terms of data fragmentation, clustering

web sites, and data distribution altogether as our approach successfully does. Table 1 compares our approach with

outsource approaches in terms of integrity, reliability, communications overhead, manageability, data consistency,

security, and performance evaluation.

Table 1. Comparison between existing methods in the literature and the proposed approach

Method #

Integrity Reliability Communication overhead

Manageability Data consistency Security Performance evaluation

8, 11 No, majority doesn’t apply clustering and allocation techniques

Poor due to fragments redundancy and Restrictions on the final # of fragments

High traffic to multiple sites, and some delays are tolerated

Difficult due to fragments overlapping and redundancy

Low due to outcomes of the fragmentation

Secured Intractable complexity are barrier to performance

13,15 No, majority doesn’t apply fragmentation and allocation techniques

High because each site is considered in multiple clusters

High due to site duplication in different clusters

Not easy as the # of sites increase and to find the shortest distance algorithms

Inconsistencies are not tolerated

Secured Poor due to redundant data in multiple sites

19,20,25,28,29, 30

No, majority doesn’t apply fragmentation and clustering techniques

Poor due to fault tolerant risk

Low due to the small # of data allocations

Easy for each site, until there is a need to share data across sites

Presents data consistency for each site, but not across sites

Secured No Guarantees of high performance as # of sites become intractable

34, 35, 36

No, majority have their own strategy for data allocation, but doesn’t apply fragmentation and clustering techniques

High due to the control of cloud services provider

Low as the cost is per use in a unit of time

Easy since it is supported by the cloud service provider

High as it is the responsibility of the cloud service provider

Not secured since it is controlled by cloud services provider

Acceptable level of performance due to more processing required for securing data

IFCA Yes, full integrated fragmentation, clustering, and allocations

High since data available at most benefit sites

Low due to place data where the least communication cost is hold

Easy because the complete data life cycle can be controlled by the provided techniques

High due to the allocation technique in this approach that keep data consistency

Secured High as the three-fold techniques guarantees minimum data records allocated to sites with least comm. costs

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

11

3. Telemedicine IFCA Assumptions and Definitions

Incorporating database fragmentation, web database sites’ clustering, and data fragments computing services’ allocation

techniques in one scenario distinguishes our approach from other approaches. The functionality of such approach

depends on the settings, assumptions, and definitions that identify the WTDS implementation environment, to guarantee

its efficiency and continuity. Below are the description of the IFCA settings, assumptions, and definitions.

3.1. Web Architecture and Communications Assumptions

The telemedicine IFCA approach is designed to support web database provider with computing services that can be

implemented over multiple servers, where the data storage, communication and processing transactions are fully

controlled, costs of communication are symmetric, and the patients’ information privacy and security are met. We

propose fully connected sites on a web telemedicine heterogeneous network system with different bandwidths; 128 kbps,

512 kbps, or multiples. In this environment, some servers are used to execute the telemedicine queries triggered from

different web database sites. Few servers are run the database programs and perform the fragmentation-clustering-

allocation computing services while the other servers are used to store the database fragments. Communications cost

(ms/byte) is the cost of loading and processing data fragments between any two sites in WTDS. To control and simplify

the proposed web telemedicine communication system, we assume that communication costs between sites are

symmetric and proportional to the distance between them. Communication costs within the same site are neglected.

3.2. Fragmentation and Clustering Assumptions

Telemedicine queries are triggered from web servers as transactions to determine the specific information that should be

extracted from the database. Transactions include but not limited to: read, write, update, and delete. To control the

process of database fragmentation and to achieve data consistency in the telemedicine database system, IFCA

fragmentation service technique partitions each database relation according to the Inclusion-Integration-Disjoint

assumptions where the generated fragments must contain all records in the database relations, the original relation

should be able to be formed from its fragments, and the fragments should be neither repeated nor intersected. The

logical clustering decision is defined as a Logical value that specifies whether a web site is included or excluded from a

certain cluster, based on the communications cost range. The communications cost range is defined as a value (ms/byte)

that specifies how much time is allowed for the web sites to transmit or receive their data to be considered in the same

cluster, this value is determined by the telemedicine database administrator.

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

12

3.3. Fragments Allocation Assumptions

The allocation decision value ADV is defined as a logical value (1, 0) that determines the fragment allocation status for a

specific cluster. The fragments that achieve allocation decision value of (1) are considered for allocation and replication

process. The advantage that can be generated from this assumption is that, more communications costs are saved due to

the fact that the fragments’ locations are in the same place where it is processed, hence improve the WTDS performance.

On the other hand, the fragments that carry out allocation decision value of (0) are considered for allocation process only

in order to ensure data availability and fault-tolerant in the WTDS. In this case, each fragment should be allocated to at

least one cluster and one site in this cluster. The allocation decision value ADV is assumed to be computed as the result of

the comparison between the cost of allocating the fragment to the cluster and the cost of not allocating the fragment to

the same cluster. The allocation cost function is composed of the following sub-cost functions that are required to

perform the fragment transactions locally: cost of local retrieval, cost of local update to maintain consistency among all

the fragments distributed over the web sites, and cost of storage, or cost of remote update and remote communications

(for remote clusters that do not have the fragment and still need to perform the required transactions on that fragment).

The not allocation cost function consists of the following sub-cost functions: cost of local retrieval and cost of remote

retrievals required to perform the fragment transactions remotely when the fragment is not allocated to the cluster.

4. Telemedicine IFCA Computation Services and Estimation Model

In the following subsections, we present our IFCA and provide mathematical models of its computations’ services.

4.1. Fragmentation Computing Service

To control the process of database fragmentation and maintain data consistency, the fragmentation technique partitions

each database relation into data set records that guarantee data inclusion, integration and non-overlapping. In a WTDS,

neither complete relation nor attributes are suitable data units for distribution, especially when considering very large

data. Therefore, it is appropriate to use data fragments that would be allocated to the WTDS sites. Data fragmentation is

based on the data records generated by executing the telemedicine SQL queries on the database relations. The

fragmentation process goes through two consecutive internal processes: (i) Overlapped and redundant data records

fragmentation and (ii) Non-overlapped data records fragmentation.

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

13

The fragmentation service generates disjoint fragments that represent the minimum number of data records to be

distributed over the web sites by the data allocation service. The proposed fragmentation Service architecture is

described through Input-Processing-Output phases depicted in Figure 2. Based on this fragmentation service, the global

database is partitioned into disjoint fragments. The overlapped and redundant data records fragmentation process is

described in Table 2. In this algorithm, database fragmentation starts with any two random data fragments (i,j) with

intersection records and proceeds as follows: If intersection exists between the two fragments, three disjoint fragments

will be generated: Fk= Fi ∩ Fj, Fk+1 = Fi –Fk, and Fk+2 = Fj- F. The first contains the common records, the second contains the

records in the first fragment but not in the second fragment, and the third contains the records in the second fragment

but not in the first. The intersecting fragments are then removed from the fragments list. This process continues until no

more intersecting fragments or data set records still exist for this telemedicine relation, and so for the other relations.

Figure 2: Data Fragmentation Service Architecture

Table 2: Overlapped and redundant data records fragmentation Algorithm

Step 1: Set 1 to I; K = F.size()

Step 2: Do steps (3-18) until I >F.size()

Step 3: Set 1 to J

Step 4: Do steps (5-16) until J > F.size()

Step 5: If I J and Fi , Fj Є F go to (6)

Else, Add 1 to J and go to step (15)

Step 6: If Fi ∩ Fj≠ Ø do steps (7-14)

Else, Add 1 to J and go to step (14)

Step 7: Add 1 to K

Step 8: Create new fragment

Fk= Fi ∩ Fjand add it to F

Step 9: Create new fragment

Fk+1 = Fi –Fkand add it to F

Step 10: Create new fragment

Fk+2 = Fj- Fk and add it to F

Step 11: Delete Fi

Step 12: Delete Fj

Step 13: Set F + 1 to J

Step 14: End IF; Step 15: End IF

Step 16: Loop

Step 17: Add 1 to I

Step 18: Loop

Step 19: Add 1 to R

Step 20: Loop

The Non-overlapped data records fragmentation process is shown Table 3. The new derived fragments from Table 2 and

the fragments from non-overlapping fragments resulted in totally disjoint fragments. The non-overlapped fragments that

do not intersect with any other fragment in Table 2 are considered in the final set of the relation disjoint fragments. For

example, consider that the transactions triggered from the telemedicine database sites hold queries that require

extracting records from different telemedicine database relations. Let us assume three transactions for patient relation

(T1, T2, T3) on sites S1, S2, and S3 respectively. T1 defines Patients who medicated by Doctor #1, T2 defines Patients who

took medicine #4, and T3 defines Patients who paid more than $1000. The data set 1 in patient relation intersects with

Set of Telemedicine

database Relations

Set of data records generated by telemedicine

queries

Execute the fragmentation technique for

the overlapped and redundant

data records

Execute the fragmentation technique for

the non-overlapped data

records

Set of disjoint fragments

Start Processing End

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

14

data set 2 in the same relation. This result in new disjoint fragments: F1, F2 and F3. The Data sets 1 and 2 are omitted.

Fragment F1 and data set 3 in the same relation are intersected. This results in the new disjoint fragments F4, F5, and F6.

Then F1 and the data set 3 are omitted, and the next intersection is between F2 and F6. The result is new disjoint

fragments; F7, F8, and F9. Then F2 and F6 are omitted. The final disjoint fragments generated from the transactions over

patient relations are: F3, F4, F5, F7, F8, and F9. Table 4 depicts the fragmentation process over patient relation.

Table 3: Non-overlapped Data Records Fragmentation Algorithm

Step 21: Set 1 to I; K = F.size()

Step 22 Do steps (23-34) until I >F.size()

Step 23: Set 1 to J

Step 24: Do steps (25-32) until J >F.size()

Step 25: If I J and Fi , FjЄ Fgo to (26)

Else Add 1 to J, go to step (32)

Step 26: If Fi ∩ Fj= Ø do steps (27-32)

Step 27: Add 1 to K

Step 28: Create new fragment Fk= Rj- U F

Step 29: End IF

Step 30: If Fk≠ Ø Add FktoF

Step 31: End IF

Step 32: Loop

Step 33: Add 1 to I

Step 34: Loop

Table 4: Fragmentation process over patient relation

4.2. Clustering Computing Service

The benefit of generating database disjoint fragments can’t be completed unless it enhances the performance of the

WTDS. As the number of database sites becomes too large, supporting high system performance with consistency and

availability constraints becomes crucial. Several techniques are developed for this purpose; one of them consists of

clustering web sites. However, grouping sites into clusters is still an open problem with NP-Complete optimal solution

[12]. Therefore, developing a near-optimal solution for grouping web database sites into clusters helps to eliminate the

extra communication costs between the sites during the process of data allocation. Clustering web telemedicine database

sites speeds up the process of data allocation by distributing the fragments at the clusters that accomplish benefit

allocation rather than distributing the fragments once at all web sites. Thus, the communication costs are minimized and

the WTDS performance is improved. In this work, we introduce a high speed clustering service based on the least average

communication cost between sites. The parameters used to control the input/output computations for generating

clusters and determining the set of sites in each are computed as follows:

Communications Cost between sites CC(Si,Sj) = data creation cost + data transmission cost between Si,Sj.

Communication Cost Range CCR (ms/byte) which is determined by the telemedicine database system administrator.

Data Set # Site # Transaction # Relation # Generated Record(s)#

1 S1 T1 Patient (1) 39,41,56,63,72,85,97

2 S2 T2 Patient (1) 31,56,72,85,97

Fragment # Site # Transaction # Relation # Generated Record(s)#

F1 S1,S2 T1∩T2 Patient (1) 56,72,85,97

F2 S1 T1 Patient (1) 39,41,63

F3 S2 T2 Patient (1) 31

Data Set # Site # Transaction # Relation # Generated Record(s)#

3 S3 T3 Patient (1) 17,24,41,63,97

Fragment # Site # Transaction # Relation # Generated Record(s)#

F4 S1,S2,

S3

(T1∩T2) ∩T3 Patient (1) 97

F5 S1,S2 (T1∩T2) Patient (1) 56,72,85

F6 S3 T3 Patient (1) 17,24,41,63

Fragment # Site # Transaction # Relation # Generated Record(s)#

F7 S1,S3 (T1∩T3) Patient (1) 41,63

F8 S1 T1 Patient (1) 39

F9 S3 T3 Patient (1) 17,24

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

15

Clustering Decision Value (cdv):

( , ) 1: ( , ) and 0 : ( , )i j i j i jcdv S S IF CC S S CCR i j IF CC S S CCR i j (1)

Accordingly, if cdv(Si,Sj) is equal to 1, then the sites Si,Sj are assigned to one cluster, otherwise they are assigned to

different clusters. If site Si can be assigned to more than one cluster, it will be considered for the cluster of the least

average communication cost. Based on this clustering service, we develop the clustering algorithm shown in Table 5.

Table 5: Clustering Algorithm

Input: Matrix of communication cost between sites CC(Si,Sj)

CCR: communication cost range; N: List of WTDS sites;

Output: CDV(Sn,Sn) Clustering Decision Values Matrix

Step 1: For I = 1, N.size(), do steps (2) - (8)

Step 2: For J = 1, N.size(), do steps (3) - (7)

Step 3: If I J AND CC(Si,Sj) <= CCR, go to step (4)

Else, go to step (5)

Step 4: Set 1 to both CDV(Si,Sj) and CDV(Sj,Si) , go to step 6

Step 5: Set 0 to both CDV(Si,Sj) and CDV(Sj,Si)

Step 6: End IF; Step 7: End For; Step 8: End For; Step 9: Stop

The communications cost within and between clusters is required for the fragment allocation phase where fragments are

allocated to web clusters and then to their respective web sites. The optimal way to compute this cost is to find the

cheapest path between the clusters which is an NP-complete problem [12]. Instead, we use the cluster symmetry average

communication cost as it has been shown to be fast, reliable and efficient method for the computation of the fragments’

allocation and replication service in many heterogeneous web telemedicine database environments. To test the validity of

our clustering service technique, and based on the assumptions on the communications costs between sites (symmetric

between any two sites, zero within the same site, and proportional to the distance between the sites), we collect a real

sample of communication costs (multiple of 0.125 ms/byte) between 12 web sites (hospitals) and present it in Table 6.

We then apply the clustering service in our IFCA tool on the communication costs in Table 6 with clustering range of 2.5.

Table 6: The communication costs between web sites

Site # site1 site2 site3 site4 site5 site6 site7 site8 site9 Site 10 Site 11 Site12

site1 0 6 11 10 7 8 9 9 12 12 11 12

site2 6 0 10 10 2 3 4 2 7 8 6 7

site3 11 10 0 6 6 7 8 7 2 3 10 11

site4 10 10 6 0 8 7 6 7 9 6 1 1

site5 7 2 6 8 0 1 2 2 7 6 9 12

site6 8 3 7 7 1 0 1 2 8 8 6 8

site7 9 4 8 6 2 1 0 3 6 6 7 6

site8 9 2 7 7 2 2 3 0 8 7 6 6

site9 12 7 2 9 7 8 6 8 0 1 6 7

site10 12 8 3 6 6 8 6 7 1 0 7 6

site11 11 6 10 1 9 6 7 6 6 7 0 2

site12 12 7 11 1 12 8 6 6 7 6 2 0

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

16

Figure 4 depicts the experimental results that generate the clusters and their respective sites. It is inferred from the figure

that sites 1 and 3 are located in different clusters because the communication costs of the other sites can't match the

communications cost range with them. Moreover, the number of clusters is increased as the communication cost

between sites increased or the communication cost range becomes small.

Figure 4: Clustering telemedicine web sites

4.3. Data Allocation and Replication

Data allocation techniques aim at distributing the database fragments on the web database clusters and their respective

sites. We introduce a heuristic fragment allocation and replication computing service to perform the processes of

fragments allocation in the WTDS. Initially, all fragments are subject for allocation to all clusters that need these

fragments at their sites. If the fragment shows positive allocation decision value (i.e. allocation benefit greater than zero)

for a specific cluster, then the fragment is allocated to this cluster and tested for allocation at each of its sites, otherwise

the fragment is not allocated to this cluster. This fragment is subsequently tested for replication in each cluster of the

WTDS. Accordingly, the fragment that shows positive allocation decision value for any WTDS cluster will be allocated at

that cluster and then tested for allocation at its sites. Consequently, if the fragment shows positive allocation decision

value at any site of cluster that already shows positive allocation decision value, then the fragment is allocated to that

site, otherwise, the fragment is not allocated. This process is repeated for all sites in each cluster that shows positive

allocation decision value. Figure 5 illustrates the structure of our data allocation and replication technique.

In case a fragment shows negative allocation decision value at all clusters, the fragment is allocated to the cluster that

holds the least average communications cost, and then to the site that achieve the least communications cost with other

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

17

sites in the current cluster. In order to better understand the computation of the queries processing cost functions, a

mathematical model will be used to formulate these cost functions.

Figure 5: Data Allocation and Replication Technique

The allocation decision value ADV is computed as the logical result of the comparison between two compound cost

functions components; the cost of allocating the fragment to the cluster and the cost of not allocating the fragment to the

same cluster. The cost of allocating the fragment Fi issued by the transaction Tk to the cluster Cj, denoted as CA(Tk,Fi,Cj), is

defined as the sum of the following sub-costs: retrievals and updates issued by the transaction Tk to the fragment Fi at

cluster Cj, storage occupied by the fragment Fi in the cluster Cj, and remote updates and remote communications sent

from remote clusters. The mathematical model for each sub-cost function is detailed below.

Cost of local retrievals issued by the transaction Tk to the fragment Fi at cluster Cj. This cost is determined by the

frequency and cost of retrievals. The frequency of retrievals represents the average number (≥0) of retrieval

transactions that occur on each fragment at all sites of a certain cluster. The cost of retrievals is the average cost of

retrieval transactions that occur on each fragment at all sites of a specific cluster, which is set by the telemedicine

database system administrator in time units (millisecond, microsecond,…etc.) / byte. The cost of local retrievals is

computed as the multiplication of the average cost of local retrievals for all sites (m) at cluster Cj and the average

frequency of local retrievals issued by the transaction Tk to the fragment Fi for all sites at cluster Cj.

1 1

( , , , ) ( , , , )( , , )

m mk i j q k i j q

k i j

q q

CLR T F C S FREQLR T F C SCLR T F C

m m

(2)

Cost of local updates issued by the transaction Tk to the fragment Fi at cluster Cj. This cost is determined by the

frequency and cost of updates. The frequency of updates is computed as the average number of update transactions

that occur on each fragment at all sites of a certain cluster. The cost of updates is the average cost of update

transactions that occur on each fragment at all sites of a specific cluster, and is determined by the WTDS

administrator in time units per byte. The cost of local updates is computed as the multiplication of the average cost of

Set of telemedicine

database disjoint

fragments

Execute the allocation

technique for one cluster

Repeat allocation

technique for the other clusters

Input Allocation to clusters Allocation to sites

Execute the allocation

technique for sites of the

clusters allocated by fragments

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

18

local updates for all sites (m) at cluster Cj and the average frequency of local updates issued by the transaction Tk to

the fragment Fi for all sites at cluster Cj.

1 1

( , , , ) ( , , , )( , , )

m mk i j q k i j q

k i j

q q

CLU T F C S FREQLU T F C SCLU T F C

m m

(3)

Cost of storage occupied by the fragment Fi in the cluster Cj. This cost is determined by the storage cost and the

fragment size. The storage cost is the cost of storing a data byte at a certain site in a specific cluster. The fragment

size is the storage occupied by the fragment in bytes. The cost of storage is computed as the result of multiplication

of the average storage cost for all sites (m) at cluster Cj occupied by the fragment Fi and the fragment size.

1

( , , , )( , , ) ( , )

mk i j q

k i j size k i

q

SCP T F C SSCP T F C F T F

m

(4)

Cost of remote updates issued by the transaction Tk to the fragment Fi sent from all clusters (n) in the WTDS except

the current cluster Cj. This cost is determined by the cost of local updates and the frequency of remote updates. The

frequency of remote updates is the number of update transactions that occur on each fragment at remote web

clusters in the WTDS. The cost of remote update is the multiplication of average cost of local updates at cluster Cq

and the average frequency of remote updates issued by the Tk to the fragment Fi at each remote cluster in the WTDS.

1, 1,

( , , , )( , , ) ( , , )

n nk i q

k i j k i q

q q j q q j

FREQRU T F C SCRU T F C CLU T F C

m

(5)

Update ratio is used in the evaluation of cost of remote communications and represents the estimated percentage of

update transactions in the web telemedicine database system. It is computed by dividing the number of local update

frequencies in all WTDS sites by the total number of all local retrievals and update frequencies in all sites.

1 1 1

( , , , ) ( , , , ) ( , , , )m m

k i j q k i j q k i j q

q q

m

q

FREQLU T F CUrati S FREQLR T F C S FREQLU T F C So

(6)

Cost of remote communications issued for the transaction Tk to the fragment Fi sent from remote clusters (n) in the

WTDS. This cost defines the required cost needed for updating the fragment from remote clusters. It is computed as

the product of the average cost of remote communication, the average frequency of local update, and update ratio.

1, 1, 1, 1

( , , , ( , )) ( , , , )( , , )

n m mk i q k i j q

k i j ratio

q q j q

CRC T F C S S FREQLU T F C SCRC T F C U

m m

(7)

According to the definition of the cost of allocating fragments, and based on the previous allocation sub-costs, the

mathematical model that represents the cost of fragment allocation to a certain cluster in the WTDS is computed as :

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

19

( , , ) ( , , ) ( , , ) ( , , ) ( , , ) ( , , )k i j k i j k i j k i j k i j k i jCA T F C CLR T F C CLU T F C SCP T F C CRU T F C CRC T F C (8)

To complete the process of ADV computation, the cost of not allocating fragment to a certain cluster (the fragment

retrieved remotely) should be defined and computed. The cost of not allocating the fragment Fi issued by the transaction

Tk to the cluster Cj, denoted CN(Tk,Fi,Cj), is defined as the sum of the following sub-costs: cost of local retrievals issued by

the transaction Tk to the fragment Fi at cluster Cj (computed in equation 2), and the cost of remote retrievals.

Retrieval ratio is used in the evaluation of the cost of remote retrievals and represents the estimated percentage of

retrieval transactions in the WTDS. The retrieval ratio is computed by dividing the number of local retrieval

frequencies in all WTDS sites by the total number of all local retrievals and update frequencies in all WTDS sites.

1 1 1

( , , , ) ( , , , ) ( , , , )m m m

ratio k i j q k i j q k i j q

q q q

R FREQLR T F C S FREQLR T F C S FREQLU T F C S

(9)

Cost of remote retrievals issued by the transaction Tk to the fragment Fi sent from remote clusters (n) in the WTDS.

The cost of remote retrievals is determined by retrieval ratio, retrieval frequency, and communications cost that

represents the communications between all clusters. This cost is computed as the product of the average cost of

communications between all clusters, the average frequency of remote retrieval, and the retrieval ratio.

1 1, 1, 1, 1

( , , , ( , )) ( , , , ))( , , )

n n mk i q k i q

k i j raio

q q q j

CCC T F C S S FREQRR T F C SCRR T F C R

m m

(10)

Based on the previous not allocation sub-costs, the mathematical model that represents the cost of not allocating

fragment to a certain cluster in the WTDS is computed as:

( , , ) ( , , ) ( , , )k i j k i j k i jCN T F C CLR T F C CRR T F C (11)

Accordingly, the allocation decision value ADV determines the allocation status of the fragment at the cluster and its sites.

ADV is computed as a logical value from the comparison between the cost of allocating the fragment to the cluster

CA(Tk,Fi,Cj ) and the cost of not allocating the fragment to the same cluster CN(Tk,Fi,Cj).

( , , ) 1; CN( , , ) ( , , ) and 0; CN( , , ) ( , , )k i j k i j k i j k i j k i jADV T F C T F C CA T F C T F C CA T F C (12)

Therefore, if CN(Tk,Fi,Cj) is greater than or equal to CA(Tk,Fi,Cj), then the allocation status ADV(Tk,Fi,Cj) shows positive

allocation benefit and the fragment is allocated at that cluster, otherwise, CN(Tk,Fi,Cj) is less than CA(Tk,Fi,Cj) where the

allocation status ADV(Tk,Fi,Cj) shows non-positive allocation benefit, so the fragment is not allocated at that cluster. To

formalize the fragment allocation procedures, we developed the allocation and replication algorithm shown in Table 7.

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

20

Table 7: Fragments Allocation and Replication Algorithm

Input: T:List of database transactions; F:List of disjoint fragments (section 3.1); C:List of clusters in the WTDS (section 3.2)

Preparation:

Step 1: Set 1 to k

Step 2: Do steps (3-32) until k >T

Step 3: Set 1 to i

Step 4: Do steps (5-30) until i >F

Step 5: Set False to allocation flag

Step 6: Set 1 to j

Step 7: Do steps (8-26) until j >C

Processing:

{Module 1: Computing costs of allocating and costs of not allocating fragment components}

{Module 2: Computing fragment cost of allocation, cost of not allocation, and allocation decision value}

{Module 3: Fragment allocation where negative decision value is achieved}

Output: The List of fragments that are allocated to each web cluster

End.

Module 1: Computing costs of allocating and costs of not allocating fragment components, steps 8-17

{Step 8:initialize Cost of Remote Update to 0; initialize Cost of Remote Communication to 0; initialize Cost of Remote Retrieval to 0

Step 9: Set 1 to y

Step 10: Do steps (11-17) until y >C

Step 11: If y j Then Do steps (12-15); Else Go to step (15)

Step 12: Cost of Remote Update(y) = Cost of Remote Update(y -1) + (Cost of Local Update * Average Frequency of Remote Update)

Step 13: Cost of Remote Communication(y) = Cost of Remote Communication(y -1) +

(Average Cost of Remote Communication * Average Frequency of Local Update * Uratio)

Step 14: Cost of Remote Retrieval(y) = Cost of Remote Retrieval(y -1) +

(Cost of Communication between Clusters * Frequency of Remote Retrieval * Rratio)

Step 15: End If

Step 16: Add 1 to y

Step 17: Loop}

Module 2: Compute fragment cost of allocation, cost of not allocation, and allocation decision value. In this module the total cost of

allocation and total cost of not allocation for each fragment will be computed, as well as the allocation decision value that determine whether the fragment is allocated to or cancelled from the cluster, steps 18-26.

{Step 18: Cost of Allocation = Cost of Local Retrieval + Cost of Local Update + Cost of Storage + Cost of Remote Update + Cost

of Remote Communication

Step 19: Cost of Not allocating = Cost of Local Retrieval + Cost of Remote Retrieval

Step 20: Allocation Decision Value= (Cost of Not allocating >= Cost of Allocation)

Step 21: If Allocation Decision Value = True Then Go to step (22); otherwise; Go to step (23)

Step 22: Allocate the fragment to the current cluster ; Set True to allocation flag; Go to step (24)

Step 23: Cancel the fragment from the current cluster

Step 24: End If

Step 25: Add 1 to j

Step 26: Loop}

Module 3: Fragment allocation where negative decision value is achieved. To maintain data availability, fragments that are not

allocated to any cluster according to their allocation decision value, will be allocated to the cluster with the least communication cost, steps 27-32.

{Step 27: If allocation flag = False Then Allocate the fragment to the cluster who has the least communication cost

Step 28: End If ; Step 29: Add 1 to i

Step 30: Loop

Step 31: Add 1 to k

Step 32: Loop}

Once the fragments are allocated and replicated at the web clusters, then the investigation of the benefit of allocating the

fragments at all sites of each cluster becomes crucial. To get better WTDS performance, fragments should be allocated

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

21

over the sites of clusters where a clear allocation benefit is accomplished or where data availability is required. The same

allocation and replication algorithm will be applied in this case, taking into consideration the costs between sites in each

cluster. For example, Consider fragment 1 for allocation in clusters 1, 2, and 3 respectively. Based on Equations 2 through

12 that determine the allocation decision value, Tables 8 through Table 10 depict the calculations of such process.

Table 8: Cluster 1 costs of retrieval, update, and storage

Cluster # Sites # Cost of retrieval Cost of update Cost of storage

1 5 0.0002 0.004 0.000007

1 6 0.0001 0.003 0.000003

1 7 0.0004 0.006 0.000005

1 Average 0.00023 0.0043 0.000005

Table 9: Average # of retrieval, update frequencies, and communication costs

Cluster # Avg. # of Retrievals

Average # of Remote Retrievals

Avg. # of Updates

Avg. # of Remote Updates

Avg. cost of comm.

Avg. cost of remote comm.

1 16700 26500 5300 7400 0.25 0.69

2 11800 31400 4000 8700 0.21 0.57

3 14700 28500 3400 9300 0.20 0.74

Table 10: Computation of fragment allocation decision value

Frag. # Cluster # CA(Tk,Fi,Cj) CN(Tk,Fi,Cj) ADV(Tk,Fi,Cj) Allocation Result

CLR CLU SCP CRU CRC CLR CRR

1 (1024 B) 1 3.84 22.79 0.0051 31.82 1499.37 3.84 13292.4 CA(Tk,Fi,Cj) < CN(Tk,Fi,Cj) Allocated

The allocation decision value for fragment 1 will be computed in the same way for the other clusters and for cluster 1 sites

(5, 6, and 7) to determine which cluster/site can allocate the fragment and which one will not.

4.4. Complexity of Computation

The time complexity of our approach is bounded by the time complexity of the following incorporated entities: defining

fragments, clustering web sites, fragments allocation, computing average retrieval, and update frequencies. The time

complexity of each entity is computed as follows:

The complexity of defining fragments is O(R*(R.size()-1)2), where R is the number of relations, and R.size() is the

average current number of records in each relation.

The complexity of clustering web sites is O((N2-N) log2n), where N is the number of sites in the WTDS.

The complexity of fragments allocation is O(R.size()*R*S), where R.size() is the average current number of records in

each relation, R is the number of relations, and S is the number of sites.

The complexity of computing average retrieval and update frequencies isO(F*S*A*A.size()) where F is the number of

the fragments in the database, S is the number of sites, A is the number of applications at each site, and A.size() is the

average current number of records in each application.

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

22

Based on the complexity computations of the previous incorporated entities, the time complexity of this approach is

bounded by: O(R*(R.size()-1)2+ O((N

2-N) log2n) + R.size()*R*S + F*S*A*A.size()).

5. Experimental Results and Performance Evaluation

To analyze the behavior of its computing service techniques, we develop an IFCA software tool that is used to validate the

telemedicine database system performance. This tool is not only experimental but it also supports the use of knowledge

extraction and decision making. In this tool, we test the feasibility of the three proposed computing services’ techniques.

The experimental results and the performance evaluation of our approach are discussed in the following sections.

5.1. Evaluation of Fragmentation Service

The internal validity of our fragmentation service technique is tested on multiple relations of a WTDS. We apply the

fragmentation computing service in our IFCA tool on a set of data records obtained from eight different transactions as

(Table 11). Figure 6 depicts the experimental results that generate the disjoint fragments and their respective records.

Table 11: Computation of fragment allocation decision value

Transaction #

Record #

Transaction #

Record #

1 1,2,3,7,9 5 18,17

2 11,18,20 6 7,8,9,10,13,14,15,16

3 1 7 13,16

4 1,…,21 8 18,19

Figure 6: Generating Database fragments

Figure 6 shows that our fragmentation computing service satisfies the optimal solution of data fragmentation in WTDS

and generates the minimum number of fragments for each relation. This helps in reducing the communications cost and

the data records storage, hence improving WTDS performance. We now evaluate the fragmentation performance in

terms of fragments storage size reduction. Let Fperr represent the fragmentation performance percentage of the relation

r, Rdrsr represent the relation data records size generated from the database queries, and Rfsr represent the relation final

fragment data records size, then Fperr is computed as: Fperr = (Rdrsr - Rfsr) / Rdrsr (13)

For example, when the fragmentation is performed for relation 1 on data records sets generated from the database

queries of total size 115 kb and a final number of fragments of a total size 33 kb, then the storage size that will be

considered for this relation is reduced from 115 kb to 33 kb with no data redundancy. This action helps in improving the

system performance by reducing the data storage size of about 71%.

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

23

To externally validate our approach, we implemented the two fragmentation techniques that are proposed by Ma et al.

[8] and Huang et al. [12] respectively and compare their performance against our approach. Figure 7 shows number of

fragments against the relation number for the three techniques. The results in the figure show that more large sized

fragments are produced by methods in [8] and [12] due to their fragmentation assumptions (See Section 2). The results

also show that less number of small size fragments is generated by our computing service. Therefore, it is clear that our

fragmentation technique significantly outperforms the approaches proposed by Ma et al. [8] and Huang et al. [12].

Figure 7: Database fragmentation performances

Figure 8: Clustering performance comparisons

5.2. Evaluation of Clustering Service

To evaluate the performance accomplished by grouping the telemedicine web sites running under our clustering service

technique, we introduce a mathematical model to calculate the performance gain in terms of the reduced communication

costs that can be saved from clustering web sites. The clustering performance gain is computed as the result of the

reduced costs of communications divided by the sum of communications costs between sites. The reduced

communications costs are specifically defined as the difference between the sum of costs that are required for each site

to communicate with remote sites in the web system and the sum of costs that is required for each cluster to

communicate with remote clusters. The performance evaluation of such mathematical model is expressed as follows: Let

CPE represent (Clustering Performance Evaluation) and CC(Si,Sj) represent the communication cost between any two

sites; Si and Sj in the cluster Ci where i,j = 1,2,3, …, n. Let CC(Ci,Cj) represent the communication cost between any two

clusters; Ci and Cj where i,j = 1,2,3, …, n. CPE can be defined as:

1 1 1 1 1 1

( , ) ( , ) ( , )n n n n n n

i j i j i j

i j i j i j

CPE CC S S CC C C CC S S

(14)

0

5

10

15

20

25

0 1 2 3 4 5 6

Nu

mb

er

of

fra

gm

en

ts

Relation Number

Fragmentation Techniques Comparison(in terms of final number of generated fragments)

IFCA

Ma et. al

Huang & Chen

0

5

10

15

20

25

30

35

40

45

10 20 30 40 50 60 70 80 90 100

Nu

mb

er

of

gen

era

ted

clu

ste

rs

Number of Web sites

Clustering Techniques Comparison

IFCA

Kumar et. al

Franczak et. al

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

24

For example, consider a telemedicine database system simulated on 10 web sites grouped in 4 clusters, each site

communicates with the other 9 sites, and each cluster communicate with the other 3 clusters taking into consideration

the communications within the cluster itself. For simplicity, we assume the sum of communications costs between sites is

697.5 ms/byte and between clusters is 181.42 ms/byte. By applying CPE (Equation 14), the communications cost is

decreased to 74% of the original communications cost.

To externally validate our clustering service, we implement the clustering techniques by Kumar et al. [13] and Franczak et

al. [15] and compare them with our clustering service in terms of the number of generated clusters against the number of

websites (Figure 8). The results in Figure 8 show that our clustering service generates more clusters for smaller number of

sites; hence it induces less communication costs within the clusters. On the other hand, the other techniques generate

less clusters for large number of sites, thus, they induce more communication costs within the clusters. Figure 8 shows

that the clustering trend increases with the increase in the number of network sites in [13] and ours. In contrast, the

number of clusters generated by the clustering approach in [15] is less due to their clustering approximation function that

uses natural logarithmic function. This in turn results in maximizing the number of sites in each cluster which increases

the communications cost.

5.3. Evaluation of Fragment Allocation Service

To evaluate our fragment allocation computing, we use a set of disjoint fragments obtained from our fragmentation

service, a set of the clusters and their sites generated from our clustering service, and a set of average number of

retrievals and updates at each site (Table 12-a). We apply the fragmentation service in our IFCA software too using the

site costs of storage, retrieval, and update depicted in Table 12-b. the results are shown in Figure 9.

Table 12-a: Fragments retrieval and update frequencies Table 12-b: Costs of storage, retrieval, and update

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

25

We propose the following mathematical model to evaluate the performance of fragment allocation and replication

service. Let IAFS represent the initial size of allocated fragments at site S, n represent the maximum number of sites in the

current cluster, FAFS represent the final size of allocated fragments at site S, and APEC represents the fragment allocation

and replication performance evaluation at cluster C, then APEC is computed as:

n

s

s

n

s

s

n

s

sc IAFSFAFSIAFSAPEC

111

(15)

Figure 9: Fragments allocation at clusters

For example, assume that 35 fragment allocations with a total size of 91 kb are initially allocated to the WTDS clusters.

The final fragment allocation using our approach is 14 with a total size of 27 kb. Therefore, the storage gain of our

allocation technique is about 70% which in turn reduces fragments allocation average computation time. Our proposed

allocation technique is compared with the allocation methods by Ma et al. [8] and Menon et al. [20]. These two

approaches allocate large numbers of fragments to each site and hence consume more storage which in turn increase the

fragments allocation average computation time in each cluster. Figure 10 illustrates the effectiveness of our fragment

allocation and replication technique in terms of average computation time compared to the fragment allocation methods

in [8] and [20]. The figure shows that our fragment allocation and replication technique incurs the least average

computation time required for fragment allocation. This is because our clustering technique produces more clusters of

small number of web sites which reduces the fragment allocation average computation time in each cluster. We Infer

from Figure 8 that the average computation time of fragment allocation increases as the number of web sites in the

cluster increases. Most importantly, the figure shows that that our technique outperforms the ones in [8] and [20].

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

26

5.4. Evaluation of Web Servers Load and Network Delay

The telemedicine database system network workload involves queries from a number of database users who access the

same database simultaneously. The amount of data required per second and the time in which the queries should be

processed depend on the database servers running under such network. For simplicity, twelve web sites grouped in 4

clusters are considered for network simulation to evaluate the performance of a telemedicine database system. The

performance factors under evaluation are the servers load and the network delay that is simulated in OPNET [23].

Figure 10: Fragments allocation after clustering

Figure 11: The WTDS network topology

The proposed network simulation for building WTDS is represented by web nodes. Each node is a stack of two 3Com

Super Stack II 1100 and two Super stack II 3300 chassis (3C_SSII_1100_3300) with four slots (4s), 52 auto-sensing Ethernet

ports (ae52), 48 Ethernet ports (e48), and 3 Gigabit Ethernet ports (ge3). The center node model is a 3Comswitch, the

periphery node model is internet workstation (Sm_Int_wkstn), and the link model is 10BaseT. The center nodes are

connected with internet servers (Sm_Int_server) by 10BaseT. The servers are also connected with a router (Cisco 2514,

node 15) by 10BaseT. Figure 11 depicts the network topology for the simulated WTDS which consists of 12 web sites

nodes (0, 1, 2, 3, 4, 9, 10, 11, 16, 17, 18, 20) and 4 clusters (6, 13, 14, and 22) connected with internet servers (7 and 8)

that are also connected with a router (Cisco 2514, node 15) by 10BaseT.

The following subsections evaluate the effect of server load and network delay on the distributed database performance.

5.4.1. Server Load

The server load determines the speed of the server in terms of (bits/sec). Figure 12 shows server’s load comparison

between our approach and the approaches in Kumar et al. [13] and Franczak et al. [15]. The servers are denoted by the

cluster legends C1, C2, C3, and C4 respectively. The results of the comparison are drawn from Figures 11 and 12, and are

summarized in Table 13. Table 13 shows that the server load increases as the number of represented nodes (sites)

0

5

10

15

20

25

0 1 2 3 4 5

Av

era

ge

tim

e (

sec

)

Cluster Number

Fragment Allocation Techniques Comparison

IFCA

Menon

Ma et. al

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

27

decreases; this is due to load distribution over the nodes assuming that all nodes have the same capacity. On the other

hand, the server load decreases as the number of represented nodes increases. The results state that the network is

almost balanced throughout processing the web services. However, some variations are expected due to the possible

variations in the number of processing transactions on each node. Figure 13 shows the maximum load (bits/sec) on the

servers’ clusters in our proposed clustering technique, and the clustering methods in [13] and [15]. The results clearly

show that our approach outperforms the approaches in [13] and [15].

Figure 12 : WTDS servers’ load comparison

Figure 13: Servers’ Max. load for clustering techniques

Table 13: WTDS Servers’ load comparison

Server numb. Represent cluster Center node in network topology Represent nodes in network topology Load is well below (bits/sec)

1 C1 22 20 1300

2 C2 6 0, 1, 2 , 3 , 4 500

3 C3 13 9, 10, 11 750

4 C4 14 16, 17, 18 600

5.4.2. Network Delay

The network delay is the delay caused by the transactions traffic on the web database system servers. The network delay

is defined as the maximum time (millisecond) required for the network system to reach the steady state. Figure 14 shows

the network delay caused by the WTDS servers represented by the legends (C1, C2, C3, C4). The figure indicates that the

web database system reaches the steady state after 0.065 milliseconds. It shows that the network delay is less when

distributing the web sites over 4 servers compared to the delay consumed when all web sites connect 1, 2, or 3 servers.

Figure 15 shows the maximum network delay (sec) caused by web servers against distance range for our clustering

technique, and the techniques in [13] and [15]. Note that the network delay in our approach is always less than in [13]

and [15]; this is due to the better clustering computations of our technique.

5.5. Threat to Validity

Threats to external validity limit the ability to generalize the results of the experiments to industrial practice. In order to

avoid such threats in evaluation of our approach, we have compared each of our proposed computing services’

0

500

1000

1500

2000

2500

3000

3500

0 1 2 3 4 5 6 7

Load

(bit

s/se

c)

Distance range (m)

Servers' maximum load Comparison

Kumar et. al

Franczak et. al

IFCA

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

28

techniques; fragmentation, clustering and data allocation with two other similar techniques proposed by different groups

of researchers. Each of our proposed techniques is implemented with two comparable techniques well accepted by the

database systems community. The performance of our proposed techniques is compared. The results show that our

fragmentation, clustering and allocation techniques outperform their counterparts proposed in the literatures.

Figure 14: The WTDS network delay

Figure 15: Max. Net. delay for clustering techniques

6. Conclusion

In this work, we proposed a new approach to promote WTDS performance. Our approach integrates three enhanced

computing services’ techniques namely, database fragmentation, network sites clustering and fragments allocation. We

develop these techniques to solve technical challenges, like distributing data fragments among multiple web servers,

handling failures, and making tradeoff between data availability and consistency. We propose an estimation model to

compute communications cost which helps in finding cost-effective data allocation solutions. The novelty of our approach

lies in the integration of web database sites clustering as a new component of the process of WTDS design in order to

improve performance and satisfy a certain level of quality in web services.

We perform both external and internal evaluation of our integrated approach. In the internal evaluation, we measure the

impact of using our techniques on WTDS and web service performance measures like communications cost, response

time and throughput. In the external evaluation, we compare the performance of our approach to that of other

techniques in the literature. The results show that our integrated approach significantly improves services requirement

satisfaction in web systems. This conclusion requires more investigation and experiments. Therefore, as future work we

plan to investigate our approach on larger scale networks involving large number of sites over the cloud. We will consider

applying different types of clustering and introduce search based technique to perform more intelligent data

redistribution. Finally, we intend to introduce security concerns that need to be addressed over data fragments.

2.5

3

3.5

4

4.5

5

0 1 2 3 4 5 6 7

Max

. de

lay

(mill

i se

c)

Distance range

Maximum Network Delay

Kumar et. Al Franczak et. Al IFCA

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

29

References

[1] Jui-chien Hsieh and Meng-Wei Hsu. “A cloud computing based 12-lead ECG telemedicine service,” BMC Medical Informatics and

Decision Making, 2012. pp 12- 77.

[2] Tamhanka, A. and Ram, S. “Database Fragmentation and Allocation: An Integrated Methodology and Case Study,” IEEE

Transactions on Systems, Man. and Cybernetics-Part A. Systems and Humans. 1998, V. 28(3), pp. 288 – 305.

[3] Borzemski, L. “Optimal Partitioning of a Distributed Relational Database for Multistage Decision-Making Support systems,”

Cybernetics and Systems Research. 1996, V. 2(13), pp. 809-814.

[4] Son, J. and Kim, M. “An Adaptable Vertical Partitioning Method in Distributed Systems,” The Journal of Systems and Software,

2004 V. 73(3), pp. 551 – 561.

[5] Lim, S. and Ng, Y. “Vertical Fragmentation and Allocation in Distributed Deductive Database Systems,” The Journal of Information

Systems, 1997, V. 22(1), pp. 1-24.

[6] Agrawal, S.; Narasayya, V. and Yang, B. “Integrating Vertical and Horizontal Partitioning into Automated Physical Database Design,”

ACM SIGMOD 2004, Paris, France, pp. 359-370.

[7] Navathe, S.; Karlapalem, K. and Minyoung, R. “A mixed fragmentation methodology for initial distributed database design,”

Journal of Computer and Software Engineering. 1995, V. 3(4), pp.395-425.

[8] Ma, H.; Scchewe, K. and Wang, Q. “Distribution design for higher-order data models,” Data and Knowledge Engineering, 2007, V.

60, pp. 400-434.

[9] Yee, W.; Donahoo, M. and Navathe, S. “A Framework for Server Data Fragment Grouping to Improve Server Scalability,”

Intermittently Synchronized Databases – CIKM 2000.

[10] Jain, A.; Murty, M. and Flynn, P. “Data Clustering: A Review,” ACM Computing Surveys. 1999, V. 31(3),pp. 264-323.

[11] Lepakshi Goud. “Achieving Availability, Elasticity and Reliability of the Data Access in Cloud Computing,” International Journal of

Advanced Engineering Sciences and Technologies, Vol. 5(2), 150 -155.

[12] Huang, Y. and Chen, J. “Fragment Allocation in Distributed Database Design,” Journal of Information Science and Engineering,

2001 V. 17, pp. 491-506.

[13] Kumar, P.; Krishna, P.; Bapi, R. and Kumar, S. “Rough Clustering of Sequential Data,” Data and Knowledge Engineering, 2007, V.63,

pp. 183-199.

[14] Voges, K.; Pope, N. and Brown, M. “Cluster Analysis of Marketing Data Examining Online Shopping Orientation: A comparison of

K-means and Rough Clustering Approaches,” H.A. Abbass, R.A. Sarker, C.S. Newton (Eds.), Heuristics and Optimization for

Knowledge Discovery, Idea Group Publishing, Hershey. 2002, pp. 207 – 224.

[15] Fronczak, A.; Holyst, J.; Jedyank, M. and Sienkiewicz, J. “Higher Order Clustering Coefficients,” Barabasi-Albert Networks, Physica

A: Statistical Mechanics and its Applications. 2002, V. 316(1-4), pp. 688-694.

[16] Halkidi, M.; Batistakis, Y. and Vazirgiannis, M. “Clustering algorithms and Validity Measures,” Proceedings of the SSDBM

Conference 2001.

[17] Ishfaq A.; Karlapalem, K. and Kaiso, Y. “Evolutionary Algorithms for Allocating Data in Distributed Database Systems,” Distributed

and Parallel Databases, Kluwer Academic Publishers. 2002, v.11, pp.5-32.

[18] Danilowicz, C. and Nguyen, N. “Consensus Methods for Solving Inconsistency of Replicated Data in Distributed Systems,”

Distributed and Parallel Databases. 2003, V.14, pp. 53-69.

[19] Costa, R. and Lifschitz, S. “Database Allocation Strategies for Parallel BLAST Evaluation on Clusters,” Distributed and Parallel

Databases, 2003, V.13, pp. 99-127.

[20] Menon, S. “Allocating Fragments in Distributed Databases,” IEEE Transactions on Parallel and Distributed Systems, 2005, Vol. 16-7,

pp. 577-585.

[21] Daudpota, N. “Five Steps to Construct a Model of Data Allocation for Distributed Database Systems,” Journal of Intelligent

Information Systems: Integrating Artificial Intelligence and Database Technologies. 1998, V. 11(2), pp.153-68.

[22] Microsoft SQL Server 2012. Available from: <http://www.microsoft.com/sql/editions/express/default.mspx> [Accessed 20th

October, 2012].

[23] MySQL 5.6 available from: <http://www.mysql.com/> [Accessed 21th October, 2013].

[24] OPNET IT Guru Academic, OPNET Technologies, Inc. 2003. Available

from:<http://www.opnet.com/university_program/itguru_academic_edition/> [Accessed 27th August,, 2013]

[25] Hauglid, J.; Ryeng, N. and Norvag, K. “DYFRAM: Dynamic Fragmentation and Replica Management in Distributed Database

Systems,” Distributed and Parallel Databases, 2010, V. 28, pp. 157–185.

[26] Wang, Z.; Li, T.; Xiong, N. and Pan, Y. “A Novel Dynamic Network Data Replication Scheme Based on Historical Access Record and

Proactive Deletion,” Journal of Supercomputing – Springer DOI: 10.1007/s11227-011-0708-z. Published online 19 October 2011.

1939-1374 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSC.2014.2300499, IEEE Transactions on Services Computing

30

[27] Khan, S. and Ahmad, I. “Replicating Data Objects in Large Distributed Database Systems: An Axiomatic Game Theoretic

Mechanism Design Approach,” Distributed and Parallel Databases, 2010, V. 28(2-3), pp. 187-218.

[28] KhanS, h. and Hoque, L. “A New Technique for Database Fragmentation in Distributed Systems,” International Journal of

Computer Applications V. 5(9), 2010, pp. 20-24.

[29] Jagannatha, S.; Mrunalini, M.; Kumar, T. and Kanth, K. “Modeling of Mixed Fragmentation in Distributed Database Using UML 2.0,”

IPCSIT, V. 2 , 2011, pp. 190 – 194.

[30] Morffi, A.; Gonzalez, C.; Lemahieu, W. and Gonzalez, L. “SIADBDD: An Integrated Tool to Design Distributed Databases,” Revista

Facultad de Ingenieria Universidad de Antioquia ISSN (Version impresa): 0120-6230, No. 47 March, 2009, pp. 155-163.

[31] Özsu, M. T. and Valduriez, P. “Principles of Distributed Databases,” 3rd edition, 2011, Springer, ISBN 978-1-4419-8833-1

[32] Mao, G.; Gao, M. and Yao, W. “An Algorithm for Clustering XML Data Stream Using Sliding Window,” The Third International

Conference on Advances in Databases, Knowledge, and Data Applications, 2011, pp. 96-101.

[33] Paixão, M. P.; Silva, L. and Elias G. “Clustering Large-Scale, Distributed Software Component Repositories,” The Fourth

International Conference on Advances in Databases, Knowledge, and Data Applications, 2012, pp. 124-129.

[34] Decandia, G.; Hastorun, D.; Jampani, M.; Kakulapati, G., Lakshman, A.; Pilchin, A.; Sivasubramanian, S.; Vosshall, P. and Vogels, W.

“Dynamo: Amazon’s Highly Available Key-Value Store,” ACM Symposium on Operating Systems Principles, 2007, pp. 205–220.

[35] http://en.wikipedia.org/wiki/MongoDB. Accessed on 12th November, 2013.

[36] Chang, F.; Dean, J.; Ghemawat, S.; Hsieh, W. C.; Wallach, D. A.; Burrows, M.; Chandra, T.; Fikes, A. and Gruber, R. E. “Big

Table: a Distributed Storage System for Structured Data,” Operating Systems Design and Implementation, 2006, pp. 205–218.

Ismail Hababeh holds a PhD degree in Computer Science from Leeds Metropolitan University - U.K.

He also holds a Master degree in Computer Science from Western Michigan University – USA, and a

Bachelor Degree in Computer Science from University of Jordan. Dr. Hababeh research areas of

particular interest include, but are not limited to the following: Distributed Databases, Cloud

Computing, Network Security, and Systems Performance.

Issa Khalil received his B.Sc. and M.S. degrees from Jordan University of Science and Technology in

1994 and 1996, and the PhD degree from Purdue University, USA, in 2007, all in Computer Engineering.

Immediately thereafter, he joined the College of Information Technology (CIT) of the United Arab

Emirates University (UAEU) where he was promoted to associate professor in September 2011. In June

2013 Khalil joined Qatar Computing Research Institute (QCRI) as a senior scientist with the cyber

security group. Khalil's research interests span the areas of wireless and wire-line communication

networks. He is especially interested in security, routing, and performance of wireless Sensor, Ad Hoc

and Mesh networks. Khalil’s recent research interests include malware analysis, advanced persistent

threats, and ICS/SCADA security. Dr. Khalil served as the technical program co-chair of the 6th International Conference

on Innovations in Information Technology, and was appointed as a Technical Program Committee member and reviewer

for many international conferences and journals. In June 2011, Khalil was granted the CIT outstanding professor award for

outstanding performance in research, teaching, and service.

Abdallah Khreishah received his Ph.D. and M.S. degrees in Electrical and Computer Engineering from

Purdue University in 2010 and 2006, respectively. Prior to that, he received his B.S. degree with honors

from Jordan University of Science & Technology in 2004. In Fall 2012, he joined the ECE department of

New Jersey Institute of Technology as an Assistant Professor. His research spans the areas of network

coding, wireless networks, cloud computing, and network security.