06200870

IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 16, NO. 4, JULY 2012 745

Optimizing Medical Data Quality Based onMultiagent Web Service Framework

Ching-Seh Wu, Ibrahim Khoury, and Hemant Shah

Abstract—One of the most important issues in e-healthcare in-formation systems is to optimize the medical data quality extractedfrom distributed and heterogeneous environments, which can ex-tremely improve diagnostic and treatment decision making. Thispaper proposes a multiagent web service framework based onservice-oriented architecture for the optimization of medical dataquality in the e-healthcare information system. Based on the de-sign of the multiagent web service framework, an evolutionaryalgorithm (EA) for the dynamic optimization of the medical dataquality is proposed. The framework consists of two main compo-nents; first, an EA will be used to dynamically optimize the com-position of medical processes into optimal task sequence accordingto specific quality attributes. Second, a multiagent framework willbe proposed to discover, monitor, and report any inconstancy be-tween the optimized task sequence and the actual medical records.To demonstrate the proposed framework, experimental results fora breast cancer case study are provided. Furthermore, to show theunique performance of our algorithm, a comparison with otherworks in the literature review will be presented.

Index Terms—Component composition, e-healthcare, medicalinformation system, multiobjective optimization, web services.

I. INTRODUCTION

I T WAS reported that between 44000 and 98000 deaths occurannually as a consequence of medical errors within Amer-

ican hospitals alone [1], and the U.S. National Association ofBoards of Pharmacy reports that as many as 7000 deaths oc-cur in the U.S. each year because of incorrect prescriptions [2].The World Health Organization (WHO) reported in the articleMedical Error in Top Ten Killers: WHO that unintended medi-cal errors are a big threat to patient safety [3]. Therefore, thereis a great desire to improve access to new healthcare methods,and the challenge of delivering healthcare becomes significantnowadays. In an attempt to meet these great demands, health-care systems have increasingly looked at deploying informationtechnology to scale resources, to reduce queues, to avoid errors,and to provide modern treatments into remote communities.

Many medical information systems are proposed in the lit-erature trying to assist in management and advising medical

Manuscript received August 24, 2011; revised December 21, 2011; acceptedApril 5, 2012. Date of publication May 16, 2012; date of current version July5, 2012.

C. Wu and I. Khoury are with the Department of Computer Science andEngineering, Oakland University, Rochester Hills, MI 48309 USA (e-mail:[email protected]; [email protected]).

H. Shah is with Henry Ford Healthcare System, Rochester Hills, MI 48309USA (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TITB.2012.2195498

treatments to prevent from any type of medical errors. Fromthe individualized care point of view, in order for clinicians tomake the best diagnosis and decision on treatment, all the rele-vant health information of the patient needs to be available andtransparently accessible to them regardless of the location whereit is stored. Moreover, computer-aided tools are now essentialfor interpreting patient-specific data in order to determine themost suitable therapy from the diagnosis, but existing systemslack collaborative ability because of employing different designmethods [4].

Many researchers have been trying to apply service-orientedarchitecture (SOA) to deal with the distributed environment fore-healthcare information systems [5]. The objective of SOA isto provide better healthcare systems to users. A web servicetechnology is widely accepted as one popular implementationof SOA. Following the definitions and specifications of webservice, any organization, company, or even individual devel-opers who can deliver such functional entities can register andpublish their service components to a Universal Description,Discovery, and Integration (UDDI) for public use. Web servicescan be as simple as a single transaction, e.g., the querying of amedical record, or more complex multiservices, e.g., supplyingchain management systems from business to business, and manyothers [6].

In dynamic optimization of medical data quality, the in-formation regarding suitable medical data service componentsneeds to be acquired from many medical data service providerswhose components are registered in a UDDI registry reposi-tory. The next step is to negotiate with different medical dataservice providers in order to integrate suitable medical datacomponents.

The optimization of medical data selection is successful whenmultiobjectives set by a medical data service requester are metsuch as reliability of medical data components, results of diagno-sis, and cycles of consultation [7]–[9]. To improve the medicaldata quality, the medical task sequence or workflow needs tobe optimized; in our paper, we are proposing a unique evolu-tionary algorithm (EA). EAs have been applied as the search-ing algorithms to search for optimal solutions to combinatorialproblems. “Survival of the fittest” is a principle in the natural en-vironment which is used in the searching algorithm to generatesurvivors, the optimal solutions, for a given problem.

To establish the basis of the evolutionary computing (EC)field, several studies were reviewed. The principles of the ECtheory are based on Darwin’s theory of natural selection to solvereal-world problems [10]. EAs have been successfully applied inoptimizing the solutions for a variety of domains. The strengthof EC techniques comes from the stochastic strategy of search

1089-7771/$31.00 © 2012 IEEE

746 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 16, NO. 4, JULY 2012

operators. The major components in EC are search operatorsacting on a population of chromosomes. EC was developedto solve complex problems, which were not easy to solve byexisting algorithms

This paper aims to apply the SOA of web service con-cepts specified previously to put forward a model of multi-ple intelligent agents (IAs) based assistance in improvementof medical data quality in the distributed e-healthcare informa-tion system environment, which is able to optimize the med-ical task sequence according to data quality aspects. Further-more, to improve accuracy of doctors’ diagnostics, many meth-ods for medical diagnostic and treatment advice systems havebeen developed to assist medical doctors in decision makingsuch as rule-based reasoning, fuzzy inference, neural network,etc. [8], [11], [12].

Intelligent agent is another approach taken by researcherstrying to assist in different domains such as business process,remote education service, and project management [13]–[15].Our objectives of this research are to design and develop medicaldata quality models and to develop the methodologies and algo-rithms of our multiagent framework to assist in monitoring andoptimizing data quality for e-healthcare information systems.

In the following sections, we will first describe the prelimi-nary aspects of our study focusing on medical data quality interms of data extraction in Section III. In Section IV, static anddynamic behavior of medical data quality models were designedand developed by using unified modeling language (UML) no-tations. These models will be implemented for healthcare IAsto monitor and keep track of the medical data recording andextraction process. In Section V, an EA is proposed to optimizemedical data quality in distributed medical data environments.A case study for breast cancer disease is examined and indicatedwith experimental results using the EA in Section VI. In orderto demonstrate that our EA can improve the performance andaccuracy, we compare our EA with penalty-based genetic algo-rithm (GA) in Section VII. Finally, the conclusion of our paperis in Section VIII.

II. RELATED WORK

Few researchers in current literature used web services andSOA technology to improve medical data quality. On the otherhand, many researchers tried to optimize quality of service(QoS) attributes for web services composition. However, noneof the researchers tried to improve medical data quality by op-timizing the medical task sequence rather than optimizing theservice components, which support doctors’ decision making.In this section, we will categorize related work into two groups:SOA-based healthcare systems and QoS-based optimization al-gorithm.

A. SOA-Based Healthcare Systems

Gong and Chen proposed a healthcare information integra-tion and shared platform based on SOA. The platform supportsthe integration, development, and operation of a full spectrum ofhealthcare applications. Their platform is an information systembased on SOA and based on the HL7 v3 reference information

model. It can provide a suite of foundational services to supportdevelopment of a wide range of healthcare domain applicationsand can help software developers building new clinical applica-tions and share information between existing systems [16].

Kart et al. described a distributed e-healthcare system thatuses the SOA as a basis for designing, implementing, deploying,invoking, and managing healthcare services. The e-healthcaresystem that they have developed provides support for physicians,nurses, pharmacists, and other healthcare professionals, as wellas for patients, medical devices, and can be used to monitorpatients [17].

In [18], authors try to present a service-oriented way of health-care information integration. The integration service manage-ment framework discussed in their work provides an effectiveapproach for dealing with the configuration, execution, and man-agement of the services which developed to archive the problemof system integration. The authors discuss an example of inte-grating radiology information system and picture archiving andcommunications system in a framework prototype to illustratethe feasibility and benefits of the service-oriented approach.

Chu’s proposed work illustrates how clinical objects from anobject-oriented clinical information system can be mapped tothe SOA structure. A set of clinical object components weredesigned based on the functional requirements defined in a clin-ical information system. The clinical objects encapsulated thedata management requirements and business rules for the clin-ical system. Two base clinical object classes, the informationservices object (ISO) and the data services object (DSO), havebeen defined in this process, All higher level (specialized) ob-jects are constructed from these base object classes. Higher levelobjects inherit the functionality of the base object classes [19].

Omar and Bendiab discuss an experimental scenario for ane-health monitoring system that uses an SOA as a model fordeploying, discovering, integrating, implementing, managing,and invoking e-health services. The authors justify that such amodel could help the healthcare industry to develop cost efficientand dependable healthcare services. In their model, they adoptedmultiple regression analysis for prediction and implemented aweb service based on multiple regression analysis [20].

Kart et al. proposed a prototype distributed e-healthcare sys-tem that uses SOA to enforce basic software architecture princi-ples and provide interoperability between different computingplatforms and applications that communicate with each other.The clinic, pharmacy, and patient modules provide the actualservices for the distributed e-healthcare system. The devices ac-cessing these modules can be desktop or server computers aswell as personal digital assistants or smart phones. They canalso be electronic medical devices, such as blood pressure mon-itors. Although their distributed e-healthcare system providesuser-friendly interfaces for busy healthcare professionals andpatients, security and privacy are particularly important in thisarea, so they designed the prototype with security and privacyin mind [10].

To sum up, some researchers used SOA architecture and webservice technology for healthcare systems; most of them intro-duce the healthcare system as distributed services that can becomposed in one application. Others used SOA to share medical

WU et al.: OPTIMIZING MEDICAL DATA QUALITY BASED ON MULTIAGENT WEB SERVICE FRAMEWORK 747

information between distributed healthcare systems. However,none of them used SOA to extract data from distributed sys-tems and then optimize the data to find the best medical tasksequence, which can extremely improve medical data qualityand support medical decision making. Furthermore, this workprovides a unique solution to improve medical data quality usinga multiagent web service framework.

B. QoS-Based Optimization Algorithm

Ai and Tang proposed QoS-aware web service compositionwith interservice dependences and conflicts using a penalty-based GA. The performance of the algorithm has been evaluatedby simulation. The fitness function used gives more penalties tochromosomes that violate more constraints, so infeasible chro-mosomes will be less likely selected to the new generation [21].

Bahadori et al. proposed a GA for web service composition;in order to increase the performance of the algorithm, they usedTabu search. Tabu search also guides methods to escape the trapof local optimization. Tabu has obtained optimal and near opti-mal solution to a wide verity of classical and practical problemsin many applications [22].

Tang and Ai proposed a new hybrid GA for the optimalweb service selection problem. The hybrid GA has been im-plemented and evaluated.

The evaluation results have shown that the hybrid GA out-performs penalty- and repairing-based GAs when the numberof web services and the number of constraints are large. Thealgorithm also takes into account the dependence and conflictbetween web services according to some constraints [23].

Canfora and Di Penta proposed a lightweight approach forQoS–aware service composition that uses GAs for the optimalQoS estimation. Also, the paper presents an algorithm for earlytriggering service replanning. The required replanning is trig-gered as soon as possible during service execution [24].

Zeng et al. proposed a global planning approach to optimallyselect component services during the execution of a compos-ite service. Service selection is formulated as an optimizationproblem which can be solved using efficient linear program-ming methods. Experimental results show that this global plan-ning approach outperforms approaches in which the componentservices are selected individually for each task in a compositeservice [25].

AnFeng et al. presented a composition model capable of com-posing web services across wide area networks with the servicecomposition based on interface idea integrated with peer-to-peertechnologies and spanning tree. It forms a novel web servicescomposition overlay network with peer-to-peer technologies,and then associate nodes in the same web services compositiondomain to form the web services composition network, accord-ing to domain ontology and its reasoning ability [26].

Xiangwei and Zhicai present independent global constrains-aware web service composition method based on color petrinet (CPN) and GA. First, a CPN modeling method that candescribe multiattribute multiconstraint relations and associaterelation between component services is proposed. Second, com-bining with the properties of CPN, GA is used to search a legal

firing sequence in the CPN model, and the composite servicecorresponding to the legal firing sequence. Using legal firingsequences of the Petri net makes the service composition locat-ing space based on GA shrink greatly. Theoretical analysis andexperimental results indicate that this method owns both lowercomputation cost and higher success ratio of service composi-tion [27].

Lee et al. proposed an advanced SOA model using web ser-vice based on the IA platform. They aim to support the inter-operability of service oriented and large-scale domain throughintegration of two technologies in their research. Their architec-ture, termed “AgWebs” (the web services based on intelligentagent platform), contains several components that operate oninternal registries to maintain records of all registered services(both agent platform and web based). These services can thenbe seamlessly invoked via the client of AgWebs, whether it bean agent service invoking a web service or vice versa [28].

To sum up, most researchers in this field used EA for optimalweb service composition; however, some of the proposed algo-rithms need high processing resources, and some of them aresuitable for web services composition but not suitable for opti-mizing medical data due to the large number of medical data.In this paper, we propose an EA that mainly depends on thePareto dominance EA that can optimize medical data such asdiagnostics and treatment process according to multiobjectiveQoS attributes. Our algorithm can find a set of optimal solutionsand alternative solutions based on the distance ranking function.

III. PRELIMINARIES OF MEDICAL DATA QUALITY

Data quality refers to many different aspects. In Table I, as-pects of the data quality were grouped into two categories ofdimensions: measurable dimension and intangible dimension.However, the main focus of the medical data quality in thisresearch has been on the measurable accuracy of data qualitydimension. The accuracy of medical data in this study refersto the reality-based presentation of the medical data from thedata extraction process during the healthcare governance cyclespecified in Fig. 1. To receive an accurate set of medical datafor healthcare consultation, this study has designed healthcareIAs to monitor and track the data extraction process.

A. Healthcare Governance Cycle

In order to design IAs to monitor and to keep track of medicaldata processing, the healthcare governance cycle is illustratedin Fig. 1. Within the healthcare consultation, the general practi-tioner (GP), such as a family doctor, uses a networked healthcaremaintenance organization to find relevant healthcare knowledgefor the treatment.

The healthcare data from each consultation will be stored ina medical database. The medical database records informationin a concise format with compressed detailed clinical codingthat includes symptoms, diagnostic results, treatments, prescrip-tions, and other medical information for the consultation. Whena piece of particular medical information is retrieved for furtheror next healthcare consultation/reference, the compressed med-ical data must be extracted to an understandable format for GPs.


TABLE IASPECTS OF THE DATA QUALITY

Fig. 1. Healthcare consultation governance cycle.

The feedback on the medical data quality will be conductedto improve the patient care for the next iteration of healthcareconsultation.

The major concern of the medical data quality is drawn fromthe data extraction process. One of the major challenges in thehealthcare domain is the extraction of comprehensible knowl-edge from medical diagnosis data. Data accuracy and consis-tency must be maintained during the extraction process. In orderto make sure that the data extraction process maintains a goodquality of up-to-date medical information, medical data qualitymodels are created for the further design of healthcare IAs.

B. Modeling the Medical Data Quality Using UML

A saying from software engineering is [29] “If you can modelit, you can implement it.” We have designed the class diagram,the activity diagram, the use case diagram, and the sequencediagram for modeling the static view of the medical data qualityand the dynamic behavior of the medical data extraction process

Fig. 2. Medical data quality modeling—use case diagram.

using UML. By developing models, we are able to look into thedetails of the medical data recording/retrieval process, as well asthe data extraction process. This will help us design the multipleIAs to monitor and track the data recording/retrieval and extrac-tion processes to assist in medical data quality improvement.

Use case diagram is a tool for modeling the features andfunctions of an information system. The use case diagram inFig. 2 shows that our system consists of medical data qualityfeatures such as data extraction, data migration, data cleaning,data integration, data processing, and data analysis. Data anal-ysis involves feedback and quality assessment. This use casediagram is the first step toward the behavior definition of themedical data quality involved in a healthcare information pro-cess. For this particular study, we only focus on the medicaldata quality of data extraction. The rest of medical data qualityissues specified in Fig. 2 has been reserved for future study anddevelopment.

The class diagram of medical data quality model in Fig. 3contains all classes/objects that associate with medical data pro-cesses and queries. Each class/object in the model was used togenerate data quality metrics/attributes for IAs to keep track-ing and monitoring of medical data. When the data extractionprocess is conducted, the medical information in classes/objectswill be collected in the medical knowledge base for inferenceconducted by IAs.

The activity diagram in Fig. 4 models the activities andtasks involved in the medical data extraction process. Thesekey activities include using hospital query language for hospi-tal information systems, medical data recording process, andlooping process for data update. The activity diagram helps usto design the internal monitoring process of healthcare IAs.The sequence diagram of the medical data model in Fig. 5shows the process sequence of medical data extraction. The se-quence diagram enhances the process definition from the activity


Fig. 3. Medical quality modeling—class diagram.

diagram. A healthcare IA uses the previously defined UMLmodel to identify quality items to be monitored.

According to the previous system modeling, in the next sec-tion, artificial IAs will be proposed for discovering, monitoring,and reporting any inconsistencies in the medical records.

IV. OPEN DESIGN OF AN IA PROTOTYPE

Once both dynamic and static models for the medical dataquality have been designed, the IA was developed accordingto the medical quality models to monitor and to keep track ofmedical data recording and processing. Our IA design is an openand collaborative infrastructure so that each agent has the samestructure enabling communications with each other.

An agent, as illustrated in Fig. 6, consists of three servicecomponents: collaboration service, quality monitor service, andreporting service as described in Fig. 6.

The collaboration service enables agents to plug and playmedical web forms for portable medical records and to plugand play data workflows, medical protocols, and clinical guide-lines in a distributed heterogeneous medical information envi-ronment, and enables information exchange service among IAs.There are two existing methods that can be used to implementIAs: Java expert system shell (JESS) and Java agent develop-ment framework (JADE).

The interior implementation of services for an agent wascarried out by using JESS. The exterior communication behaviorin a distributed e-healthcare environment was carried out byusing JADE. In general, healthcare IAs have been developed toassist e-healthcare information system in medical data quality

Fig. 4. Medical data quality modeling—activity diagram.

Fig. 5. Medical data quality modeling—sequence diagram.


Fig. 6. Open-module design of an intelligent agent.

Fig. 7. Intelligent agent inference structure.

improvement activities.1) Update medical knowledge base.2) Define criteria for healthcare data query.3) Determine if a threshold value of data quality has been

reached.4) Optimize data quality for the accurate diagnosis.5) Keep track of patients’ healthcare profile.6) Communicate with other agents.JESS was used to develop and implement the interior infer-

ence process of IAs as specified in Fig. 7. The IA can take inputsfrom healthcare experts and transfer the inputs into healthcare

Fig. 8. Healthcare IA.

knowledge for the inference engine to make consultation judg-ments. The healthcare IA was developed with the features de-scribed in Fig. 8. It is able to take inputs from medical eventssuch as patient’s medical history, diagnosis knowledge from ex-perts, and medical symptoms. The interior features include up-dating medical knowledge, defining criteria for hospital queries,determining if the data quality threshold value has been reached,keeping track of patient’s medical profiles, and communicatingwith other agents.

A. Rule-Based Decision Making in IAs

The IA takes the optimized healthcare task sequence fromthe optimization algorithm. The IA can detect any healthcareprocess in the medical data that does not follow the optimizedalgorithm task sequence.

All healthcare task sequences should be consistent with theoptimized task sequence. The IA monitors the consistency ac-cording to specific rules written in JESS, such as the followingexample.

In the aforementioned example, three global variables aredefined in lines 1, 2 and 3. Optimized task sequence is a listof variables that consist of optimized medical task sequencecomposition generated by the EA.


ALTTaskSequence is the alternative optimized web servicecomposition solution generated by the EA. Task sequence rep-resents a medical task sequence composition extracted frompatient history or from healthcare medical records.

In line 4, the IA first compares the extracted task sequencewith the optimized task sequence for the web service composi-tion. If the optimized task sequence is not consistent with theextracted task sequence, in line 7, the agent will compare thetask sequence with the alternative optimized task sequence.

In lines 9 and 10, the agent will result in inconsistency if theextracted task sequence does not follow the optimized solutionprovided by the EA.

In the second rule-based example, the agent calculates thequality attributes of accuracy, completeness, consistency, andtimeliness according to predefined matrices as follows.

Lines 1–8 defined the quality attributes. Every QoS attributecan be calculated for the task sequence according to aggregationequations, which are defined in Section VII. The JESS code inthe following can compare every QoS attribute between theoptimized task sequence and the extracted task sequence.

In lines 9–16, the IA compares the optimized task sequenceQoS attributes with the extracted task sequence QoS attribute,and if the extracted task sequence is higher than the optimized,the agent will send the information to the GA to recalculate thematrices and reoptimize the composition.

On the other hand, JADE is a framework that coordinatesand manages the communications between multiple agents ondifferent platforms. The communication architecture offers flex-ible and efficient messaging, where JADE creates and managesa queue of incoming access control list (ACL) messages, privateto each agent.

Basically, agents are implemented as one task per agent, butagents often need to execute parallel tasks. To achieve multi-tasking between agents, JADE schedules tasks in a light andeffective way, by placing tasks into containers and run them as

Fig. 9. Optimizing medical data quality framework in a distributed medicaldata environment.

a queue. JADE provides the shell of the agent and guaranteesthe communication between agents, while JESS is the engine ofthe agent that performs all the necessary reasoning.

After introducing the modeling design of medical data qual-ity and proposing in details the multiagent framework, an opti-mization algorithm is needed to optimize the data quality. Theoptimization algorithm should be able to minimize some qualityattributes such as cost and time; on the other hand, the algorithmshould be able to maximize others such as reliability and accu-racy. In the next section, an EA is proposed to optimize medicaldata quality as well as meeting user’s QoS requirements.

V. OPTIMIZATION OF MEDICAL DATA QUALITY

One of the most important concerns of this study is the med-ical data selection for quality improvement over a distributede-healthcare information environment. The foundation of satis-fying data quality over the distributed medical data environmentcompiles the analysis and construction of medical data servicetask sequence, the automation of composing/optimizing suit-able medical data components, and medical data componentreusability. To satisfy the data quality criteria, we proposed aframework in Fig. 9, where we integrated an IA, a medical datarepository section, and several modules into the SOA.

EAs have been applied as the searching algorithms to searchthe optimal medical data in the distributed e-healthcare infor-mation environment as specified in Fig. 9 and for optimal com-position of web services [30].

Furthermore, the problem of composition in the medicalworkflow is similar to optimizing the web service selection andcomposition [31], [32]. “Survival of the fittest” [33] is a prin-ciple in the natural environment, which is used in the medicaldata selection algorithm to generate survivors, the optimal dataselection in the distributed healthcare environment.

The original principles of the EC theory are based on Darwin’stheory of natural selection to solve real-world problems [5]. EAshave been successfully applied in optimizing the solutions fora variety of domains [5]. The strength of EC techniques comes


Fig. 10. Design of the evolution process of medical data selection.

from the stochastic strategy of search operators. The major com-ponents in EC are search operators acting on a population ofchromosomes. EC was developed to solve complex problems,which were not easy to solve by existing algorithms [5], [9].

The method utilized in the algorithm to progress the searchfrom ancestors to offspring is the collective learning process;species information is collected during the evolutionary process,and the offspring that inherits good genes from parents survivethe competition. This is the first characteristic of EAs. Next, thegeneration of descendants is handled by the search operators,crossover and mutation, which explore variations in speciesinformation in order to generate offspring. Crossover operatorsexchange information between mating partners. On the otherhand, a mutation operator, which mutates a single gene withvery small probability, is used to change the genetic material inan individual. Finally, the third characteristic that defines EAs isthe evaluation scheme, which is used to decide who the survivoris. The evaluation scheme is the most diverse characteristic of thethree due to the different objectives used to select the differentsolutions needed in different domains. The evaluation schemecan be as simple as good or bad a binary decision or as complexas nonlinear using multiple mathematical equations to assesstradeoffs between multiple objectives.

The design objective of this study was to develop an EC-basedprocess incorporated with a current web service transaction pro-cedure (see Fig. 10) to search the optimal medical data qualitysolution space. The space was created by collecting informa-tion of data service components through UDDI registries forthe optimization of medical data web service composition. Thistype of evolutionary process has also been developed and testedin requirements engineering in order to search for the optimalquality solutions for system specification [33].

The fundamental designs of an EC-based process in this studywere focused on the definition of the medical data search space,chromosome structure design, objective function definitions,and the quality fitness assessment algorithm. In general, to applythe process in medical data web service composition, the majorsteps of the process are defined as follows.

1) Collecting the medical data of component registrants: thesize of medical data searching space is decided by thenumber of component registrants collected from avail-able UDDI registries. Therefore, it is very important toobtain the information of all available medical data lo-cations/components from component registration agents.The information regarding the description of service com-ponents can be collected from a component library asspecified in [34]. The communication protocol is basedon a set of application programming interface messages(i.e., UDDI 3.0 and up).

2) Modeling medical data resources from different providers:medical data service components are classified and con-structed into database tables based on the functionalitiesand characteristics of medical data service requested. Thework flow of the medical data service can be modeled byusing a scenario-based method that is used in previous sec-tions to describe the task steps required to accomplish thecompletion of medical data web service applications [33].

3) Applying the sequence of medical data composition andchromosome encoding/decoding: the task sequence ofmedical data that are needed to be optimized is defined. Asubtask service in a task sequence can be defined as

{Componentji , Sub − taskj}

where it is assumed that one subtask can be completed by amedical data service component. By utilizing the collectedinformation of medical data component registrants, a webservice task sequence is transformed into a binary string,i.e., encoding a quality solution into a chromosome. Thechromosome mapping mechanism utilizes a hierarchicalstructure [33] for an encoding/ decoding task sequenceand chromosome.

4) Quality Fitness Assessment: to evaluate the quality ofmedical data optimization, multiparameters or attributesare used in the metrics to evaluate performance andquality.The metric measurement focuses on different aspects thatdata quality criteria require. Such measurement is a keyelement of evaluating the performance and quality of med-ical data optimization.

To demonstrate the effectiveness and efficiency, we usedbreast cancer diagnostic and treatment case study. In the nextsection, we will propose a case study and experimental resultsfor the optimization algorithm.

VI. CASE STUDY

In this section, we propose a case study for e-healthcare,subsequently used throughout the paper. To demonstrate theefficiency of EA, a breast cancer medical data quality is selectedin this case study. The main goal is to find the optimal solutionsfor diagnostics, treatments, and alternative treatments accordingto multiobjective medical data quality metrics. The EA wasimplemented using MATLAB 7.


Fig. 11. Breast cancer web service task sequence.

TABLE IITESTS AND TREATMENT TYPES

Fig. 11 shows the medical data task sequence for breast can-cer. Following a physical examination, wherein the patient hasbeen found to have breast cancer, the next step in the sequencewill be choosing a series of tests that the patient will undergo toprovide treatment [35].

Table II shows the test, treatment, and alternative treatmenttypes for breast cancer. Based on test results, treatments andalternative treatments are decided upon by the doctor. However,the optimization process will help doctors in decision makingby providing an optimal set of solutions that is ranked accordingto QoS attributes. This type of information can be very helpfuland powerful to assist doctors’ decisions.

We focus on a set of data quality dimensions which are pro-vided by IAs, namely accuracy, consistency, completeness, andtimeliness, which constitute the focus of the majority of au-thors [36], [37]. They are defined as follows.

1) Accuracy: the data should be presented as reality or veri-fiable medical resources.

2) Completeness: all specific information has to be repre-sented as complete.

3) Consistency: data should be represented without repeti-tion.

4) Timeliness: medical data should be stored and updatedconstantly.

However, the proposed EA can be used with any type of qual-ity attributes; the algorithm is suitable for multiobjective qualityattributes and multidomain environments. The optimization al-gorithm should be able to maximize accuracy, consistency, andcompleteness. On the other hand, it should be able to minimizetimeliness. After optimizing the medical data, the optimization

Fig. 12 Fittest task sequence representation in terms of consistency, accuracy,and completeness metrics in 3-D solution space.

algorithm will provide the IA with an optimized task sequence.The IA will use the task sequence to monitor the medical datalooking for any inconsistency. If inconsistency is found, theagent will report the incident to the management.

In the next section, we will provide detailed experimental re-sults which clearly show the optimization process. Furthermore,we will present a comparison between our algorithm and otheroptimization algorithms that was conducted and proposed.

A. Experimental Results

We implemented the proposed algorithm and experimentalmedical data were used for testing purposes. As we proposedpreviously, a case study of breast cancer was used with a mul-tiobjective medical data quality: accuracy, completeness, con-sistency, and timeliness. The algorithm was able to maximizeaccuracy, completeness, and consistency. On the other hand, thealgorithm was able to minimize timeliness.

Fig. 12 shows the efficiency of the algorithm in maximizingthe three-objective solution space: accuracy, consistency, andcompleteness. Each point represents the combination of medicaltask sequence to complete the medical diagnostic and treatmentprocess. Our algorithm reaches the fittest task sequence after 12generations which is indicated with an arrow.

Figs. 13 and 14 show how an EA can solve multiobjectiveproblems by using combinations of different quality metricswith different target points. For example, the algorithm was ableto maximize accuracy, consistency, and completeness, whileminimizing timeliness.

The algorithm was able to find optimal medical task sequencecomposition that can be used to support doctor’s decision mak-ing. The algorithm gives a fittest (optimal) medical task se-quence that can be followed which meet healthcare organiza-tions medical data quality requirements.

Fig. 15 shows that the algorithm was able to reach the opti-mal solution within 12 generations. This convergence indicatesthe applicability of the EC algorithms in optimizing medicaldata task sequence according to QoS attributes. The algorithmfinds the optimized solution and also gives alternative solutionsfor testing, treatment, and alternative treatment. The simulation


Fig. 13. Fittest task sequence representation in terms of timeliness, consis-tency, and completeness metrics in 3-D solution space.

Fig. 14. Fittest task sequence representation in terms of timeliness, accuracy,and completeness metrics in 3-D solution space.

Fig. 15. Fitness representation of optimal service composition versusgenerations.

was tested through a set of data quality metrics randomly gen-erated by MATLAB, the results demonstrating that the metriccombinations were optimized by the EC-based process.

Simulation results of the algorithm for the top three optimizedsolutions for breast cancer are shown in Table III. EA finds theoptimal solution and also gives alternative solutions for tests,treatments, and alternative treatments; the fitness value repre-

TABLE IIIOPTIMIZED FITTEST SOLUTIONS

sents the value between the target and the solution, and it will beused as a ranking value to assist doctors’ decisions. Accordingto the simulation results, the first optimal solution with highestfitness is MRI, surgery, and bioflavonoid as presented in row 1and the alternative solutions are represented by rows 2 and 3.

The proposed EA was able to optimize medical data in effi-cient and effective way. However, to prove that our algorithm,which uses evolutionary technique and Pareto dominance, canprovide better performance and accuracy, in the next section, wewill compare our algorithm with other EAs in the literature re-view. The comparison will focus mainly on the fitness functionwhich is a major component to evaluate every EA.

VII. COMPARISON OF ALGORITHMS

In this section, we will compare our EA which depends ondistance-based fitness (DBF) function with penalty algorithmfitness that has been used by many researchers in the literaturereview. The comparison aims to show that using DBF functionwith Pareto dominance can improve the performance and accu-racy of the EA. In this comparison, we used four qualities ofservice attributes: reliability, availability, reputation, and cost.The metric descriptions are as follows.

Reliability: Unlike other quality factors, software reliabilitycannot be measured directly. As specified in [38], software relia-bility is defined in statistical terms as “the probability of failure-free operation of a computer program in a specified environmentfor a specified time.” For simplicity, a simple measure of relia-bility for the measure of each service component was adoptedas defined in [39]. The measure is defined as follows:

MTBF = MTTF + MTTR (1)

where MTBF is the mean time between failure, MTTF is themean time to failure, and MTTR is the mean time to repair.

Availability: Service availability objective was adopted inthis design. The availability was the multiplication of all servicecomponents organized for a service request [40]. The equationis defined as follows:

AServiceA va i la b i l i ty =∏

for all component i

Ai. (2)

Reputation: Service reputation objective was adopted in thisdesign. The reputation was the summation of all service com-ponents over the number of service components organized fora service request [41]. The summation equation is defined as


follows:

RServiceR e p u t a t io n =1n

∑

for all component i

Ri. (3)

Cost: Service cost objective was adopted in this design. Inthis study, only the cost summation of all service componentsorganized for a service request was considered. The summationequation is defined as follows:

CServiceC o s t =∑

for all component i

Ci. (4)

To prove the efficiency for Pareto algorithm, we will compareit with the penalty-based algorithm [21]. Penalty GA creates arandom population as the initial population, and they use onepoint crossover and mutation. The penalty algorithm defined thefitness as follows:

Fitness(x) =

⎧⎨

⎩

0.5 + 0.5 × Fob j(x), if V (x) = 0

0.5 × Fob j(x) − V (x)Vmax

, otherwise

⎫⎬

⎭ .

In our comparison, we do not consider any constraints conflicts,so assume that there are no conflicts between web services; thus,the equation with v(x) = 0 will be used.

And for the object function, they used the following equation:

Fob j(x) =n∑

i=1

(Qmax

i − Qi(x)Qmax

i − Qmini (x)

× Wi

)

+k∑

i=1

(Qi(x) − Qmax

i

Qmaxi − Qmin

i

× Wi

)

where the first part of the equation is used to minimize QoSattributes such as time and cost; on the other hand, the secondpart of the equation is used to maximize the QoS attributes suchas availability, reliability, and reputation.

In our algorithm, a random population will be created: weused multiple crossover points and every task in the task se-quence has two crossover points. We used one point mutationwith very low probability. The DBF function can be defined asfollows:

Fitness =√

(AQ1 − T1)2 + · · · + (AQn − Tn )2

where AQ is the aggregated QoS attribute for a task sequencesuch as cost, reliability, and time, and T is the target specifiedby the user. The fitness function for every chromosome in thesolution space is the summation of all distances between QoSattributes and the user target. For the maximum optimization,the target can be specified as value 1 for the maximized QoSsuch as reliability and availability and value 0 for the minimizedQoS such as time and cost.

Figs. 16–19 show the comparison between the two algorithmsaccording to availability, cost, reliability, and reputation, respec-tively. The simulation data show that the distance-based functionwas able to reach the optimal availability solution in 11 genera-tions, penalty GA reached the optimal solution in 15 generation,which means that our algorithm can reach the optimal solutionin less time and less computational resources. Furthermore, the

Fig. 16. DBF and penalty GA comparison according to availability.

Fig. 17. DBF and penalty EA comparison according to cost.

Fig. 18. DBF and penalty EA comparison according to reliability.

experiment data show that both algorithms were able to mini-mize availability, reliability, and reputation. On the other hand,it was able to minimize cost. However, the DBF evaluates afewer number of generations which means a fewer number ofoperations such as crossover and mutation.

Both algorithms were able to minimize cost and maximizereliability, reputation, and availability. However, our algorithmwas able to provide a set of alternative solutions as well asnear optimal solutions that are ranked according to the distancefunction; those solutions can be used when the optimal solutionis not available or the QoS requirements are changed.

According to the previous results, our algorithm was able tofind a set of optimal solution with very good performance. Thequick convergence illustrates the applicability and efficiency ofthe EA in optimizing medical data task sequence.


Fig. 19. DBF and Penalty GA comparison according to reputation.

VIII. CONCLUSION

To integrate heterogeneous healthcare information systemsand to support healthcare organization decision making, thispaper presented a multiagent framework based on SOA archi-tecture for healthcare information systems. This study startswith creating static and dynamic models for medical data qual-ity in terms of data extraction so that the domain of objects andprocesses is defined. The open design of healthcare IAs followsthe definitions from the medical data quality models. The designof the IA enables the IA to provide external communication forcollaborative service, internal inference shells for monitoringand tracking of the data extraction process, and a printing reportservice.

To solve the problem of data selection and quality optimiza-tion in a distributed e-healthcare environment, an evolution com-puting algorithm was integrated into the SOA of web service. Inthe SOA, the healthcare IA also plays a major role as the serviceagent in medical data registration and data requesting services.This multiagent framework has been developed using JESS andJADE. The system will be practically deployed and integratedwith e-healthcare information systems for our local hospitals.

The experimental results show that the proposed EA was ableto optimize the medical data workflow on a medical experimen-tal data for breast cancer case study. The results proved that thealgorithm is able to function accurately and efficiently on mul-tidomains with multiobjectives QoS attributes. Furthermore, acomparison between the proposed fitness function and penaltyfitness function has been proposed. The results show that theproposed algorithm was able to find the optimal solution infewer generations which means better performance.

In this study, we have focused on medical data extractionfor the SOA-based healthcare systems. In terms of future work,we will expand the system to cover other medical aspects inthe proposed UML model such as data migration, cleaning, andprocessing. Furthermore, the optimized workflow will be usedto discover and compose web services as software componentsto satisfy the medical needs from different providers.

REFERENCES

[1] L. T. Kohn, J. M. Corrigan, and M. S. Donaldson, To Err Is Human:Building a Safer Health System, Washington, DC: National AcademiesPress, 2000.

[2] T. A. Branigan. (2002, Feb.). “Medication errors and board prac-tice,” Colorado State Board of Pharmacy News, National Associa-

tion of Boards of Pharmacy. [Online]. Available: www.nabp.net/ ftp-files/newsletters/CO/CO022002.pdf

[3] M. Iyer, “Medical Error in Top Ten Killers: WHO,” The Times of India,Apr. 20, 2011.

[4] R. Sokolowski, “Expressing health care object in XML,” in Proc. IEEE8th Int. Workshops Enabling Technol.: Infrastructure Collaborative En-terprises, Palo Alto, CA, 1999, pp. 341–342.

[5] D. A. Menasce, “Web server software architectures,” IEEE Internet Com-put., vol. 7, no. 6, pp. 78–81, Nov./Dec. 2003.

[6] Web Services Organization. (2004). [Online]. Available: http://www.webservices.org

[7] I. Y. Ko and R. Neches, “Composing web services for large-scale tasks,”IEEE Internet Comput., vol. 7, no. 5, pp. 52–59, Sep./Oct. 2003.

[8] Y. Hayashi and R. Setiono, “Combining neural network predictions formedical diagnosis,” Comput. Biol. Med., vol. 32, no. 4, pp. 237–246,2002.

[9] Business Process Execution Language for Web Services Im-porter/Exporter Technology. (2004). [Online]. Available: http://www.ibm.com

[10] F. Kart, L. E. Moser, and P. M. Melliar-Smith, “Building a distributed E-healthcare system using SOA,” IT Professional, vol. 10, no. 2, pp. 24–30,Mar./Apr. 2008.

[11] E. Papageorgiou, C. Stylios, and P. Groumpos, “A combined fuzzy cogni-tive map and decision trees model for medical decision making,” in Proc.28th Annu. Int. Conf. IEEE Eng. Med. Biol. Soc., 2006, pp. 6117–6120.

[12] M. C. Tsai, P. Dev, L. J. Leifer, and K. L. Melmon, “Cooperative medicaldecision making and learning by the sharing of web-based electronicnotebooks and logs,” in Proc. 12th IEEE Symp. Comput.-Based Med.Syst., 1999, pp. 90–95.

[13] C.-S. Wu, W.-C. Chang, and I. K. Sethi, “A metric-based multi-agentsystem for software project management,” in Proc. IEEE/ACIS 8th Int.Conf. Comput. Inf. Sci., 2009, pp. 3–8.

[14] P. M. Devamalar, V. T. Bai, N. Murali, and S. K. Srivatsa, “Design of realtime web centric intelligent health care diagnostic system using objectoriented modeling,” in Proc. 2nd Int. Conf. Bioinformat. Biomed. Eng.,2008, pp. 1665–1671.

[15] F. Bellifemine, A. Poggi, and G. Rimassa, “Developing multi-agent sys-tems with a FIPA-compliant agent framework,” Softw. —Pract. Exp.,vol. 31, pp. 103–128, 2001.

[16] Y.-G. Gong and X. Chen, “Healthcare information integration and sharedplatform based on service-oriented architectures,” in Proc. 2nd Int. Conf.Signal Process. Syst., 2010, vol. 2, pp. V2-523–V2-527.

[17] F. Kart, M. Gengxin, L. E. Moser, and P. M. Melliar-Smith, “A distributede-healthcare system based on the service oriented architecture,” in Proc.IEEE Int. Conf. Service Comput., 2007, pp. 652–659.

[18] W. Wang, M. Wang, and S. Zhu, “Healthcare information system inte-gration: A service oriented approach,” in Proc. Int. Conf. Services Syst.Services Manage., 2005, vol. 2, pp. 1475–1480.

[19] S. C. Chu, “From component-based to service oriented software architec-ture for healthcare,” in Proc. 7th Int. Workshop Enterprise Netw. Comput.Healthcare Ind., 2005, pp. 96–100.

[20] W. M. Omar and T. A. Bendiab, “E-health support services based onservice-oriented architecture,” IT Professional, vol. 8, no. 2, pp. 35–41,Mar./Apr. 2006.

[21] L. Ai and M. Tang, “A penalty-based genetic algorithm for QoS-awareweb service composition with inter-service dependencies and conflicts,” inProc. Int. Conf. Comput. Intell. Modelling Control Autom., 2008, pp. 738–743.

[22] S. Bahadori, S. Kafi, K. Zamani far, and M. R. Khayyambashi, “Optimalweb service composition using hybrid GA-tabu search,” J. TheoreticalAppl. Inf. Technol., vol. 9, no. 1, pp. 10–15, 2009.

[23] M. Tang and L. Ai, “A hybrid genetic algorithm for the optimal constrainedweb service selection problem in web service composition,” Proc. IEEECongr. Evol. Comput., pp. 1–8, 2010.

[24] G. Canfora and M. Di Penta, “A lightweight approach for QoS-awareservice composition,” presented at the 2nd Int. Conf. Service OrientedComput., New York, 2004.

[25] L. Zeng, B. Benatallah, M. Dumas, J. Kalagnanam, and Q. Z. Sheng,“Quality driven web services composition,” presented at the 12th Int.Conf. World Wide Web, Budapest, Hungary, 2003.

[26] L. AnFeng, C. ZhiGang, He Hui, and G. WeiHua, “Treenet: A web ser-vices composition model based on spanning tree,” in Proc. 2nd Int. Conf.Pervasive Comput. Appl., 2007, pp. 618–623.

[27] L. Xiangwei and X. Zhicai, “Independent global constraints web servicecomposition optimization based on color petri net,” in Proc. Int. Conf.Comput. Intell. Natural Comput., Jun.6–7, 2009, vol. 2, pp. 217–220.


[28] S.-H. Lee, K.-H. Choi, H.-J. Shin, and D.-R. Shin, “Ag Webs: Web servicesbased on intelligent agent platform,” in Proc. 9th Int. Conf. Adv. Commun.Technol., 2007, vol. 1, pp. 353–356.

[29] A. Perkins, “Business rules = meta-data,” in Proc. Int. Conf. Technol.Object-Oriented Language Syst., 2000, pp. 285–294.

[30] F. Lecue and N. Mehandjiev, “Seeking quality of web service compositionin a semantic dimension,” IEEE Trans. Knowl. Data Eng., vol. 23, no. 6,pp. 942–959, Jun. 2011.

[31] H. Tong, J. Cao, S. Zhang, and M. Li, “A distributed algorithm for webservice composition based on service agent model,” IEEE Trans. ParallelDistrib. Syst., vol. 22, no. 12, pp. 2008–2021, Dec. 2011.

[32] Hadad, J. El, M. Manouvrier, and M. Rukoz, “TQoS: Transactional andQoS-aware selection algorithm for automatic web service composition,”IEEE Trans. Services Comput., vol. 3, no. 1, pp. 73–85, Jan./Mar. 2010.

[33] W. C. Chang, “Optimising system requirements with evolutionary algo-rithms,” in Dept. Computation, The University of Manchester Institute ofScience and Technology, Manchester, U.K., 2004.

[34] J. Yang, “Web service componentization,” Commun. ACM, vol. 46,pp. 35–40, 2003.

[35] National Cancer Institute. (2011). [Online]. Available at:http://www.cancer.gov/cancertopics/treatment/breast

[36] C. Batini, C. Cappiello, C. Francalanci, and A. Maurino, “Methodologiesfor data quality assessment and improvement,” J. ACM Comput. Surveys,vol. 41, pp. 1–52, 2009.

[37] L. L. Pipino, Y. W. Lee, and R. Y. Wang, “Data quality assessment,”Commun. ACM, vol. 45, pp. 211–218, 2002.

[38] R. S. Pressman, Software Engineering: A Practitioner’s Approach. NewYork: McGraw-Hill, 2004.

[39] J. D. Musa, A. Iannino, and K. Okumoto, Engineering and ManagingSoftware With Reliability Measures. New York: McGraw-Hill, 1987.

[40] X. Mei, F. Zheng, A. Jiang, and S. Li, “QoS aggregation evaluation of webservices composition with transaction,” in Proc. Int. Conf. Inf. Technol.Comput. Sci., 2009, vol. 2, pp. 151–155.

[41] M. Li, T. Deng, H. Sun, H. Guo, and X. Liu, “GOS: A global optimalselection approach for QoS-aware web services composition,” in Proc.5th IEEE Int. Symp. Service Oriented Syst. Eng., 2010, pp. 7–14.

Ching-Seh Wu received the M.S. degree from theU.S. Air Force Institute of Technology, Dayton, OH,in 1993, and the Ph.D. degree from Texas A&M Uni-versity, College Station, in 2000, both in computerscience.

In 2010, he joined the Department of Com-puter Science and Engineering, Oakland University,Rochester Hills, MI. His past 20 years of experi-ence in software development and current researchinterests have focused on both theoretical and practi-cal issues of software engineering, web services, and

cloud computing. He has published more than 40 peer-reviewed papers anddeveloped software systems such as e-healthcare information system, airborneearly warning and control system for National Defense, air traffic managementand control system, enterprise resource planning, and smart phone applications.

Ibrahim Khoury received the B.S. degree in busi-ness information systems from The University of Jor-dan, Amman, Jordan, in 2007, and the M.S. degreein information, network and computer security fromthe New York Institute of Technology, New York,in 2009. He is currently working toward the Ph.D.degree in computer science at Oakland University,Rochester Hills, MI.

His research interests include software engineer-ing, web services, e-healthcare systems, cloud com-puting, and software testing.

Hemant Shah received the graduate degree inmedicine (M.B.B.S.) with specialization (Masters ofSergery) in obstetrics and gynecology from JabalpurUniversity, Jabalpur, India.

He did his Biomedical Informatics Fellowship atthe National Library of Medicine, Bethesda, MD. Heis currently a Senior Informatics Researcher at HenryFord Health System, Rochester Hills, MI. He is theauthor of the Proteus model for intelligent clinicalprocesses. His main research interests include sys-tems for knowledge engineering by domain experts

for clinical workflows and clinical decision support, semantics in medical data,and service-oriented architecture for healthcare systems.

Documents

06200870