Journal of Visual Languages and Computing · 2018-05-13 · 2. Data mining and data warehousing for EBM In the context of EBM, data mining and data ware-housing provide tools to acquire

Contents lists available at ScienceDirect

Journal of Visual Languages and Computing

Journal of Visual Languages and Computing 25 (2014) 858–867

http://d1045-92

☆ ThisE-m

journal homepage: www.elsevier.com/locate/jvlc

Short Paper

A decision support system for Evidence Based Medicine$

Giuseppe PoleseDepartment of Management & Information Technology, University of Salerno, Italy

a r t i c l e i n f o

Article history:Received 25 September 2014Accepted 30 September 2014Available online 19 October 2014

Keywords:EBMDecision Support SystemsData miningData warehousing

x.doi.org/10.1016/j.jvlc.2014.09.0136X/& 2014 Elsevier Ltd. All rights reserved.

paper has been recommended for acceptanail address: [email protected]

a b s t r a c t

We present a decision support system to let medical doctors analyze important clinical data,like patients medical history, diagnosis, or therapy, in order to detect common patterns ofknowledge useful in the diagnosis process. The underlying approach mainly exploits case-based reasoning (CBR), which is useful to extract knowledge from previously experiencedcases. In particular, we used sequence data mining to detect common patterns in patientshistories and to highlight the effects of medical practices, based on evidence.

We also exploited data warehousing techniques, such OLAP queries to let medicaldoctor analyze diagnosis along several measures, and recent visual data integrationapproaches and tools to effectively support the complex task of integrating and reconcil-ing data from different medical data sources. In addition, due to massive presence oftextual information within the clinical records of many hospitals, text mining techniqueshave been devised. In particular, we performed lexical analysis of free text in order toextract discriminatory terms and to derive encoded information. Finally, the systemprovides user friendly mechanisms to manage the protection of confidential medical data.

System validation has been performed, mainly focusing on usability issues, by runningexperiments based on a large database from a primary public hospital.

& 2014 Elsevier Ltd. All rights reserved.

1. Introduction

Evidence based medicine (EBM) is the conscientious,explicit, and judicious use of current best evidence in makingdecisions about the care of individual patients [28]. The goalof evidence-based medicine is to complement the existingclinical decision making processes with the best availableresearch evidence and patient values. Furthermore, EBMprovides a direction and rationale for clinicians to managetheir patients, and a means to integrate clinical expertisewith the best available research evidence. It has been mainlyused in the evaluation of clinical therapy effectiveness,especially due to the success of Randomized Controlled Trials(RCTs).

ce by Shi Kho Chang.

An effective application of EBM requires five importantsteps: (i) problem definition, (ii) best available evidencesidentification, (iii) critical evaluation, (iv) identified evi-dences and patient values integration, (v) whole processevaluation. However, EBM requires performing large scaletrials, which require high expenses, prodigious labor, andsophisticated infrastructures, so they cannot be performedfrequently [1]. Secondly, prediction of the outcome isrequired at the time of planning, and optimization of allthe possible factors is not easy. Thirdly, a combinatorialexplosion problem may be encountered when all drugs areused in a given clinical situation. Fourthly, the differentbackgrounds of patients can make results controversial,even when almost identical large-scale trials are carriedout [1,2]. Lastly and importantly, large-scale trials cannotproduce or even predict the new treatment.

To tackle the complexity of EBM processes, decisionsupport systems (DSS) are being increasingly used. In this

www.sciencedirect.com/science/journal/1045926X

www.elsevier.com/locate/jvlc

http://dx.doi.org/10.1016/j.jvlc.2014.09.013



http://crossmark.crossref.org/dialog/?doi=10.1016/j.jvlc.2014.09.013&domain=pdf



mailto:[email protected]


G. Polese / Journal of Visual Languages and Computing 25 (2014) 858–867 859

paper we present a decision support system for EBM,which exploits sequence data mining, and advanced dataintegration and data warehousing techniques [16,32,34],together with text mining techniques to process textualinformation within legacy clinical records [6,29,30]. More-over, the system embeds modules enabling the visualspecification of data privacy policies [15], since in thiscontext it is vital protecting confidential medical data. Theproposed system can be used to support decisions aboutthe candidate therapies that can applied to a new patient,by providing medical doctors with information regardingthe treatments of past patients, based on the analysis of alarge number of patients records.

The system is the final output of a publicly fundedresearch project developed in cooperation between Uni-versity of Salerno and Healthware, a company specializedin the production of health care information systems.In particular, the system has been integrated within anexisting health care information system produced byHealthware, namely the Healthware NetCare, which wasconceived to support medical staff in the achievement ofday-to-day medical operations. Finally, we present a userstudy to validate the usability of the current systemprototype.

The paper is organized as follows. In Section 2 weintroduce some principles of data mining and data ware-housing in the context of EBM, whereas related works arepresented in Section 3. In Section 4 we discuss theapproaches underlying the proposed system. The latter isdescribed in Section 5. A user study is presented in Section 6.Finally, conclusions are given in Section 7.

2. Data mining and data warehousing for EBM

In the context of EBM, data mining and data ware-housing provide tools to acquire medical data, to extractrelevant information from them, and to make this knowl-edge available to all the people involved in health care.

Decision support systems for EBM can vary in theirscope. The simplest systems are fed by data concerningdiseases and best practice guidelines to support caredelivery. In addition, more sophisticated systems includevarious clinical internal and external data sources tosupport further decision making in the area of businessmanagement, staff management, and so forth. Relevantdata sources in clinical decision support systems forevidence-based medicine purposes are:

�
Evidence-based guidelines (in the form of rules). � Clinical data (patient data, pharmaceutical data, med-
ical treatments, length of stay).
� Administrative data (staff skills, overtime, nursing care
hours, staff sick leave).
� Financial data (treatment costs, drug costs, staff sal-
aries, accounting, cost-effectiveness studies).
� Organizational data (room occupation, facilities,
equipment).

One of the most relevant applications used to extractknowledge from medical data sources is data mining.

In particular, one of the most frequently encounteredinstances of data mining in the context of EBM is dataclustering. The latter involves grouping the data into classesor clusters so that objects within a cluster have high similarity,while objects from different clusters are dissimilar.

The data mining approach underlying the proposedsystem is based on Sequence Clustering, which enabled usto detect interesting and hidden knowledge on clinicalactivities.

The most relevant application fields for data ware-housing in the area of evidence-based medicine are:

1.
The generation process of the evidence-based guidelines. 2. The clinicians at the point of care delivery, by making
evidence-based rules available.
3. The monitoring of clinical treatment pathways. 4. The administrative and management tasks, by provid-
ing evidence-based knowledge, as well as diverseorganizational and financial data.

The medical data warehouse we have built containsdata from patients medical records as well as evidence-based guidelines. They are prepared and offered to bequeried and analyzed at will.

Often, clinical management is interested in finding outwhich treatments and medications led to more rapid andcheaper patient convalescence. Data mining and OLAPanalytical functions support business decision makers increating the most effective business strategies satisfyingboth patient expectations and financial potential.

2.1. Visual data mining

Visual data mining exploits data visualization techni-ques to help humans in the identification of possiblepatterns and structures in complex data. It can be seenas a hypothesis-generating process: the user generates ahypothesis about relationships and patterns in the data[10,18].

Visual data mining has several advantages over theautomatic data mining methods. It leads to a faster resultwith a higher degree of human confidence in the findings,because it is intuitive and requires less understanding ofcomplex mathematical and computational backgroundthan automatic data mining. This makes visual datamining suitable for decision support in EBM, since itincreases the participation of the medical doctor to thedecision process. Moreover, visual data mining it is effec-tive when little is known about the data and the explora-tion goals are vague, since these can be adjusted duringthe exploration process. It can provide a qualitative over-view of the data and it can allow unexpectedly detectedphenomena to be pointed out and explored using furtherquantitative analysis [19].

The visual data mining process starts by forming thecriteria about the visualizations to choose and the attri-butes to display. These criteria are formulated according tothe exploration task. The user recognizes patterns in openvisualizations and selects a subset of items s/he is inter-ested in. The result of this selection is a restriction of the

G. Polese / Journal of Visual Languages and Computing 25 (2014) 858–867860

search space, which may show new patterns to the enduser. The whole process can then be repeated on theselected subset of data items. Alternatively, new visualiza-tions can be added. The process continues until the user issatisfied with the result, which represents a solution toher/his initial problem. The user has full control over theexploration process, by interacting with the visualizations.

Visual data mining has been used in a number ofdisciplines. Some examples include detecting telephonecall frauds by a combination of directed graph drawingsand barplots [8], classifications based on parallel coordi-nate plots [17], and in temporal medical data analysis bymeans of 3D parallel histograms [7].

2.2. Combining automatic and visual data mining

The efficient extraction of hidden knowledge requiresskilled application of complex algorithms and visualizationtools, which must be applied in an intelligent and thought-ful manner, based on intermediate results and backgroundknowledge. The whole Knowledge Discovery in Databases(KDD) process is therefore difficult to automate, as itrequires high-level intelligence. By merging automaticand visual mining, the flexibility, creativity, and knowledgeof a human are combined with the storage capacity andcomputational power of the computer. A combination ofboth automatic and visual mining in one system permits afaster and more effective KDD process [3,18,20–23,25,26].

3. Related work

In this section we briefly describe approaches andsystems supporting evidence based medical decisions,some of which exploit one or more of the techniquesaddressed in our proposal.

In [1] Abidi et al. describe an Integrated Clinical EvidenceSystem designed to augment the typical literature basedclinical evidence with additional technology-mediated clinicalevidence. They propose a technology-enriched strategy toexploit advanced knowledge management, data mining, casebased reasoning, and internet technologies within traditionalevidence based medicine systems, in order to derive allclinical evidences with heterogeneous modalities.

The four steps in incorporating the best availableresearch evidence in decision making are presented inthe research project described in [9]. The authors formu-late the following steps: asking answerable questions;accessing the best information; appraising the informationfor validity and relevance; and applying the information topatient care. Furthermore, they state that by applyingevidence-based medicine to individual patients requiresdrawing a balance sheet of benefits and harms based onresearch and individual patient data.

Table 1Selected features.

Patient id Patient sexPathological medical history Physiological medical historyFamily medical history Risk factorsClinical problems Medical activities

In [33] Wu et al. show growing evidence indicating thatthe integration of clinical decision support into thecomputer-based patient record can reduce medical errors,enhance patient safety, decrease unwanted practice varia-tions, and improve patient outcomes.

Clinical Pathways are the subject of research by DRGResearch Group of Roeder et al. at the UniversittsklinikumMunster [13]. They investigated eight different interna-tional DRG systems on the basis of data from cardiacsurgery and concluded that the Australian AR-DRG-system excellently matches levels of complexity. Thus, itprovides a good basis for the German R-DRG-system,which will serve for the reimbursement of all in-patientcases, according to the German Ministry of Health.

Stolba et al. [31] propose a federated data warehouseapproach for evidence-based medicine, in order to achievebetter data security. Depersonalization and pseudonymousare used to ensure data privacy for sensible patient data.

Data mining in the area of clinical pathways is thesubject of a research performed by Lin et al. [24]. Theypropose a data mining technique to discover the timedependency pattern of clinical pathways for treating brainstroke. The aim of their research is to discover patterns ofprocess execution sequences and to identify the depen-dent relation between activities in a majority of cases.

With respect to these proposals, our tries to providemore a general solution, by combining data mining, datawarehousing, and techniques for extracting knowledgefrom textual data, which still abound in many legacymedical records.

4. The underlying approaches

The main approach underlying the proposed systemfalls in the category of Knowledge Discovery in Databases(KDD) [14]. The detailed KDD process and the otherapproaches underlying the proposed system are describedin the following subsections.

4.1. Data selection

In the data selection phase we have selected significantdata from a large database of an infectious diseasesdepartment of a big public hospital, which contained datacollected over a period of about 10 years. This has led tothe selection of the features described in Table 1.

4.2. Data pre-processing

We have accomplished an extensive data pre-processing phase. This phase served to identify and solveproblems like data duplication, inconsistencies betweenlogically associated values, missing data, unexpected use

Patient nationality …Epidemiological medical history …Diagnosis …Pharmacological Therapy

Fig. 1. The classification process.


of one or more fields, and inconsistent values possibly dueto different conventions and abbreviations, or to dataentry errors. During this phase, it was necessary to builda data dictionary to correct writing errors and to provideconsistent synonyms or abbreviations.

A problem that required a particular effort was thepresence of free text fields containing significant informa-tion. In fact, data stored in the source database wereinconsistent, often redundant, and not-coded. This pro-blem was solved by combining several approaches, includ-ing natural language processing and data classificationapproaches. In particular, the system has been equippedwith a basic module to process simple free text, and anadvanced module based on conceptual dependency[29,30]. The basic module performs lexical analysis of freetext in order to extract the most discriminatory terms andto derive encoded information. We have first used it toidentify the discriminating terms in the free text. Then, wehave built a stop-word list, consisting of all the frequentlyused and not discriminatory terms in the text (e.g. articles,conjunctions, or prepositions). Next, each term matching astop-word has been deleted in order to maintain onlydiscriminating terms. The advantages gained by using astop-word list were basically two: reduced space requiredfor storing information, and overall improvement of dataquality. For this phase we did not need use the modulebased on conceptual dependency. The latter will bedescribed at the end of this section.

As for data classification, the analysis of the sourcedatabase showed the presence of data quality problems,such as null values of attributes representing standardcoding of descriptive attributes (e.g. the association diag-nosis – code ICDIX). Data Mining and Text Mining techni-ques were applied in order to build a classification systemcapable of detecting the code associated to a descriptionwhen missing. A sketch of the classification process isdepicted in Fig. 1.

4.3. Data transformation

Some data also needed to be transformed to suit ourdata-mining model. The result of the transformation phaseis a data set to be successively processed through asequence clustering algorithm. To this end, we needed toconstruct a table of clinical cases, but not all of them couldbe described through a single row of data. For example, aclinical case might need two tables to be represented, onecontaining patient information, and another containingpatient diagnoses, with a one to many relationship, whichis represented through a nested table. The data in a nestedtable can be used both for training the data mining modeland for predicting therapeutic activities. For example, inthe constructed model we considered two columns corre-sponding to two nested tables: one related to the Diag-nosis nested table contains a list of patient diagnoses,whereas the other column, corresponding to the Therapynested table, contains patients therapies information.In this scenario, we have used the diagnoses and therapiespatient's related information as input of the predictionprocess, in order to foresee a suitable therapy based on theclinical history of the patient and of similar patients.

4.4. The data mining strategy

The data mining strategy we have used is based on asequence clustering algorithm. It allows to find the mostcommon sequences by grouping those that are similar oridentical. In particular, it finds clusters of clinical casescontaining similar paths in a sequence, instead of findingclusters of clinical cases containing similar attributes.

The application of the sequence clustering algorithmrequired the design of a data set focused on the concept ofPatient and information related to him/her. Furthermore, ithas been necessary to define roles of data to determinehow the algorithm works and what it predicts. In order todesign the data mining model it was also necessary toundergo a conceptual and a logical design phase, so as torepresent an abstraction of the phenomenon to be ana-lyzed. The design model focuses on the concept of Patientand on the data sets capturing all important details onhim/her. Table 2 shows the schema of the data set onwhich we have applied our sequence clustering algorithm.

4.5. The theory of conceptual dependency

The theory of conceptual dependency (CD theory, forshort) is a pictorial formalism developed by Roger Schankin the 70s [29] for representing complex events byelementary ones. Introduced for natural language under-standing purposes, it has been successively used also forvisual language understanding [6].


A CD representation of an event (also called conceptua-lization) is composed of objects belonging to four classeslinked together by rules. The classes of CD objects are ACT(actions), PP (Picture Producers), AA (Action Aiders), and PA(Picture Aiders). The class ACT contains eleven elementaryactions, like, for instance, PTRANS (Physical TRANSfer),indicating a position transfer of an object, or GRASP,representing the act of grasping an object by an actor.The class PP contains humans, animals, or objects. The roleof classes AA and PA specify more precisely the semanticsof the objects involved in the conceptualization. Examplesof rules are PP-ACT or PP⟨� ⟩PA. The first rule states thatPP is the agent of the ACT, whereas the second one statesthat PP is in the state PA.

Reasoning tasks on CD representations have beenimplemented as LISP programs.

Table 2Dataset schema.

Patient Medical history Risk factors Diagnosis Clin

Fig. 2. System a

4.6. The theory of scripts

Scripts were introduced by Schank and Abelson [30] tomodel the daily life stereotypical situations. As an exam-ple, they can be used to describe a surgery, or a patientundergoing a therapy. They are frame-like knowledgestructures representing prototypical knowledge, and con-tain sequences of scenes involving a set of objects (Props)and a set of people (Roles). Scenes contain general actionsaiming to reach the same goal. Typical reasonings taskswith scripts are:

�

ical

rch

understanding: To interpret (understand) the occurredevents using its own knowledge.

�
reasoning about occurred events: To make inferencesabout occurred events. Moreover, script knowledge can
problems Medical activities Pharmacologic Therapy

itecture.


be used to supply the lack of information in occurredevents, using default values. Scripts can also be used toforesee events.

�
answering questions about occurred events. � summarization: Summarize by selecting only the most
relevant events.

From the computational point of view, script basedunderstanding means to select a script and to fill its slotswith the information coming from the input.

5. The proposed system

The knowledge discovery modules have been imple-mented as a component of a pre-existing InformationSystem called Healthware NetCare. It is a complex set ofapplications conceived to support medical staff in theachievement of day-to-day medical operations, includingthe management of research protocols, the analysis of datafor scientific and administrative purposes, and the infor-mation sharing for EMR interoperability. Additional mod-ules are responsible for processing CD forms and scripts.

From an architectural standpoint, the proposed systemis composed of a client side and a server side. The latterimplements Data Management services, whereas the cli-ent side is composed of a set of software modulesimplementing the business logic, the presentation logic,and the user interface.

Fig. 3. Conceptual schema of the

5.1. System architecture

The knowledge discovery modules of the system arestructured according to a classical three tier architecture(Fig. 2).

5.1.1. Client tierThe client tier is composed of a web application

representing the user interface. In particular, the usermight perform clustering analysis, explore resulting clus-ters, and perform exploratory data analysis on StandardTherapeutic Protocols and their applications. Operationsrequested by the user are handled by the RequestDis-patcher component, which relies on the http protocol toaccess the intermediate tier services.

In order to enhance user experience, we adopted theSVG technology to provide visual and interactive repre-sentations of clusters and protocols. The user mightexplore data using several detail levels, zooming in andout the chart, and interacting with its elements. S/hemight also access detailed information in a hyper-textualfashion.

5.1.2. Logic tierThe Logic Tier is composed of a set of software compo-

nents implementing the services provided in the applica-tion. In particular, at this level we find the two mainservices: Data Mining and Visualization. Data Mining servicesare provided by the DataMiningService component, which is

Healthcare Data Warehouse.


based on a Sequence Clustering Algorithm. The Data MiningService is responsible for the analysis of clinical records inorder to find evidences in diagnostic-therapeutic pro-cesses of patients. Visualization Services are provided bythe VisualizationService module, which translates EMR,data on clinical activities, and data mining results in achart expressed in the DOT language, and successivelyperforms SVG translation through the well known soft-ware GraphViz. The resulting SVG chart is returned asoutput to the client tier.

Another relevant component is the OLAP module, whichis responsible for supporting analytic queries. Thanks to theservices provided by this module, the user might analyzediagnosis under several measures: re-evaluation number,duration, patients, therapy, and so forth.

CD forms and scripts are located in this tier. The formersare responsible for interpreting textual information withinclinical records. Scripts can be used to complement the

Fig. 4. Logic schema of the He

knowledge discovery process, such as to capture anom-alous health situations and to produce logic consequencesto be submitted to the medical doctor. When an anomaly isidentified, an alert signal is produced, and a set of scripts isused, which is ruled by three operations: selection, activa-tion, and application.

The selection operation selects one or more scripts fromthe knowledge base. These are candidates to the under-standing of the current situation.

The activation is the operation deciding the script (amongthe selected ones) to be used in the current situation. Onceactivated, a script is applied, i.e., its slots are filled with theobjects, roles, and actions from the situation being processed.At the end of the application the script becomes instantiated,and it returns a description of the meaning for the processedsituation. An activated script foresees the actions that willtake place, and therefore it is useful to detect anomalous andsuspicious clinical situations.

althcare Data Warehouse.


5.1.3. Data tierThe Data Tier implements the data model of the

application. Its main service is the Integration Service,which enables the Logic Tier modules to control the accessto clinical data. This module also contains packageenabling the visual specification of data privacy policies,in order facilitate the specification and implementation ofdepersonalization and pseudonymisation of data to guar-antee privacy for sensible patient data [15].

The Healthcare Data Warehouse is located in this tier toprovide OLAP Query and Visualization Service. We identifiedthe concept of “Diagnosis” as the main subject, and definedthe “number of re-evaluations” as fact measures, whereas“patient”, “therapy”, “date”, and “protocol” as analysis

Fig. 5. Conceptual schema o

Table 3The usability questionnaire.

Category Id Que

General evaluation Q.2.1 TheQ.2.2 UsinQ.2.3 The

Special judgment Q.3.1 TheQ.3.2 TheQ.3.3 The

Tool learning Q.4.1 LearQ.4.2 TheQ.4.3 RemQ.4.4 TheQ.4.5 TheQ.4.6 TheQ.4.7 Sug

Information grant Q.5.1 IconQ.5.2 Each

dimensions. Each of these dimensions is characterized by aset of dimensional attributes like “age” or “country” forpatients, as shown in the conceptual schema in Fig. 3.

Afterwards, the conceptual schema has been trans-formed into a star schema (see Fig. 4). Main efforts havebeen devoted to the analysis and reconciliation of theoperational data sources [4,5,11], which has first led to theconceptual reconciled schema shown in Fig. 5, and there-fore the logical reconciled schema.

6. System usability

A test was carried out to evaluate the usability of theimplemented system, and in particular of the data mining

f the re-conciled layer.

stion

tool provides a nice user interfaceg the tool is simplearoused feeling by the tool use is satisfactory

user interface is pleasanttool is simple to usetool proposes specific error messages

ning to use the tool is simplerequired time to use the tool is appropriateembering the commands and their use is simplenumber of steps to carry out a task is appropriatetime to examine medical records and therapy protocols is appropriatenumber of steps to compare protocol applications is appropriategested Therapeutic Activities are correct

names and objects have a clear meaningset of operations produces a predictable result

Fig. 6. The boxplots for the answers of the usability questionnaire.


and data warehouse modules. The test was accomplishedthrough a one-to-one session (i.e., with a supervisor foreach subject), using the think aloud technique. A group of5 students of our medical school with heterogeneouscomputer skills were recruited, the majority of which werenot familiar with EBM tools and data mining techniques.

6.1. Experiment design

All the subjects underwent an introductory course of60 min on the system and its visual notations. Succes-sively, they have been asked to use the tool for 20 minwith the possibility of invoking tutor support. After that,they were asked to use the system to examine medicalrecords of several patients, without having the possibilityof invoking tutor support.

After accomplishing the task, they were asked to fill ina questionnaire to provide information on the usabilitythey perceived. The questions composing the usabilityquestionnaire were organized into five categories (seeTable 3). Subjects expertise and their general reaction interms of satisfaction degree were evaluated through ques-tions in the categories Subject Background (not shown inthe table) and General Evaluation, respectively. Questionsin the Special Judgment category aimed at assessing theperceived usability with respect to the graphical userinterface. The Tool Learning category aimed at evaluatingthe satisfaction degree to master the tools. Finally, theinformation provided by the tool during its usage wasevaluated through the questions in the Information Grantcategory. For all these questions subjects were asked toprovide a feedback based on a Likert scale [27]: 1 (strongly

agree), 2 (agree), 3 (neutral), 4 (disagree), and 5 (stronglydisagree).

6.2. Results

The data collected from the usability questionnaire arevisually summarized in the box-plots of Fig. 6. This figureshows a good distribution of the answers for each questionfrom the questionnaire.

The General Evaluation was fairly good for the majorityof the subjects, as the boxes for questions Q.2.1, Q.2.2, andQ2.3 reveal. Nearly the totality of the subjects had a goodreaction concerning the system usage, as the boxes forquestions Q.3.1, Q.3.2, and Q3.3 show. However, the subjectID2 expressed a better judgment on the error messagesproposed by the tool (see Q.3.3 box). Moreover, on theperceived simplicity of the tool learning, a good agreementhas been achieved. Indeed, on the questions Q.4.4 thesubjects expressed a worse judgment. However, as the boxfor question Q.4.4 shows, the number of steps required toaccomplish a task was considered appropriate. Finally, alsoon the Information Grant category a good agreement levelhas been achieved. In particular, the subjects expressed abetter judgment on the question Q.5.2 (i.e., each set ofoperations produces a predictable result).

7. Conclusion

We have presented a health-care decision supportsystem, which has been used experimentally on a largedatabase of clinical records, in order to show how datawarehousing and data mining can effectively support


evidence-based medicine. The proposed system enablesmedical guide lines identification by exploiting evidencebased clinical history of a patient, standard protocols, andother patients histories. A user study has been presentedto provide an early system validation focusing on usabilityaspects.

A broader validation focusing on the quality of thesystem provided information and the results of inferencesis currently being accomplished. In particular, other thanthe medical data sources we had available, we also neededthe active involvement of medical doctors to assist us inwriting CD forms and scripts for specific medical domains,and to help us evaluate the quality of inference results.To this end, with the help of medical doctors from ourmedical school we are currently integrating data sourcesconcerning a specific disease (Sepsis), for which we arealso writing CD forms and scripts. This will also allow us toperform more sophisticated interpretations of textualinformation contained within medical records, for whichwe are also investigating the use of sketch recognitionstrategies, due the handwritten text abounding in legacyclinical records [12].

References

[1] S.S.R. Abidi, S.R. Abidi, A case for supplementing evidence basemedicine with inductive clinical knowledge: Towards a technology-enriched integrated clinical evidence system, in: Proceedings of theFourteenth IEEE Symposium on Computer-Based Medical Systems,CBMS '01, IEEE Computer Society, Washington, DC, USA, 2001, pp. 5–10.

[2] M. Ankerst, Visual data mining (Ph.D. thesis), Ludwig MaximiliansUniversitat, Munchen, Germany, 2000.

[3] P. Buono, M. Costabile, Visualizing association rules in a frameworkfor visual data mining, Lecture Notes in Computer Science: Inte-grated Publication and Information Systems to Information andKnowledge Environments, Springer Verlag, Berlin, Heidelberg,2005, pp. 221–231.

[4] L. Caruccio, V. Deufemia, M. Moscariello, G. Polese, Data integrationby conceptual diagrams, in: Proceedings of 25th InternationalConference on Database and Expert Systems Applications (DEXA2014), ACM, Munich, Germany, 2014, pp. 131–135.

[5] L. Caruccio, V. Deufemia, G. Polese, Visual data integration based ondescription logic reasoning, in: Proceedings of 18th InternationalDatabase Engineering & Applications Symposium (IDEAS), ACM,Porto, Portugal, 2014, pp. 19–28.

[6] S. Chang, S. Orefice, G. Polese, M. Tucci, A methodology andinteractive environment for iconic language design, Int. J. Hum.–Comput. Stud. 41 (5) (1994) 683–716.

[7] L. Chittaro, C. Combi, G. Trapasso, Data mining on temporal data: avisual approach and its clinical application to hemodialysis, J. Vis.Lang. Comput. 14 (2003) 591–620.

[8] K. Cox, S. Eick, G. Wills, Brief application description – visual datamining: recognizing telephone calling fraud, Data Min. Knowl.Discov. 1 (1997) 225–231.

[9] J.C. Craig, L.M. Irwig, M.R. Stockler, Evidence-based medicine: usefultools for decision making, Med. J. Aust. 174 (5) (2001) 248–253.

[10] U. Demšar, Data mining of geospatial data: combining visual andautomatic methods (Ph.D. thesis), Department of Urban Planningand Environment, School of Architecture and the Built Environment,Royal Institute of Technology (KTH), 2006.

[11] V. Deufemia, M. Giordano, G. Polese, G. Tortora, A visual language-based system for extraction-transformation-loading development,Softw.: Pract. Exp. (2013), http://dx.doi.org/10.1002/spe.220.1.

[12] V. Deufemia, M. Risi, G. Tortora, Sketched symbol recognition usinglatent dynamic conditional random fields and distance-based clus-tering, Pattern Recognit. 47 (3) (2014) 1159–1171.

[13] N. Roeder, P. Hensen, D. Hindle, N. Loskamp, H.J. Lakomek, Clinicalpathways: effective and efficient inpatient treatment, DRG-Research-Group, Universitätsklinikum Münster 74 (12) (2003)1149–1155, http://dx.doi.org/10.1007/s00104-003-0754-z.

[14] U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, From data mining toknowledge discovery in databases, AI Mag. (1996) 37–54.

[15] M. Giordano, G. Polese, Visual computer-managed security: a frame-work to support the development of access control in enterpriseapplications, IEEE Software 30 (5) (2013) 62–69.

[16] W.H. Inmon, Building the Data Warehouse, John Wiley & Sons, Inc.,New York, NY, USA, 1992.

[17] A. Inselberg, Visualization and data mining of high-dimensionaldata, Chemometrics and Intelligent Laboratory Systems 60 (2002)147–159.

[18] D.A. Keim, W. Müller, H. Schumann, Visual data mining, in:D. Fellner, R. Scopigno (Eds.), STAR Proceedings of Eurographics2002, Saarbrücken, Germany, Eurographics Association, Euro-graphics 02 STAR, September 2002.

[19] D.A. Keim, C. Panse, M. Sips, S.C. North, Pixel based visual datamining of geo-spatial data, Comput. Graph. 28 (3) (2004) 327–344.

[20] S. Kimani, S. Lodi, T. Catarci, G. Santucci, C. Sartori, Vidamine: avisual data mining environment, J. Vis. Lang. Comput. 15 (2004)37–67.

[21] I. Kopanakis, N. Pelekis, H. Karanikas, T. Mavroudkis, Visual Techni-ques for the Interpretation of Data Mining Outcomes, SpringerVerlag, Berlin, Heidelberg, 2005, pp. 25–35.

[22] I. Kopanakis, B. Theodoulidis, Visual data mining modeling techni-ques for the visualization of mining outcomes, J. Vis. Lang. Comput.1 (14) (2003) 543–589.

[23] M. Kreuseler, H. Schumann, A flexible approach for visual datamining, Trans. Vis. Comput. Graph. 8 (1) (2002) 39–51.

[24] F.-r. Lin, S.-c. Chou, S.-m. Pan, Y.-m. Chen, Mining time dependencypatterns in clinical pathways, in: Proceedings of the 33rd HawaiiInternational Conference on System Sciences, vol. 5, HICSS '00, IEEEComputer Society, Washington, DC, USA, 2000, pp. 5015.

[25] G. Manco, C. Pizzuti, D. Talia, Eureka!: an interactive and visualknowledge discovery tool, J. Vis. Lang. Comput. 15 (2004) 1–35.

[26] H. Miller, J. Han, An overview, Geographic Data Mining and KnowledgeDiscovery, Taylor and Francis, London, New York, 2001, pp. 3–32.

[27] A.N. Oppenheim, Questionnaire Design, Interviewing, and AttitudeMeasurement, new ed. Martin's Press, London, 1992.

[28] D.L. Sackett, W.M.C. Rosenberg, M.J.A. Gray, B.R. Haynes,S.W. Richardson, Evidence based medicine: what it is and what itisn't, BMJ 312 (7023) (1996) 71–72.

[29] R.C. Schank, Conceptual Information Processing, North-HollandPublishing Co, Amsterdam, 1975.

[30] R.C. Schank, R. Abelson, Script Plans Goals and Understanding,Lawrence Erlbaum Associates, 1977.

[31] N. Stolba, M. Banek, A.M. Tjoa, The security issue of federated datawarehouses in the area of evidence-based medicine, in: Proceedingsof the First International Conference on Availability, Reliability andSecurity, IEEE Computer Society, Washington, DC, USA, 2006,pp. 329–339.

[32] N. Stolba, A.M. Tjoa, The relevance of data warehousing and datamining in the field of evidence-based medicine to support healthcaredecision making, in: K. Ardil (Ed.), Computer Science, vol. 11, Enforma-tika. Vortrag: ICCS 2006, Prag, Czech Republic, 2006, pp. 12–17.

[33] R. Wu, W. Peters, M.W. Morgan, The next generation of clinicaldecision support: linking evidence to best practice, J. Health Inf.Manag. 16 (4) (2002) 50–55.

[34] N. Ye, Introduction, in: The Handbook of Data Mining, LawrenceErlbaum Associates Publishers, 2003.

http://refhub.elsevier.com/S1045-926X(14)00092-5/sbref6











dx.doi.org/10.1002/spe.220.1








































Documents

Journal of Visual Languages and Computing · 2018-05-13 · 2. Data mining and data warehousing for EBM In the context of EBM, data mining and data ware-housing provide tools to acquire