Exploring Clinical Care Processes Using Visual and Data Analytics: Challenges and Opportunities

Embed Size (px)

Citation preview

  • 8/10/2019 Exploring Clinical Care Processes Using Visual and Data Analytics: Challenges and Opportunities

    1/5

    Exploring Clinical Care Processes Using Visual and DataAnalytics: Challenges and Opportunities

    A Position and Demo Paper

    Vikas Kumar1,3, Hyunwoo Park1,4, Rahul C. Basole1,2, Mark Braunstein1,2, MinsukKahng3, Duen Horng Chau3, Daniel A. Hirsh5,6, Nicoleta Serban4, James Bost3, Burton

    Lesnick5, Beth Schissel5,6, Acar Tamersoy3, Michael Thompson51Tennenbaum Institute, Georgia Tech

    2School of Interactive Computing, Georgia Tech3School of Computational Science & Engineering, Georgia Tech

    4School of Industrial & Systems Engineering, Georgia Tech5Childrens Healthcare of Atlanta

    6Pediatric Emergency Medicine Associates, LLC, Atlanta, Georgia

    ABSTRACT

    Healthcare big data is being widely touted as a potential re-source for curbing costs and improving outcomes. However,numerous challenges remain for leveraging this data to itsfull potential. In this position paper we identify the difficul-ties that characterize clinical data, based on our experiencesworking with pediatric asthma data from Childrens Health-care of Atlanta. The specific dataset we explored includesadministrative items, medications, lab results, clinical respi-ratory scores (outcome), timestamps, and demographic in-formation from 5,785 emergency department (ED) visits forasthma exacerbations. We argue that new data and visualanalytic techniques are needed that are specifically tailoredfor solving challenges in healthcare, and we propose charac-teristics that these techniques should support and give ourdesign rationale. To demonstrate how a tool that embod-ies these desirable features may be designed, we introduceAsthmaFlow, an interactive visual analytics prototype toolthat helps clinicians explore and understand the processesinvolved in pediatric asthma emergency department care.

    Categories and Subject Descriptors

    Human-centered computing [Visualization]; Informationsystems [Data mining]; Applied computing [Health careinformation systems]

    General TermsDesign, Algorithms

    Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.KDD14: Data Science for Social Good Workshop New York City, USACopyright 2014 ACM X-XXXXX-XX-X/XX/XX .. .$15.00.

    Keywords

    Visual analytics, asthma, emergency care, pediatric hospital

    1. INTRODUCTIONAchieving cost effective healthcare is one of the most press-

    ing problems that faces society today. Many industry ex-perts point to the incoming tsunami of electronic healthrecord (EHR) data as a possible resource of helping solvethis problem [5, 7]. However, many obstacles remain forleveraging this data to its full potential. A central ques-tion we are exploring is how we can overcome the challengesposed by healthcare data to discover care patterns that pro-duce optimal outcomes at the lowest cost [3].

    To illustrate the magnitude of this problem, consider the

    many types of information that are recorded for a typical pa-tient entering the hospital today: demographic informationincluding race, gender, age, payment type, and numerousother variables; a clinical description of the problem, in-cluding a chief complaint, history of the present illness, pastmedical history, review of systems, and physical examina-tion results, each of which may include dozens of structuredand unstructured data elements; laboratory tests includingbloodwork, other diagnostic tests, and imaging studies; andprescribed medications with specific dosages and instruc-tions. The timestamps of all of these events are recorded inthe EHR but this may be done after the fact. These dataare also sparse and may even be missing one patient mayhave many imaging studes while another has none. The

    challenge is to analyze and visualize this complex data setto yield insights about clinical care and to inform furtherinvestigations for doing so.

    We describe some of the issues that we have faced whenattempting to understand the underlying clinical processesbased on our experience working with pediatric asthma datafrom the Childrens Healthcare of Atlanta (Childrens) emer-gency department. To solve these challenges we require atool that combines data mining and analytics with effectivevisualization.

    Our main contributions include:

    1. Identifying the challenges facing healthcare analytics,

  • 8/10/2019 Exploring Clinical Care Processes Using Visual and Data Analytics: Challenges and Opportunities

    2/5

    specifically when understanding clinical care processes.

    2. Proposing data and visual analytic solutions for thesechallenges.

    3. Demonstrating AsthmaFlow, our prototype tool, inthe context of possible solutions.

    The structure of this paper is as follows: in Section 2we discuss the factors that make healthcare data difficult towork with. In Section 3 we identify specific features that willbe required of new tools for analyzing and visualizing careprocesses effectively. We also introduce AsthmaFlow in thissection. In Section 4 previous attempts at using data andvisual analytics to understand clinical care processes willbe reviewed. Finally, we conclude in Section 5 with futuredirections and implications for research.

    2. CHALLENGES FOR UNDERSTANDING

    CLINICAL CARE PROCESSESIn order to understand the techniques that are needed in

    this area of healthcare analytics, we must first discuss thechallenges presented by healthcare data. We acknowledge

    the substantial body of prior work in machine learning /process mining analytics in healthcare; a comprehensive re-view is beyond the scope of this paper and interested readersare referred to key work [4, 10, 12]. Our contribution is thatwe focus on the synthesis of visual and data analytics tohelp clinicians understand entire clinical processes. This isan important, but sparsely explored, area of research thatis relevant to helping clinicians, who will be the ultimatebeneficiaries of our work. Below, we highlight the majorchallenge areas our research aims to address (Figure 1).

    2.1 Large dataClinical databases are often quite large, requiring storage

    space in the petabyte range. They are large not only patient-

    wise: the 2011 Mayo Clinic data repository contained over1.1 million patient registrations [17]but also in terms ofthe data elements stored as discussed in the introduction.With the large number of attributes such as age, sex, race,etc. comes the curse of dimensionality: when dividing pa-tient populations using multiple filters, very small samplesets result that are not amenable to statistical testing. Fur-thermore, visual analytic tools have a finite array of features(e.g. x-position, y-position, color, etc.) with which to rep-resent these hundreds of variables. Finally, the data maybe distributed throughout the electronic records of variousdepartments, including billing, administration, clinical care,and even medical devices. Assembling and organizing theselarge and distributed datasets can be difficult and time con-suming.

    2.2 Variable semantics and numberClinical data is incredibly variable, both semanticallyand

    numerically. Semantically, EHR data contains different typesof variables and events. While a particular demographictrait (such as age) can be represented as one variable perpatient, events are more complex. An outcome event maybe an associated measure (e.g. HgbA1c) and a timestamp.A medication treatment event might require the medicationname, dose, form and route of administration along withmultiple timestamps representing each time it is given. Ad-ministrative events might simply be a timestamp indication,

    Figure 1: Our proposed challenges and correspond-ing solutions for visualizing and analyzing clinicalcare processes.

    for example, when the patient was admitted. Also there aremany different data types: some variables are categorical,some ordinal, some continuous and some in date and timeformat [4].

    Finally, the variables are not fixed in number: there mightbe a few or hundreds of medications administered to a pa-

    tient during a single admission. This leads to a relationaldatabase structure requiring multiple tables, instead of thetraditional one-table structure consisting of observations andattributes.

    2.3 IrregularityA third challenge is irregularity, by which we mean both

    noisiness and incompleteness. Noisiness refers to the highfrequency of errors in the data. This may occur because ofincorrect logging of data into the EHR by care professionals,either at wrong timestamps or with incorrect values [12].Noise is also present in the values themselves; for examplea lab test that is repeated multiple times in a patient willlikely return different values.

    Additionally, clinical data is rarely complete; the datais recorded at irregular intervals and sometimes it is notrecorded at all. The irregular intervals pose a problem forstandard time series analysis. For example, consider a dia-betic patient who has been hospitalized for high blood glu-cose. During the visit, the sequence of blood glucose mea-surements and that of medication administrations will beirregular with respect to time and each other, making it dif-ficult to determine how the medication affects the glucoselevel over time in that patient. Also, other medications mayhave been given during the treatment, further confoundingthe results. In the clinical trials this problem is largely cir-cumvented by conducting randomized controlled studies, inwhich the medications and intervals between variables of in-terest are strictly regulated while determining the effect thatthe medication has on outcome. However, if we are to lever-age the data collected in the clinical care environment werequire alternative solutions to this problem.

    2.4 Temporal richnessAnother obstacle that characterizes healthcare data is its

    rich temporal domain. The temporal domain also preventsthe breakdown of data into simple rows and features ontowhich traditional clustering, classification, and predictiontasks can be applied. Instead, a typical patient may havehundreds of tests performed and medications administeredin interleaving sequences. Moreover, the sequences may be

  • 8/10/2019 Exploring Clinical Care Processes Using Visual and Data Analytics: Challenges and Opportunities

    3/5

    Figure 2: A screenshot of AsthmaFlow. (a), (b), (c) are the filter criteria panels for patient (e.g. age, sex,race), provider, and care process characteristics (e.g. visit duration, charge, disposition), respectively. (d) isthe main network visualization area. The clinical events are each represented as nodes, and edges representthe connections between consecutive events in a case. (e) is a collection of analytical tools including histogramsand pie charts. (f) allows users to save and load filtered subsets of populations.

    of different lengths, precluding a simple item-by-item com-parison. Events may be associated with multiple times-tamps: for example, a medication event has a timestamp for

    when it was ordered and one for each time it was adminis-tered to the patient. In this case, should events be treated asintervals, or as separate sub-events (a similar question hasbeen asked in [13])? The data includes not only the relativesequence of events, but also the actual time of each event,something that many current sequence mining algorithmsdont consider. Finally, an additional problem with medicaldata is that of concurrent events [15]. In the clinical settingthe care team may submit multiple lab tests and medicationorders at once, in various combinations. This makes it hardto form true sequences. Are the resulting super-eventstreated as one big event, or as separate events?

    3. DESIGN OF ANALYTIC TOOLS FOR

    HEALTHCAREBecause of the unique and diverse set of challenges out-

    lined earlier, it may be necessary to develop analytic toolsthat are specifically suited for EHR data. We propose thatsuch new tools have certain characteristics (Figure 1). Pre-vious work [10, 4] is not readily applicable here, since itdoes not consider certain aspects (e.g. temporal richness)that are present in clinical data.

    To exemplify such a tool we introduce AsthmaFlow (Fig. 2),a prototype tool for analyzing and visualizing pediatric asthmacare processes in the emergency department. AsthmaFlowwas developed in a D3.js [6] environment while storing the

    data on a lightweight server. We collaborated with Chil-drens Healthcare of Atlanta (Childrens), the largest providerof pediatric health services in the nation. Our data from

    Childrens was in a relational database format and includedadministrative data, medications, lab results, CRS score(outcome) timestamps, charge data, and demographic char-acteristics for 5,785 ED visits for which asthma was the pri-mary problem.

    3.1 Fast data filteringBecause researchers and clinicians may want to test multi-

    ple hypotheses in a short period of time or even on-the-fly,the optimal tools would be able to filter across various pa-tient populations and perform analyses in close to real-time.The interface for AsthmaFlow allows for real-time filtering ofpatients by demographic characteristics including age, sex,race, triage status, and other variables (see Figure 2a - c).

    3.2 IntegrativeData relevant to health analytics is often distributed across

    different parts of the healthcare system such as the billing,pharmacy, and clinical care departments. Effective tools willneed to draw information from these multiple sources in or-der to put forward all-encompassing solutions. As input,AsthmaFlow receives different types of information rangingfrom charge data to administrative timestamps to medica-tions administered to each patient.

    3.3 MultifactorialAs we have discussed, healthcare contains different types

  • 8/10/2019 Exploring Clinical Care Processes Using Visual and Data Analytics: Challenges and Opportunities

    4/5

  • 8/10/2019 Exploring Clinical Care Processes Using Visual and Data Analytics: Challenges and Opportunities

    5/5

    [5] D. Blumenthal. Stimulating the adoption of healthinformation technology. New England Journal ofMedicine, 360(15):14771479, 2009. PMID: 19321856.

    [6] M. Bostock, V. Ogievetsky, and J. Heer. D3

    data-driven documents. Visualization and ComputerGraphics, IEEE Transactions on, 17(12):23012309,2011.

    [7] M. Braunstein.Contemporary Health Informatics.

    AHIMA Press, Chicago, Illinois, 2014.[8] C.-W. Cheng, N. Chanani, J. Venugopalan, K. Maher,

    and M. Wang. icuarm-an icu clinical decision supportsystem using association rule mining.TranslationalEngineering in Health and Medicine, IEEE Journal of,1:44001104400110, 2013.

    [9] P. Delias, M. Doumpos, P. Manolitzas, E. Grigoroudis,and N. Matsatsinis. Clustering healthcare processeswith a robust approach. In 26th European Conferenceon Operational Research, 2013.

    [10] I. Kononenko. Machine learning for medical diagnosis:history, state of the art and perspective. ArtificialIntelligence in Medicine, 23:89109, 2001.

    [11] P. Liu, L. Lei, J. Yin, W. Zhang, W. Naijun, and

    E. El-Darzi. Healthcare data mining: Predictioninpatient length of stay. In Intelligent Systems, 20063rd International IEEE Conference on, pages 832837,Sept 2006.

    [12] R. Mans, W. van der Aalst, R. Vanwersch, andA. Moleman. Process mining in healthcare: Datachallenges when answering frequently posed questions.In R. Lenz, S. Miksch, M. Peleg, M. Reichert,D. RiaAso, and A. ten Teije, editors, Process Supportand Knowledge Representation in Health Care, volume7738 ofLecture Notes in Computer Science, pages140153. Springer Berlin Heidelberg, 2013.

    [13] M. Monroe, R. Lan, H. Lee, C. Plaisant, andB. Shneiderman. Temporal event sequence

    simplification. Visualization and Computer Graphics,IEEE Transactions on, 19(12):22272236, Dec 2013.

    [14] C. Ordonez, E. Omiecinski, L. De Braal, C. Santana,N. Ezquerra, J. Taboada, D. Cooke, E. Krawczynska,and E. Garcia. Mining constrained association rules topredict heart disease. InData Mining, 2001. ICDM2001, Proceedings IEEE International Conference on,pages 433440, 2001.

    [15] A. Perer and F. Wang. Frequence: Interactive miningand visualization of temporal frequent eventsequences. InProceedings of the 19th InternationalConference on Intelligent User Interfaces, IUI 14,pages 153162, New York, NY, USA, 2014. ACM.

    [16] A. Rebuge and D. R. Ferreira. Business process

    analysis in healthcare environments: A methodologybased on process mining. Inf. Syst., 37(2):99116, Apr.2012.

    [17] B. Techentin. Big data and graph analytics in a healthcare setting. In Supercomputing, 2012.

    [18] W. M. P. van der Aalst.Process Mining: Discovery,Conformance and Enhancement of Business Processes.Springer Publishing Company, Incorporated, 1stedition, 2011.