Intelligent Electronic Health Systemsdavidc/pubs/crc_chapter2015.pdf · Intelligent Electronic Health Systems 77 prescribed. Using machine learning approaches, these disparate sources

73

4Intelligent Electronic Health Systems

David A. Clifton, Marco A. F. Pimentel, Katherine E. Niehaus, Lei Clifton, Timothy E. A. Peto, Derrick W. Crook, and Peter J. Watkinson

4.1 Introduction

Healthcare systems worldwide are entering a new phase: ever-increasing quantities of complex, massively multivariate data concerning all aspects of patient care are starting to be routinely acquired and stored [1], throughout the life of a patient. This exponential growth in data quantities far outpaces the capability of clinical experts to cope, result-ing in a so-called data deluge, in which the data are largely unexploited. There is huge potential for using advances in large-scale machine learning methodologies* to exploit the contents of these complex data sets by performing robust, scalable, automated inference to improve healthcare outcomes significantly by using patient-specific probabilistic models, a field in which there is little existing research [2] and which promises to develop into a new

* Sometimes termed “big-data” methods.

CONTENTS

4.1 Introduction .......................................................................................................................... 734.1.1 Objectives .................................................................................................................. 744.1.2 Themes Considered in This Chapter ....................................................................75

4.2 Theme I: Using the Broad Range of Data Sets within the EHR .................................... 754.2.1 Case Study: Prediction of Bacterial Drug Susceptibility ...................................774.2.2 Features .....................................................................................................................774.2.3 Supervised Learning Algorithms for the EHR ................................................... 794.2.4 Feature Selection ......................................................................................................804.2.5 Generalization ..........................................................................................................804.2.6 Summary of Theme I ............................................................................................... 81

4.3 Theme II: Augmenting the EHR with Sensor Data ........................................................ 814.3.1 Case Study: Early-Warning Systems ..................................................................... 814.3.2 Estimating Vital Signs with Probabilistic Models .............................................. 824.3.3 Learning Data Trajectories .....................................................................................834.3.4 Similarity between Vital-Signs Trajectories .........................................................854.3.5 Summary of Theme II .............................................................................................90

4.4 Theme III: EHRs in the Developing World ...................................................................... 914.4.1 Fusing Data from Noisy Time Series .................................................................... 91

4.5 Conclusions and Future Directions .................................................................................. 92Acknowledgments ........................................................................................................................ 94References ....................................................................................................................................... 94

74 Telemedicine and Electronic Medicine

industry supporting the next generation of healthcare technology. Data integration across spatial scales, from molecular to population level, and across temporal scales, from fixed genomic data to a beat-by-beat electrocardiogram (ECG), will be one of the key challenges for exploiting these massive, disparate data sets.

Electronic health records (EHRs) are being rapidly adopted for use by healthcare provid-ers, particularly in hospital settings. These EHRs typically contain heterogeneous physi-ological data that have been deposited throughout routine care of patients and which may be of varying provenance and fidelity. These disparate data sets often represent a wide range of data types, acquired across a wide range of measurement scales; for example, we might encounter the following:

• Background demographic data, including categorical data (sex and ethnicity), dis-crete data (number of admissions to hospital and severity scores for long-term conditions), and continuous data (weight and age)

• Diagnostic data, describing long-term or acute conditions, and other similar clini-cal indicators

• Surgical and treatment data, describing interventions and their details• Pharmacological data, describing treatments prescribed• Imaging and radiological data of varying resolutions• Low-rate physiological data, such as those measurements of the vital signs made

manually by clinicians during routine clinical care of a patient (perhaps several times per day)

• Higher-rate physiological data, such as those parameters recorded by bed-side monitors and ICU monitoring systems (perhaps several times per minute)

• Very high-rate physiological data, such as waveforms acquired by sensors (per-haps many times per second)

• Genomic data and proteomic data, which can be massively multivariate and which are often represented in binary or discrete format

Current interest [3–4] focuses on linking such disparate databases within the EHR by using conventional statistical methods. However, conventional methods do not scale to these incomplete, noisy, terabyte-scale, massively multivariate data sets. Additionally, the contents of these data sets, when linked together, may offer contradictory information concerning the patient, and these contradictory data typically confound conventional statistical methods. This chapter describes the construction of complex multiscale models for use with EHRs, for so-called intelligent EHRs, and their use in the extraction of clinically useful information from these very large healthcare data sets for the improvement of patient outcomes.

Models can be built using either a bottom-up approach (for example, using stochastic processes for modeling multivariate time-series data) or a top-down approach based on massively multivariate probabilistic models. In both cases, we describe how learning can take place within a machine learning (and, ideally, Bayesian) framework, which provides the optimal approach for quantifying the uncertainty associated with the noisy and miss-ing data typical of large healthcare data sets.

4.1.1 Objectives

The primary objective of this overlapping area of medical and engineering research is to develop novel, automated methods for improving patient outcomes by exploiting the

Dow

nloa

ded

by [

Dav

id C

lifto

n] a

t 07:

30 0

3 D

ecem

ber

2015


contents of the EHR (including genomic data), augmented with multivariate high-rate data acquired from wearable sensors and other hospital devices. Researchers in this field aim to, for example, (i) demonstrate a reduction in adverse outcomes for high-risk patients, including outcomes such as cardiac arrests, unscheduled admission to ICU, and death; (ii) provide automatic stratification of patients according to perceived risk or efficacy of treatment, allowing more effective use of healthcare resources; and (iii) construct novel patient-specific models that allow treatments to be tailored to the physiological condition of the individual patient, rather than relying on population-based, generic models.

These exemplar primary objectives will be, in the work described by this chapter, addressed in two complementary ways, which represent the two extremes of EHR analy-sis: (i) performing analyses across low-rate, but massively multivariate, data sets in the EHR, incorporating genomic and demographic background data, and (ii) performing online dynamical modeling of multivariate, high-rate time series of physiological data from patients who are at risk of deterioration, such as those in acute hospital wards.

4.1.2 Themes Considered in This Chapter

We will introduce exemplar research themes within the field of intelligent electronic health systems, considering a number of case studies to demonstrate the potential of such research:

• Theme I: using the broad range of data sets within the EHR, for improving under-standing of infectious disease

• Theme II: augmenting the EHR with sensor data, for continuous monitoring of high-risk ambulatory patients

• Theme III: EHRs in the developing world, for improving access to affordable healthcare

4.2 Theme I: Using the Broad Range of Data Sets within the EHR

EHRs have great potential for improving our understanding of, and ability to tackle, large-scale problems in a manner previously impossible. One such example is the emerg-ing global threat posed by infectious disease. The chief medical officer (CMO) of the UK noted that “infectious disease is as great a threat to national security as climate change” (with more than one new life-threatening disease identified each year) and that “the chal-lenge in identifying future threats is not the acquisition of data, but in using these huge databases with new computer science methods” [5]. Tuberculosis infects one in three of the world’s population and claims over one million lives per year; deaths from infectious disease, including methicillin-resistant Staphylococcus aureus (MRSA) and Escherichia coli, account for as many deaths as road traffic accidents in Europe each year [6]. As the CMO also observed recently, no new general classes of antibiotics have been discovered since 1990, and resistance to existing antibiotics is far outpacing our ability to produce new drugs. This high death rate is exacerbated by the fact that conventional methods for iden-tifying pathogens take up to 6 weeks to perform. There is, therefore, an urgent need to improve our ability to fight infectious disease, and it is in this respect that EHRs can play a key role.

Dow

nloa

ded

by [

Dav

id C

lifto

n] a

t 07:

30 0

3 D

ecem

ber

2015


EHRs provide the data required to develop predictive models that, once validated, can be used in clinical support systems (i.e., an intelligent EHR) to provide real-time informa-tion for clinicians. Such models will also inform our scientific understanding of patient response to bacterial infection and appropriate treatment strategies. The benefit of EHRs will only be further compounded when gene-sequencing platforms become more widely available for routine hospital sequencing of bacterial specimens [6]. Sequencing the bacte-rial genome provides immediate benefit to the patient by enabling quick identification of the bacterial species and known mechanisms of drug resistance, as well as enabling sur-veillance of hospital-acquired infections. Combining both genomic and patient-based data sources from the EHR will allow for prediction of changes to bacterial virulence, patient risk, drug resistance, etc. This interplay is illustrated in Figure 4.1.

Whole-genome sequencing (WGS), together with information typically contained in the EHR, such as patient admission, length of stay, and movement within the hospital, has already been used to investigate the spread of infectious outbreaks within the hospital [7]. Tools such as phylogenetic tree-building, which maps the relationship between bacterial isolates based upon their genetic differences, can illustrate whether cases of infection are most likely being spread within the hospital or if they came from outside the community. Such analysis is typically retrospective in nature. Several recent proof-of-concept studies have also illustrated the potential of using WGS to identify known resistance-conferring mechanisms within the bacterial genome [8–9]. This approach has been shown to have very competitive performance for detecting resistance with a minimal number of false alarms, as assessed against gold-standard phenotypic techniques.

Early prospective prediction of changes to bacterial virulence also becomes possible through the use of intelligent EHR systems. As described in Section 4.1, EHR data sets also typically contain information such as patient laboratory test results and medications

Electronic health record

Bacterial genome sequencing

Predictive algorithms

Drugsusceptibility

Bacterial virulence

Detect rise of new, virulent strain

Assess spread of infection

Patient risk scores

FIGURE 4.1Illustration of the feedback process implicit in using an intelligent EHR system. The EHR provides the data required to develop predictive algorithms (with outputs shown in circles), which can then be used as clinical support tools while continuing to be modified as new data are obtained.

Dow

nloa

ded

by [

Dav

id C

lifto

n] a

t 07:

30 0

3 D

ecem

ber

2015


prescribed. Using machine learning approaches, these disparate sources of information can be combined to form an underlying model of “normal” patient characteristics during a bacterial infection. Subtle departures from a normal trajectory, which may be difficult for clinicians to identify in the course of routine care, can be detected through such model-ing techniques. This can allow the early detection of unusually virulent strains of bacteria and necessary escalations in treatment and infection containment. An applicable branch of machine learning for constructing such models of normality is that of novelty detec-tion, particularly, when examples of abnormality are few or when abnormality is poorly understood [10–11].

Patient risk scores can also be determined through similar modeling techniques. The association between a bacterial genotype and its virulence can be established using super-vised machine learning [12]; this type of analysis will be described in the case study to fol-low. Once a patient has become infected by a bacterial infection and the bacterial genome has been obtained, the predicted virulence level could be used to produce a baseline patient risk score. As a patient remains in the hospital and more clinical information, such as additional lab test results, becomes available, the risk score can be updated to provide a real-time indicator of his or her infection severity.

4.2.1 Case Study: Prediction of Bacterial Drug Susceptibility

We illustrate how information from an intelligent EHR system, including bacterial genome sequencing, can be used to develop a predictive algorithm for the bacterial phenotype. We will examine Mycobacterium tuberculosis (MTB), a bacterial species with a very stable genome. As a report from the World Health Organization warns, bacterial drug resistance is becom-ing an imminent threat to global healthcare systems [13], making appropriate antibiotic prescription essential. While genetic mechanisms underlying resistance are well-known for some antibiotics, others are less well understood. For example, isoniazid is a drug with very well-characterized resistance mechanisms, and yet it is estimated that up to 20% of isoniazid drug resistance remains unexplained by known genotypic factors [14]. Furthermore, drug resistance using conventional phenotypic techniques (i.e., growing samples in a culture impregnated with antibiotics) can take up to 2 months for slow-growing bacteria such as MTB. Genome sequencing, with results available in the range of a few hours to a few days, offers a much faster alternative. Algorithms that can quickly and automatically process the entire sequenced bacterial genome to produce a resistance prediction would be very useful in clinical practice and are the focus of much current research. We therefore, consider drug resistance as our outcome phenotype of interest in the following example.

There may be interactions between different genomic mutations, and machine learn-ing provides an appropriate and principled method for examining the relationship between the genetic pattern of variation and the associated drug-resistance profile. Here, we explain how features may be generated from the genome and provide examples of machine learning classification algorithms that underpin intelligent EHRs in this context. We will describe an example set of data containing 1800 MTB isolates from the Midlands, UK. Figure 4.2 illustrates the steps involved in EHR-based analysis, which are described in detail below.

4.2.2 Features

With genomic data being increasingly present in EHRs, it is important for data scientists to understand the provenance and process by which such data are obtained, and to gain an

Dow

nloa

ded

by [

Dav

id C

lifto

n] a

t 07:

30 0

3 D

ecem

ber

2015


appreciation of the vocabulary used in this discipline. A “read” of a genomic sequence will typically contain thousands of unaligned contiguous regions of DNA (also termed contigs). As MTB has a very stable genome, these contigs can be aligned to a reference MTB sequence (such as the Hv37Rv reference), using software such as Stampy [15]. Following this align-ment, bases can be determined (“called”) using SAMtools [16] or similar software. There will be some regions in the genome that are sequenced with poor quality; this may be due to a small number of reads available, or repeated regions of DNA in the genome. Such bases are represented in the sequencing output as “null calls.”

As our goal in this example is to link genomic variation to the in vitro phenotypic response of the bacteria to antibacterial drugs, as described in the EHR, it makes sense to extract all of the sites where a single bacterial isolate differs from the reference. These differences between the isolate and the reference genome are possible mechanisms for resistance, and these variant sites are termed single-nucleotide polymorphisms (SNPs). As a first step in this example, we focus on 23 genes that are suspected of being involved in antibiotic resistance. For any given MTB strain, there are an average of 5 SNPs across these 23 genes. Many of these are “private” SNPs, meaning that they are not shared by other isolates in the sample. In this EHR example, with a data set of 1800 isolates, there were 1621 variant sites in total, including null calls. Removing null calls that most likely corre-sponded to the reference sequence (based upon the majority proportion of reads when the call is uncertain), and removing private mutations, resulted in 301 variant sites remaining for analysis. These SNPs will compose our feature set for the subsequent example.

We note in passing that, while MTB represents one extreme as a stable, clonal bacteria (in the sense that one generation looks much like the last), other examples such as E. coli and Klebsiella pneumoniae are much more promiscuous. These bacteria tend to gain and

Sequencebacterialsample

.fastq file

Map contigs toreference

(e.g., Stampy)

.bam file

Make base calls(e.g., SAMtools)

.vcf and .fastafiles

Extract variant sites

Perform feature

selection

Optimize classifier

(e.g., SVM, LR, RF)

Predict in unseen data

Grow bacteria in culture to obtain resistance profile

FIGURE 4.2Illustration of the analysis steps that are involved in the process of predicting a bacterial phenotype from geno-type data. The main text describes genomic terms used (e.g., contig).

Dow

nloa

ded

by [

Dav

id C

lifto

n] a

t 07:

30 0

3 D

ecem

ber

2015


lose plasmids, and therefore genes, very easily between successive generations. With such organisms, we might choose, therefore, a de novo assembly of the sequence, rather than comparison against a reference genome. Tools such as Velvet [17] may be used to perform this de novo assembly for entities with such widely-varying genomes. For these bacteria, alternative feature options, such as using gene presence/absence, are often more appropri-ate: gene presence/absence involves the alignment of genes to known bacterial genes by using the Basic Local Alignment Search Tool (BLAST) [18]. Alternatively, a k-mer approach may be used, which steps through the sequence, defining each set of k bases (k is usually between 20 and 40) to be a feature. The k-mer approach is beneficial in that it should cap-ture both gene presence/absence and SNP changes in the sequence. However, it leads to a very large number of features, which can be difficult to interpret. The k-mer approach has been used within a genome-wide association study (GWAS) context [19]. As presented here, SNPs, k-mers, and genes are all binary features.

4.2.3 Supervised Learning Algorithms for the EHR

From the EHR, we can also obtain a set of phenotypic labels defining whether our isolates are resistant or susceptible to first-line tuberculosis antibiotic drugs. MTB phenotypic test-ing typically involves an initial screen for resistance by growing the bacteria (obtained from patient sputum) in liquid culture mycobacterium growth indicator tubes (MGITs). Positive samples are confirmed by growing the bacteria on Löwenstein–Jensen (LJ) media (LJ slopes) impregnated with antibiotics. This presents us with a supervised classifica-tion task: we have a set of features (our SNPs) to describe our bacteria and a set of labels (our phenotypic test results from the EHR) that we wish to predict. Supervised machine learning classification algorithms such as the support vector machine (SVM) and random forests (RFs) have been used for performing this classification.

We will consider a subset of N isolates to represent a training set of examples x1, … , xN with labels ℓ1, …, ℓN, ℓi ∈ {−1, 1}, with 1 indicating drug resistance for a given drug and −1 indicating susceptibility. Each example xi is composed of a vector of J binary features indi-cating the presence (xij = 1) or absence (xij = 0) of a given SNP.

The SVM is a classification algorithm that attempts to separate two groups by the widest margin possible. The hyperplane defining this separation is determined by maxi-mizing the distance between it and the closest training points from each class, which are termed the support vectors. Through the so-called kernel trick, an SVM can be used to project data into a high-dimensional space, in which the classes may be linearly separable. Common kernels include the linear kernel and the radial basis function (i.e., Gaussian distance) kernels. We defer to Bishop [20] for a detailed description of this well-understood algorithm.

RFs are ensemble learners, which means that the random forest prediction is based upon the votes of a committee of “weak” base learners. The base learner for a RF is a decision tree. Each of the decision trees in the random forest is formed from a (random) subset of features and a subset of examples from the data set. After all of the trees have been built, the classifier’s prediction is based upon majority voting of the trees. For problems involv-ing genomic loci as features, building 40–400 trees and using a random selection of half of the features have been found to be useful starting points [21].

There are many additional types of classifiers that have been used for EHR-based analy-sis in this field. For instance, logistic regression is a linear discriminative classifier that pro-vides relatively easily interpretable weightings of features, and which is a commonly-used

Dow

nloa

ded

by [

Dav

id C

lifto

n] a

t 07:

30 0

3 D

ecem

ber

2015


tool in conventional medical statistics. The Bayesian product-of-marginals (BPM) model is a generative classifier that assumes that the input features are independent, conditional upon the class. Although this is a strong assumption, a fully Bayesian treatment of BPM also pro-vides a probabilistic distribution over the probability of each SNP feature belonging to each class. It is often beneficial to compare the predictive performance across these different clas-sifiers to understand how well the assumptions of each (e.g., linear combinations of features and independence of the features) are substantiated in the data. More details concerning such algorithms and their assumptions can be found in Goldstein et al. [21].

4.2.4 Feature Selection

Using all variant sites across the genome, or a subset of the genome, results in a large number of features. Many of these SNPs may be irrelevant to the classification problem at hand. It may, therefore, be desirable to perform some form of feature selection to obtain a smaller, more parsimonious feature set, which often results in better-performing classi-fiers. Preprocessing steps that are often performed in such studies include removing fea-tures found within fewer than 5% of the isolates in the sample and removing features that appear to be in linkage disequilibrium (i.e., they co-occur together in samples more often than would be expected by chance) [22–23].

Following any such preprocessing steps, there are three main categories of feature selec-tion methods: filters, wrappers, and embedded methods. Filtering methods use criteria to select features, an independent step that is taken before performing classification. An example of a filtering method is the identification (and removal) of SNPs that are not sig-nificantly associated with the outcome of interest (for instance, using a chi-squared test and an adjusted p value for multiple comparisons). Wrapper methods use the machine learning algorithm itself to select relevant variables. A common example is SVM-based recursive feature elimination, in which a linear SVM is trained on a set of features and those that have the lowest weightings are removed, and the process repeats until predic-tive performance decreases by some margin.

Embedded methods refer to classifiers that have some sort of automatic relevance deter-mination (ARD) for features incorporated within the classification algorithm itself. For instance, the least absolute shrinkage and selection operator (LASSO) regularization method for logistic regression shrinks feature weightings down to zero if they are not found to be important in determining a classification. From a Bayesian perspective, this is equivalent to putting a zero-mean Laplacian prior on the feature weightings, meaning that the prior assumption is that the feature is not important until the training data show otherwise [24].

Feature selection becomes even more important when moving from an initial subset of genes to looking at SNPs found across the entire genome, at which point the number of features begins to require increasingly prohibitive computational resources.

4.2.5 Generalization

As in any machine learning setting, it is important to avoid overfitting the learned param-eters to the training data set. Performance is commonly assessed by training on a subset of data (e.g., 80%) and testing on the remainder. Parameter tuning (as is required for SVM classifiers) can be accomplished by performing a grid search over the range of expected parameter values within cross validation folds of the training set. Alternatively, Bayesian methods (as will be discussed in Section 4.3) often incorporate penalty terms within the training procedure that aim to avoid overfitting.

Dow

nloa

ded

by [

Dav

id C

lifto

n] a

t 07:

30 0

3 D

ecem

ber

2015


4.2.6 Summary of Theme I

Machine learning techniques can be used to create predictive classifiers for drug resistance based upon data contained within the EHR, including genomic data. A machine learning approach provides a predictive outcome, which is in contrast to traditional approaches, which are designed primarily to find genomic variants that have strong linear associations with the desired phenotype. Machine learning approaches also have the capability to iden-tify important, possibly causative, variants through feature selection methods, although the interpretability of such weightings typically depends upon the type of classifier used. Furthermore, machine learning classifiers are well-equipped to handle nonlinear interac-tions between features. In the context of genomic data within the EHR, in which complex regulatory mechanisms are biologically plausible, this is an advantageous approach.

As this case study has illustrated, bacterial phenotype profiles of resistance from the EHR can be combined with the bacterial genome sequence to develop predictive algorithms for drug resistance. Once established, such an algorithm would ideally be continually updated as new data are collected; this is possible through automated EHR systems. Similar methods can be employed to predict other bacterial phenotypes, such as virulence. Ongoing research in this active field of EHR-based work will combine these phenotypes with clinical data to provide patient risk scores and identify the rise of new, more virulent strains.

4.3 Theme II: Augmenting the EHR with Sensor Data

As we have described in Section 4.1, the increasing use of electronic health records in healthcare systems, especially in hospitals, results in the acquisition and storage of large quantities of patient-confidential data [25]. Typically, the size and heterogeneity of these data mean that only elementary analysis is undertaken in most existing systems. Furthermore, there is a trend toward augmenting the EHR with diagnostic data from point-of-care devices, and physiological measurements (such as blood pressure, heart rate, and temperature) acquired from EHR-compatible medical devices. Automated methods for modeling and analyzing the data in these augmented EHRs are urgently required, such that clinicians may be provided with the results of inference for decision support. This theme introduces a representation of trajectories for the time-series data in the EHR, using Gaussian process (GP) regression, which may be used for the recognition of normal and abnormal patterns by generating a trajectory that provides representative physiologi-cal trends, even though the training examples may be unevenly-sampled and noisy. The latter are key factors that must be dealt with in the analysis of real EHR data and which Bayesian systems, in particular, are well-equipped to address.

4.3.1 Case Study: Early-Warning Systems

An estimated 20,000 patients have unplanned admissions to the ICU in the UK each year, which could be avoided if physiological deterioration was identified early [26]. Such patients are at a high risk of morbidity and mortality (and have a 40% higher mortality rate than planned ICU admissions). The delay in detection of physiological deterioration is exacerbated by the fact that acute patients outside the ICU typically have their physiologi-cal data observed every 2–4 h [27]. There is a need for reliable, continuous data analysis

Dow

nloa

ded

by [

Dav

id C

lifto

n] a

t 07:

30 0

3 D

ecem

ber

2015


between nurse observations, to provide early warning of deterioration and improve out-comes. Existing techniques [28] are limited by (i) not taking into account the dynamics of the time series of the physiological data in the EHR and (ii) comparing all patients (regard-less of demographics or physiology) to a model constructed from a global population. This theme will illustrate the use of patient-specific probabilistic time-series analysis by using EHR data. Our description will cover Bayesian nonparametric processes, which offer a robust and principled framework for performing inference in the presence of incomplete and noisy data, as is typical for data within the EHR.

We here introduce this theme with a framework for probabilistic analysis of time-series data within the EHR, which may be used for the functional characterization of vital-signs trajectories; that is, we will treat the time series within the EHR as being whole functions, rather than sequences of individual data points. We will demonstrate the utility of this approach by using Gaussian process models for discovering clusters in the functional data in an unsupervised manner, such that “prototype functions” corresponding to known and unknown modes of physiological behavior of the physiology of postoperative patients may be revealed in the EHR. We demonstrate that our approach may be used to discrimi-nate between “abnormal” trajectories corresponding to patients who deteriorate physio-logically and are admitted to a higher level of care and those belonging to patients with a normal recovery. Such systems are immediately useful for providing early warning of deterioration with EHR data.

4.3.2 Estimating Vital Signs with Probabilistic Models

EHRs augmented with high-rate waveform data from patient-worn sensors are becoming possible with the increasing prevalence of high-bandwidth wireless networks in hospitals and other healthcare environments. For example, the ECG is now available from disposable “sticking plaster” patches, which may then be transmitted to a central EHR system, while the photoplethysmogram (PPG) is available from lightweight mobile sensors [29]. Both sensors typically provide point estimates of a number of derived physiological variables, such as respiratory rate (one of the key indicators of impending physiological derange-ment), because both the ECG and PPG are modulated in several ways by the respiratory process [30]. However, the resulting estimates have no probabilistic interpretation and are typically considered to be frequently artifactual, limiting the efficacy of their use when combined with other data in the EHR, which is our ultimate goal in this theme.

Taking a probabilistic approach to estimating the vital signs, we can assume the modu-lation of these waveforms to take the form of a Gaussian process [31–32]. We consider the regression model y = f(x) + є, which expresses a dependent variable y in terms of an inde-pendent variable x via a latent function f(x) and a noise term є ~ N(0, σ2). The function f can be interpreted as being a probability distribution over functions, y = f(x) ~ GP(m(x), k(x, x′)), which is a GP, and where m(x) is the mean function of the distribution and k is a covariance function which describes the coupling between two values of the independent variable as a function of the (kernel) distance between them [33]. A process is a GP if the joint distri-bution of the output variable (e.g., respiratory rate) at different time points is multivariate Gaussian.*

* Note that this is not assuming that the time series of, for example, heart-rate values are drawn from a Gaussian distribution—only that the joint distribution over all time points is Gaussian. The practical consequence of this simplifying assumption is that the predictive distribution of the heart rates at any time point is univariate Gaussian, but where we note that the mean and variance of that Gaussian may vary with time.

Dow

nloa

ded

by [

Dav

id C

lifto

n] a

t 07:

30 0

3 D

ecem

ber

2015


The nature of the GP is such that, conditional on observed data, predictions can be made about the function values y* at any “test” location of the index set x*. We hereafter assume that the index set corresponds to time.

The posterior density for a test point x* is Gaussian,

y*|x*,x, y ∼ N f *, var[ f *]( ) , (4.1)

where the mean and variance are given by

f * = k(x*,x *)T k(x,x)−1 y , (4.2)

var[f *] = k(x*, x*) − k(x, x*)Tk(x, x)−1k(x, x*), (4.3)

respectively. Here, we have assumed that the mean function is zero for simplicity, which is commonly assumed to be true.

The covariance function encodes our assumptions concerning the structure of the time series that we wish to model. There exists a large class of well-suited covariance functions, many of which are parameterized by a length-scale parameter (which determines the typi-cal timescale over which the time series varies) and an amplitude (which determines the typical amplitude of deviation from the mean).

A key advantage of the Bayesian framework is our ability to incorporate the significant quantity of prior clinical knowledge that we may have. In the case of estimating respiratory rate, our prior knowledge of the periodicity of the respiratory effect (in this case, the range of values of respiratory rate that are feasible for a patient) and the rate at which respiratory rate can be expected to change may be encoded within an appropriate covariance function,

k(r) = σ02 exp −

sin2 2π PL( )r⎡⎣ ⎤⎦2λ2

⎧⎨⎩⎪

⎫⎬⎭⎪, (4.4)

in which the hyperparameters σ0 and λ of the Bayesian model give the amplitude and length scale of the latent respiratory function (where r is the Euclidean distance between two time points of the input waveform). PL is the length of the period and is the key param-eter for estimation of the respiratory rate. Taking a fully Bayesian approach, we assume the log posterior distribution of the hyperparameters to be multivariate Gaussian, allowing us to arrive at a distribution over PL and, thus, obtain a fully probabilistic estimate of the respiratory rate [34]. An example is shown in Figure 4.3, which shows an ECG time series, obtained from wearable sensors connected to an EHR and where the amplitude modula-tion caused by respiratory rate is explicitly modeled using a Gaussian process.

The resulting probabilistic estimates of the vital signs may subsequently be used by further EHR-based models that fuse multiple vital signs with other data from the EHR, as described below.

4.3.3 Learning Data Trajectories

With probabilistic estimates of physiological data obtained where possible, as described above, we can subsequently combine these data with other time series available in the

Dow

nloa

ded

by [

Dav

id C

lifto

n] a

t 07:

30 0

3 D

ecem

ber

2015


EHR (such as manual clinical observations of the vital signs) by describing a Gaussian process over a collection of the variables. We have previously proposed an extension of extreme value statistics (which is typically used to estimate the distribution of extrema for uni- or bivariate data spaces) to the infinite-dimensional case of functional data. We first construct a Gaussian process with hyperparameters of a squared-exponential covariance function optimized according to maximizing the joint likelihood of the training data, where the training data are examples of “normal” time series. We can then define an “extreme function distribution” over the functional space correspond-ing to that Gaussian process [35], which effectively allows us to determine if a given function (time series) was generated from the normal model or not according to some given threshold probability. We have shown that time-series EHR data can be classi-fied in this manner, allowing us to construct models of normality based on multivari-ate physiological time-series data, and then use these to detect potentially abnormal time series.

According to physiological understanding and results from previous analyses of vital-signs data [36], two dominant length scales are apparent in the EHR data: the first corre-sponds to physiological changes from one day to the next and the second is associated with the periodicity of the data, known as the circadian rhythm, i.e., the variability between daytimes and nighttimes. Analysis shows that these effects are additive [36] and, hence,

2.2

2.1

2

1.9

1.8

1.7

1.6

1.5

1.4120 130 140 150 160 170 180

Time (secs)

× 104

ECG

sign

al

FIGURE 4.3An example of the probabilistic estimation of the vital signs from waveform data in an augmented EHR in which the respiration process modulates the ECG (dark grey solid lines). The mean function of the Gaussian process (black dashed line) with a 2σ interval around the mean (light grey and shaded area) is superimposed over the peaks of the ECG (asterisks), which describe the respiratory waveform. One of the governing hyper-parameters of the GP model used in this figure is PL, the respiratory rate, over which we have a distribution describing our confidence in the value of respiratory rate provided.

Dow

nloa

ded

by [

Dav

id C

lifto

n] a

t 07:

30 0

3 D

ecem

ber

2015


can be modeled by a sum of two covariance functions. Denoting r = ||xp − xq|| as the Euclidean distance between two independent variables xp and xq, we model day-to-day variations by using a squared-exponential kernel,

kL(r) = σL2 exp − r2

2δL2⎛⎝⎜

⎞⎠⎟, (4.5)

with length scale δL and amplitude σL. For modeling daily variability we multiply by a covariance function which is periodic:

kS(r) = σS2 exp − r2

2δS2⎛⎝⎜

⎞⎠⎟exp −

sin2 2π PL( )r⎡⎣ ⎤⎦2

⎧⎨⎩

⎫⎬⎭, (4.6)

with length scale δS, amplitude σS, and period length PL. Combining both models by using the sum of covariance functions, we obtain

k(xp, xq|θk) = kL(xp, xq|σL, δL) + kS(xp, xq|σS, δS, PL), (4.7)

where θk = {σL, σS, δL, δS, PL} refers to the hyperparameters of the final covariance func-tion, values for which were selected using a grid search in which we maximized the marginal log likelihood [33] and where the period PL was allowed to vary between 0.25 and 1 day.

4.3.4 Similarity between Vital-Signs Trajectories

A separate Gaussian process may be trained for each patient’s D time-series variables by using the approach described in the preceding section, which results in D univariate tra-jectories (for one patient). We can now compute a similarity metric to reveal common latent behavior in patients’ trajectories. Assuming that we have a number of Gaussian process models Xk, for k = 1, …, N and where each Xk is a set of D time-series EHR data y for one of N patients, we can compare a test time series X* = (t, y), for times t, to the D Gaussian processes Xk by using a local likelihood evaluated at point i,

X i*(ti , yi ) Xk= − log yij ti ,Xk

j( )j=1

D

∏ , (4.8)

and then define a global likelihood over the local likelihoods,

Lk(X*) = n−1 X i*(ti , yi ) Xk⎡⎣⎢

⎤⎦⎥

i∑ . (4.9)

Dow

nloa

ded

by [

Dav

id C

lifto

n] a

t 07:

30 0

3 D

ecem

ber

2015


Finally, a global similarity may be obtained by normalizing this global likelihood,

Sk(X*) = L */Lk(X*), (4.10)

where the numerator is the self-global likelihood (i.e., evaluated for the test time series with respect to its own GP) and where Lk(X*) appears in the denominator because it is the negative log likelihood. If the test data X* are similar to process Xk, then Sk → 1; else, Sk → 0. Finally, we can then perform hierarchical clustering by using the similarities Sk between processes for all N patients to identify clusters of time-series data that are similar.

We evaluate our example method by using a data set containing manual observations of vital signs acquired from 100 patients who have a normal recovery from cancer surgery in the Oxford University Hospitals National Health Service (NHS) Trust. These observations comprise measurements of heart rate, systolic blood pressure, temperature, blood oxygen saturation, and breathing rate made by clinical staff every hour or every 2 hours on the days immediately following surgery (depending on the patient’s condition) and approximately every 4 hours on the last few days of the patient’s stay on the postoperative ward. These patients were discharged home after their stay on the ward (median length of stay of 10 days).

The first day of patients’ vital-signs trajectories corresponds to the day on which surgery took place. We initially selected two vital signs (systolic blood pressure and temperature), D = 2, to determine the mean vital-signs trajectory and then determine the similarity between all trajectories by using the methods described in the preceding sections. Hierarchical cluster-ing revealed four main functional clusters, the mean of the mean functions for each of which is plotted in Figure 4.4. Table 4.1 summarizes the number of patient trajectories included in each cluster. All prototype trajectories reveal the “expected” recovery from surgery, in which blood pressure rises then stabilizes and temperature decreases. However, the range of blood pressures covered in each prototype trajectory varies from one subgroup to the other.

In order to study the influence of the other vital signs in the patient trajectories, we performed the same analysis by using all vital signs (D = 5) and determined how many patients were assigned to same cluster found in the previous analysis. The results summa-rized in Table 4.1 show that most patients (approximately 70%) were assigned to the same clusters, which suggests that the physiological trajectory for normal recovery is primarily determined by the temperature and systolic blood pressure, with other vital signs contrib-uting a secondary level of additional information.

Figure 4.4g and h shows the trajectory of an example patient who deteriorated and was deemed by clinicians to be sufficiently abnormal for admission to an intensive care facil-ity 9 days after surgery. It may be seen that the physiological trajectory of this patient is

Temperature (ºC)

SysBP (mm Hg)

Time (days)(a)

FIGURE 4.4(a) Functional prototypes revealed by similarity Sk for systolic blood pressure (SysBP) and temperature data.

(Continued)

Dow

nloa

ded

by [

Dav

id C

lifto

n] a

t 07:

30 0

3 D

ecem

ber

2015


39

38

37

36

35

34

Tem

pera

ture

(ºC

)

510

1520 60

80100

120140

160180

SysBP (mm Hg)Time (days)(b)

39

38

37

36

35

34

Tem

pera

ture

(ºC

)

510

1520 60

80100

120140

160180

SysBP (mm Hg)Time (days)(c)

FIGURE 4.4 (CONTINUED)(b, c) Functional prototypes revealed by similarity Sk for systolic blood pressure (SysBP) and temperature data.

(Continued)

Dow

nloa

ded

by [

Dav

id C

lifto

n] a

t 07:

30 0

3 D

ecem

ber

2015


39

38

37

36

35

34

Tem

pera

ture

(ºC

)

510

1520 60

80100

120140

160180

SysBP (mm Hg)Time (days)(d)

39

38

37

36

35

34

Tem

pera

ture

(ºC

)

510

1520 60

80100

120140

160180

SysBP (mm Hg)Time (days)(e)

FIGURE 4.4 (CONTINUED)(d, e) Functional prototypes revealed by similarity Sk for systolic blood pressure (SysBP) and temperature data.

(Continued)

Dow

nloa

ded

by [

Dav

id C

lifto

n] a

t 07:

30 0

3 D

ecem

ber

2015


39

38

37

36

35

34

Tem

pera

ture

(ºC

)

5

10

15

20 80

100

120

140

160

SysBP (mm Hg)Time (days)(f )

180

160

170

150

140

130

120

110

100

902 4 6 8 10 12 14 16 18 20

Time (days)(g)

SysB

P (m

m H

g)

FIGURE 4.4 (CONTINUED)(f) Functional prototypes revealed by similarity Sk for systolic blood pressure (SysBP) and temperature data; (g) two-dimensional (2-D) representation of the functional prototypes and the mean function, with 95% confi-dence band (solid and dashed black lines) of an abnormal patient. (Continued)

Dow

nloa

ded

by [

Dav

id C

lifto

n] a

t 07:

30 0

3 D

ecem

ber

2015


different from those of the patients who had a normal surgery recovery and which, there-fore, has low similarity Sk to all k normal prototypes.

4.3.5 Summary of Theme II

We have described an example method by which vital-signs data from the EHR may be used to provide improved clinical understanding of patient condition. We have also shown that it is able to discriminate normal from abnormal physiological trajectories, which may be further extended by combining it with other recently proposed methods [35]. Ongoing work involves the incorporation of models in which the distributions over the process

37.6

37.2

36.8

36.4

36

35.62 4 6 8 10 12 14 16 18 20

Time (days)(h)

Tem

pera

ture

(ºC

)

FIGURE 4.4 (CONTINUED)(h) Two-dimensional (2-D) representation of the functional prototypes and the mean function, with 95% confi-dence band (solid and dashed black lines) of an abnormal patient.

TABLE 4.1

Number of Patients Assigned to Each Cluster

Cluster D = 2a D = 5b Overlap

Prototype A 12 14 8Prototype B 20 19 12Prototype C 34 35 26Prototype D 34 35 24a Temperature and systolic blood pressure.b All five vital signs.

Dow

nloa

ded

by [

Dav

id C

lifto

n] a

t 07:

30 0

3 D

ecem

ber

2015


hyperparameters are nonstationary and where data from additional sources, particularly higher-rate data from wearable sensors, may be incorporated.

4.4 Theme III: EHRs in the Developing World

While the use of large-scale EHRs is becoming increasingly prevalent in the healthcare systems (and, in some cases, being mandated by national legislation), the power of intelligent EHR systems also has great potential to improve the standard of, and access to, care in the developing world. OpenMRS [37] is one example of an open-source EHR that is frequently used in research for developing regions. Frameworks such as Sana/Moca [38] have been constructed that build upon open-source EHRs to allow direct access to medical records via mobile devices. The latter may be used for connection with physiological sensors, allowing healthcare workers to record data from patients and upload it directly to open-source EHRs [39]. Such studies typically use the data to perform screening for risk of long-term conditions, such as cardiovascular disease, allowing workers with low levels of healthcare training to perform elementary assess-ment of patients.

4.4.1 Fusing Data from Noisy Time Series

The research challenge for this branch of work is to provide analysis techniques (build-ing on methods described in themes I and II) that are sufficiently robust in the face of the extreme sparsity and noise present in EHRs from resource-constrained countries. Using technology developed from large-scale studies of EHRs in developed-world healthcare systems [40–42], periods of missing or artifactual data may be analyzed in a principled manner, using probabilistic analyses of the kind described in this chapter.

An example is shown in Figure 4.5, which are physiological data stored in an EHR after being acquired by physiological sensors and where the EHR also contains manual obser-vations made by clinicians. A 4-day interval is shown, in which the patient experiences repeated episodes of tachycardia (elevated heart rate) and frequent desaturations (decreases in SpO2). The physiological EHR data acquired via sensors have been modeled using a multi-task Gaussian process or MTGP [43–44]. Figure 4.5 shows extended periods of disconnection from the physiological sensors, particularly in the middle of the interval shown, where the MTGP is able to interpolate effectively to provide a principled estimate of the missing data.

MTGPs are a natural development of the univariate GPs considered in previous sections, in which multiple time series can be considered simultaneously. The covariance function for MTGPs may be written as

kMTGP(x, x′, l, l′) = kc(l, l′) × kt(x, x′), (4.11)

where kc and kt represent the correlation between two time series l and l′ and the temporal covariance functions within a time series at times x and x′, respectively. That is, the model explicitly takes into account the dynamics of each time series, as well as explicitly model-ing how the various time series covary.

The task of learning with MTGPs, therefore, becomes estimating suitable values of the various hyperparameters for the kernels kc and kt. As before with the univariate GPs

Dow

nloa

ded

by [

Dav

id C

lifto

n] a

t 07:

30 0

3 D

ecem

ber

2015


described in previous sections, this can be performed by optimizing the log marginal like-lihood of the hyperparameters given the training data [44]. A toolbox for performing this optimization has been implemented by the authors of this chapter and which accompanies the article by Dürichen et al. [45].

These methods have been formulated for online learning [46], such that models may be sequentially updated as new data arrive in the EHR. While such models are computation-ally demanding and, therefore, would be best suited to run on the servers that provide the EHR itself, some computation may be performed on local devices by using techniques developed to make processing more computationally efficient [47–48].

4.5 Conclusions and Future Directions

This chapter has highlighted current research directions in the growing field of intelligent EHR systems. We have presented an overview of existing research, with an emphasis on the machine learning techniques that offer a realistic means of tackling the many chal-lenges that exist in the analysis of realistic, disparate, and often contradictory data in typi-cal EHRs.

While advances in machine learning, such as those described, are critical to underpin-ning the technical success of projects involving intelligent EHR systems, it is important to

BPBP

210200190180170160150140130120110100

908070605040302010

0

HR/

BR/B

P

SpO2

HR

BR

00:00 00:00 00:00 00:00

10098969492908886848280

% SpO2

FIGURE 4.5EHR data, with a multitask Gaussian process model tracking multiple physiological time series in parallel. The correlation between the various time series has been learned by the model, which is able to cope with periods of missing data by using principled interpolation from those time series that are available at any one time. Manual observations, made by a nurse, and continuous observations, made by physiological sensors, are shown by circles and solid lines, respectively. The underlying Gaussian process is shown by dashed lines, and its uncer-tainty is shown by the gray shaded regions. The horizontal axis shows midnight for four consecutive days; the vertical axes shown the scales for heart rate (HR), breathing rate (BR), systolic blood pressure (BP), and SpO2.

Dow

nloa

ded

by [

Dav

id C

lifto

n] a

t 07:

30 0

3 D

ecem

ber

2015


note that this is just part of the complex task involved in implementing intelligent systems within healthcare practice. A key step in producing appropriate technology for solving clinical problems is an understanding of the data that are being modeled, and close col-laboration between data scientists and clinicians is, therefore, sine qua non for large-scale biomedical projects. We have described in this chapter those systems that we have found to be suitable for incorporation of the clinical prior knowledge that makes a computational system effective: these techniques are typically Bayesian, whereby clinical information can be explicitly incorporated within the model in two ways.

Firstly, appropriate systems for performing complex analyses (such as those involved in the production of intelligent EHR systems) are often nonparametric; that is, formally, the parameter space of the model is infinite. In practice, this typically means that the num-ber of parameters in the model can grow with the number of data observed—this is a key feature of large-scale data analysis, in which one wishes to update models through time. In healthcare systems, as in the analysis of many other complex systems, it cannot be assumed that the patient is unchanging. Therefore, models need to be able to “grow” and evolve with the changing dynamics of the patient, perhaps as a patient recovers from a condition or perhaps as a patient’s condition begins to deteriorate with time. Bayesian nonparametric methods, such as the GP, allow this growth of the model with respect to the data, while allowing us to incorporate key clinical insights in the form of (for the GP) the covariance functions that we use. We showed, in the probabilistic estimation of vital signs, for example, that our knowledge of the respiratory process led to the use of periodic covariance functions, the hyperparameter of which corresponds to the quantity that we aim to estimate: the respiration rate.

Secondly, prior clinical knowledge can be directly incorporated by specifying prior dis-tributions over our hyperparameters. This is a “strongly Bayesian” approach, in which a potentially infinite number of values of the hyperparameter are considered and where inference is typically performed such that the distributions over the hyperparameters are optimized. These distributions over the hyperparameters are often specified using hyper-hyperparameters; for example, our distribution over the respiration rate hyperparameter PL in the case described above could be chosen to be Gaussian, which will have its own mean and variance hyper-hyperparameters. These analyses are often no longer analyti-cally tractable, and so approximation schemes are typically performed, such as sampling-based methods, or deterministic approximations such as variational Bayesian methods [20,22].

Future research for intelligent EHR systems proceeds on many fronts, worldwide: researchers are currently working on principled means for the large-scale fusion of data sets across the varying data types that exist in the EHR. A key theme is the integration of genomic, proteomic, and metabolomics with other data types; we have described a case study in this chapter, applying such methods to infectious disease. Other researchers have considered fusion of “omic” data with images [49], with applications in cancer treatment. Fusion of gene expression data, including genomic and time-series data, has been per-formed [50]; this study used GPs for modeling the gene expression time-series data, simi-lar to that which could be obtained within an EHR, along with multinomial models for the categorical data sets, including discretized gene expression levels. Other machine learning techniques have also been considered for the analysis of inpatient EHR data [51], which considered the incorporation of data from other sources peripheral to the EHR, such as data from health insurance providers. The potentially transformative effect of intelligent health systems has been considered as a means of both reducing the cost of healthcare and improving patient outcomes [52], both inside and outside the hospital.

Dow

nloa

ded

by [

Dav

id C

lifto

n] a

t 07:

30 0

3 D

ecem

ber

2015


As more data scientists collaborate with clinical teams, this rapidly growing field of research has the promise to deliver the analysis tools required to underpin the next gener-ation of healthcare technologies embedded within clinical practice as “intelligent EHRs.” With global challenges to the sustainability of healthcare systems worldwide, technology has a key role to play in bringing about this new generation of tools—and clinicians that are best placed to use and exploit such tools.

Acknowledgments

The authors gratefully acknowledge the support of the Centre of Excellence in Medical Engineering funded by the Wellcome Trust and Engineering and Physical Sciences Research Council (EPSRC) under Grant Number WT 088877/Z/09/Z; the National Institute for Health Research (NIHR) Biomedical Research Centre, Oxford; the Research Councils United Kingdom (RCUK) Digital Economy Programme (Oxford Centre for Doctoral Training in Healthcare Innovation); and the Fundação para a Ciência e Tecnologia (FCT). DAC was funded by a Royal Academy of Engineering Fellowship and by Balliol College, Oxford, and the Balliol Interdisciplinary Institute. KEN was supported by the Rhodes Trust.

References

1. House of Commons Health Committee, United Kingdom, “The electronic patient record, report HC 422-I,” White Paper, 2007.

2. I. Buchan, J. Winn, and C. Bishop, “A unified modeling approach to data-intensive healthcare,” The Fourth Paradigm: Data-Intensive Scientific Discovery, Microsoft Research, Technical Report, 2009.

3. Secretary of State for Health, United Kingdom, “The government response to the health com-mittee report on the electronic patient record, report Cm 7264,” White Paper, 2007.

4. Medical Research Council, “e-Health informatics research: Securing the UK as a world leader,” http://www.mrc.ac.uk/Ourresearch/ResearchInitiatives/E-HealthInformaticsResearch/, accessed: June 2014.

5. Annual Report of the UK chief medical officer, “Infections and the rise of antimicrobial resis-tance,” White Paper, 2011.

6. X. Didelot, R. Bowden, D. J. Wilson, T. E. Peto, and D. W. Crook, “Transforming clinical micro-biology with bacterial genome sequencing,” Nature Reviews Genetics, vol. 13, no. 9, pp. 601–612, 2012.

7. C. U. Köser, M. T. Holden, M. J. Ellington, E. J. Cartwright, N. M. Brown, A. L. Ogilvy-Stuart, L. Y. Hsu, C. Chewapreecha, N. J. Croucher, S. R. Harris et al., “Rapid whole-genome sequenc-ing for investigation of a neonatal MRSA outbreak,” New England Journal of Medicine, vol. 366, no. 24, pp. 2267–2275, 2012.

8. N. Stoesser, E. Batty, D. Eyre, M. Morgan, D. Wyllie, C. D. O. Elias, J. Johnson, A. Walker, T. Peto, and D. Crook, “Predicting antimicrobial susceptibilities for Escherichia coli and Klebsiella pneumoniae isolates using whole genomic sequence data,” Journal of Antimicrobial Chemotherapy, vol. 68, pp. 2234–2244, 2013.

9. N. Gordon, J. Price, K. Cole, R. Everitt, M. Morgan, J. Finney, A. Kearns, B. Pichon, B. Young, D. Wilson et al., “Prediction of Staphylococcus aureus antimicrobial resistance by whole-genome sequencing,” Journal of Clinical Microbiology, vol. 52, no. 4, pp. 1182–1191, 2014.

Dow

nloa

ded

by [

Dav

id C

lifto

n] a

t 07:

30 0

3 D

ecem

ber

2015

http://www.mrc.ac.uk


10. L. Tarassenko, D. Clifton, P. Bannister, S. King, and D. King, “Novelty detection,” Encyclopaedia of Structural Health Monitoring, pp. 653–675, 2009.

11. M. Pimentel, D. Clifton, L. Clifton, and L. Tarassenko, “A review of novelty detection,” Signal Processing, vol. 99, pp. 215–249, 2014.

12. M. Laabei, M. Recker, J. K. Rudkin, M. Aldeljawi, Z. Gulay, T. J. Sloan, P. Williams, J. L. Endres, K. W. Bayles, P. D. Fey et al., “Predicting the virulence of MRSA from its genome sequence,” Genome Research, vol. 24, no. 5, pp. 839–849, 2014. Also available on http://genome.cshlp.org/content /early/2014/04/02/gr.165415.113.full.pdf+html. Last visited on June 6, 2015.

13. World Health Organization, “Antimicrobial resistance: Global report on surveillance 2014,” World Health Organization, 2014, http://www.who.int/drugresistance/documents /surveillancereport/en/.

14. M. H. Hazbón, M. Brimacombe, M. B. del Valle, M. Cavatore, M. I. Guerrero, M. Varma-Basil, H. Billman-Jacobe, C. Lavender, J. Fyfe, L. García-García et al., “Population genetics study of isoniazid resistance mutations and evolution of multidrug-resistant mycobacterium tubercu-losis,” Antimicrobial Agents and Chemotherapy, vol. 50, no. 8, pp. 2640–2649, 2006.

15. G. Lunter and M. Goodson, “Stampy: A statistical algorithm for sensitive and fast mapping of illumina sequence reads,” Genome Research, vol. 21, no. 6, pp. 936–939, 2011.

16. H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, and R. Durbin, “The sequence alignment/map (SAM) format and SAMtools,” Bioinformatics, vol. 25, pp. 2078–2079, 2009.

17. D. R. Zerbino and E. Birney, “Velvet: Algorithms for de novo short read assembly using de Bruijn graphs,” Genome Research, vol. 18, no. 5, pp. 821–829, 2008.

18. Z. Zhang, S. Schwartz, L. Wagner, and W. Miller, “A greedy algorithm for aligning DNA sequences,” Journal of Computational Biology, vol. 7, nos. 1–2, pp. 203–214, 2000.

19. S. K. Sheppard, X. Didelot, G. Meric, A. Torralbo, K. A. Jolley, D. J. Kelly, S. D. Bentley, M. C. Maiden, J. Parkhill, and D. Falush, “Genome-wide association study identifies vitamin B5 bio-synthesis as a host specificity factor in Campylobacter,” Proceedings of the National Academy of Sciences, vol. 110, no. 29, pp. 11923–11927, 2013.

20. C. M. Bishop, Pattern Recognition and Machine Learning. Berlin: Springer-Verlag, 2006. 21. B. A. Goldstein, A. E. Hubbard, A. Cutler, and L. F. Barcellos, “An application of random forests

to a genome-wide association dataset: Methodological considerations and new findings,” BMC Genetics, vol. 11, no. 1, p. 49, 2010.

22. T. A. Manolio, F. S. Collins, N. J. Cox, D. B. Goldstein, L. A. Hindorff, D. J. Hunter, M. I. McCarthy, E. M. Ramos, L. R. Cardon, A. Chakravarti et al., “Finding the missing heritability of complex diseases,” Nature, vol. 461, no. 7265, pp. 747–753, 2009.

23. W. S. Bush and J. H. Moore, “Genome-wide association studies,” PLoS Computational Biology, vol. 8, no. 12, p. e1002822, 2012.

24. K. P. Murphy, Machine Learning: A Probabilistic Perspective. Cambridge: MIT Press, 2012. 25. T. Bonnici, D. Clifton, P. Watkinson, and L. Tarassenko, “The digital patient,” Clinical Medicine,

vol. 3, no. 3, pp. 252–257, 2013. 26. National Patient Safety Association, “Safer care for acutely ill patients: Learning from serious

accidents,” White Paper, 2007. 27. L. Tarassenko, D. Clifton, M. Pinsky, M. Hravnak, J. Woods, and P. Watkinson, “Centile-based

early warning scores derived from statistical distributions of vital signs,” Resuscitation, vol. 82, no. 8, pp. 1013–1018, 2011.

28. M. Hravnak, M. de Vita, A. Clontz, L. Edwards, C. Valenta, and M. Pinsky, “Cardiorespiratory instability before and after implementing an integrated monitoring system,” Critical Care Medicine, vol. 39, no. 1, pp. 65–72, 2011.

29. G. Clifford and D. Clifton, “Annual review: Wireless technology in disease state management and medicine,” Annual Review of Medicine, vol. 63, pp. 479–492, 2012.

30. D. Meredith, D. Clifton, P. Charlton, J. Brooks, C. Pugh, and L. Tarassenko, “Photoplethysmographic derivation of respiratory rate: A review of relevant respiratory and circulatory physiology,” Journal of Medical Engineering and Technology, vol. 36, no. 1, pp. 60–66, 2012.

Dow

nloa

ded

by [

Dav

id C

lifto

n] a

t 07:

30 0

3 D

ecem

ber

2015

http://genome.cshlp.org

http://genome.cshlp.org

http://www.who.int

http://www.who.int


31. L. Clifton, D. Clifton, M. Pimentel, P. Watkinson, and L. Tarassenko, “Gaussian processes for personalised e-health monitoring with wearable sensors,” IEEE Transactions on Biomedical Engineering, vol. 60, no. 1, pp. 193–197, 2013.

32. L. Clifton, D. Clifton, M. Pimentel, P. Watkinson, and L. Tarassenko, “Predictive monitoring of mobile patients by combining clinical observations with data from wearable sensors,” IEEE Journal of Biomedical and Health Informatics, vol. 18, no. 3, pp. 722–730, 2014.

33. C. Rasmussen and C. Williams, Gaussian Processes for Machine Learning. Cambridge: MIT Press, 2006.

34. M. Pimentel, D. Clifton, L. Clifton, and L. Tarassenko, “Probabilistic estimation of respiratory rate using Gaussian processes,” in IEEE Engineering in Medicine & Biology Conference, Osaka, Japan, 2013, pp. 2902–2095.

35. D. Clifton, L. Clifton, S. Hugueny, D. Wong, and L. Tarassenko, “An extreme function theory for novelty detection,” IEEE Journal of Selected Topics on Signal Processing, vol. 7, no. 1, pp. 28–37, 2013.

36. M. Pimentel, D. Clifton, L. Clifton, P. Watkinson, and L. Tarassenko, “Modelling physiologi-cal deterioration in post-operative patient vital-sign data,” Medical & Biological Engineering & Computation, vol. 51, no. 8, pp. 869–877, 2013.

37. C. Seebregts, B. Mamlin, P. Biondich, H. Fraser, B. Wolfe, D. Jazayeri, C. Allen, J. Miranda, E. Baker, N. Musinguzi, D. Kayiwa, C. Fourie, N. Lesh, A. Kanter, C. Yiannoutsos, and C. Bailey, “The OpenMRS implementers network,” International Journal of Medical Informatics, vol. 78, no. 11, pp. 711–720, 2009.

38. L. Celi, L. Sarmenta, J. Rotberg, A. Marcelo, and G. Clifford, “Mobile care (Moca) for remote diagnosis and screening,” Journal of Health Informatics in Developing Countries, vol. 3, no. 1, pp. 17–21, 2009.

39. M. Tian, “The simplified cardiovascular management in India and China study (Simcard),” http://www.georgeinstitute.org/projects/simplified-cardiovascular-management-in-india -and-china-study-simcard, 2014.

40. D. Clifton, D. Wong, L. Clifton, R. Pullinger, and L. Tarassenko, “A large-scale clinical valida-tion of an integrated monitoring system in the emergency department,” IEEE Transactions on Information Technology in Biomedicine, vol. 17, no. 4, pp. 835–877, 2013.

41. L. Clifton, D. Clifton, Y. Zhang, P. Watkinson, L. Tarassenko, and H. Yin, “Probabilistic novelty detection with support vector machines,” IEEE Transactions on Reliability, vol. 62, no. 2, pp. 455–467, 2014.

42. D. Clifton, S. Hugueny, L. Clifton, and L. Tarassenko, “Extending the Generalised Pareto dis-tribution for novelty detection in high-dimensional spaces,” Journal of Signal Processing Systems, vol. 74, pp. 323–339, 2014.

43. K. Yu, V. Tresp, and A. Schwaighofer, “Learning Gaussian processes from multiple tasks,” in Proceedings of the 22nd International Conference on Machine Learning, 2005, pp. 1012–1019.

44. E. V. Bonilla, K. M. A. Chai, and C. K. I. Williams, “Multi-task Gaussian process prediction,” in Advances in Neural Information Processing Systems (NIPS), 2008, pp. 153–160.

45. R. Dürichen, M. Pimentel, L. Clifton, and D. Clifton, “MTGP—A Matlab toolbox for multi-task Gaussian processes,” http://www.robots.ox.ac.uk/~davidc/code.php, 2014.

46. G. Pillonetto, F. Dinuzzo, and G. De Nicolao, “Bayesian online multitask learning of Gaussian processes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 2, pp. 193–205, 2010.

47. M. A. Álvarez and N. D. Lawrence, “Computationally efficient convolved multiple output Gaussian processes,” Journal of Machine Learning Research, vol. 12, pp. 1459–1500, 2011.

48. M. A. Osborne, S. J. Roberts, A. Rogers, and N. R. Jennings, “Real-time information processing of environmental sensor network data using Bayesian Gaussian processes,” ACM Transactions on Sensor Networks, vol. 9, no. 1, pp. 1–32, 2012.

49. J. Phan, C. Quo, C. Cheng, and M. Wang, “Multiscale integration of -omic, imaging, and clini-cal data in biomedical informatics,” IEEE Reviews in Biomedical Engineering, vol. 5, pp. 74–87, 2012.

Dow

nloa

ded

by [

Dav

id C

lifto

n] a

t 07:

30 0

3 D

ecem

ber

2015

http://www.georgeinstitute.org

http://www.georgeinstitute.org

http://www.robots.ox.ac.uk


50. P. Kirk, J. Griffin, R. Savage, Z. Ghahramani, and D. Wild, “Bayesian correlated clustering to integrate multiple datasets,” Bioinformatics, vol. 28, no. 24, pp. 3290–3297, 2012.

51. D. Neill, “Using artificial intelligence to improve hospital inpatient care,” IEEE Intelligent Systems, vol. 28, no. 2, pp. 92–95, 2013.

52. M. Pavel, H. Jimison, H. Wactlar, T. Hayes, W. Barkis, J. Skapik, and J. Kaye, “The role of technology and engineering models in transforming healthcare,” IEEE Reviews in Biomedical Engineering, vol. 6, pp. 156–177, 2013.

Dow

nloa

ded

by [

Dav

id C

lifto

n] a

t 07:

30 0

3 D

ecem

ber

2015

Dow

nloa

ded

by [

Dav

id C

lifto

n] a

t 07:

30 0

3 D

ecem

ber

2015

Documents

Intelligent Electronic Health Systemsdavidc/pubs/crc_chapter2015.pdf · Intelligent Electronic Health Systems 77 prescribed. Using machine learning approaches, these disparate sources