50
High-Throughput Phenotyping from Electronic Health Records for Clinical and Translational Research Jyotishman Pathak, PhD Associate Professor of Biomedical Informatics Department of Health Sciences Research August 15 th , 2013

High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from Electronic Health Records for Clinical and Translational Research

Jyotishman Pathak, PhD Associate Professor of Biomedical Informatics Department of Health Sciences Research August 15th, 2013

Page 2: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

Outline • Clinical phenotyping from electronic health

records (EHRs) •  eMERGE •  SHARPn

•  Standards-based approaches to EHR phenotyping • NQF Quality Data Model •  JBoss® Drools business rules environment •  PhenotypePortal.org

• On-going and future work

©2013 MFMER | slide-2

Page 3: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-3 [Green et al. Nature 2011; 470:204-213]

Page 4: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs [www.nhgri.gov] ©2013 MFMER | slide-4

Page 5: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

Phenotyping is still a bottleneck…

©2012 MFMER | slide-5 [Image from Wikipedia]

Page 6: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

EHR systems: United States 2008—2011

©2012 MFMER | slide-6

20 Health Information Technology in the United States: Driving Toward Delivery System Change, 2012

CHAPTER 1

Exhibit 8: Changes in Adoption of Basic and Comprehensive EHRs

0%

5%

10%

15%

20%

25%

30%

2011201020092008

Any EHR Basic EHR Comprehensive EHR

Ado

ptio

n R

ate

26.6%

11.9%

18.0%

9.2%2.7%

15.1%

11.5%

3.6%

8.7%

1.5%

7.2%

8.8%

Source: DesRoches CM, Joshi M, Worzala C, Kralovec P, Jha AK. Small, non-teaching, and rural hospitals continue to be slow in adopting electronic health record systems. Health Aff (Millwood). 2012;31(5). [Epub ahead of print], archived and available at www.healthaffairs.org.

.

[DesRoches et al., Health Aff (Millwood), 2012; 31(5)]

Page 7: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

Electronic health records (EHRs) driven phenotyping

•  EHRs are becoming more and more prevalent within the U.S. healthcare system • Meaningful Use is one of the major drivers

• Overarching goal •  To develop high-throughput semi-automated

techniques and algorithms that operate on normalized EHR data to identify cohorts of potentially eligible subjects on the basis of disease, symptoms, or related findings both retrospectively and prospectively

©2012 MFMER | slide-7

Page 8: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-8

http://gwas.org [eMERGE Network]

Page 9: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

EHR-driven Phenotyping Algorithms - I

•  Typical components •  Billing and diagnoses codes •  Procedure codes •  Labs •  Medications •  Phenotype-specific co-variates (e.g., Demographics,

Vitals, Smoking Status, CASI scores) •  Pathology •  Radiology

• Organized into inclusion and exclusion criteria

©2012 MFMER | slide-9

[eMERGE Network]

Page 10: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

Data Transform Transform

EHR-driven Phenotyping Algorithms - II

Phenotype Algorithm

Visualization

Evaluation

NLP, SQL

Rules

Mappings [eMERGE Network]

©2012 MFMER | slide-10

Page 11: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

No secondary causes (e.g., pregnancy, ablation)

No ICD-9s for Hypothyroidism

No Abnormal TSH/FT4

No  An&bodies  for  TTG/TPO  

ICD-­‐9s  for  Hypothyroidism

An&bodies for TTG or TPO (anti-thyroglobulin, anti-thyroperidase)

Abnormal TSH/FT4

No thyroid-altering medications (e.g., Phenytoin, Lithium)

Thyroid replace. meds

Case  1   Case  2  

No thyroid replace. meds

Control  

2+ non-acute visits in 3 yrs

No Hx of myasthenia gravis

Example: Hypothyroidism Algorithm

©2012 MFMER | slide-11

[Denny et al., ASHG, 2012; 89:529-542]

Page 12: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

Example: Hypothyroidism Algorithm

©2012 MFMER | slide-12 [Conway et al. AMIA 2011: 274-83]

Drugs

Labs

Diagnosis

NLP

Proce- dures

Demo- graphics

Vitals

Page 13: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

Phenotyping Algorithms

Data Categories used to define EHR-driven Phenotyping Algorithms

Clinical gold standard

EHR-derived phenotype

Validation (PPV/NPV)

Sensitivity (Case/Cntrl)

Alzheimer’s Dementia

Demographics, clinical examination of mental status, histopathologic

examination

Diagnoses, medications 73%/55% 37.1%/99%

Cataracts Clinical exam finding

(Ophthalmologic examination)

Diagnoses, procedure codes 98%/98% 99.1%/93.6%

Peripheral Arterial Disease

Clinical exam finding (ankle-brachial index

or arteriography)

Diagnoses, procedure codes,

medications, radiology test

results

94%/99% 85.5%/81.6%

Type 2 Diabetes

Laboratory Tests Diagnoses,

laboratory tests, medications

98%/100% 100%/100%

Cardiac Conduction

ECG measurements ECG report results 97% (case only algorithm)

96.9% (case only algorithm) [eMERGE Network]

Page 14: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

“EHR Depth” plays an important role

[Wei et al. IJIM 2012 (Epub ahead of print)]

©2012 MFMER | slide-14

Page 15: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

0.5 5

Genotype-Phenotype Association Results

0.5 50.5 5.0 1.0

Odds Ratio

rs2200733 Chr. 4q25 rs10033464 Chr. 4q25 rs11805303 IL23R rs17234657 Chr. 5 rs1000113 Chr. 5 rs17221417 NOD2 rs2542151 PTPN22 rs3135388 DRB1*1501 rs2104286 IL2RA rs6897932 IL7RA rs6457617 Chr. 6 rs6679677 RSBN1 rs2476601 PTPN22 rs4506565 TCF7L2 rs12255372 TCF7L2 rs12243326 TCF7L2 rs10811661 CDKN2B rs8050136 FTO rs5219 KCNJ11 rs5215 KCNJ11 rs4402960 IGF2BP2

Atrial fibrillation

Crohn's disease

Multiple sclerosis

Rheumatoid arthritis

Type 2 diabetes

disease gene / region marker

2.0 [Ritchie et al. AJHG 2010; 86(4):560-72]

observed published

©2012 MFMER | slide-15

Page 16: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

Key lessons learned from eMERGE •  Algorithm design and transportability

•  Non-trivial; requires significant expert involvement •  Highly iterative process •  Time-consuming manual chart reviews •  Representation of “phenotype logic” is critical

•  Standardized data access and representation •  Importance of unified vocabularies, data elements, and value sets •  Questionable reliability of ICD & CPT codes (e.g., billing the wrong

code since it is easier to find) •  Natural Language Processing (NLP) is critical

©2012 MFMER | slide-16

[Kho et al. Sc. Trans. Med 2011; 3(79): 1-7]

Page 17: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

SHARPn: Secondary Use of EHR Data

©2012 MFMER | slide-17

Building a robust, scalable and standards-driven infrastructure for secondary useof EHR data: The SHARPn project

Susan Rea a,!, Jyotishman Pathak b, Guergana Savova c, Thomas A. Oniki d, Les Westberg e, Calvin E. Beebe b,Cui Tao b, Craig G. Parker a, Peter J. Haug a,f, Stanley M. Huff d,f, Christopher G. Chute b

a Homer Warner Center for Informatics Research, Intermountain Healthcare, Murray, UT, USAb Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USAc Childrens Hospital Boston and Harvard Medical School, Boston, MA, USAd Intermountain Medical Center, Intermountain Healthcare, Murray, UT, USAe Agilex Technologies, Inc., Chantilly, VA, USAf Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA

a r t i c l e i n f o

Article history:Received 2 September 2011Accepted 25 January 2012Available online xxxx

Keywords:Electronic health recordsMeaningful useHealth information exchangeHigh-throughput phenotypingNatural language processing

a b s t r a c t

The Strategic Health IT Advanced Research Projects (SHARP) Program, established by the Office of theNational Coordinator for Health Information Technology in 2010 supports research findings that removebarriers for increased adoption of health IT. The improvements envisioned by the SHARP Area 4 Consor-tium (SHARPn) will enable the use of the electronic health record (EHR) for secondary purposes, such ascare process and outcomes improvement, biomedical research and epidemiologic monitoring of thenation’s health. One of the primary informatics problem areas in this endeavor is the standardizationof disparate health data from the nation’s many health care organizations and providers. The SHARPnteam is developing open source services and components to support the ubiquitous exchange, sharingand reuse or ‘liquidity’ of operational clinical data stored in electronic health records. One year intothe design and development of the SHARPn framework, we demonstrated end to end data flow and a pro-totype SHARPn platform, using thousands of patient electronic records sourced from two large healthcareorganizations: Mayo Clinic and Intermountain Healthcare. The platform was deployed to (1) receivesource EHR data in several formats, (2) generate structured data from EHR narrative text, and (3) normal-ize the EHR data using common detailed clinical models and Consolidated Health Informatics standardterminologies, which were (4) accessed by a phenotyping service using normalized data specifications.The architecture of this prototype SHARPn platform is presented. The EHR data throughput demonstra-tion showed success in normalizing native EHR data, both structured and narrative, from two indepen-dent organizations and EHR systems. Based on the demonstration, observed challenges forstandardization of EHR data for interoperable secondary use are discussed.

! 2012 Elsevier Inc. All rights reserved.

1. Introduction

Strategic Health IT Advanced Research Projects (SHARP) wereestablished under the direction of the Office of the National Coor-dinator for Health Information Technology (ONC) [1,2]. SHARP re-search and development is focused on achieving breakthroughadvances to address well-documented problems that have im-peded adoption of the electronic health record (EHR):

! Security and privacy of health information.! User interfaces that support clinical reasoning and decision-

making.! Shared application and network architectures.! Secondary use of EHR data to improve health [3,4].

The health improvements envisioned by the secondary-useSHARP consortium (SHARPn) include care process and outcomesmeasurement, more efficient biomedical research and support forpublic health activities. Critical to these secondary-use scenariosis the transformation of health information into standards-con-forming, comparable, and consistent data. Traditionally, a patient’smedical information, such as medical history, exam data, hospitalvisits and physician notes, are stored inconsistently and in multiple

1532-0464/$ - see front matter ! 2012 Elsevier Inc. All rights reserved.doi:10.1016/j.jbi.2012.01.009

! Corresponding author. Address: Homer Warner Center for Informatics Research,Intermountain Medical Center, 5171 Cottonwood St., Murray, UT 84107, USA. Fax:+1 801 442 0748.

E-mail address: [email protected] (S. Rea).

Journal of Biomedical Informatics xxx (2012) xxx–xxx

Contents lists available at SciVerse ScienceDirect

Journal of Biomedical Informatics

journal homepage: www.elsevier .com/locate /y jb in

Please cite this article in press as: Rea S et al. Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: The SHARPnproject. J Biomed Inform (2012), doi:10.1016/j.jbi.2012.01.009

Page 18: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

Data Transform Transform

Algorithm Development Process - Modified

Phenotype Algorithm

Visualization

Evaluation

NLP, SQL

Rules

Mappings

Semi-Automatic Execution

©2012 MFMER | slide-18

•  Standardized representation of clinical data

•  Create new and re-use existing clinical element models (CEMs)

•  Standardized and structured representation of phenotype definition criteria

•  Use the NQF Quality Data Model (QDM)

•  Conversion of structured phenotype criteria into executable queries

•  Use JBoss® Drools (DRLs)

[Welch et al., JBI 2012; 45(4):763-71]

Page 19: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

The SHARPn “phenotyping funnel”

©2012 MFMER | slide-19

Phenotype specific patient cohorts

DRLs

QDMs CEMs

Intermountain EHR data

Mayo Clinic EHR data

[Welch et al., JBI 2012; 45(4):763-71]

Page 20: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-20

Page 21: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

©2012 MFMER | slide-21

SHARPn data normalization architecture - II

CEM CouchDB database with standardized and normalized patient data

[Welch et al., JBI 2012; 45(4):763-71]

Page 22: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

Data Transform Transform

Algorithm Development Process - Modified

Phenotype Algorithm

Visualization

Evaluation

NLP, SQL

Rules

Mappings

Semi-Automatic Execution

©2012 MFMER | slide-22

•  Standardized representation of clinical data

•  Create new and re-use existing clinical element models (CEMs)

•  Standardized and structured representation of phenotype definition criteria

•  Use the NQF Quality Data Model (QDM)

[Welch et al., JBI 2012; 45(4):763-71]

Page 23: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

Example algorithm: Hypothyroidism

©2012 MFMER | slide-23

Page 24: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

NQF Quality Data Model (QDM) •  Standard of the National Quality Forum (NQF)

•  A structure and grammar to represent quality measures and phenotype definitions in a standardized format

•  Groups of codes in a code set (ICD-9, etc.) •  "Diagnosis, Active: steroid induced diabetes" using

"steroid induced diabetes Value Set GROUPING (2.16.840.1.113883.3.464.0001.113)”

•  Supports temporality & sequences •  AND: "Procedure, Performed: eye exam" > 1 year(s)

starts before or during "Measurement end date"

•  Implemented as a set of XML schemas •  Links to standardized terminologies (ICD-9, ICD-10,

SNOMED-CT, CPT-4, LOINC, RxNorm etc.)

©2012 MFMER | slide-24

Page 25: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

Example: Diabetes & Lipid Mgmt. - I

©2012 MFMER | slide-25

Page 26: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

Example: Diabetes & Lipid Mgmt. - II

©2012 MFMER | slide-26

Human readable HTML

Page 27: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

Example: Diabetes & Lipid Mgmt. - III

©2012 MFMER | slide-27

Page 28: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

Example: Diabetes & Lipid Mgmt. - IV

©2012 MFMER | slide-28

Page 29: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

Example: Diabetes & Lipid Mgmt. - V

©2012 MFMER | slide-29

Computer readable XML (based on HL7 RIM semantics)

Page 30: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-30

[Thompson et al., AMIA 2012; (Epub ahead of print)]

Table 2. Use of QDM Value Sets within eMERGE continuous measure algorithms.

Algorithm Clinical Information

Terminology No. of Value Sets Used

Height Diagnosis Laboratory Medication Patient Char. Physical Exam

ICD-9 Grouped LOINC RXNORM Grouped SNOMED-CT LOINC SNOMED-CT Grouped

12 2 1 5 1 1 1 1 1

Serum lipid level Diagnosis Laboratory Medication

ICD-9 ICD-10 SNOMED-CT Grouped LOINC Grouped RXNORM Grouped

4 2 2 3 4 1 8 2

Low HDL cholesterol level Diagnosis Laboratory Medication

ICD-9 ICD-10 SNOMED-CT Grouped Keywords LOINC Grouped RXNORM Grouped

4 2 2 3 1 1 1 7 1

QRS duration Diagnosis Diagnostic Study Laboratory Medication Physical Exam

ICD-9 UMLS CUI LOINC RXNORM SNOMED-CT

2 1 3 2 1

Table 3. Use of Boolean and temporal relationships within eMERGE algorithms implemented in the QDM

Algorithm Boolean Operators

Max Depth

Temporal Relationships

Diabetic retinopathy Cases Controls

8 6

2 2

0 0

Height 8 1 4 Serum lipid level 6 1 5 Low HDL cholesterol level 6 1 5 Peripheral arterial disease 17 4 1 QRS duration 28 4 8 Resistant hypertension Cases Controls

172 26

5 4

2 5

Type 2 diabetes 15 3 5 Cataract Cases Controls

9 3

3 1

2 2

Page 6 of 10

An Evaluation of the NQF Quality Data Model for Representing Electronic Health Record Driven Phenotyping Algorithms

William K. Thompson, Ph.D.1, Luke V. Rasmussen1, Jennifer A. Pacheco1, Peggy L. Peissig, M.B.A.2, Joshua C. Denny, M.D. 3, Abel N. Kho, M.D. 1,

Aaron Miller, Ph.D.2, Jyotishman Pathak, Ph.D.4, 1Northwestern University, Chicago, IL; 2Marshfield Clinic, Marshfield, WI; 3Vanderbilt

University, Nashville, TN; 4Mayo Clinic, Rochester, MN

Abstract

The development of Electronic Health Record (EHR)-based phenotype selection algorithms is a non-trivial and highly iterative process involving domain experts and informaticians. To make it easier to port algorithms across institutions, it is desirable to represent them using an unambiguous formal specification language. For this purpose we evaluated the recently developed National Quality Forum (NQF) information model designed for EHR-based quality measures: the Quality Data Model (QDM). We selected 9 phenotyping algorithms that had been previously developed as part of the eMERGE consortium and translated them into QDM format. Our study concluded that the QDM contains several core elements that make it a promising format for EHR-driven phenotyping algorithms for clinical research. However, we also found areas in which the QDM could be usefully extended, such as representing information extracted from clinical text, and the ability to handle algorithms that do not consist of Boolean combinations of criteria.

Introduction and Motivation

Identifying subjects for clinical trials and research studies can be a time-consuming and expensive process. For this reason, there has been much interest in using the electronic health records (EHRs) to automatically identify patients that match clinical study eligibility criteria, making it possible to leverage existing patient data to inexpensively and automatically generate lists of patients that possess desired phenotypic traits.1,2 Yet the development of EHR-based phenotyping algorithms is a non-trivial and highly iterative process involving domain experts and data analysts.3 It is therefore desirable to make it as easy as possible to re-use such algorithms across institutions in order to minimize the degree of effort involved, as well as the potential for errors due to ambiguity or under-specification. Part of the solution to this issue is the adoption of an unambiguous and precise formal specification language for representing phenotyping algorithms. This step is naturally a pre-condition for achieving the long-term goal of automatically executable phenotyping algorithm specifications. A key element required to achieve this goal is a formal representation for modeling the algorithms that can aid portability by enforcing standard syntax on algorithm specifications, along with providing well-defined mappings to standard semantics of data elements and value sets.

Our experience in the development of phenotyping algorithms stems from work performed as part of the electronic Medical Records and Genomics (eMERGE)4 consortium, a network of seven sites originally using data collected in the EHR as part of routine clinical care to detect phenotypes for use in genome-wide association studies. The successful first phase of eMERGE involved the development and validation of 14 phenotyping algorithms, which were shared among five separate sites with widely diverse EHR systems.5 Execution of these phenotypes have resulted in new genetic discoveries using data from one and across multiple sites.6,7 The second phase of eMERGE, beginning in 2011, added two additional sites and has the goal of developing and disseminating a total of 35 EHR-based phenotyping algorithms. It has become abundantly clear to those involved in this effort that the adoption of an appropriate formal representation language for the algorithms would provide practical benefits for both portability and presentation of results. The work in this paper is framed by a pragmatic attempt to realize these benefits in the context of lessons learned from the eMERGE project.

While a variety of alternative possibilities exist8 for representing phenotype definition criteria, including the HL7 Arden Syntax,9 the SAGE guideline model,10 and the GELLO clinical decision support language11, as described by Weng et al. as well as our own prior work, there is currently no definitive formal language to select for the purpose of representing EHR-based phenotyping algorithms. Consequently, the primary goal of our study is to investigate the suitability of a representational format recently developed and released by the non-profit National Quality Forum (NQF)12—called the Quality Data Model13 (QDM)— which is an information model for representing EHR-based quality measures. The QDM was developed by the NQF in response to the growing need for quality measures (“eMeasures”14) that map to data in the EHR systems, with a particularly significant driver being the implementation

Page 31: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-31

[Thompson et al., AMIA 2012; (Epub ahead of print)]

Table 4. Use of Natural Language Processing (NLP) as specified within eMERGE algorithms. Results indicate if the algorithm mentions the use of NLP at all, required that negation be factored into the inclusion/exclusion criteria (e.g., presence/absence of “Patient  does  not  have  condition  X”),  requires  that  no  evidence  of  a  concept  be  found  (e.g., no presence  of  “Patient  has  X”),  or  if  the NLP required searching in specific document sections.

Algorithm Uses NLP Term Negation No Evidence Of Document Section

Restriction Diabetic retinopathy Yes Yes No Yes

Height Yes No No No

Serum lipid level No - - -

Low HDL cholesterol level Yes No Yes No

Peripheral arterial disease Yes No No No

QRS duration Yes Yes Yes Yes

Resistant hypertension Yes No No No

Type 2 diabetes No - - -

Cataract Yes No No No

Discussion

Implementing the selected eMERGE phenotyping algorithms using the MAT yielded insights on the suitability of this tool and the QDM overall for representing phenotyping algorithms for clinical research. In our estimation one of the core strengths of the QDM is its ability to represent phenotyping algorithms for both machine and human consumption. In addition to supporting the export of structured XML file descriptors of algorithms, the MAT also allows users to export a style sheet for executing an XSL transform of the XML source into human-readable HTML (Figure 2). The source XML permits the algorithm developer to add structured human-readable visual aids such as tables and lists that become part of the generated HTML, as well as hyperlinks to external images, flowcharts, and other supporting documents. In the eMERGE project we found these types of documentation to be an invaluable means for communicating algorithm logic to external institutions,4 and the ability of the MAT to support this is very welcome.

The representation of value sets within the QDM as lists external to the measure definition, which are assigned a globally unique identifier, allows for the re-use of value sets. Furthermore, the MAT facilitates the sharing of value sets across algorithms by providing a centralized, searchable repository of value sets for authors. This capability was employed in several of the eMERGE algorithms, indicating the usefulness of this feature for algorithm development. Shared value sets were used in several instances for the case vs. control population definitions within a single phenotyping algorithm (e.g., the Diabetic Retinopathy case and control populations shared diagnosis value sets). Sharing also occurred at a more significant level, such as diagnosis value sets between the Type 2 Diabetes and Diabetic Retinopathy algorithms, and laboratory value sets between the Low HDL and Serum Lipid Levels algorithms. We were also able to re-use pre-existing value sets that the MAT allowed us to discover with its search capability. For example, the Resistant Hypertension algorithm re-used a pre-existing  grouped  value  sets  for  “ACE  Inhibitor/  or  ARB  medications”  from  the  American  Medical  Association   - Physician Consortium for Performance Improvement, and the Serum Lipids algorithm re-used the HDL, LDL, Total Cholesterol, and Triglycerides laboratory tests value sets from the National Committee for Quality Assurance. This kind of re-use is another major benefit of using the QDM and MAT. One author alone used pre-existing value sets for more than a third of the total used across four phenotyping algorithms. In addition, the QDM requires strict definitions of value sets – code sets, code set version, a code and description for each entry – which are not imposed in the current free-text document-based representation of the eMERGE algorithms. These strict definitions impose high standards on the algorithm developer to perform more work up front to provide precise definitions, which in turn decreases potential downstream errors due to ambiguity and under-specification. It also provides the potential for future automation of the algorithms.

A limitation of the QDM, however, is the lack of support for sharing logic (whether partial components or entire algorithm) between algorithms. An example is within the eMERGE Diabetic Retinopathy algorithm, which utilized

Page 7 of 10

An Evaluation of the NQF Quality Data Model for Representing Electronic Health Record Driven Phenotyping Algorithms

William K. Thompson, Ph.D.1, Luke V. Rasmussen1, Jennifer A. Pacheco1, Peggy L. Peissig, M.B.A.2, Joshua C. Denny, M.D. 3, Abel N. Kho, M.D. 1,

Aaron Miller, Ph.D.2, Jyotishman Pathak, Ph.D.4, 1Northwestern University, Chicago, IL; 2Marshfield Clinic, Marshfield, WI; 3Vanderbilt

University, Nashville, TN; 4Mayo Clinic, Rochester, MN

Abstract

The development of Electronic Health Record (EHR)-based phenotype selection algorithms is a non-trivial and highly iterative process involving domain experts and informaticians. To make it easier to port algorithms across institutions, it is desirable to represent them using an unambiguous formal specification language. For this purpose we evaluated the recently developed National Quality Forum (NQF) information model designed for EHR-based quality measures: the Quality Data Model (QDM). We selected 9 phenotyping algorithms that had been previously developed as part of the eMERGE consortium and translated them into QDM format. Our study concluded that the QDM contains several core elements that make it a promising format for EHR-driven phenotyping algorithms for clinical research. However, we also found areas in which the QDM could be usefully extended, such as representing information extracted from clinical text, and the ability to handle algorithms that do not consist of Boolean combinations of criteria.

Introduction and Motivation

Identifying subjects for clinical trials and research studies can be a time-consuming and expensive process. For this reason, there has been much interest in using the electronic health records (EHRs) to automatically identify patients that match clinical study eligibility criteria, making it possible to leverage existing patient data to inexpensively and automatically generate lists of patients that possess desired phenotypic traits.1,2 Yet the development of EHR-based phenotyping algorithms is a non-trivial and highly iterative process involving domain experts and data analysts.3 It is therefore desirable to make it as easy as possible to re-use such algorithms across institutions in order to minimize the degree of effort involved, as well as the potential for errors due to ambiguity or under-specification. Part of the solution to this issue is the adoption of an unambiguous and precise formal specification language for representing phenotyping algorithms. This step is naturally a pre-condition for achieving the long-term goal of automatically executable phenotyping algorithm specifications. A key element required to achieve this goal is a formal representation for modeling the algorithms that can aid portability by enforcing standard syntax on algorithm specifications, along with providing well-defined mappings to standard semantics of data elements and value sets.

Our experience in the development of phenotyping algorithms stems from work performed as part of the electronic Medical Records and Genomics (eMERGE)4 consortium, a network of seven sites originally using data collected in the EHR as part of routine clinical care to detect phenotypes for use in genome-wide association studies. The successful first phase of eMERGE involved the development and validation of 14 phenotyping algorithms, which were shared among five separate sites with widely diverse EHR systems.5 Execution of these phenotypes have resulted in new genetic discoveries using data from one and across multiple sites.6,7 The second phase of eMERGE, beginning in 2011, added two additional sites and has the goal of developing and disseminating a total of 35 EHR-based phenotyping algorithms. It has become abundantly clear to those involved in this effort that the adoption of an appropriate formal representation language for the algorithms would provide practical benefits for both portability and presentation of results. The work in this paper is framed by a pragmatic attempt to realize these benefits in the context of lessons learned from the eMERGE project.

While a variety of alternative possibilities exist8 for representing phenotype definition criteria, including the HL7 Arden Syntax,9 the SAGE guideline model,10 and the GELLO clinical decision support language11, as described by Weng et al. as well as our own prior work, there is currently no definitive formal language to select for the purpose of representing EHR-based phenotyping algorithms. Consequently, the primary goal of our study is to investigate the suitability of a representational format recently developed and released by the non-profit National Quality Forum (NQF)12—called the Quality Data Model13 (QDM)— which is an information model for representing EHR-based quality measures. The QDM was developed by the NQF in response to the growing need for quality measures (“eMeasures”14) that map to data in the EHR systems, with a particularly significant driver being the implementation

Page 32: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

Data Transform Transform

Algorithm Development Process - Modified

Phenotype Algorithm

Visualization

Evaluation

NLP, SQL

Rules

Mappings

Semi-Automatic Execution

©2012 MFMER | slide-32

•  Standardized representation of clinical data

•  Create new and re-use existing clinical element models (CEMs)

•  Standardized and structured representation of phenotype definition criteria

•  Use the NQF Quality Data Model (QDM)

•  Conversion of structured phenotype criteria into executable queries

•  Use JBoss® Drools (DRLs)

[Welch et al., JBI 2012; 45(4):763-71]

Page 33: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

JBoss® Drools rules management system

©2012 MFMER | slide-33

•  Represents knowledge with declarative production rules •  Origins in artificial intelligence

expert systems •  Simple when <pattern> then

<action> rules specified in text files

•  Separation of data and logic into separate components

•  Forward chaining inference model (Rete algorithm)

•  Domain specific languages (DSL)

Page 34: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

Example Drools rule

©2012 MFMER | slide-34

rule “Glucose <= 40, Insulin On” when

$msg : GlucoseMsg(glucoseFinding <= 40, currentInsulinDrip > 0 )

then

glucoseProtocolResult.setInstruction(GlucoseInstructions.GLUCOSE_LESS_THAN_40_INSULIN_ON_MSG); end

{binding} {Java Class} {Class Getter Method}

Parameter {Java Class}

{Class Setter Method}

{Rule Name}

Page 35: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-35

Modeling and Executing Electronic Health Records Driven Phenotyping Algorithms using the NQF Quality Data Model and JBoss® Drools Engine

Dingcheng Li, PhD1, Gopu Shrestha, MS1, Sahana Murthy, MS1 Davide Sottara, PhD2 Stanley M. Huff, MD3 Christopher G. Chute, MD, DrPH1 Jyotishman Pathak, PhD1

1Mayo Clinic, Rochester, MN 2University of Bologna, Italy 3Intermountain Healthcare, Salt Lake City, UT

Abstract With increasing adoption of electronic health records (EHRs), the need for formal representations for EHR-driven phenotyping algorithms has been recognized for some time. The recently proposed Quality Data Model from the National Quality Forum (NQF) provides an information model and a grammar that is intended to represent data collected during routine clinical care in EHRs as well as the basic logic required to represent the algorithmic criteria for phenotype definitions. The QDM is further aligned with Meaningful Use standards to ensure that the clinical data and algorithmic criteria are represented in a consistent, unambiguous and reproducible manner. However, phenotype definitions represented in QDM, while structured, cannot be executed readily on existing EHRs. Rather, human interpretation, and subsequent implementation is a required step for this process. To address this need, the current study investigates open-source JBoss® Drools rules engine for automatic translation of QDM criteria into rules for execution over EHR data. In particular, using Apache Foundation’s Unstructured Information Management Architecture (UIMA) platform, we developed a translator tool for converting QDM defined phenotyping algorithm criteria into executable Drools rules scripts, and demonstrate their execution on real patient data from Mayo Clinic to identify cases for Coronary Artery Disease and Diabetes. To the best of our knowledge, this is the first study illustrating a framework and an approach for executing phenotyping criteria modeled in QDM using the Drools business rules management system.

Introduction Identification of patient cohorts for conducting clinical and research studies has always been a major bottleneck and time-consuming process. Several studies as well as reports from the FDA have highlighted the issues in delays with subject recruitment and its impact on clinical research and public health 1-3. To meet this important requirement, increasing attention is being paid recently to leverage electronic health record (EHR) data for cohort identification4-6. In particular, with the increasing adoption of EHRs for routine clinical care within the U.S. due to Meaningful Use7, evaluating the strengths and limitations for secondary use of EHR data has important implications for clinical and translational research, including clinical trials, observational cohorts, outcomes research, and comparative effectiveness research.

In the recent past, several large-scale national and international projects, including eMERGE8 SHARPn9 and i2b210, are developing tools and technologies for identifying patient cohorts using EHRs. A key component in this process is to define the “pseudocode” in terms of subject inclusion and exclusion criteria comprising primarily semi-structured data fields in the EHRs (e.g., billing and diagnoses codes, procedure codes, laboratory results, and medications). These pseudocodes, commonly referred to as phenotyping algorithms11, also comprise logical operators, and are in general, represented in Microsoft Word and PDF documents as unstructured text. While the algorithm development is a team effort, which includes clinicians, domain experts, and informaticians, informatics and IT experts often operationalize their implementation. Human intermediary between the algorithm and the EHR system is required for primarily two main reasons: first, due to the lack of formal representation or specification language used for modeling the phenotyping algorithms, a domain expert has to interpret the algorithmic criteria. And second, due to the disconnection between how the criteria might be specified and how the clinical data is represented within an institution’s EHR (e.g., algorithm criteria might use LOINC® codes for lab measurements, whereas the EHR data might be represented using a local or proprietary lab code system), a human interpretation is required to transform the algorithm criteria and logic into a set of queries that can be executed on the EHR data.

To address both these challenges, the current study investigates the Quality Data Model12 (QDM) proposed by the National Quality Forum (NQF) along with open-source JBoss® Drools rules management system for the modeling and execution of EHR-driven phenotyping algorithms automatically. QDM is an information model and a grammar that is intended to represent data collected during routine clinical care in EHRs as well as the basic logic required to

Page 1 of 10

[Li et al., AMIA 2012; (Epub ahead of print)]

Page 36: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

The “executable” Drools workflow

©2012 MFMER | slide-36

[Li et al., AMIA 2012; (Epub ahead of print)]

Page 37: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs [Endle et al., AMIA 2012; (Epub ahead of print)]

http://phenotypeportal.org

Page 38: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

PhenotypePortal Architecture

©2013 MFMER | slide-38

Page 39: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

PhenotypePortal Demo

•  http://107.20.139.191:8080/htp2014/

©2013 MFMER | slide-39

Page 40: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

On-going/Future planned activities: 2013-14 • Measure Authoring Toolkit (MAT) integration

•  Seamless authoring capabilities •  API decoupling/additional REST interfaces

•  Invocation of services w/o UI client

•  Improve scalability and performance •  No need to wait for long batch processing as the

execution can be done in real time. As a result, multiple translators can be distributed to handle large patient sets.

•  Batch processing can be done and code has been written to integrate with Hadoop to parallelize execution across large patient sets.

•  Additional optimizations are still possible, such as caching, etc.

©2013 MFMER | slide-40

Page 41: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

Sustainability plan • NIH funded R01 (July 2013—June 2017)

• National Infrastructure for EHR-driven phenotyping algorithm modeling and execution (PI: Pathak)

•  Expand HTP infrastructure to additional workflow engines (KNIME), repositories (i2b2), and analytical platforms (R)

• Mayo Clinical Informatics Program (Jan 2013-) •  Library of Cohort Definitions for use in Mayo

Clinic practice and research • Real-time phenotype/cohort dashboard

©2013 MFMER | slide-41

Page 42: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

Concluding remarks •  EHRs contain a wealth of phenotypes for clinical

and translational research

•  EHRs represent real-world data, and hence has challenges with interpretation, wrong diagnoses, and compliance with medications • Handling referral patients even more so

•  Standardization and normalization of clinical data and phenotype definitions is critical

•  Phenotyping algorithms are often transportable between multiple EHR settings •  Validation is an important component

©2012 MFMER | slide-42

Page 43: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

Acknowledgment

• Mayo HTP team

•  Intermountain HTP team

•  Erin Martin

©2013 MFMER | slide-43

Page 44: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

Thank You!

©2012 MFMER | slide-44

Page 45: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

Back-Up

©2013 MFMER | slide-45

Page 46: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

1.  Converts QDM to Drools

2.  Rule execution by querying the CEM database

3.  Generate summary reports

Page 47: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs ©2013 MFMER | slide-47

Page 48: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs ©2013 MFMER | slide-48

Page 49: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs ©2013 MFMER | slide-49

Page 50: High-Throughput Phenotyping from Electronic Health Records ...informatics.mayo.edu/sharp/images/8/84/SHARP_HTP_AUG2013.pdf · High-Throughput Phenotyping from Electronic Health Records

High-Throughput Phenotyping from EHRs

Translator

• Distinguishing Features •  Operates on real-time patient set and/or batch processing •  Drools rules could be reused and combined •  Rules can give immediate feedback as a new information is

known as new lab results or patient visits are added to the data. •  Easier integration into clinical setting because the translator can

work on observed changes in data instead of only scheduling batch processes. Clients would not have to trigger algorithm execution as this execution would be triggered when new data is encountered. This would minimize points of integration.

•  Open source license, built on established projects such as PopHealth and Cypress.