Deliverable 4.4: Final specification of EHR4CR semantic ... › i-HD › assets › File › EHR4CR... · 9.3.3 Scenario C.: Scenarios about hl7: Templates. 83 9.3.4 Detailed Functional

1

Electronic Health Records for Clinical Research

Deliverable 4.4: Final specification of EHR4CR semantic interoperability solutions

Version 1.0

Final

29/02/2016

Project acronym: EHR4CR Project full title: Electronic Health Records for Clinical Research Grant agreement no.: 115189 Budget: 16 million EURO Start: 01.03.2011 - End: 28.02.2015 Website: www.ehr4cr.eu

The EHR4CR project is partially funded by the IMI JU

programmed Coordinator:

Managing Entity:

http://www.ehr4cr.eu/

2

Document description

Deliverable no: 4.4

Deliverable title: Final specification of EHR4CR semantic interoperability solutions

Description: This deliverable describes the final implementation of the EHR4CR semantic interoperability services provided to address the needs of any specific project intending to use any services defined as part of the three EHR4CR use case (PFS, PRS or CTE). It describes the result of the activities executed during Task 4.3, Task 4.4 and Task 4.5.

Extension of Task 4.3 (Terminologies Services and Tools) to the context of CTE

Task 4.4 (Knowledge Models Mapping and Management Services and Tools) consisting in the design and implementation of tools supporting structural and terminological mapping between the EHR4CR Common Information Model and the models used in EHR/CDW systems and clinical research systems.

Task 4.5 (Knowledge Authoring) consisting in the design and implementation of tools supporting the creation and management of EHR4CR semantic resources

The deliverable describes the governance of the semantic interoperability platform, the adopted approach and overall specifications. It describes the EHR4CR standardization pipeline needed to fulfill the requirements of the EHR4CR uses cases (PFS, PRS or CTE) and the EHR4CR semantic interoperability services (SIS) used during both the set up and execution phases of EHR4CR use case (PFS, PRS or CTE). It also describes first evaluation of the EHR4CR standardization pipeline and semantic interoperability services, and the design and implementation of the EHR4CR Clinical Data Warehouse and of the extract-transform-load (ETL) process adopted for CDW population.

Status: Final

Version: 1.0 Date: 29/02/2016

Deadline:

Editors: C. Daniel, S. Hussain, E. Sadou, D. Ouagne, K. Forsberg, E. Zapletal, Mark Mc Gilchrist

Outputs: Type Description Communication Vehicle

Report … …

… … …

… … …

Other … …

3

Document history

Date Revision

Author(s) Changes

30/09/2014 0.1 S. Hussain, C. Daniel Table of content and first contributions

11/11/2014 0.7 C. Daniel, D.Kalra Draft of section Governance, process, responsibilities and roles

09/12/2014 0.8 C. Daniel Draft of section sections informatics infrastructure

09/01/2015 0.9 C. Daniel Draft of section sections informatics infrastructure, evaluation, conclusion

06/04/2015 0.10 C. Daniel Input from Sebastian Mate and Mark McGilchrist

16/04/2015 0.11 C.Daniel,S.Hussain, D.Ouagne, E.Sadou

Draft of glossary, specification of the semantic interoperability services

28/04/2015 0.12 C.Daniel Improved draft of glossary, semantic interoperability specification

05/05/2015 0.13 C.Daniel, E.Zapletal Draft section for structural mapping. Input from Eric Zapletal: EHR4CR Terminology Mapping Status Manager (TMSM) section

27/05/2015 0.14 C.Daniel, E.Zapletal Input from E.Zapletal: Structural mapping section

19/06/2015 0.15 C.Daniel, S.Hussain, D.Ouagne, E.Sadou, M.McGilchrist

Section about the EHR4CR Clinical Data Warehouse

30/07/2015 0.16 C.Daniel, S.Hussain

22/11/2015 0.17 C.Daniel, D.Ouagne, E.Sadou

Discussion & Conclusion

29/02/2016 1.0 C.Daniel Final review

4

Table of Contents

1 Introduction 6 1.1 EHR4CR platform & use cases overview 7 1.2 EHR4CR Semantic Interoperability Services overview 8 1.3 Objective of the deliverable 9

1.3.1 Outline of the deliverable 9 1.3.2 WP4 deliverables interdependencies 9

1.4 Reference documents 10 1.4.1 Reference Documents 10 1.4.2 Developed Documents 10

1.5 Definitions and acronyms 10 1.5.1 Definitions 10 1.5.2 Acronyms 14

2 Semantic interoperability overall specification 15 2.1 Approach 15

2.1.1 What is a mediation model? 15 2.1.2 Why do we need a mediation model? 15 2.1.3 How to build the EHR4CR Common Information Model as mediation model? 16

2.2 Governance 16 2.3 Overview of the specifications 16

2.3.1 Semantic interoperability requirements for patient identification based on eligibility criteria (use case 1 & 2) 17 2.3.2 Semantic interoperability requirements for data extraction and form pre-population (use case 3) 18

3 Standardization pipeline 23 3.1 Managing the EHR4CR Common Information Model 23 3.2 Mapping local semantic resources to the EHR4CR Common Information Model 24 3.3 Overview of the standardization process before the execution of any EHR4CR use case 24

3.3.1 EHR4CR Common Information Model (CIM) management 25 3.3.2 Central/local mapping management 28

4 Semantic interoperability resources 29 4.1 EHR4CR Common Information Model (mediation model) 29

4.1.1 What is the EHR4CR mediation model? 29 4.1.2 How is the EHR4CR mediation model built and maintained? 30 4.1.3 FHIR-based templates and data elements 31 4.1.4 Terminologies/Ontologies 34 4.1.5 Semantic Resource Repository 38

4.2 Tools and services 38 4.2.1 EHR4CR Common Information Model Editor (CIME) 39 4.2.2 EHR4CR Terminology Mapping Suite (TMS) 40 4.2.3 Structural mapping 49

5 EHR4CR semantic interoperability services (SIS) 54

5

5.1 Introduction and Scope 54 5.2 Service Definition Principles 54 5.3 Comparison of the SIS/CTS2 Service Functional Models 55 5.4 Structure of the SIS specification 55 5.5 Implementation Considerations 56

6 EHR4CR Clinical Data Warehouse 56 6.1 Introduction 56 6.2 ETL process and guidance for user acceptance testing 60 6.3 Mappings 64 6.4 The Dundee 200 test data 65 6.5 Future work 65

7 Evaluation of the semantic resources and services 65 7.1 Evaluation framework 66

7.1.1 The need of high quality query language and model 66 7.1.2 The need of high quality mediation model (patient data model) 66 7.1.3 The need of an efficient standardization pipeline within participants data providers 67

7.2 Results 67 7.2.1 Query model and language 69 7.2.2 Mediation model 70 7.2.3 Standardization pipeline for data providers 72

8 Conclusion 72 8.1 The EHR4CR semantic interoperability platform 72 8.2 Limits, related projects and perspectives 73 8.3 References 74

9 Appendix 76 9.1 List of clinical trials 76 9.2 Detailed Functional Model for each of Interface Semantic Interoperability Services (SIS) 81 9.3 Business Scenarios 81

9.3.1 Scenario A: cts2:CodeSystem 81 9.3.2 Scenario B: Scenarios about cts2:ValueSet 82 9.3.3 Scenario C.: Scenarios about hl7: Templates. 83 9.3.4 Detailed Functional Model for each Interface 86

9.4 Semantic services used by SDM/ODM editor 89 9.4.1 Introduction 89 9.4.2 Usage of a SDM-ODM container 89 9.4.3 SDM elements for patient recruitment 90 9.4.4 SDM-ODM extension for third party SDM-ODM designer 91 9.4.5 Global Definitions = protocol 91

6

1 Introduction

The EHR4CR (Electronic Health Records for Clinical Research) project aims to improve the efficiency and reduce the cost of conducting clinical trials, through better leveraging routinely collected clinical data in electronic healthcare records (EHRs) and using it at key points in trial design and execution life-cycle. The EHR4CR platform automates the reuse of EHR data stored in existing EHR systems or Clinical Data Warehouses (CDWs) and implements three use cases - protocol feasibility testing, patient identification and recruitment for clinical trials, supporting clinical trial execution and adverse event reporting.

Figure 1. EHR4CR services for reusing EHR data during key points in trial design and execution life-cycle: protocol

feasibility services (PFS), patient identification and recruitment (PIR) and clinical trial execution and serious adverse reporting (CTE).

The EHR4CR platform contributes to the automation of the clinical research process from the design of the protocol until the submission of the data to the regulatory agencies (see Figure 1). During this process the protocols and case report forms are key documents that are becoming more available in electronic formats in compliance with the relevant CDISC standards in both pharma companies and university research centers.

Figure 2. Automation and standardization of the clinical research process

7

EHR4CR services are demonstrated by 11 pilot hospitals in 5 European countries (see Figure 3).

Figure 3. EHR4CR services demonstrated by 10 clinical research (EFPIA) & 11 hospital pilot sites

1.1 EHR4CR platform & use cases overview

The EHR4CR platform is a loosely coupled service platform, which orchestrates independent services. The EHR4CR architecture designed in WP3 (WP 3: Architecture and Integration) defines how the tools and services of WP4 (Semantic interoperability), WP5 (Data Protection, Privacy & Security) and WP6 (end-user Platform Services) integrate. Table 1 describes the WP6 services required to support the four pilot scenarios. The end-user WP6 services are built upon services defined in WP3-5 to properly access EHR/CDW systems.

Table 1. WP6 service tools support the three pilot scenarios - protocol feasibility, patient identification and recruitment, clinical trial execution and serious adverse event reporting. These services are built upon the services

defined in WP3-5 to properly access the existing EHR/CDW systems.

Use cases Description Services

WP6 1- Protocol feasibility (PFS)

Leverage clinical data to design viable trial protocols and estimate recruitment

Distributed queries over heterogeneous EHRs or CDWs

2-Patient identification & recruitment (PIR)

Detect patients eligible for trials and better utilize recruitment potential

Distributed queries over heterogeneous EHRs or CDWs Workflow execution

3-Clinical trial execution and serious adverse event reporting (CTE)

Optimize clinical trial execution

Workflow execution Pre-population of forms (distributed queries over heterogeneous EHRs or CDWs)

8

Re-use of clinical data to pre-populate eCRFs and adverse event reporting forms

WP4 Semantic Interoperability Services (Resources & Terminology) (SIS)

WP5 Access policy, pseudonymization/de-identification, patient content services

1.2 EHR4CR Semantic Interoperability Services overview

In this context the objective of the Semantic Interoperability Services provided by WP4 is to allow:

Clinicians in hospitals (data providers of of the EHR4CR network), while using their own words, to simultaneously utilize the most appropriate reference codes for meaningful re-use of routinely collected clinical data in electronic healthcare records (EHRs) in the context of clinical research conducted at an international level.

Investigators of the EHR4CR network to use a semantically-enabled platform to efficiently perform sophisticated web searches across European hospitals, to find clinically relevant results that can help improve clinical research.

The clinical terms normally used by clinicians are usually mapped to – often local - coding terminologies used locally for care coordination and secondary use of the clinical content. These local coding terminologies do not necessarily match with international administrative and clinical reference terminologies – such as ICD-10-CM, SNOMED CT®, LOINC®, ATC, etc. – used within the EHR4CR European network. The aim is that clinicians in hospitals can go on capturing, storing and searching their clinical content according to local terminologies while providing to the EHR4CR users a cross-border access to this important clinical information according to international reference terminologies. In addition to maintaining a wide range of curated semantic resources (healthcare template/data elements/value sets and terminologies) the EHR4CR semantic interoperability platform also created tools and services to support the mapping between local terminologies used in the hospitals and reference terminologies used in EHR4CR queries.

9

1.3 Objective of the deliverable

The objective of D4.4 Final specification of EHR4CR semantic interoperability solutions (M48) is to describe the final implementation of the EHR4CR semantic interoperability services (SIS) provided to address the needs of any specific project intending to use any services defined as part of the three EHR4CR use case (PFS, PRS or CTE). D4.4 describes the result of the activities executed during Task 4.3, Task 4.4 and Task 4.5.

Extension of Task 4.3 (Terminologies Services and Tools) to the context of CTE

Task 4.4 (Knowledge Models Mapping and Management Services and Tools) consisting in the design and implementation of tools supporting structural and terminological mapping between the EHR4CR Common Information Model and the models used in EHR/CDW systems and clinical research systems.

Task 4.5 (Knowledge Authoring) consisting in the design and implementation of tools supporting the creation and management of EHR4CR semantic resources

1.3.1 Outline of the deliverable

Chapter 2 describes the governance of the semantic interoperability platform, the adopted approach and overall specifications. Chapter 3 describes the EHR4CR standardization pipeline needed to fulfill the requirements of the EHR4CR uses cases (PFS, PRS or CTE). The process, responsibilities and user roles - on both clinical research and hospital side – are defined. The information technology infrastructure – semantic resources and tools - developed to support the different actors is presented. Chapter 4 describes the EHR4CR semantic interoperability services (SIS) used during both the set up and execution phases of EHR4CR use case (PFS, PRS or CTE).

Chapter 5 describes first evaluation of the EHR4CR standardization pipeline and semantic interoperability services (SIS)

Chapter 6 describes the design and implementation of the EHR4CR Clinical Data Warehouse and of the extract-transform-load (ETL) process adopted for CDW population.

Chapter 7 provides the conclusive statements of the deliverable.

1.3.2 WP4 deliverables interdependencies

D4.1 Inventory of information and knowledge models and Definition of EHR4CR Information Models (M12) describes activities executed during Task 4.1: Inventory of information and knowledge models - inventory is based on the information systems at the pilot sites, EFPIA partner preferred solutions and products and standards relevant to the domain – and Task 4.2: Definition of EHR4CR Information Models – Specification of EHR4CR standard representation for clinical data : knowledge models, core dataset, template and archetype registry/repository

10

D4.2 Design and implementation of semantic interoperability tools for PFS and PRS (M24) describes activities executed during Task 4.3: Terminologies Services and Tools – specification of minimal services (with limited tools support) for working with multiple terminologies (including reference terminologies and relevant ontologies) : services for managing harmonized collection of terminologies in use across clinical trials and EHR systems, for terminology translation services (cross-mappings dealing with multi-lingual resources) D4.3 Report on authored knowledge models (M36) resulting from activities executed during Task 4.5: Knowledge Authoring consisting in authoring the specific clinical knowledge models required for all four EHR4CR scenarios across the selected disease areas.

D4.4 Final specification of EHR4CR semantic interoperability solutions (M48) is built on top of D4.1, D4.2 and D4.3 describing models and services based on the requirements of PFS, PRS and CTE use cases. D4.4 extends the scope of previous deliverables by describing the final EHR4CR normalization and the specification of additional services addressing the requirements of CTE use case (use case 3).

1.4 Reference documents

1.4.1 Reference Documents

No. Name of Document Author Date 1 EHR4CR_Protocol_Feasibility_SRS_v1.0 (freeze

candidate) T.Karakoyun, W.Kuchinke, C.Ohmann, C.Krauth

December 16, 2011

2 EHR4CR Subject Recruitment SRS_v1.2 T.Karakoyun, C.Krauth, M.Eckert, B.Braasch, B.Trinczek

November 16, 2013

3 EHR4CR_Trial_Execution_SRS_v1.1 T.Karakoyun, C.Krauth, M.Eckert November 26, 2013

4 CT2

5 ISO 11179

6 ISO 21090

7 OHDSI - OMOP

1.4.2 Developed Documents

No. Name of Document Author Date 1 Terminology Mapping Editor_SRS E.Sadou, S.Hussain

2 Semantic Interoperability Services (SIS)_SRS D.Ouagne, E.Sadou

3 Local Workbench SDM extension_SRS M.Neukum July 22, 2014

4 ODM-SDM Editor Extension_SRS M.Neukum, D.Ouagne, E.Sadou September 27, 2014

1.5 Definitions and acronyms

1.5.1 Definitions

The following definitions of key vocabulary terms used in the deliverable. Most of the definitions come from documents provided by organizations contributing to international efforts in the domain of semantic interoperability such as the HL7 Version 3 Standard: Common Terminology Services HL7 (Draft Standard for Trial Use - DSTU Release 2 October 2009), Semantic Health Net (SHN) European network of Excellence (SHN), ISO 21090, ISO 11179.

Table 2. List of Definitions

11

Term Definition and source

Information model

[SHN] Semantic artifact providing information structures, relationships, and constraints to represent data. The meaning they convey relies on the intuitive and common-sense understanding of natural language labels and descriptions, not a priori referring to any ontological foundation.

Note Note: In the healthcare domain, several decade-long, large-scale efforts of different standard definition organizations (SDOs) have focused on specifying both the syntax and the semantics of patient clinical information. Information models in the domain of patient care. The HL7 or EN 13606 standards define the semantics of meta-structures and stated the need for “layers of semantic expressiveness” including: i) generic reference information models EN ISO 13606-1, openEHR

Reference Model, HL7 Reference Information Model (RIM) or FHIR resources;

ii) more detailed meta-data models like CEN/ISO 13606 Archetypes/Templates*, HL7’s Detailed Clinical Models (DCMs) or HL7 FHIR resources and profiles that instantiate generic reference models and are tailored to the needs of structured data acquisition. It is important to note that these models require associated robust data element model* such as that defined by ISO 111791 and data type model such as that defined by ISO 21090

iii) terminology models* such as ICD or SNOMED-CT. Information models in the domain of clinical research: the Clinical Data Information Standards Committee (CDISC) non-profit organization has developed a number of standards for study design (SDM), study data collection, study data analysis (ADAM), and submission to the regulatory bodies (SDTM).

In the EHR4CR project

The semantic interoperability platform mediates different particular information models

Source information models of information systems used in hospital sites to collect clinical data during patient care (Electronic Healthcare Records (EHRs)) or to process it for secondary use (Clinical Data Warehouses (CDWs)).

Target information models of information systems (Clinical Trial Management Systems (CTMS), Clinical Data Management Systems (CDMS), Electronic Data Capture (EDC) systems) used in clinical research sites to collect information of clinical trials including clinical data from participating hospital sites.

Terminology model

Semantic artifact providing the description of domain entities. Terminology models vary from simple lists, hierarchies, multi-axial systems to ontologies. In this document, consistently with CTS2, we will use the term of code system (or terminology or vocabulary) to designate any terminology model (including ontologies) and will adopt the SHN definition of ontologies.

Code system [CTS2] Managed collection of concept identifiers, usually codes, but

1

Home Page for ISO/IEC 11179 Information Technology -- Metadata registries. ISO/IEC 11179,

Information Technology -- Metadata registries (MDR). 2013. http://metadata-standards.org/11179/

[accessed 09/24/15]

12

(terminology, vocabulary)

sometimes more complex sets of rules and references. They are often described as collections of uniquely identifiable concepts with associated representations, designations, associations, and meanings. Examples of Code Systems include ICD-10, SNOMED CT, LOINC, etc. To meet the requirements of a Code System as defined by HL7, a given concept representation must resolve to one and only one meaning within the Code System. In the CTS2 terminology model, a Code System is represented by the Code System class.

Ontology [SHN] Semantic artifact formally describing properties and relations of types of entities. Domain-independent categories, relations and axioms are typically provided by top-level ontologies, whereas the types of things specific to a domain of interest are represented by domain ontologies.

Mediation Model

[SHN] Semantic artifact used to mediate different particular information models implemented in different settings. A mediation model can be formally described within information models (e.g. HL7 FHIR resources, HL7 RIM-based models, openEHR, EN ISO 13606, etc.) or within an ontological framework, which is mostly independent of the specific layout of information models.


The EHR4CR Common Information Model (CIM) consists in a integrative semantic abstraction providing a homogeneous view that enables to mediate across heterogeneous particular patient-centric information models implemented in both hospital (source information models) and clinical research (target information models) sites for conducting clinical research studies. The EHRCR Common Information Model (CIM) consists of a set of semantic resources used for clinical data standardization and query specification.

Semantic resource

Any element of an information model or a terminology model used to represent the meaning of health information stored, shared, exchanged and/or processed by information systems.


The common EHR4CR semantic resources consist in a shared set of templates and data elements with their associated value sets and concepts that enables to mediate across heterogeneous representations of patient-centric health information. The common EHR4CR semantic resources are stored and maintained in a metadata registry framework extending the ISO/IEC 11179 and are accessed through standardized interfaces: EHR4CR semantic interoperability services (SIS).

Template Any detailed meta-data models that instantiate any generic reference meta-data model (such as EN ISO 13606-1, openEHR Reference Model, HL7 RIM, HL7 FHIR resources) and is tailored to the needs of structured data acquisition in a specific context

Note HL7’s Clinical Document Architecture (CDA) is a high level template defining the structure and generic content for any type of clinical document. The derived Continuity of Care Document (CCD)) applied to the CDA schema is defined to produce a desired level of information structure and content for a particular purpose – a particular type of document.


EHR4CR Common Elementary Templates (CETs) consists in a set of FHIR-based computable elementary information models of elements used for

13

query specification.

Data element (meta data)

[ISO11179] Any atomic element of an information model (generic information model or template) can be considered a data element.

Note The NCI has developed the Cancer Data Standards Repository (caDSR) initiative to standardize common data elements used in cancer research. Similarly CDISC has developed CDASH in order to represent a minimum set of core data elements defined across all research studies. CDISC Shared Health and Research Electronic Library (CSHARE) aims at building a global, accessible electronic library, which enables data element definitions beyond the scope of CDASH. NCI caDSR, CSHARE utilize the ISO/IEC 11179 standard as the semantic basis for the metadata repository (MDR) of common data elements.


EHR4CR Common Data Elements (CDEs) consists in granular, computable elementary information models of elements used for query specification.

Data type [ISO21090]

Concept [CTS2] Unitary mental representation of a real or abstract thing; an atomic unit of thought. It should be unique in a given Code System. Concepts as abstract, designation-independent representations of meaning are important for the design and interpretation of information models. Terminology best practices dictate that concepts are not deleted from code systems, but are instead deprecated or retired from use, although nothing in the model prevents this. A concept may have synonyms in terms of textual representation (as Terms or Designations).

Note Concepts may be simple or compositional in nature. A compositional concept is one that contains more than one concept concatenated within it. Example: in SNOMED CT, the concept of “Malignant tumor of breast «is a combination of “Malignant neoplasm of primary, secondary, or uncertain origin” (morphologic abnormality) and the Finding site (attribute) Breast structure (body structure).

Term (Designation)

[CTS2] Representations of concepts. The designation identifier must uniquely map to a given text string, bitmap, etc. within the context of the containing concept. In some terminologies, every unique text string will have exactly one identifier, which means that the same identifier may occur under more than one concept. In other terminologies, there may be more than one identifier for a given text string, meaning that the identifier is unique to the concept. Service software must not assume either model. Example: in SNOMED CT, the concept of “fever” has the fully specified name of “fever (finding),” a preferred name of “fever,” and synonyms of “febrile” and “pyrexia.” These are all designations for the concept of “fever.”

Concept domain

[CTS2] Named category of like concepts (a semantic type) that will be bound to one or more attributes in a static model whose data types are coded. Concept domains exist to constrain the intent of the coded element while deferring the association of the element to a specific coded terminology until later in the model development process. Thus, concept domains are independent of any specific vocabulary or code system. Example: Concept domains represent an abstract conceptual space such as "countries of the world", "the gender of a person used for administrative purposes", “languages of the world”, etc.

Mapping (Association)

[CTS2] Binary relationships or linkages between concepts. An association links a source concept to a target concept, often implying a direction of the

14

association from a source to a target. There is a separate concept version level representation to identify the associations supported within a specific version of a code system.

Value Set [CTS2] Uniquely identifiable set of valid concept representations, where any concept representation can be tested to determine whether or not it is a member of the value set. Value set complexity may range from a simple flat list of concept codes drawn from a single code system, to an unbounded hierarchical set of possibly post-coordinated expressions drawn from multiple code systems. A value set has a definition, which describes a set of codes referencing a collection of unique concept identifiers, and can be resolved to an expansion, which is a set of concept designations defined by the concept identifier. The collection of unique identifiers referenced by a value set is drawn from one or more code systems. Each of these identifiers is represented by a code.

Binding realm [CTS2] In HL7, all model instances must declare a particular binding realm based on the jurisdiction from which they originate, for which they are destined, or for some third jurisdiction by site-specific agreement. The declared binding realm applies to the entire model or specification artifact: it is not specific to individual elements of that model or artifact. A binding realm has a unique code and a steward. The name of the Binding Realm is carried in the model instance. In the interest of maximizing interoperability, interoperability spaces should be as large as possible: binding realms are preferred to be large-grained. A binding realm is used to provide and manage the bindings of value sets to reflect rules within a conformance space—e.g., a country.

Jurisdictional Domain

Anybody that may define and manage its own code systems or concepts, including localization of a broader code system. It is specifically to allow for localization of certain concept elements. A Jurisdictional Domain could be a country, group of countries, a territory (e.g. state), an SDO, an individual organization or even department within an organization. A Binding Realm is represented as a Jurisdictional Domain in the CTS2 model. Jurisdictional Domain is intended to encompass the HL7 concept of Realm, however is broader in scope than an HL7 Realm. HL7 rules prohibit new codes being added to a code system locally, but do allow for additional concept relationships, concept properties and designations. (However, any organization could use the same model, interfaces etc. to define its own code systems for internal use.) This class provides the link to those classes to enable the localization to be recorded and managed.

1.5.2 Acronyms

Table 3. List of Abbreviations and Acronyms

Abbreviation/ Acronym

DEFINITION

13606 CEN/ISO 13606 EHR-Communication standard

ADE Adverse Drug Event

ATC Anatomical Therapeutic Chemical

CCD Continuity of Care Document

CDA Clinical Document Architecture

15

CDE Common Data Element

CDW Clinical Data Warehouse

CEN European Committee for Standardization

CET Common Element Template

E2B (R2) ICH message standard based on HL7 for Individual Case Safety Reports

EHR Electronic Health Record

EMA European Medicines Agency

FDA Food and Drug Administration

FHIR Fast Healthcare Interoperability Resources

HL7 Health Level Seven

ICD The International Classification of Diseases

ICSR Individual Case Safety Report

IHE Integrating the Healthcare Enterprise

MDR Metadata Registry

OMOP Observational Medical Outcomes Partnership

2 Semantic interoperability overall specification

Chapter 2 provides a summary of the semantic interoperability approach describes the governance of the EHR4CR semantic interoperability platform and provides an overview of the specifications.

2.1 Approach

The EHR4CR project developed a semantic interoperability platform providing a consistent integrative semantic abstraction on top of existing application representations that enables to mediate caross heterogeneous applications - Electronic health records (EHRs) and Clinical Data Warehouses (CDWs) – storing routinely collected clinical data at hospital sites.

2.1.1 What is a mediation model?

A mediation model provides a homogeneous view of the clinical data contained within disparate databases of data providers so that data users can access these data using a library of standard queries that have been written based on the mediation model.

2.1.2 Why do we need a mediation model?

Electronic health records (EHR) support insurance reimbursement processes and clinical practice at the point of care. Each has different logical organizations and physical formats, and the terminologies used to describe the clinical information conditions vary from source to source. Clinical Data Warehouses (CDWs) support secondary use of clinical data and allow users to generate evidence from a wide variety of sources and support collaborative research across data sources both within and outside the hospitals. Clinical Data Warehouses (CDWs) also implement various information models and terminology models. EHR4CR faces the challenge of improving semantic interoperability of clinical information in order to better leverage routinely collected clinical data in electronic healthcare records (EHRs) during the execution of clinical trials. The EHRCR Common Information Model (CIM) is a standard-based expressive and scalable mediation model*, allowing dynamic mappings between data structures and semantics for consistent interpretation of clinical data accessed from varying sources.

16

2.1.3 How to build the EHR4CR Common Information Model as mediation model?

Our approach is based on the realistic assumption that the co-existence between several standard semantic artefacts - namely information models (e.g. EN ISO 13606 information model and archetypes, openEHR, HL7 RIM, C-CDA and FHIR specifications, CDISC ODM, etc.) and terminologies/ontologies (e.g. LOINC, ATC, SNOMED CT, etc.) – as well as proprietary implementations for representing the content of health information in systems (EHR systems, CDWs, CTMS, EDC systems, etc.) will endure. Therefore achieving broad-based, scalable and computable semantic interoperability across multiple domains and systems requires a consistent use of multiple standards, clinical information models and terminology models. The EHR4CR project provides a mediation model – the EHR4CR Common Information Model consisting in a set of multilingual semantic resources based on multiple standards (see section about the EHR4CR Common Information Model). The EHR4CR project also proposes a standardization process allowing disparate information models and coding systems of participant sites to be harmonized to a standardized model and standard terminologies. Once hospital CDWs/EHRs are connected to the EHR4CR platform and source information models mapped to the EHR4CR Common Information Model, distributed queries can be specified based on the EHR4CR Common Information Model and executed over heterogeneous sources. Routinely collected clinical data can be used at different key points in trial design and execution life-cycle.

2.2 Governance

A governance body – the Semantic Interoperability Board - establishes the rules for the standardization process of health information within the jurisdictional domain of the EHR4CR network (described here as the Board2). The Board is in charge of the definition and efficient execution of the standardization process of health information in order to fulfil the requirements of the EHR4CR uses cases3. The standardization process has to be useful for data users in addition to being manageable for data owners. Therefore the board is in charge of:

Providing a high quality mediation model (EHR4CR Common Information Model) used to mediate data integration from different sources EHRs/CDWs and ensuring that hospital sites (data providers) provide high quality structural and terminology mappings between local semantic resources (source information models of EHRs/CDWs and local terminologies) and this mediation model.

Supporting data users in clinical research sites to set up query specification in the context of the EHR4CR use cases. Users need to represent eligibility criteria and/or clinical items of clinical trials based on the mediation model (EHR4CR Common Information Model).

The Board defined the standardization process as well as the responsibilities and roles of the participating actors.

2.3 Overview of the specifications

A first version of the EHR4CR semantic Interoperability platform has been designed and implemented to support the different actors in accomplishing their tasks within the standardization process and EHR4CR use case execution.

2 This board itself relies on the European Institute for Innovation through Health Data.

3 Three EHR4CR use case (PFS, PRS or CTE), and potentially for other services defined in the future

17

Tools and services are used for i) authoring and maintaining EHR4CR shared semantic resources and ii) supporting the definition of query specifications in the context of the EHR4CR use cases. The various use cases addressed by the EHR4CR project can be grouped into two high-level functional categories (see Table 4): patient identification based on pre-defined eligibility criteria (use cases 1 & 2) and extraction of patient-specific data for pre-populating individual forms of a research protocol (use cases 3).

2.3.1 Semantic interoperability requirements for patient identification based on eligibility criteria (use case 1 & 2)

An investigator wants to identify patients, based on a set of eligibility criteria, in different healthcare facilities based on predefined inclusion/exclusion criteria.

Scenario 1: Protocol Feasibility Service (PFS) In the context of feasibility studies, the investigator runs the queries on a central workbench. The EHR4CR queries return aggregated data (counts and percentages) that might be cross-tabulated by a number of key eligibility criteria. Data will be returned only if counts are sufficiently large to protect privacy (see Figure 4).

Scenario 2: Patient Recruitment Service (PRS) In the context of patient recruitment, once the clinical trial is set up (approvals obtained, clinical investigators recruited and contract completed), the investigator runs the queries on a local workbench. The EHR4CR queries return only a pseudonymized list of eligible patients. Based on local knowledge, the investigator may delete individuals from the list. Only treating physicians have access to a re-identified list of patients and may, when appropriate, invite patients to participate in the trial. No individual patient level data would be returned to the organization conducting the clinical research prior to patient consent.

Patient-centric data can be accessed through a number of disparate CDWs whose source information models* have to be aligned with common templates* and data elements* of the EHR4CR Common Information Model* in which the inclusion/exclusion criteria (target information models*) are also expressed. During the set up phase:

o The user accesses the workbench of the EHR4CR platform to represent a list of free text eligibility criteria as query specifications based on the Common Information Model (CIM) and to execute the queries on the CDWs of the hospital sites. Semantic interoperability services (SIS) are used at the workbench, by the query builder of the EHR4CR platform to access the common EHR4CR Semantic Resources (templates, data elements, value sets, terminologies) in order to represent eligibility criteria using common templates (e.g. observations, procedures, medication statement, etc.) combined with Boolean operators and temporal constraints according to the query model.

o If needed, the user may require the creation of missing semantic resources (missing relevant templates for representing eligibility criteria). The Common Information Model (CIM) Editor is used to update the mediation model and the Terminology Mapping Editor (TME) is used to map local terminologies to central terminologies used during the authoring of the new templates and data elements.

During the execution phase, semantic interoperability services (SIS) are used at the endpoint of the EHR4CR platform to access terminology mappings.

18

Figure 4. Need of semantic interoperability for use case 1 (Protocol Feasibility) & 2 (Patient Recruitment)

2.3.2 Semantic interoperability requirements for data extraction and form pre-population (use case 3)

An investigator at a clinical research site wants to pre-populate clinical research or patient safety forms using patient data resident in a number of disparate EHRs or EHR extracts4. Query specifications are derived from the content of the form and executed against an EHR or EHR extract to pre-populate an instance of the defined form. The EHR4CR queries are run only if the patient gave his full informed consent for participating to the clinical trial and for the extraction of data from his/her EHR.

Scenario 3: Clinical Trial execution (EHR data extraction and form pre-population) In the context of electronic data capture, during a visit, the clinical investigator opens the eCRF which is automatically pre-populated by extracted data. All data extracted from the EHR are human validated before the eCRF is completed and finally submitted to the Clinical Research Organization (CRO) managing the data collection of the clinical research (see Figure 5). In the context of Adverse Drug Reaction (ADR) reporting, when a clinician documents symptoms, findings or results those are suggestive of a serious adverse drug reaction, the EHR4CR platform prompts the clinician to complete a patient safety form which is automatically pre-populated by extracted data. All data automatically extracted from the EHR are human validated before the form is completed and finally submitted to the sponsor, CRO or regulatory agency (see Figure 5). The IHE Structured Data Capture (SDC)5 profile utilizes the IHE Retrieve Form for Data Capture (RFD) profile for retrieving and submitting forms in a standardized and structured format. In

4 EHR extracts can be considered as eSources

5 based on the work of the US Office of the National Coordinator for Health Information Technology, Standards &

Interoperability (S&I) Framework SDC Initiative. The IHE SDC profile consists of four new standards that enable EHRs to capture and store structured data: i) a standard for the Common Data Elements used to fill the specified forms or templates; ii) a standard for the structure or design of the form or template (container); iii) a standard for how EHRs

19

traditional RFD, form pre-population is done by a Form Manager system, such as a research electronic data capture (EDC) system, using data exported from the EHR. The IHE SDC profile introduces a second mode called of auto-population, where the EHR applies data directly to the form. In this approach, the data element definitions within the form are interpreted by the EHR system, and corresponding instance data are retrieved from the EHR database and applied to the form. SDC adds the concepts of i) a forms repository and the option of persistent forms based on the emerging ISO/IEC 19763-13 Meta-model for Framework Interoperability (MFI) form compliance model and ii) an ISO-IEC 11179 Metadata Registries for the Definition of Common Data Elements. This profile also supports optional use of IHE Data Element Exchange (DEX) Profile for auto-populating and prepopulating forms.

Figure 5. Different steps of the Clinical Trial execution. Actors of the IHE Structured Data Capture (SDC) profile (Form Filler, Form Manager, Form Receiver, and Form Archiver) exchange standard-based transactions. Semantic interoperability services (in green) support the process.

The different steps of the Clinical Trial execution use case are: 1. Study Manager (SM) creates a study in CDISC SDM format 2. SM creates or refines queries 3. The EDC manager imports the CDISC ODM file created by the sponsor, uses annotation tools to annotate eCRF template data element the with the Central Date Element Repository and save it in the central workbench database as a study attribute 4. SM publishes for interest to a list of hospitals 5. Data Relation Manager (DRM) analyses eCRF template (with eCRF template visualization tool) and patient recruitment queries, then gives a participation status (accept or decline)

interact with the form or template; iv) a standard to enable these forms or templates to auto-populate with data extracted from the existing EHR.

20

6. DRM and Investigator setup the local service repository and the EDC system that will be used for the eCRF prepopulation, they update mapping to match local terminology (using local mapping tool) and submit the new eCRF template to the central EDC manager with a representative dataset and all relevant information (transformation, translation, transcoding, calculation,…) 7. EDC Manager checks the site specific mapping and dataset. If he is not satisfied with it, he returns it with comments, in this case, DRM and Investigator redo step 6 until the mapping is approved 8. When the mapping has been approved for a site, the SM can send “ready to go” to this site 9. Patient recruitment can start 10. CTE take place. For each visit of a recruited patient, an ODM file is generated and imported using the eCRF import tool to prepopulate it. The result is checked by the investigator 11. Using the dashboard on the central workbench, SM can follow the course of the clinical trial

Figure 6. The interaction diagram of the IHE Structured Data Capture (SDC) profile

Semantic interoperability requirements and overall specification The model of the eCRF or adverse event reporting form (target Clinical Information Models*) has to be aligned with the mediation model (EHR4CR Common Information Model) so that query specifications based on the mediation model can be specified and run on disparate EHRs whose source Information Models* have also been aligned with the mediation model. During the set up phase:

o The CDSIC SDM-ODM file of the clinical trial, used to generate the specification of eCRFs or AE reporting forms, needs to be annotated with EHR4CR Common Data Elements using the SDM-ODM editor extension. Semantic interoperability services (SIS) are used during the annotation process to access the common EHR4CR

21

Semantic Resources (templates, data elements, value sets, terminologies) in order to represent eligibility criteria.

o The user accesses the EHR4CR platform to upload the CDSIC SDM-ODM annotated file. Therefore, the CDSIC SDM-ODM annotated file can be used for query specification during the execution of the auto-population step of the IHE SDC profile.

o If needed, the user may require the creation of missing semantic resources (missing relevant templates for representing eligibility criteria). The Common Information Model (CIM) Editor is used to update the mediation model and the Terminology Mapping Editor (TME) is used to map local terminologies to central terminologies used during the authoring of the new templates and data elements.

During the execution phase, semantic interoperability services (SIS) are used at the endpoint of the EHR4CR platform to access terminology mappings.

Figure 7. The need of semantic interoperability for use case 3 (clinical trial execution)

A set of tools – Common Information Model (CIM) Editor, Terminology mapping Editor (TME), Local Workbench SDM extension, SDM-ODM editor extension and semantic interoperability services (SIS) have been designed and developed for:

Providing a set the shared EHR4CR semantic resources of the mediation model (EHR4CR Common Information Model) and ensuring that hospital pilot sites provide a high quality structural and terminology mappings of the local semantic resources to the mediation model.

Supporting users in clinical research sites to set up query specification in the context of the EHR4CR use cases.

22

Figure 8. EHR4CR Semantic Interoperability platform: a set of EHR4CR Semantic Resources and Semantic

Interoperability Services (SIS) are used during EHR4CR use case execution.

Table 4 summarizes the use of tools and semantic interoperability services (SIS) of the Semantic Interoperability platform during the set up and execution phases of the three EHR4CR use cases. Table 4. Use of tools and semantic interoperability services (SIS) of the Semantic Interoperability platform

Phases Protocol Feasibility (PFS)

Patient Recruitment (PRS)

Clinical Trial Execution (CTE) /Adverse event Reporting (ADR)

Prerequisite Feasibility study protocol

Clinical trial protocol (+/- CDISC SDM-ODM file)

Setup phase of the use

case

Manual pre-processing of free text eligibility criteria

(optional step) Uploading the CDISC SDM file including free text eligibility criteria using the [Local Workbench SDM extension] Manual pre-processing of free text eligibility criteria

Semantic annotation of CDISC SDM-ODM files using the [SDM-ODM editor extension] and semantic interoperability services [SIS]

Updating the EHR4CR Resources to cover the scope of the clinical trial: Data element creation/update using the [CIM Editor] + Terminology mapping &

validation using [TME]

Query specification (workbench) using [SIS] Query specification using [SIS]

Execution phase of the

Query execution (endpoint) using [SIS] Query refinement & execution (workbench &

Query execution (auto population transaction)

23

use case endpoint) using [SIS] using [SIS]

Outcome of the use case

Get the potential number of patients per site Identification of the potential sites to start a recruitment protocol

Contact Sites, if agreed, start recruitment Screening and update of the patient count

(Contact Sites If agreed, start recruitment), Real time execution of auto-population of eCRF or AD reporting form per patient/per visit. Source Data Verification by investigator.

3 Standardization pipeline

Chapter 3 describes the EHR4CR standardization process needed to address the semantic interoperability requirements for EHR4CR use case execution. The aim of this process is the standardization of structured EHR data into a unified, concept-based model. The workflow of the standardization process has been defined as well as the responsibilities and roles of the participating actors. This chapter also provides the use case diagram of the semantic interoperability platform supporting the different actors - on both clinical research and hospital side –in accomplishing their tasks to during the standardization process. The EHR4CR standardization process includes: i) the management of the mediation model (EHR4CR Common Information Model), ii) the structural and terminology mappings between local semantic resources (source information models of EHRs/CDWs and local terminologies) and this mediation model. The structural and terminology mappings is used in two

3.1 Managing the EHR4CR Common Information Model

The EHR4CR Common Information Model has been developed, and can be extended, through a global consensus-based development process6 based upon both i) eligibility criteria and data items defined across a given set of specific clinical trials (bottom up approach) and ii) standards reference clinical information models (top down approach). The EHR4CR Common Information Model is developed and evolved through repeated cycles using a "Learning by Doing" approach. The board is in charge of the maintenance and quality assurance of the EHR4CR Common Information Model. Based on an initial subset of semantic resources produced during the timeframe of the EHR4CR project, the system will be iterated to address the needs of any new specific research study or project intending to use any of the EHR4CR services (PFS, PRS or CTE), and any additionally-defined services. The board is in charge of managing and prioritizing the requests for authoring/curating semantic resources – templates, data elements, value sets, concepts and associations - that may be submitted to extend the existing set of resources. Requests require actions from various actors that hold roles and responsibilities based on the configured governance workflow and that shall be executed within a predefined timeframe.

6 Defined consistently with the governance principles defined by CDISC SHARE

24

3.2 Mapping local semantic resources to the EHR4CR Common Information Model

The standardization pipeline addresses the organization of the mappings required as part of the set up any specific project (clinical trial) intending to use any EHR4CR services (PFS, PRS or CTE) and potentially for other services defined in the future. During the timeframe of the EHR4CR project (EHR4CR CIM version 0.1, version 0.2 & version 0.3), mappings were developed manually through repeated cycles using a "Learning by Doing" approach. The board will support the efficiency and quality of mappings that are developed by hospital sites between their local semantic resources and the EHR4CR Common Information Model. It will provide some centralized resources to support the mapping process and its quality assurance, including standardized mappings, educational resources and data quality assessment tools. The board will also advice on the necessary expertise and roles required within hospital sites to manage those mappings. The hospital itself will ultimately be responsible for the efficiency and quality of the mappings it defines and implements. The EHR4CR Service Provider and each hospital will negotiate and agree the point at which the mappings and their implementation are sufficiently satisfactory to enable platform services to become operational.

3.3 Overview of the standardization process before the execution of any EHR4CR use case

Figure 9. Standardization process

For each new study (clinical trial), authorized users of the EHR4CR platform (trial sponsors, hospitals, EHR4CR board) (the requester) submit the feasibility study (PFS) or clinical trial (PRS, CTE) protocol and the corresponding list of pre-processed eligibility criteria and data items. In collaboration with the requester7, the board identifies the need of the creation of new semantic resources - templates, value sets, concepts, and terms – in order to cover the scope of the use case and assigns each request to a reviewer(s). A Reviewer may add information to the request and can take action to approve or reject each of the new semantic resources created to fulfill the request. The resources are created under the responsibility of the board who assigns

7 Ideally involving Investigators/Data managers in hospital sites

Validate semantic resource (hospital perspective)

25

each request to Author/Curator(s) 8 responsible of developing the semantic resources corresponding to the request. A semantic resource is not published until the request has received the approval from the Reviewer. The board is in charge of following that the requests are addressed on time. As soon as a request is fulfilled and the corresponding new semantic resources - templates, value sets, concepts, terms – are available centrally (as part of the EHR4CR Common Information Model), the study sponsor and hospital sites of the network – especially those involved in the study - are notified in order to validate the resources and to check that the central terms used in the resources are mapped to their local terms. The board is responsible for ensuring that each hospital site involved in the study has been notified of the mappings that need to be developed in the context of the clinical trial. For each given project (clinical trial), the hospital site is in charge of identifying mappings that need to be authored/curated and assigns to Author/Curator(s) the developing of these mappings. The mappings are not used until the request has received all necessary approval.

3.3.1 EHR4CR Common Information Model (CIM) management

The board defines the responsibilities and roles of the different actors using or in charge of the maintenance and quality assurance of the EHR4CR Common Information Model: semantic resource user, administrator, provider, author/curator, translator, reviewer, mapper and semanic enabled application developer. A Resource User is an actor such as a subject matter expert or terminologist. Resource User activities include, but are not limited to, querying for specific resources and browsing or comparing resources. Specializations of the Resource User actors are defined in the figure and table below. Depending on the kind of resources (terminology, value set, template/data element, mapping) or tasks (administration, authoring/curation, mapping, translation) one person can hold more than one role.

8 Investigators/Data managers in hospitals are probably best candidates for authoring data elements

26

Figure 10. Use cases of the semantic resource management

Actor (Roles) Description/ Responsibilities Tool Organization

EHR4CR Semantic Resource Provider

The EHR4CR Semantic Resource Provider is the actor the individuals or organization

that is responsible for the development of the EHR4CR semantic resources (including external resources templates (HL7 models, CEN 13606 archetypes, CDISC SDTM, CDISC

CDASH, NCI CDE (caDSR), UCUM), value sets (provided by HL7/IHE), code systems (ICD-

10, ATC, SNOMED CT, LOINC, PathLex, etc.))

EHR4CR Service Provider

Semantic EHR4CR Resource

Administrator

The EHR4CR Resource Administrator is an actor responsible for ensuring the availability and overall maintenance of the EHR4CR semantic services. This includes, but is not limited to loading content into the server, and making available the required functionality to address the specific needs of users.

EHR4CR Service Provider

EHR4CR Semantic Resource Author /

Curator

The EHR4CR Resource Author / Curator is an actor such as a subject matter expert or terminologist who is responsible maintaining EHR4CR semantic resources

CIM Editor (used manually

through the user interface

EHR4CR Service Provider or

Hospital sites

27

including but not limited to, the development of new resources – templates, data elements, value sets, concepts and associations - that may be submitted to the Resource Provider as an extension of an existing set of resources. According to the type of resource that is authored/curated, we distinguish: o A Terminology Author / Curator is an

actor who is responsible maintaining a new terminology content or the extension of an existing terminology with local concepts. Terminology Authors / Curators may not necessarily belong to the Terminology Provider's organization.

o A Value set Author / Curator is an actor with specific domain knowledge, as well as expertise in controlled terminologies who develop s and maintains domain-or application-specific terminology value sets.

o A Template/Data Element Author / Curator is an actor with specific domain knowledge, as well as expertise in information models and controlled terminologies who develops and maintains domain-or application-specific templates and data elements.

or import of txt or csv files)

EHR4CR Semantic Resource

Reviewer/Validator

The EHR4CR Semantic Resource Reviewer/Validator is an actor who is responsible of investigating requests and of validating new semantic resources. A reviewer may add information to the request and can take action to approve or reject the request.

CIM Editor EHR4CR Service Provider or

Hospital sites

EHR4CR Semantic Resource Human

Language Translator

Terminology Human Language Translator A Terminology Human Language Translator is an actor with domain knowledge who is also familiar with the languages and dialects which they are responsible for translating.

CIM Editor EHR4CR Service Provider or

Hospital sites

EHR4CR Terminology

Mapper

An EHR4CR Terminology Mapper is an actor (human or system) that is responsible for creating or maintaining specialized associations, or mappings between EHR4CR terminology and concepts from different code systems. An EHR4CR Terminology Mapper is in charge of validating and/or importing mappings provided by external provides (e.g. UMLS (mappings between SNOMED CT/MedDRA or SNOMED CT/ICD-

CIM Editor EHR4CR Service Provider

28

10 or SNOMED CT/NCI thesaurus, etc.)

Semantic Enabled Application Developer

A Semantic Enabled Application Developer is an actor who is responsible for the development of software applications that make explicit use of different types of semantic resources: templates, data elements, value sets, concepts. Semantic resources are used through standard semantic services specified in the dedicated section of the deliverable.

Semantic interoperability

services at EHR4CR

workbench

EHR4CR workbench at

pharma company/clinical

research units

All EHR4CR semantic resource may have different status: draft, cancelled, active

deprecated. At creation semantic resources are automatically published in Draft status. A

resource in Draft status may be cancelled or activated if the resource gets validation. A

resource in Active status may be deprecated.

3.3.2 Central/local mapping management

The Board also defines responsibilities and roles within hospital pilot sites of the EHR4CR network in the process of managing the mappings between local and the central terminologies used to mediate data integration in the EHR4CR project. Actor (Roles) Description/ Responsibilities Tool Organization

Hospital Semantic Resource Provider

The Hospital Provider is the actor the individuals or organization that is responsible for the development of the semantic resources (including imported external resources) at the hospital

Hospital Semantic Resource

Administrator

The Hospital Resource Administrator is an actor responsible for ensuring the availability and overall maintenance of the resource of the hospital repository. This includes, but is not limited to loading content into the server, and making available the required functionality to address the specific needs of users.

(local terminology

server)

Hospital site

Hospital Semantic Resource Author /

Curator

The Hospital Resource Author / Curator is an actor such as a subject matter expert or terminologist who is responsible maintaining semantic resources including but not limited to, the development of new resources – templates, data elements, value sets, concepts and associations - that may be submitted to the Resource Provider as an extension of an existing set of resources.

(local terminology

server)

Hospital site

Hospital Terminology Human Language

Translator

The Hospital Terminology Human Language Translator A Terminology Human Language Translator is an actor with domain knowledge who is also familiar with the languages and dialects which they are responsible for translating.

(local terminology

server)

Hospital site

29

Hospital Terminology Mapper

The Hospital Terminology Mapper is an actor (human or system) that is responsible for creating or maintaining specialized associations, or "mappings" between concepts from different code systems.

Terminology Mapping

Suite (TMS)

Hospital site

Semantic Resource Reviewer/Validator

The Semantic Resource Reviewer/Validator is an actor who is responsible of investigating requests and of validating new semantic resources (including mappings). A reviewer may add information to the request and can take action to approve or reject the request.

Terminology Mapping

Suite (TMS)

Hospital site

Semantic Enabled Application Developer

A Semantic Enabled Application Developer is an actor who is responsible for the development of software applications that make explicit use of different types of semantic resources: templates, data elements, value sets, concepts. Semantic resources are used through standard semantic services specified in the dedicated section of the deliverable.

Semantic interoperability services at Endpoint

EHR4CR endpoint at hospital site

4 Semantic interoperability resources

A first version of the semantic interoperability platform supporting the different actors in accomplishing their tasks within the standardization process has been developed. This platform consists in a mediation model and a set of tools. This section presents the EHR4CR Common Information Model (CIM) (mediation model).

4.1 EHR4CR Common Information Model (mediation model)

4.1.1 What is the EHR4CR mediation model?

The EHRCR Common Information Model (CIM) – mediation model – is an integrative semantic abstraction providing a homogeneous view that enables to mediate across heterogeneous particular information models implemented in both hospital (source information models) and clinical research (target information models) sites for conducting clinical research studies. This mediation model consists of a set of semantic resources 9 used for clinical data standardization and query specification. The common EHR4CR semantic resources consist in a shared set of templates and data elements with their associated value sets and concepts that enables to mediate across heterogeneous representations of patient-centric health information. The common EHR4CR semantic resources are stored and maintained in a metadata registry framework and are accessed through standardized interfaces: EHR4CR semantic interoperability services (SIS).

9 Any element of an information model or a terminology model used to represent the meaning of health

information stored, shared, exchanged and/or processed by information systems.

30

4.1.2 How is the EHR4CR mediation model built and maintained?

The EHR4CR Common Information Model (mediation model) has been developed, and can be extended, through a global consensus-based development process10 in order to cover the scope of both i) eligibility criteria and data items identified from a given set of specific clinical trials (bottom up approach) and ii) standards reference clinical information models (top down approach). The EHR4CR Common Information Model is developed and evolves through repeated cycles using a "Learning by Doing" approach. A first iteration of the development of the EHR4CR Common Information Model (version 0), based on a bottom up approach, started to cover the scope of 14 clinical trials selected to demonstrate the "Protocol Feasibility Services" (PFS) use case (EHR4CR CIM version 0.1), a second iteration covered the scope of 17 additional clinical trials selected to demonstrate the "Patient Recruitment Services" (PRS) use case (EHR4CR CIM version 0.2) and the third iteration covered the scope of 28 additional clinical trials selected to demonstrate the "Clinical Trial Execution" (CTE) implemented for (EHR4CR CIM version 0.3). Each new version of the EHR4CR Common Information Model has an extended scope and improved quality.

Table 5. List of version of the EHR4CR Common Information Model (mediation model)

Version 0.1 Version 0.2 Version 0.3 Version 1.0

Scope

Approach BOTTOM UP (Use case and clinical trials driven - see the list in

Appendix)

BOTTOM UP + TOP DOWN (Cross project harmonization)

11

Content PFS (n=14) PFS+PRS (n=14+17=31)

PFS+PRS+CTE (n=14+17+28=59)

From additional pharma/hospital CT From Models (OMOP, CDISC SHARE) data elements

Semantic classes/categories

Demographics, Conditions, Diagnosis, Medications,

Vital Signs, Results (lab, anatomic pathology), Procedures

HL7 CCD sections & UMLS semantic

types

Cross project harmonization

Common Element Templates

Observation Substance administration Procedure

Patient Encounter Observation Medication statement Procedure

FHIR resources Patient Encounter Condition Observation Medication statement Procedure


Terminologies/Ontologies

ICD, SNOMED, LOINC, PathLex, ATC ICD, SNOMED, LOINC, ATC, ICD-O, Pubcan, TNM, PathLex


10

Defined consistently with the governance principles defined by CDISC SHARE 11

A cross project harmonization is being initiated as part of the Semantic Health Net Initiative. The

Semantic Inter Operability task consists in identifying common fragments from models used in different

European projects e.g EHR4CR, ePSOS, EXPAND.

31

Mappings between reference terminologies from UMLS (CUI)

Mapping between SNOMEDCT and MedDRA, NCI-T,ICD-9, ICD-10, ICD-O

4.1.3 FHIR-based templates and data elements

The EHR4CR Common Information Model (CIM) consists in a set of multilingual semantic resources based on multiple standards (see figure 11 & 12). The EHR4CR templates are based on FHIR resources (Patient, Encounter, Condition, Observation, Procedure and MedicationStatement) (see table 6). FHIR-based resources were organized into categories based on HL7 CCD sections and UMLS semantic types: Demographics, Encounters, Advance directives, Problems, Family History, Social History, Alerts, Medications, Immunizations, Vital Signs, Results (lab, anatomic pathology), Procedures, Plan of Care, Lifestyle Choice, Ethical consideration. FHIR resources were enriched in order to fulfil the requirements of the project and represent the required semantic content. Some specific value sets were defined for some data elements of the FHIR templates.

Figure 11. FHIR-based resources were organized into categories based on HL7 CCD sections and UMLS semantic types. For example, the clinical observable entity: “Eastern Cooperative Oncology Group (ECOG) performance

status” is defined using the template designed for clinical observations

EHR4CR templates are composed of data elements that are bound to a set of international reference terminologies selected by the project: ICD, SNOMED-CT, LOINC, ATC, ICD-O, Pubcan, TNM, PathLex. These terminologies are, when possible, imported into the collaborative editor from the official source of the terminology provider in order to bind the EHR4CR resources to up-to-date terminologies. The terminology binding is done through the definition of value sets corresponding to the data elements of each template. Figure 12 illustrates the terminology binding done for the Observable entity: “ECOG performance status”. The EHR4CR editing tool supports faceted templates. We defined a limited set of generic templates (e.g. Observation) with facets, so that it is possible for each code of the template (e.g. Observable entity SCT/423740007/ECOG performance status) to define its corresponding value set (e.g. SCT/424122007/ECOG performance status finding).

32

As much as possible, we enriched and/or merged reference terminologies in order to build multilingual terminologies and value sets (in English, French at least and when possible in the four languages of the EHR4CR partners: English, French, German and polish).

Figure 12. Copy screen of the EHR4CR collaborative editing tool

The clinical observable entity: “Eastern Cooperative Oncology Group (ECOG) performance status” is defined using the template designed for clinical observations (see table 6). Terminology binding. The data element: “code”

(DataType=ConceptDescriptor (CD)) is associated to a Value set defined as a set of TOP SNOMEDCT or LOINC codes e.g. SCT/423740007/ECOG performance status. The data element: “value” (DataType=ConceptDescriptor (CD)) is

associated to a Value set defined as a set of concepts (ordered children of SCT/424122007/ECOG performance status finding: 0/SCT/425389002-ECOG 0; 1/SCT/422512005-ECOG 1; 2/SCT/422894000-ECOG 2; 3/SCT/423053003-

ECOG 3; 4/SCT/423237006-ECOG 4; 5/SCT/423409001-ECOG 5).

The current version of the EHR4CR CIM includes 6 FHIR-based templates (and 6 additional specialized templates) and a subset of 15 corresponding data elements. Table 6 describes the content scope of the templates. Four patient demographic data elements (gender, birth time, deceased indicator, and deceased time) are part of the patient template. Four data elements (code, discharge disposition code, effective time, and length of stay) are part of the Encounter template. We distinguished two types of Conditions: diseases on one hand and signs and symptoms on the other hand. We defined 25 categories of diagnoses (including discharge diagnosis, primary diagnosis, secondary diagnosis, admitting diagnosis, etc.). Diseases are encoding using codes from a value set combining ICD 10 (n=12,318 codes) and a subset of SNOMED CT codes. In the current version we defined four specialized Observation templates and defined clinical observable entities (n=26), vital signs (n=5), laboratory observable entities (n=2000) and anatomic pathology observable entities (n=80). Value sets corresponding to categorical observable entities were defined and populated with more than 1000 codes from SNOMED CT, ICD-O (Pubcan), TNM, PathLex and EHR4CR-T. We defined as part of the Procedure template a small value set SNOMED CT procedures (n=57). As part of the MedicationStatement, we selected ATC (n=5,655 codes) as the value set attached to the data element consumableCode. The terminology binding of the EHR4CR CIM involves more than 21 500 concepts from reference terminologies internationally used. All the concepts are at least bilingual (English and French).

Table 6: Description and structure of the six core FHIR-based-templates of the EHR4CR mediation model.

33

Template (nb. of data elements)

Template scope Specialized template scope

Data element Terminlogy binding Value set

Nb. of concep

ts

Patient (n=4)

A Patient is a uniquely identified person. Clinical statements attached to this Patient may be recorded within the source systems.

administrativeGenderCode SCT gender types 4

birthTime

deceasedInd

deceasedTime

Encounter (n=4)

An Encounter occurrence correspond to a period of time a Patient continuously receives medical services from one or more providers at a care site in a given setting within the health care system.

Code SCT encounter types 6

dischargeDispositionCode

effectiveTime

lengthOfStayQuantity

Condition (n=2)

Conditions state the presence of a clinical disease, sign or symptom, etc.

nonDiseaseCondition: correspond to symptoms (observed by the patient) or signs (observed by a care provider).

Category SCT condition types 4

Code Subset of SCT findings 16

diseaseCondition: are inferred from medical claims data, textual clinical document, collected via forms (e.g. from a problem list), etc.

Category SCT diagnostic types 25

Code diseases (ICD10+subset of SCT diseases)

12500

clinicalObservation (n=2)

A (numerical or categorical) Observation is a sign or a symptom or the result of any procedure which is either observed by a Provider or reported by the Patient.

clinicalObservation: records of measurements performed by a clinician at bed side (including scores, grades, stages, etc.)

Name subset of SCT observable entities

26

Value value sets specific to each categorical observable entity

95

vitalSignObservation: refer to blood pressure, body temperature, pulse rate and respiratory rate.

Name subset of SCT vital signs 5

Value

laboratoryObservation: refer to laboratory tests.

Name subset of LOINC codes (Top 2000)

2000

Value value sets specific to each categorical observable entity

>500

anatomicPathologyObservation: records of measurements performed by a pathologist analyzing tissues/cells with a microscope (including scores, grades, stages, etc).

Name subset of LOINC codes (Top 80)

80

Value value sets specific to each categorical observable entity (e.g. ICD-O, TNM, etc)

>500

Procedure (n=1)

A Procedure occurrence correspond to the record of an activity or process ordered by, or carried out by, a healthcare provider on the patient with a diagnostic or therapeutic purpose. Procedures are inferred from medical claims include, computerized orders in EHRs, etc.

Code subset of SCT procedures

57

34

Medication Statement (n=2)

A medication statement is inferred from clinical events associated with orders, prescriptions written, pharmacy dispensing, procedural administrations, and other patient-reported information. Medication includes medicines, vaccines, and large-molecule biologic therapies.

administrationUnitCode

consumableCode ATC codes 6000

The current limited set of FHIR-based templates allows the representation of the main textual clinical data (signs, symptoms, diseases, outcome, procedures, care plans, etc.). We defined context-dependent value sets for representing multiple views or contextual information (e.g. organ specific scores or histologic types, etc.).

4.1.4 Terminologies/Ontologies

As much as possible existing resources are imported. Some of external resources are overlapping (e.g. ICD-10 and SNOMED CT; MedDRA and SNOMED CT; NCI-T and SNOMED CT). Associations between these reference terminologies are available in UMLS. Some of the external resources need to be translated and/or extended, EHR4CR translations/extensions need to be captured and managed. At last, some specific resources need to be created. An EHR4CR terminology was created in order to create concepts that are in the scope of the project but do not exist in the selected reference terminologies. We integrated the UMLS CUI in order to allow multi-terminology binding.

35

Table 7. Summary of selected reference terminologies used in EHR4CR

Terminology/ Ontology (Name, Provider, Availability, Steward/Custodian )

Description (General information, Technology, Use)

Conclusion (Use in the EHR4CR project &

issues)

ICD-10 http://www.who.int /classifications/icd/en/ Developed by WHO, managed by a Revision Steering Group. Available to download free of charge by license for non-commercial research purposes

Terminology model: A multi-lingual first generation coded classification system, using a fixed subsumption hierarchy with a simple semantic list alphanumerically referenced.

Number of concepts: 14,400 concepts.

Format: Csv database files

Languages: 6 official WHO languages (Arabic, Chinese, English, French, Russian and Spanish) and a total of 42 languages

Use & scope: International standard diagnostic classification for all general epidemiological, many health management purposes and clinical use. All diseases, morbidity associated with pregnancy, childbirth and the puerperium, congenital malformations and abnormalities, a wide variety of signs, symptoms, abnormal findings and health complaints, factors influencing health status (e.g. social circumstances) and categories for external causes of injury or disease (e.g. poisoning, transport accidents).

Rationale: Selected as reference terminology due to broad international use Issues:

Licensing issue for commercial use

Non-standard ICD-10 extensions

Classification with a single axis subsumption hierarchy & limited value in expanding higher level concepts for searching or querying within the EHR4CR applications.

ICD-O-3 http://www.who.int /classifications/icd/adaptations/oncology/en/

Developed by WHO, managed by the Secretariat / WHO International Association of Cancer Registries c/o International Agency for Research on Cancer (Lyon, France) Available to download free of charge

Terminology model: A multi-axial classification of the site, morphology, behavior, and grading of neoplasms. The topography axis uses the ICD-10 classification of malignant neoplasms (except those categories which relate to secondary neoplasms and to specify morphological types of tumors) for all types of tumors, thereby providing greater site detail for non-malignant tumors than is provided in ICD-10. In contrast to ICD-10, the ICD-O includes topography for sites of hematopoietic and reticuloendothelial tumors. The morphology axis provides five-digit codes ranging from M-8000/0 to M-9989/3. The first four digits indicate the specific histological term. The fifth digit after the slash (/) is the behavior code, which indicates whether a tumor is malignant, benign, in situ, or uncertain (whether benign or malignant). A separate one-digit code is also provided for histologic grading (differentiation).

Number of concepts:

Format: Csv database files

Languages: Chinese, Czech, English, Finnish, Flemish/Dutch, French German, Japanese, Korean, Portuguese, Spanish, Romanian, Turkish

Use & scope: Used principally in tumor or cancer registries for coding the site (topography) and the histology (morphology) of neoplasms, usually obtained from a pathology report.

Creation date: 1976 - Last date change: 2000

Rationale: Selected as reference terminology due to broad international use Issues:

Possible licensing issue for commercial use

Non-standard ICD-O extensions

http://www.who.int/

http://www.who.int/

36

Pubcan http://www.pubcan.org/

Developed by WHO, managed by the Secretariat / WHO International Association of Cancer Registries c/o International Agency for Research on Cancer (Lyon, France) Not available to download

Terminology model: A classification of pre-coordinated representation of the site, morphology, behavior, and grading of neoplasms based on ICD-O-3.

Number of concepts:

Format: no available export format

Languages: English

Use & scope: idem ICD-O-3

Creation date & Last date change:

Rationale: Selected as reference terminology, since built from the broadly internationally used ICD-O and useful for pre-coordinated representation of the site, morphology, behavior, and grading of neoplasms. Issues:

Possible licensing issue for commercial use

Classification with a single axis subsumption hierarchy & limited value in expanding higher level concepts for searching or querying within the EHR4CR applications.

LOINC (Logical Observation Identifiers Names and Codes ) http://loinc.org/international Developed by Regenstrief Institute, Inc., Indianapolis, USA. Free of charge to all users.

Terminology model: A multi-lingual second generation vocabulary and coding system.

Number of concepts: > 72,000 (including >50,000 laboratory terms)

Format: CSV format text file, Access database and release to release (Change File and Change Report).

Tooling and support: http://loinc.org/

Languages: English, German, Spanish, French,

Chinese.

Use & scope: A set of universal names and ID codes for identifying laboratory test results or clinical observations. Usual categories of chemistry, hematology, serology, microbiology, toxicology; concepts for vital signs, hemodynamic, intake/output, EKG, obstetric ultrasound, cardiac echo, urologic imaging, gastro endoscopic procedures, pulmonary ventilator management, selected survey instruments (e.g. Glascow Coma Score, PHQ-9 depression scale, CMS-required patient assessment instruments), and other clinical observations.

Creation date : 1994 - Last date change: 2014-12-22 (LOINC 2.50)

Rationale: Subsets of LOINC selected as reference terminology due to broad international use (more than 35,000 people in 163 countries) Issues:

Formal representation of observable entities but nonstandard data types and missing formal representation of value sets

The broad scope of LOINC requires a step by step import & mapping strategy based on subsets (e.g. LOINC Top 2000 results, LOINC Top 300 orders)

ATC (Anatomical Therapeutic Chemical Classification System) http://www.whocc.no/ http://www.whocc.no/atc_ddd_index/ Developed by WHO Collaborating Centre for Drug Statistics Methodology (Norwegian Institute of Public Health). Cost=€200 (No formal license needed when ATC system is an integrated part of a database).

Terminology model: Five-level classification

Number of concepts: 5717 concepts (4464 concepts are ATC 5th levels (substance level)).

Format: no available export format

Languages: English, Spanish and German

Use & scope: Classification of medicines according to their active substance(s). Medicines are divided into different groups according to the organ or system on which they act and their therapeutic, pharmacological and chemical properties. All medicinal substances that are active ingredients in licensed medicinal products internationally

Creation date & Last date change:

Rationale: Selected as reference terminology due to broad international use Useful when eligibility criteria reference medicines use by therapeutic category Issues:

Primarily designed for pharmaco-epidemiology

Does not contain the lower level medicinal product concepts (for example it contains "atenolol" but not "atenolol 50mg tablets")

Does not support any dosage representation or calculation

Substances formulated into a number of different types of medicinal products (for example hydrocortisone) may have several different ATC codes

Some national extensions to ATC include lower level concepts.

Snomed CT http://www.ihtsdo.org/ Owned since April 2007 by the International

Terminology model: A multilingual thesaurus with an ontological foundation. Concepts are organized into acyclic taxonomic (is-a) hierarchies. Concepts may have multiple parents. Created by the merger, expansion, and restructuring of the

Rationale: Subsets of SNOMED CT are candidate for selection as reference terminology due to international use and the ontological foundation of SNOMED CT.

http://www.pubcan.org/

http://loinc.org/international

http://loinc.org/international

http://www.whocc.no/

http://www.ihtsdo.org/

37

Health Terminology Standards Development Organization (IHTSDO) Requires a license (national membership OR affiliate license)

College of American Pathologists (CAP) SNOMED RT (Reference Terminology) and the UK National Health Service (NHS) Clinical Terms (also known as the Read codes)

Number of concepts: 311,000 concepts linked by approximately 1,360,000 links, called relationships (2011).

Format: EL++ formalism, incorporated into OWL 2 as an OWL 2 EL Profile

Tooling and support: http://browser.ihtsdotools.org/ Numerous online and offline browsers are available.

Languages: American English, British English and Spanish, with other translations under way or nearly completed in French, Dutch, Danish, and Swedish.

Use & scope: Body structure, Clinical finding, Environment or geographical location (environment / location), Event, Observable entity (observable entity), Organism, Pharmaceutical / biologic product, Physical force, Physical object (physical object), Procedure, Qualifier value, Record artifact, Situation with explicit context, Social context, Specimen, Staging and scales, Substance

Creation date : January 2002 - Last date change:

License cost is a limitation in non SNOMED CT member countries. Issues:

Limited reasoning capabilities due to omissions and redundancies of semantic content (duplicate primitive and defined concepts)

Cost of the affiliate license for non SNOMED CT member countries.

PathLex (1.3.6.1.4.1.19376.1.8.2.1) http://www.ihe.net/Technical_Framework/upload/IHE_PAT_Suppl_APSR_Appendix_Value_Sets_2011_03_31.xls Developed by IHE and HL7 Anatomic Pathology in collaboration with the College of American pathologists (CAP) and ADICAP. Started in 2010

Terminology model: A lexicon unifying and supplementing other terminologies, such as Snomed CT, LOINC, ICD-O to ensure semantic consistency within and across standards (HL7 v2.5, HL7 v3, DICOM).

Format: csv files

Number of concepts: 1700 terms or expressions

Use & scope: published as part of the IHE content profile “Anatomic Pathology Structured Report” (APSR): 21 HL7 CDA templates (a generic template for APSR and 20 organ-organ-specific templates including templates for cancer-specific organizers).

Rationale: Subsets of Pathlex are candidate for selection as reference terminology to represent grades and scores in Anatomic Pathology (when LOINC or SNOMED CT codes are not available). Issues:

Limited scope (20 IHE CDA templates of structured reports)

TNM (UICC and AJCC staging systems) http://www.uicc.org/resources/tnm Maintained by the Union for International Cancer Control (UICC) & American Joint Committee on Cancer (AJCC) Purchased at Wiley Online Library http://onlinelibrary.wiley.com/doi/10.1002/9780471420194.tnms04.pub3/full

Terminology model: multi-axial classification

Format: pdf

Number of concepts:

Language: Arabic, Chinese, Czech, French, German, Italian, Japanese, Latvian, Polish, Portuguese, Russian, Turkish

Use & scope: assessing the diagnosis, treatment, and prognosis of patients with cancer. Internationally agreed-upon standards to describe and categorize cancer stages and progression. It contains important updated organ-specific classifications that oncologists and other professionals who manage patients with cancer need can use to accurately classify tumors for staging, prognosis and treatment. The UICC TNM staging system is the common language in which oncology health professionals can communicate on the cancer extent for individual patients as a basis for decision making on treatment management and individual prognosis but can also be used, to inform and evaluate treatment guidelines, national cancer

Rationale: Selected as reference classification due to broad international use for Aid treatment planning, Provide an indication of prognosis, Assist in the evaluation of treatment results, Facilitate the exchange of information between treatment centers, Contribute to continuing investigations of human malignancies, Support cancer control activities, including through cancer registries. Issues:

Cost of the classification.

http://browser.ihtsdotools.org/

http://www.ihe.net/Technical_Framework/upload/IHE_PAT_Suppl_APSR_Appendix_Value_Sets_2011_03_31.xls





http://www.uicc.org/resources/tnm

http://www.uicc.org/resources/tnm

http://onlinelibrary.wiley.com/doi/10.1002/9780471420194.tnms04.pub3/full




38

planning and research. Creation date : 1977 - Last date change: Edition 7 published 2009

4.1.5 Semantic Resource Repository

The semantic resources are stored into a semantic metadata repository (MDR). We use the term of metadata (literally "data about data") to distinguish “data collection structures” from patient data that populate those structures, i.e. instance-level. Metadata should be described using well-defined metadata schema so as to represent the semantics of the instance data and will include concepts and relationships as well as bindings to terminologies. Metadata scheme may be expressed in a number of different programming languages e.g. HTML, XML, UML, RDF, etc. We used the international standard ISO/IEC 11179 to define metadata. This standard provides the definition of a "data element" registry, describing disembodied data elements. It is important to note that ISO/IEC 11179 covers just the definition of elements and does not dictate the persistence structures or retrieval strategies. In the healthcare domain, another ISO standard – ISO 21090 – plays a key role in the ISO/IEC 11179-based data element definitions since it provides the appropriate formal representation of the data type for Data Element Concept and of any type of the Value Domain data type. ISO 21090 especially provides a formal of the coded data types and addresses the binding with terminologies.

4.2 Tools and services

Tools and services have been developed for supporting the different actors in accomplishing their tasks within the standardization pipeline. These tools support:

the management of the mediation model (CIM Editor)

the mapping between hospital local sources and the mediation model that are used for clinical data transformation (standardization) during the ETL processes and/or query transformation. We distinguish:

o terminology mapping between local terminologies used in hospital CDWs/EHRs and reference terminologies used in the mediation model (central EHR4CR Common Information Model) supported by the Terminology Mapping Suite (TMS)

o structural mapping between information models of CDWs (i2b2 and EHR4CR CDW) and the mediation model (done manually).

the terminology mapping between electronic Clinical Research Forms (eCRFs in CDISC ODM format) and the mediation model supported by the SDM-ODM CDISC Editor extension.

The table 8 provides an overview of the tooling developed for supporting the different actors in accomplishing their tasks within the standardization pipeline.

Table 8. List and description of the supportive tools and services

Type of tool/service

Users & role Description Availability

Yr2

Yr3

Yr4

Yr5

Common Information Model Editor (CIME)

Web-based editor

Service provider (Board)/Hospital sites

Authoring/curating EHR4CR semantic resources

V0 V1 V1 V2

Term

ino

log

y M

app

ing

Suit

e (T

MS

)

Terminology Mapping Services (TME) and Terminology Mapping Services

Web-based editor

Service provider (Board)

Authoring/curating and validating terminology mapping (reference terminologies)

V0

V1

Hospitals (mapping edition)

Authoring/curating and validating terminology mapping (local

https://en.wikipedia.org/wiki/Data

https://en.wikipedia.org/wiki/ISO/IEC_11179

39

Users (sponsors, investigators) (mapping validation)

terminologies/EHR4CR terminologies)

EHR4CR Terminology Mapping Status Manager (TMSM)

Hospitals (mapping status)

Management of the mapping tasks (local terminologies/EHR4CR terminologies): access to current status and worklists

V0 V1

SDM-ODM editor extension

Software CDISC ODM editor (semantic enabled application using semantic interoperability services for annotation of CDISC ODM files)

Search/Access Search concept or browsing concept hierarchies Get associated Data type , ValueSet, Unit

V0

4.2.1 EHR4CR Common Information Model Editor (CIME)

4.2.1.1 Functional scope

EHR4CR CIM Editor allows the EHR4CR Semantic Resource Author / Curator to:

Browse the repository of EHR4CR semantic resources (Common Element Templates (e.g. observations, procedures, substance administrations, etc.), Common Data Elements, Value Sets and Terminologies)

Search for any type of EHR4CR semantic resources

Edit new Common Element Templates (e.g. observations, procedures, substance administrations, etc.), Data Elements and Value Sets and link them to reference terminologies (terminology binding)

Import semantic resources (Clinical Elementary Templates, Data Elements, Value Sets and terminologies from external providers (e.g. UMLS, BioPortal, HL7, etc.)

Export any type of EHR4CR semantic resources (Clinical Elementary Templates, Data Elements, Value Sets, terminology binding and query specification) in standard formats (e.g. SKOS)

Create/modify the model of the EHR4CR semantic resources (Common Element Templates (e.g. observations, procedures, substance administrations, etc.), Common Data Elements, Value Sets and Terminologies)

40

Figure 13. Common Information Model (CIM) Editor: A collaborative editor for authoring/curation of various

semantic resources: Templates, Data Elements; Value Sets, Terminologies.

In order to manage the terminology binding during the creation of new data elements or value sets the CIM Editor supports a searching interface allowing the selection of standard concepts from reference terminologies. Standard terminologies currently supported include HL7 vocabulary, ICD-10, LOINC, ATC, SNOMED CT, PathLex, ICD-O, and UCUM.

4.2.2 EHR4CR Terminology Mapping Suite (TMS)

4.2.2.1 Functional scope

Terminology mapping process, supported by TMS, is guided through two main workflows: i) Terminology Loading workflow; and ii) Terminology Mapping workflow.

4.2.2.1.1 Terminology Loading Workflow

The Terminology Mapper uploads the local value sets corresponding to the scope of the EHR4CR mediation model (i.e a set of central value sets, model such as diagnosis codes, clinical findings or vital signs codes, procedure codes, etc. (see 4.1.4)) into the Terminology Mapping Editor (TME) using a predefined loading format.

41

Figure 14. Terminology loading workflow

42

4.2.2.1.2 Terminology Mapping Workflow

Figure 15. Terminology mapping workflow

The Terminology Mapper uses the Terminology Mapping Editor (TME) GUI to set up the scope of the mapping task selecting the local and central value sets to be mapped. In the defined scope, the Terminology Mapper uses the TME GUI to perform manual mappings. In addition, he/she use Terminology Mapping Services to run automatic processes for finding new mappings/validating mappings. At the end of the automatic process he/she displays the results (new mappings and/or erroneous mappings) and completes the mapping. Before validation, the mapping is at a draft status. Once validated by the Terminology Mapper, the mapping is at a frozen state and the system prevents any changes to it. When the mapping is available for review, the Mapping Reviewer revises the mappings corresponding to the request he has been assigned to. Once validated by the Mapping Reviewer, the mapping is at production state. Local mapping coverage evaluation is very important for data providers because mapping coverage has a direct impact on query performances: the wider the coverage is, the more accurate and sensitive the queries are. The Terminology Mapper uses the EHR4CR Terminology Mapping Status Manager (TMSM) to visualize the mapping coverage in his/her hospital. The Terminology Mapper identifies the value sets of the mediation model (e.g. diagnosis codes, clinical findings or vital signs codes, procedure codes, etc.) for which the mapping is missing or incomplete and the list of the central concepts of the value set that still need to be mapped. If any update occurs in any local value set, the new version has to be uploaded in TME. Thanks to the EHR4CR Terminology Mapping Status Manager (TMSM), the Terminology Mapper identify any need for updating existing mappings between local/central value sets after any change occurred in either a local or a central value set. The Terminology Mapper can use Terminology Mapping Services to run automatic processes for updating of any existing mapping between a local and central value set after any change

43

occurred in either the local or the central value set. Already existing validated mappings are preserved during the execution of the automatic processes.

4.2.2.2 Terminology Mapping Editor (TME) and Terminology Mapping Services

4.2.2.2.1 Tool description

Terminology Mapping Editor (TME) provides an interactive and collaborative interface to the terminology experts allowing them to define mappings between two terms from different value sets +/- terminologies. TME presents the content of the two (user-selected) value sets in a hierarchical view, allows user to (i) browse and search clinical terms from each of the two given value sets, (ii) define mapping relation between 1-1 or 1-many terms across two different value sets and (iii) define mapping type (exact match, narrow match, broad match, close match) (see Figure 16).

44

Figure 16. EHR4CR Terminology Mapping Editor

A set of Terminology Mapping Services are used to provide (i) an initial version of the mapping between a local and central value set, (ii) a validation of any mapping between a local and central value set, and (iii) an update of any existing mapping between a local and central value set after any change occurred in either the local or the central value set

4.2.2.3 EHR4CR Terminology Mapping Status Manager (TMSM)

4.2.2.3.1 Tool description

The GraphicalMappingValidator (GMV) is a first version (v0) of the EHR4CR Terminology Mapping Status Manager (TMSM). The data fetched from these 3 different sources is aligned and formatted into a FreeMind compatible file that can be read and displayed by the FreeMind tool.

45

Figure 17. EHR4CR main categories displayed in the GraphicalMappingValidator tool

The GMV script retrieves the whole EHR4CR central terminology hierarchy and then adds the local items when an alignement is found in the local mapping server. To discover the mapping, the user has to navigate through the hierarchy by opening the intermediate levels.

46

Figure 18. Mapped central items (in green) and the corresponding mapped local items (in blue) for the "Drug or

medicament" category

Mapped central items are displayed with a green background and local items are displayed with a blue background. When a central item is not yet mapped with a local item it is displayed with a white background. The GMV can also display the mapping coverage (number of central items mapped / total number of central items) for each branch and for the overall hierachy:

47

Figure 19. Mapping coverage statistics (nb_concepts, nb_mapped_concepts and mapping_coverage) displayed in

the GMV tool

Figure 20. Mapping coverage statistics (details) for "Drug or medicament" category

The GMV can also displays the number of patients and the number of observations in the CDW for each local items:

48

Figure 21. CDW counts for each local items {n= # of patients; p= # of observations} mapped to central item

[ATC:J07BB ‘Influenza vaccines’].

The CDW counts enable an additional mapping validation because they show how many patients/observations can be theoretically addressed by the EHR4CR central items.

4.2.2.3.2 Technical implementation

GraphicalMappingValidator (GMV) is based on the use of the FreeMind software (http://freemind.sourceforge.net/wiki/index.php/Main_Page) which is known for organizing and managing any kind of data structure in a very simple manner. The main developing effort for setting up this tool was to generate a FreeMind compatible file that contains all the information needed to audit and validate the local mapping. From a technical point of view, the GMV script is merging data extracted from 3 different EHR4CR components in order to generate the GMV FreeMind file:

The EHR4CR central terminology server in order to fetch the central items hierarchy

The local mapping server in order to fetch the existing local items

The Clinical Data Warehouse (CDW)

Figure 22. The GMV architecture

http://freemind.sourceforge.net/wiki/index.php/Main_Page

49

The GMV script is implemented as a Groovy (http://www.groovy-lang.org/) script that is accessible by the EHR4CR developer community in the subversion repository at the URL https://svn.custodix.com/ehr4cr/branches/2015-03-09_APHP_GraphicalMappingValidator/. Groovy is based on the Java language (the GMV script can be run on any platform that already supports Java) and allows scripting syntax shortcuts that foster application development. GMV, as a temporary solution for evaluating and auditing the mapping, has really taken advantage from this feature.

4.2.2.3.3 Tool limitations and future developments

The GMV tool is only a visualization tool: the mapping cannot be changed nor exported to the local mapping server. But, thanks to 1) the navigation features of FreeMind, 2) the colors attributes (green/blue for mapped central/local items) and 3) the content statistics (mapping coverage, patient counts, observation counts) the user can very easily get an overview on the mapping coverage.

4.2.3 Structural mapping

The EHR4CR “structural mapping” layer is a way of addressing the connection of the EHR4CR local components to different local clinical data warehouses. As a matter of fact, clinical data warehouses may differ from one site to another by numerous aspects, among which:

Data storage model (HL7 based, star-schema based, snowflake based …)

Database engine : SQL based (ORACLE, MSSQL, POSTGRE …), noSQL based (Hadoop,

Neo4J, …) or object oriented (Caché …)

The “structural mapping” layer enables to take all these different possible configurations into account by explicitly defining for a predefined set of ECLECTIC templates (i.e. the high level medical objects) the corresponding database statements that will be used by the platform to retrieve corresponding data from the clinical data warehouse. Each template (GENERAL, PROCEDURE, MEDICATION …) requires a predefined set of data elements that must be given by the database statement (SQL-based most of the time). In some context, the template needs some parameters (to get the patient data for a given set of medications codes for example). Therefore, the “structural mapping” layer is using a temporary table (Q_CD) in which these parameters are stored. Some database statements are using this table to fetch the template dataset. The set of database statements for all ECLECTIC templates are gathered in a XML-based configuration file. During the duration of the project two different clinical data warehouses engine has been used:

1. The “native” EHR4CR based CDW

2. The i2b2 based CDW

For these 2 CDW flavors, a different “structural mapping” configuration file has been designed.

http://www.groovy-lang.org/

https://svn.custodix.com/ehr4cr/branches/2015-03-09_APHP_GraphicalMappingValidator/

50

4.2.3.1 ECLECTIC query templates and corresponding SQL queries

Category EHR4CR Query Template/Parameters

EHR4CR template

EHR4CR data element and example query

SQL statement in i2b2

Demographics

Patient/Attribut

e Gender

SCT/263495000/Gender DataType=CD Value set e.g SCT/248153007/Male

Select sex_cd from patient_dimension

Demographics

Patient/Birth date

Birth date Select birth_date from patient_dimension

Procedures Procedure/ Attribute code

DataType=CD Value set (extension) e.g. SCT/90470006/Prostatectomy

Select INSTR(CONCEPT_CD, ':') + 1) from observation_fact

Medication administered

Medication/ List of medication codes

Substance Administration

Attribute code DataType=CD Value set (extension) e.g. ATC/ A10AB/Insulins and analogues for injection, fast-acting


Condition Existential Observation /List of observation codes

Observation Attribute code DataType=CD Value set SCT/304897000/Ability to comply with treatment Attribute value DataType=BL Yes


Observations and measurements with a physical value and unit

Numeric Observation/ List of observation codes

Observation Attribute code DataType=CD Value set (intension) e.g. LOINC/4548-4 /Hemoglobin A1c/Hemoglobin.total in Blood Attribute value DataType=PQ e.g UCUM/%/percent

Select INSTR(CONCEPT_CD, ':') + 1), nval_num, units_cd from observation_fact

Observations and measurements with a categorical value

Coded Value Observation/ List of observation codes

Observation Attribute code DataType=CD Value set (intension) e.g. SCT/424122007/ECOG performance status finding Attribute value DataType=CD Value set (extension) e.g. SCT/422512005/ECOG 1

Select INSTR(CONCEPT_CD, ':') + 1), tval_char from observation_fact

Observations and measurements with an ordinal value

Coded Ordinal Observation/ List of observation codes

Observation Attribute code DataType=CD Value set (intension) e.g. SCT/424122007/ECOG performance status finding Attribute value DataType=CO Value set (extension) e.g. SCT/422512005/ECOG 1/1

Select INSTR(CONCEPT_CD, ':') + 1), concept_cd || ‘=’ || tval_char from observation_fact

Diagnoses Diagnosis/ List of diagnosis codes

Observation Attribute code DataType=CD Value set (intension) e.g. SCT/89100005 Final diagnosis (discharge) Attribute value DataType=CD


51

Value set (extension) e.g. ICD10/I21/Acute myocardial infarction

4.2.3.1 Structural mapping in i2b2 sites

Table 9. ECLECTIC query templates and corresponding SQL queries based on i2b2 CDW

Name Parameters

SQL queries based i2b2 CDW

generalQuery No SELECT S.PATIENT_NUM IDPATIENT, S.BIRTH_DATE DOB, S.SEX_CD GENDERCODE, '2.16.840.1.113883.6.96' AS GENDERCODESYSTEM FROM PATIENT_DIMENSION S WHERE S.PATIENT_NUM IS NOT NULL

deadQuery No SELECT S.PATIENT_NUM IDPATIENT, S.BIRTH_DATE DOB, S.DEATH_DATE EFFECTIVETIME, S.SEX_CD GENDERCODE, '2.16.840.1.113883.6.96' GENDERCODESYSTEM FROM PATIENT_DIMENSION S WHERE VITAL_STATUS_CD = 'N' AND S.PATIENT_NUM IS NOT NULL

medicationQuery List of CD SELECT S.PATIENT_NUM IDPATIENT, S.BIRTH_DATE DOB, S.SEX_CD GENDERCODE, '2.16.840.1.113883.6.96' GENDERCODESYSTEM, A.START_DATE EFFECTIVETIME, Q.NUM NUM FROM OBSERVATION_FACT A, PATIENT_DIMENSION S, Q_CD Q WHERE A.PATIENT_NUM = S.PATIENT_NUM AND A.CONCEPT_CD = Q.CODESYSTEM || ':' || Q.CODE ORDER BY S.PATIENT_NUM, A.START_DATE

procedureQuery List of CD SELECT S.PATIENT_NUM IDPATIENT, S.BIRTH_DATE DOB, S.SEX_CD GENDERCODE, '2.16.840.1.113883.6.96' GENDERCODESYSTEM, A.START_DATE EFFECTIVETIME, Q.NUM NUM FROM OBSERVATION_FACT A, PATIENT_DIMENSION S, Q_CD Q WHERE A.PATIENT_NUM = S.PATIENT_NUM AND A.CONCEPT_CD = Q.CODESYSTEM || ':' || Q.CODE ORDER BY S.PATIENT_NUM, A.START_DATE

existenceObservationQuery List of CD SELECT S.PATIENT_NUM IDPATIENT, S.BIRTH_DATE DOB, S.SEX_CD GENDERCODE, '2.16.840.1.113883.6.96' GENDERCODESYSTEM, A.START_DATE EFFECTIVETIME, Q.NUM NUM, SUBSTR(A.CONCEPT_CD, INSTR(A.CONCEPT_CD, ':') + 1) AS LOCALCODE FROM OBSERVATION_FACT A, PATIENT_DIMENSION S, Q_CD Q WHERE A.PATIENT_NUM = S.PATIENT_NUM AND A.CONCEPT_CD = Q.CODESYSTEM || ':' || Q.CODE ORDER BY S.PATIENT_NUM, A.START_DATE

diagnosisQuery List of CD SELECT S.PATIENT_NUM IDPATIENT, S.BIRTH_DATE DOB, S.SEX_CD GENDERCODE, '2.16.840.1.113883.6.96' AS GENDERCODESYSTEM, SUBSTR(A.CONCEPT_CD, INSTR(A.CONCEPT_CD, ':') + 1) LOCALCODE, A.START_DATE EFFECTIVETIME, Q.NUM NUM FROM OBSERVATION_FACT A, PATIENT_DIMENSION S, Q_CD Q WHERE A.PATIENT_NUM = S.PATIENT_NUM AND A.CONCEPT_CD = Q.CODESYSTEM || ':' || Q.CODE ORDER BY S.PATIENT_NUM, A.START_DATE

numericObservationQuery List of CD SELECT S.PATIENT_NUM IDPATIENT, S.BIRTH_DATE DOB, S.SEX_CD GENDERCODE, '2.16.840.1.113883.6.96'

52

GENDERCODESYSTEM, A.START_DATE EFFECTIVETIME, SUBSTR(A.CONCEPT_CD, INSTR(A.CONCEPT_CD, ':') + 1) EVENTCODE, Q.CODESYSTEM EVENTCODESYSTEM, CASE WHEN A.VALTYPE_CD = 'N' THEN A.NVAL_NUM WHEN A.VALTYPE_CD = 'T' THEN TO_NUMBER(A.TVAL_CHAR) END AS PHYSVALUE, A.UNITS_CD PHYSUNITCODE, Q.NUM NUM, '2.16.840.1.113883.6.8' AS physUnitCodeSystem FROM OBSERVATION_FACT A, PATIENT_DIMENSION S, Q_CD Q WHERE A.PATIENT_NUM = S.PATIENT_NUM AND A.CONCEPT_CD = Q.CODESYSTEM || ':' || Q.CODE ORDER BY S.PATIENT_NUM, A.START_DATE

codedObservationQuery List of CD SELECT S.PATIENT_NUM IDPATIENT, S.BIRTH_DATE DOB, S.SEX_CD GENDERCODE, '2.16.840.1.113883.6.96' GENDERCODESYSTEM, A.START_DATE EFFECTIVETIME, A.TVAL_CHAR VALUECODE, Q.CODESYSTEM VALUECODESYSTEM, Q.NUM NUM FROM OBSERVATION_FACT A, PATIENT_DIMENSION S, Q_CD Q WHERE A.PATIENT_NUM = S.PATIENT_NUM AND A.CONCEPT_CD = Q.CODESYSTEM || ':' || Q.CODE ORDER BY S.PATIENT_NUM, A.START_DATE

codedOrdinalObservationQuery

List of CD SELECT S.PATIENT_NUM IDPATIENT, S.BIRTH_DATE DOB, S.SEX_CD GENDERCODE, '2.16.840.1.113883.6.96' GENDERCODESYSTEM, A.START_DATE EFFECTIVETIME, SUBSTR(A.CONCEPT_CD, INSTR(A.CONCEPT_CD, ':') + 1) LOCALCODE, A.CONCEPT_CD || ( CASE WHEN A.VALTYPE_CD = 'T' THEN '=' || A.TVAL_CHAR WHEN A.VALTYPE_CD = 'N' THEN '=' || A.NVAL_NUM ELSE '' END ) AS VALUECODE, Q.CODESYSTEM AS VALUECODESYSTEM, NULL AS ORDINALVALUE, A.CONCEPT_CD AS ROOTCODE, Q.CODESYSTEM AS ROOTCODESYSTEM, Q.NUM AS NUM FROM OBSERVATION_FACT A, PATIENT_DIMENSION S, Q_CD Q WHERE A.PATIENT_NUM = S.PATIENT_NUM AND A.CONCEPT_CD = Q.CODESYSTEM || ':' || Q.CODE ORDER BY S.PATIENT_NUM, A.START_DATE

4.2.3.1 Structural mapping in EHR4CR-CDW

Table 10. ECLECTIC query templates and corresponding SQL queries based on EHR4CR-CDW

Name Parameters SQL queries based EHR4CR-schema CDW

generalQuery No SELECT s.id AS idPatient, s.birthTime AS DoB, s.administrativeGenderCode AS genderCode, s.administrativeGenderCodeSystem AS genderCodeSystem FROM Subject AS s WHERE s.id IS NOT NULL;

deadQuery No

medicationQuery List of CD SELECT s.id AS idPatient, s.birthTime AS DoB, s.administrativeGenderCode AS genderCode, s.administrativeGenderCodeSystem AS genderCodeSystem, a.id AS idSbam, a.effectiveTime AS effectiveTime, a.effectiveTimeLow AS effectiveTimeLow, a.effectiveTimeHigh AS

53

effectiveTimeHigh, m.code AS materialCode, r.code AS routeCode FROM ((Administration AS a LEFT JOIN Subject AS s ON a.idSubject=s.id) LEFT JOIN V_CD AS m ON m.id=a.materialCode) LEFT JOIN V_CD AS r ON r.id=a.routeCode WHERE a.materialCode in (SELECT DISTINCT c.id FROM V_CD as c WHERE CONCAT(c.codeSystem, ':', c.code) in ('OID:ATC_Code',…)) ORDER BY s.id, effectiveTime;

procedureQuery List of CD SELECT s.id AS idPatient, s.birthTime AS DoB, s.administrativeGenderCode AS genderCode, s.administrativeGenderCodeSystem AS genderCodeSystem, p.id AS idProc, p.effectiveTime AS effectiveTime FROM (Procedures AS p LEFT JOIN Subject AS s ON p.idSubject=s.id) WHERE p.code in (SELECT DISTINCT c.id FROM V_CD as c WHERE CONCAT(c.codeSystem, ':', c.code) in ('OID:SNOMED-CT_Code',…)) ORDER BY s.id, effectiveTime;

existenceObservationQuery List of CD

diagnosisQuery List of CD SELECT s.id AS idPatient, s.birthTime AS DoB, s.administrativeGenderCode AS genderCode, s.administrativeGenderCodeSystem AS genderCodeSystem, o.id AS idObs, o.effectiveTime AS effectiveTime FROM (Observation AS o LEFT JOIN Subject AS s ON o.idSubject=s.id) where CONCAT(o.valueCodeSystem, ':', o.codevalue) in ('OID:ICD_code',…) and o.valueRefType='CD' ORDER BY s.id, effectiveTime;

numericObservationQuery List of CD SELECT s.id AS idPatient, s.birthTime AS DoB, s.administrativeGenderCode AS genderCode, s.administrativeGenderCodeSystem AS genderCodeSystem, o.id AS idObs, o.effectiveTime AS effectiveTime, d.code AS eventCode, d.codeSystem AS eventCodeSystem, o.physValue, o.physValueLow, o.physValueHigh, o.physUnit as physUnitCode FROM (Observation AS o LEFT JOIN Subject AS s ON o.idSubject=s.id) LEFT JOIN V_CD AS d ON o.code = d.id WHERE o.code in (SELECT DISTINCT c.id FROM V_CD as c WHERE CONCAT(c.codeSystem, ':', c.code) in ('OID:LOINC_Code',…)) and o.valueRefType='PQ' ORDER BY s.id, effectiveTime;

codedObservationQuery List of CD SELECT s.id AS idPatient, s.birthTime AS DoB, s.administrativeGenderCode AS genderCode, s.administrativeGenderCodeSystem AS genderCodeSystem, o.id AS idObs, o.effectiveTime AS effectiveTime, v.code AS valueCode, v.codeSystem AS valueCodeSystem FROM (Observation AS o LEFT JOIN Subject AS s ON o.idSubject=s.id) LEFT JOIN V_CD as v ON v.id = o.codeValue WHERE o.code in (SELECT DISTINCT c.id FROM V_CD as c WHERE CONCAT(c.codeSystem, ':', c.code) in ('OID:SNOMED-CT_Code')) and o.valueRefType='CD' ORDER BY s.id, effectiveTime;

54

5 EHR4CR semantic interoperability services (SIS)

Chapter 5 describes the EHR4CR semantic interoperability services (SIS) developed for semantically enabled-applications.

The semantic interoperability services (SIS) are developed to support i) the standardization process during the set up phase. SIS are available for ETL tools used during EHR4CR CDW population (see section 6) and for the CDISC SDM-ODM editor extension used for the structural mapping between CDISC SDM-ODM and the mediation model) and ii) the execution phase of EHR4CR use cases (PFS, PRS or CTE). SIS are available for the EHR4CR query builder for query specification and for the EHR4CR endpoint for query transformation.

EHR4CR Semantic Interoperability Services (SIS) provide a standardized interface for the usage and management of semantic resources (terminologies, ontologies, value sets, data elements, templates) chosen by the users of the service in their deployment environment. These services consist in a modular, common and universally deployable set of behaviors which can be used to deal with a shared set of semantic resources. The services contribute to interoperability by fostering the authoring of semantic resources via its authoring profile and by supporting an easy access to the foundational elements of shared semantics. This goal is realized via the expansion of the original functionality outlined in HL7’s Common Terminology Service – Release 2 (CTS2) Specification.

5.1 Introduction and Scope

The SIS have been developed following the Service Specification Development Framework Methodology. This methodology was proposed by HL7 and OMG for defining the Healthcare Services Specification Project specifications. The methodology sets out an overall process, and also defines the responsibilities of the Service Functional Model. Section 5.2.2 sets out the business context for this particular specification, but firstly it is important to understand the overall context within which this specification is written, i.e. its purpose from a methodology standpoint.

5.2 Service Definition Principles

The high level principles regarding service definition that have been adopted by the Services Specification Project are as follows: Service Specifications shall be well defined and clearly scoped and with well understood

requirements and responsibilities. Services should have a unity of purpose (e.g., fulfilling one domain or area) but services

themselves may be composable. Services will be specified sufficiently to address functional, semantic, and structural

interoperability. It must be possible to replace one conformant service implementation with another

meeting the same service specification while maintaining functionality of the system. A Service at the SFM level is regarded as a system component; the meaning of the term “(system) component” in this context is consistent with UML usage. A component is a modular unit with well-defined interfaces that is replaceable within its environment. A component can always be considered an autonomous unit within a system or subsystem. It has one or more

codedOrdinalObservationQuery List of CD Under developement

55

provided and/or required interfaces, and its internals are hidden and inaccessible other than as provided by its interfaces. Each Service’s Functional Model defines the interfaces that the service exposes to its environment, and the service’s dependencies on services provided by other components in its environment. Dependencies in the Functional Model relate to services that have or may in future have a Functional Model at a similar level; detail dependencies on low-level utility services should not be included, as that level of design is not in scope for the Functional Model. The manner in which services and interfaces are deployed, discovered, and so forth is outside the scope of the Functional Model.

5.3 Comparison of the SIS/CTS2 Service Functional Models

The goal of the SIS is to provide a standardized interface for the usage and management of semantic resources. The services contribute to interoperability by supporting an easy access to the foundational elements of shared semantics. It also fosters the authoring of semantic resources via its authoring profile. This goal is realized via the expansion of the original functionality outlined in HL7’s Common Terminology Service (CTS 2) Specification defines the functional requirements of a set of service interfaces to allow the representation, access, and maintenance of semantic content either locally, or across a federation of terminology service nodes. Similarly to the CTS 2 service, the SIS defines both the expected behaviors of a semantic resource service and a standardized method of accessing semantic content. Terminologies provide the atomic building blocks of shared semantics, concepts. Other building blocks in the scope of SIS are value sets, data elements, templates and associations. The functional scope of SIS is broader than the CTS2; SIS provides a modular, common and universally deployable set of behaviors which can be used to deal not only with a set of terminologies, value sets and associations but also with data elements and templates chosen by the users of the service in their deployment environment. This consistent approach to semantic content interaction benefit other business context services by providing a level of semantic interoperability that currently only exists in a limited form. Although specified in this section to provide standalone capabilities for semantic content access and management purposes, SIS are used in conjunction with other services in EHR4CR use cases. This overview section reuses and extends parts of the CTS2 specifications.

5.4 Structure of the SIS specification

In order to provide for the maximum implementation flexibility, this functional model defines several enumerated functional profiles for SIS. These profiles serve to subset and focus the functionality of a SIS implementation to accomplish a targeted set of tasks. The functional profiles include:

SIS code system profile - The SIS code system profile outlines functional coverage necessary for a service to declare itself as being a conformant SIS service. The SIS code system profile includes capabilities for searching and query code system content.

SIS value set profile - The SIS value set profile outlines functional coverage necessary for a service to declare itself as being a conformant SIS service. The SIS value set profile includes capabilities for searching and query value set content.

SIS template profiles - The SIS template profile outlines functional coverage necessary for a service to declare itself as being a conformant SIS service. The SIS template profile includes capabilities for searching and query template content.

56

5.5 Implementation Considerations

SIS specification is an interface specification, not an implementation specification. As such, it is intended to facilitate the development of an implementable interoperability mechanism for terminology resources. SIS are intended to expose a single or multiple semantic sources for use by various applications that may or may not be within the same organization, providing a standardized method for semantic resources access. SIS provide for semantic interoperability between organizations. While coded concepts from structured terminology can unambiguously identify the concept(s) being communicated, a standard way of structuring and communicating those coded entries is required. SIS can be used in an inter-organizational setting where each organization maintains its own security and application specific provisions. SIS enable access to a high availability or international standard terminology resource, made available to subscribers via a SIS interface. Since semantic content is not static, SIS provide functionality to maintain and update semantic resources. Updates and update requests to semantic sources need to be reviewable and traceable over time. Often, a terminology source provider will want to maintain the “gold standard” or master release of a code system, as to maintain a consistent standard terminology that can be used across multiple organizations and realms. Notwithstanding, users of any given source terminology may wish to extend that terminology for their own use, and may even wish to recommend the addition of those “local” extensions to the semantic resources repository provider to be included as part of the release. SIS provides a mechanism to allow for users to extend a given semantic resources repository, share those extensions with others, or feed those extensions back to the source provider in a structured format to be reviewed, modified as necessary, and fed into a SIS server as input to update the source terminology with the content contained in the change request.

6 EHR4CR Clinical Data Warehouse

Chapter 6 describes the design and implementation of the EHR4CR clinical Data Warehouse and of the extract-transform-load of EHR data into the EHR4CR CDW

6.1 Introduction

At the start of the project in 2011 only two sites (FAU and APHP) had existing warehouses that could directly connect to the platform, both of these being I2B2 warehouses (which are not HL7 compliant). The other sites therefore had a need to deploy a warehouse and perform regular extraction-transformation-load (ETL) operations. EHR4CR did not choose to nominate a warehouse for use with the EHR4CR platform (such as I2B2), and as part of its commitment to these and future sites a project-specific warehouse was developed for installation by the sites along with associated ETL guidance. (Existing I2B2 sites already had ETL in place.) The technology chosen for the EHR4CR project warehouse is Relational and its design is compliant with the HL7 compliant EHR4CR Information Model (IM, v2-r3 24/04/2012, represented in UML in Figure A). (The workbench Eligibility Criteria Model is translated directly to SQL by the endpoint software for both the EHR4CR and I2B2 warehouses using their respective query templates discussed in section 4.2.1.)

57

Figure A - EHR4CR Information Model represented in UML

Derivation of a Physical Model The first step towards a project-specific relational warehouse was the generation of a physical model (PM) and associated database schema from the common information model (IM) or mediation model and this was achieved in a series of decisions as follows:

1. Unwanted attributes in the IM were dropped 2. The structure of the information model was used as the initial physical model 3. Foreign keys were introduced for inter-class relationships and embedded classes 4. The physical model was de-normalized to a degree to improve the efficiency of

expected queries 5. The ClinicalStatementRelationship class was dropped 6. Mandatory columns were established 7. Support for specific relational technologies was introduced, e.g. MySQL, Oracle,

MS-SQL 8. The final model was expressed as a database schema and provided as DDL 9. The final model was expressed as a UML diagram 10. Guidance was prepared and highlighted unresolved issues

Unwanted attributes in the IM were dropped Since the IM is derived from existing HL7 classes there are some attributes of these classes and other referenced classes that are not required for our purposes. For example, HL7 mood, status and priority codes were not required; as were activity and availability time. In addition it was

58

agreed that negation would not be represented within the model and that we would rely solely on terminology for this. Other attributes dropped from the (abstract) ClinicalStatement class included: interruptibleInd, independentInd, derivationExpr and repeatNumber. The structure of the information model was used as the initial physical model Each major class of the information model (Observation, Procedure, SubstanceAdministration) and associated classes (Subject, Encounter, Participation) were introduced as tables into the physical model. In addition, attributes of these classes (which are classes in themselves) were also introduced as tables, e.g. attributes of class type CD. Foreign keys were introduced for inter-class relationships and embedded classes All inter-class and embedded class relationships were represented by foreign keys within the appropriate table. For example:

a foreign key integer field idEncounter was inserted into the Observation table to establish the related Encounter table record.

a foreign key integer field code was inserted into the Observation table to establish the related CD table record holding details of the code, including value, code system, rubric, etc.

The physical model was de-normalized to a degree to improve the efficiency of expected queries The resulting physical model had many related tables and foreign keys, and queries invariably required a number of joins which would undermine performance of the warehouse. Specific requests were made from WPG2 for some inter-table relationships to be removed and the ‘child’ table field structure to be brought within the ‘parent’ table. For example:

the PQ table storing physical quantities was removed and the fields brought within the Observation table, these being defined as physicalValue, physicalValueLow, physicalValueHigh, and physicalUnit

a table for effectiveTime was removed and the fields brought within various tables, these being defined as effectiveTime, effectiveTimeLow and effectiveTimeHigh with type DATETIME (in MySQL for example).

With the exception of administrativeGenderCode in the Observations table all codes are placed in the V_CD table where all codes and their rubrics reside. One advantage of the latter is the ability to perform mapping of site codes to central codes using this table alone and retaining a chain from the central to the original codes. The ClinicalStatementRelationship class was dropped The ClinicalStatementRelationship class allows for relationships of varying complexity to be established between records of the 3 main tables. This complexity is not generally useful for a warehouse given the nature of the likely queries and it was therefore decided to drop this relationship. Since the Eligibility Criteria Model does not have computational capabilities this has implications. For example, if we wish to query against BMI derived from coincident height and weight observations then the BMI record must be created by the ETL process; no other means are possible. Mandatory columns were established A crucial element of the physical model is to establish those fields which must be present for platform queries to execute meaningfully. For example, the model requires that subject birthTime is always available as many queries will not execute without this variable. Sites with

59

only age available will have to compute an estimated birthTime during ETL. In addition, effective time (all tables), subject ID (subject table) and organization ID (organization table, derived from the HL7 participation class) are mandatory and must be made available or generated during ETL. Nearly all sites can furnish this data. Where data is generally not available from sites the relevant fields are defined NULL; this covers fields that are optional, conditional or not required at this time, e.g. administration.doseQuantity (not required), observation.physValue (conditional) or observation.idEncounter (optional). Support for specific relational technologies was introduced Most sites will have procured ICT products from their preferred vendor, such as Microsoft, Oracle or perhaps open source solutions. It is very unlikely that sites will wish to introduce an alternative vendor for EHR4CR, therefore it is important that the physical model can be implemented in a number of vendor technologies (and that the end-point software can deal with this). In regard to the SQL language syntax, vendor implementations will introduce variations which must be catered for. End-point software will have to handle variations in data manipulation language (DML) while the physical model must handle variations in data definition language (DDL). The latter uses SQL-92 syntax almost exclusively. DDL variations include (but are not limited to):

Date and time syntax

Sequencing of Create Table statements

Optimization statements Rather than provide separate DDL for each vendor, the DDL script has embedded guidance highlighting the substitutions to apply before creating the database. For example, for dates and times the timestamp data type is specified, but this should be substituted by DATETIME or DATE depending on the vendor platform used at a site. The final model was expressed as a database schema and provided as DDL The relational DDL for scripting the EHR4CR warehouse can be found in Appendix X. This file provides definitions and guidance for each table and field and their relationship to the Information Model classes. Fields are normally given names from the IM, and their type within the IM is also indicated. For example, within the Subject table we find the field birthTime which is of type TimeStamp in the physical model and class TS in the IM. Guidance is provided for obfuscation of this value where sites feel the need to do this, or where this is already the case within their source systems. For some fields mandatory coding is specified, e.g. for administrativeGender within the Subject table. General guidance on code mapping is provided by WP4 and the services available there. The final model was expressed as a UML diagram The UML diagram for the final data model is given in Figure B as output from Enterprise Architect version 9 (EA v9). This file can be found on the project SharePoint.

60

Figure B - Enterprise Architect diagram for EHR4CR physical model.

6.2 ETL process and guidance for user acceptance testing

The ETL process to prepare site data for the EHR4CR warehouse for user acceptance testing (UAT) of the platform is shown in Figure C.

61

Figure C - ETL for EHR4CR native warehouse used in user acceptance testing. Note the operations at top to obfuscate

the data within the resulting warehouse.

The project sites represent three different kinds of data source: hospital systems (6); individual clinic systems (2); and regional repositories of hospital data (3). Therefore, the domain coverage and size of the warehouses vary significantly. The majority of sites obfuscated their data because there was insufficient protection from the platform end-point software. End-point features thought absolutely necessary for release of pseudonymised, real data to the warehouse included: local query audit; local control of query execution; certification/validation of the software; and fuzzing/blocking of returned counts. This self-imposed obfuscation included the scrambling or shifting of event times and k-anonymization, sometimes making the data internally inconsistent from a domain perspective. In addition, sometimes for performance reasons, many sites did not provide complete data, but relatively small subsets, either random subsets or subsets tailored to one or more of the studies contained within the UAT. Also, some sites had data that was not coded and so did not pass through ETL into the warehouse. Each site must design an ETL suitable for its data. What is important is that the semantics of the target warehouse – structure and terminology – are clearly understood by all sites using that warehouse and that the ETL operations they perform yield the desired data records and site coding. ETL consists of three phases:

Extraction

During the extraction phase data is located and converted into a single format that is appropriate for further transformation. At this point, if necessary, the semantics of the data can be explored by profiling the data, i.e. enumerating the unique data values and parsing them into patterns that make the field readily comprehensible. Any documentation that offers definition or provenance of the data should then be consulted and any inter-field correlations checked or

62

noted. Rejection criteria/rules should then be defined and any rejections logged. Such criteria should include quality assessment as a minimum. Rejection rules should also include those data values considered confidential and where local information governance (IG) does not permit their release. Where IG does permit release this should be indicated using the confidentialityCode in the main tables of the warehouse. Finally, further rejections should take place if the data is to be added incrementally to the warehouse. The precise details of the latter will be site specific.

Transformation

During the transformation phase a structural transformation from the source data model to the warehouse model is performed. Any code mappings from source system to central system takes place at query runtime. Structural transformation is usually straightforward, but can be more complex with one-to-many and many-to-one field mappings. An example of one-to-many would be a single field for a named laboratory result, such as a serum sodium value, transforming to multiple fields in the warehouse for code, value and unit (see Figure D). An example of a many-to-one field mapping might involve a field whose semantics depends on the value of another field. Missing data also presents a choice for site ETL. For example, some sites do not have date-of-birth available and provide age instead. It is important that uniform guidance is given for calculating an approximation and ensuring that all event times in the longitudinal patient record are consistent with this. An important part of the ETL is the generation of a pseudo-identifier for each patient within the warehouse. This must be defined in the Subject table and carefully applied to the 3 main fact tables. This subject id will not be exposed to the orchestrator or workbench, but will be processed by the end-point software. There must be support for de-identification of possible recruits. The EHR4CR warehouse uses a 4-byte integer to reference a subject. It is expected that the relationship of this subject id to the local hospital identifier is kept in a local mapping table that is not exposed to the central EHR4CR platform and may reside in a separate local component of the platform. The definition and naming of this table is under local control. Sources that implement k-anonymization will not provide the ability to identify patients for recruitment using the warehouse and must re-query the source EHRs. Sites may record participations in clinical statements by providing site-unique organization IDs. It is not anticipated that the warehouse will store details of individual medical practitioners, but only the organizations they work for. Organization size may vary from individual hospital clinics or wards to entire hospital groups. Organization IDs must be defined in the Organization table and carefully applied to the 3 main fact tables. It is important that the detail of the ETL processes which gave rise to a clinical statement record is available to end-users so that any subtleties can be considered. The warehouse model offers a field (ETLSpec) that indexes information of this kind in the table ETLSpec. This table is at present a placeholder for a fuller specification which will be developed in due course. In addition, the context in which data is collected can have a profound effect on interpretation. As well as ETLSpec a further field may be made available to cover information of this kind. Both these developments would involve additional tables being defined within the warehouse. Comprehensive guidance has yet to be developed and is the subject of future work. This will involve both general guidance and particular guidance and will evolve over time and will require continuous coordination and online support. Figure D shows an example transformation.

63

Figure D - Example of structural and terminology mapping of Serum sodium 143mmol/L

Load

The complexity of the final (bulk) load is determined by whether the load is de novo or permits incremental additions of records and substitutions of corrected records. The details of this are left to the sites. Whatever method is used, the load method and timeliness of the source data should be known to the end-user. Data quality It is desirable to develop some queries, executed through the platform, that compute measures of quality within particular disease areas. This will be the subject of future work. Automation The overall ETL should contain features necessary for continuous or frequent operation:

a workflow with selectable processing components

processes and components that are efficient, scalable, and maintainable

scheduling

monitoring and alerting

a life cycle with audit and compliance

bulk extract and load The above features list is common in ETL tools available today, which comprise both open source and proprietary offerings: Open Source

64

Pentaho

Talend Open Studio

Scriptella

JasperETL from JasperSoft

CloverETL

Apatar Proprietary

Pervasive Software

Astera Centerprise

Expressor

SAS Data Integration Server

SAP BusinessObjects Data Integrator

Integration Services of Microsoft SQL Server

IBM InfoSphere DataStage

Informatica PowerCenter

Oracle Data Integrator

Stone Bond Technologies

SnapLogic A number of EHR4CR sites have experience of Talend Open Studio and have used this for their ETL.

6.3 Mappings

It should be noted that source-target terminology mapping services are not required at ETL for the EHR4CR warehouse. However, the necessary mappings must be developed in parallel using the services from WP4. In practice sites used their own tools to generate terminology mappings since the relevant tools were not yet available from WP4. For example, Dundee used a relatively simple mapping editor that generated mapping files that could be imported into the local terminology service (see Figure F).

65

Figure F: Terminology mapping editor used by Dundee to map site codes to central codes.

6.4 The Dundee 200 test data

To facilitate the deployment and testing of end-point software at sites, a test dataset consisting of Dundee data relating to 200 diabetic patients is made available to other sites. This data is pseudonymised and partially obfuscated. The dataset comprised a total of 100k records and includes the data listed in Table A.

Sample of 200 diabetics Local coding Central coding

Birth, death, gender Local S-CT

Height, weight, BMI Local S-CT

BP: systolic, diastolic Local S-CT

Laboratory (clinical chemistry) Local LOINC

Medication BNF ATC

Hospital episodes (diagnosis, procedure) ICD10, OPCS4.x, national ICD10, S-CT Table A - Data and coding used within the Dundee-200 test data. S-CT: SNOMED-CT; BNF: British National Formulary

(drugs); OPCS 4.x: Office of Population Censuses and Surveys, Classification of Interventions and Procedures, v4.x; National: e.g. dischargeDispositionCode

6.5 Future work

While there is agreement on the structure of the EHR4CR CDW, performance measures, and conformance testing, there is still work to be done on:

Support for ETL provenance

Support for data context

Data quality measures

Site and platform resource requirements

Site skill sets

7 Evaluation of the semantic resources and services

Chapter 7 describes an evaluation of the EHR4CR standardization pipeline and semantic interoperability services (SIS). Our goal was twofold, first to define a set of desiderata for

66

developing a Common Information Model and computable eligibility queries; second to use this conceptual framework to describe the strengths and limitations of the EHR4CR semantic platform.

Our approach consisted first of extending the desiderata for computable representations of electronic health records-driven phenotype algorithms proposed by Mo et al [Mo15] in order to propose a conceptual framework for comparing mediation approaches and semantic interoperability solutions developed by platforms supporting cross border research. Second, we instantiated the conceptual framework in the context of the EHR4CR project in order to evaluate how far the development of the mediation model and the standardization efforts met the expected requirements of the project. The conceptual framework of computable representations of phenotype algorithms consists of a set of requirements related to the three main components of a semantic interoperability platform: 1) query language and model, 2) patient data model (mediation model) and 3) standardization pipeline for data providers.

7.1 Evaluation framework

7.1.1 The need of high quality query language and model

There is a need to manage eligibility criteria in order to accelerate the development of new clinical research protocols and related clinical research documents (e.g., case report forms, data collection forms, training materials, etc.). Related effort include EligWriter [Gennari01] and Designa-Trial [Nammuni02] that supported the re-use of eligibility criteria during clinical trial protocol authoring, as well as ERGO [ERGO15] and ASPIRE [Niland07] that support eligibility criteria annotation. The definition of computable phenotype algorithms require interoperability with patient data. The knowledge representation requirements for eligibility criteria in this context are more stringent, including highly expressive language(s) to achieve executable eligibility rules, a patient information model, and an appropriate clinical terminology to facilitate mapping from eligibility concepts to patient data.

7.1.2 The need of high quality mediation model (patient data model)

Since each data source is not designed with a primary focus of cross-domain integration, initiatives for integrating clinical care and clinical research data have been often limited to non-scalable, system (or vendor)-specific efforts [Cuggia11, El Fadly11, Schreiweis14]. In an expanding research landscape, cooperation infrastructures are now being built to allow research projects to reuse patient data from federated systems from many different sites in different countries and therefore in a multilingual settings. Non-standard, and often conflicting, vendor approaches to representing data pose challenges to infrastructure developers, who must build solutions to work with clinical data across multiple formats. Systems developed during the last decade in order to compute eligibility criteria - including GUIDE, GLIF3, SAGE, ERGO, CRFQ - largely adopted some form of Virtual Medical Records (VMR) [81Weng] based on the HL7 Reference Information Model (RIM) [Jenders97], which provides an abstraction layer on top of a real EHR. Nowadays HL7 FHIR specifications are gaining interest. Although there is no consensus in the medical informatics community regarding a standard patient information model, the development of HL7 FHIR shows promise to mitigate the classic site-specific data mapping problem. A controlled mediation model is required to support federated access to heterogeneous data sources.

67

7.1.3 The need of an efficient standardization pipeline within participants data providers

Beyond the creation and continuous extension of the standard-based mediation model, the process of harmonizing heterogeneous data sources, called “data standardization” in this paper, relies also on the capability of different actors in hospital sites to align the local structures and content of their EHR systems or Clinical Data Repositories to the mediation model. Few EHR systems or Clinical Data Repositories in hospitals implement standard reference models such as HL7 RIM, EN ISO 13606 or openEHR. Most of them rely on proprietary models. Furthermore, although the need for controlled vocabularies in EHR systems is widely recognized, system developers have often dealt with this need by creating ad hoc sets of controlled terms for use in their applications so that information in one system cannot be recognized and used by other systems. Differences between the controlled vocabularies of two systems exist even when both systems were created by the same developers. Therefore mapping local models and/or controlled vocabularies is a challenging and time consuming task for terminologists in participant hospitals. Efficient supportive mapping tools are required to enable terminologists to develop and maintain semantic mapping between the proprietary models and the mediation model.

7.2 Results

Table 11 provides the list of 23 requirements defined for the three main components involved in the definition of phenotype algorithms 1) query language and model, 2) patient data model (mediation model) and 3) standardization pipeline for data providers. Table 1 also provides a qualitative evaluation the strengths and limitations of the EHR4CR platform in the implementation of phenotype algorithms and its capacity to support the different actors in accomplishing their tasks during the data standardization process at both setup and execution phases of the EHR4CR use cases.

Table 11. Results of the evaluaton of the EHR4CR semantic interoperability platform

Desiderata proposed by Mo et al. [Mo15]

EHR4CR use case

A-The need of high quality query language

Req A.1 Implement set operations and relational algebra for modeling phenotype algorithms, represent phenotype criteria with structured rules Mo 2015; Req.4&5 ++

Req A.2 Support both human readable and computable representations of phenotype algorithms Mo 2015; Req.3 ++

Req A.3 Support defining temporal relations between events Mo 2015; Req.6 ++

Req A.4 Provide representations for text searching and natural language processing Mo 2015; Req.8

Req A.5 Query language shall be generic and standard based

Req A.6 Query builder shall be intuitive ++

Req A.7 Provide interfaces for external software algorithms Mo 2015; Req.9

Req A.8 Maintain backward compatibility Mo 2015; Req.10

B-The need of high quality patient data model (mediation model)

Req B.1 The mediation model shall be based on standard domain knowledge and reference models provided by standard development organizations that are and will be used by EHR +++

68

vendors, clinicians, and government mandates (e.g. Meaningful Use Stage 3 in US).

Req B.2 The mediation model shall use standard terminologies, ontologies and value sets that are multilingual and internationally used Mo 2015; Req.7 +++

Req B.3 Support customization for the variability and availability of EHR data among sites. Possible use of internally defined extensions of existing standard terminologies (in order to add any missing concept or any missing description in any specific language) Mo 2015; Req.2 ++

Req B.4 The mediation model shall use mappings between reference terminologies (e.g. SNOMED-MedDRA, SNOMED CT-NCI Thesaurus) in order to allow end users to access semantically equivalent content through different terminologies +

Req B.5 The mediation model shall be expressive enough to represent i) multimodal (sign, symptoms, diseases, outcomes, procedures, care plans, etc. as well as images, signals, etc.) and multi-scale clinical data including molecular findings such as genomics information; ii) specimen related information, family related information, etc. iii) multiple granularities, multiple consistent views, context representation

Req B.6 The mediation model shall be scoped to the needs of the users of the research network in the context of dedicated use cases but scalable and sustainable (designed to be rapidly and efficiently scoped to cover any new requirement, extensible in terms of structure and content) ++

Req B.7 The mediation model shall be represented using standard formal languages allowing semantic reasoning (e.g. semantic web languages) in order to recognize redundancy or inconsistency

Req B.8 A robust version management process shall be provided for any type of semantic resource of the mediation model ++

Req B.9 A dedicated tool is required for supporting the authors of the mediation model to efficiently create/update the semantic resources of the model. The editor shall support a collaborative editing process. The creation and update process shall be user-friendly and adapted to medical experts (through user interface, but also through import of simple csv files used to capture medical knowledge in a format that is understandable for medical experts). The editor shall allow the authors to create new semantic resources from standard terminologies (e.g. SNOMED CT, LOINC, ATC, ICD-O) or value sets. The standard resources are imported from the official terminology providers and up-to-date. ++

Req B.10 The semantic resources shall be accessible to any application through standardized semantic services based on new web

+++

69

technologies, such as Representational State Transfer (REST)-based APIs/web services, recently been adopted by HL7.

C-The need of efficient standardization pipeline within data providers

Req C.1 Automatic mapping algorithms supporting terminologists in identifying corresponding concepts in the mediation model on one side and local models on the other side. These algorithms shall i) use the descriptions and synonyms of the concepts; ii) address multi lingual issues; iii) use existing mappings between reference terminologies (e.g. when local sources are mapped to a standard terminology which is not used in the mediation model (e.g. NCI Thesaurus), using the mapping between SNOMED CT and NCI Thesaurus to propose automatic mappings between local concepts and SNOMED CT concepts in the mediation model) Mo 2015; Req.1

Req C.2 Natural language processing for semantic annotation of text Mo 2015; Req.8

Req C.3 Formal representation of mappings and version management

Req C.4 Use case driven support for prioritizing the mapping effort. The terminologist needs to know within the list of the data elements of the mediation model that are not yet mapped to local data elements, the ones that need to be mapped in priority according to different criteria (e.g. data elements that are the most frequently used in distributed queries, data elements corresponding to a specific phenotype algorithm, etc.)

Req C.5 Mappings shall be accessible to any application through standardized semantic services based on new web technologies, such as Representational State Transfer (REST)-based APIs/web services, recently been adopted by HL7. +

7.2.1 Query model and language

In this section, we describe the characteristics of the EHR4CR Eligibility Criteria Model (EC Model) and ECLECTIC language regarding the 8 requirements stated in the “A-Query model and language” section of the conceptual framework. Req A.1: Implement set operations and relational algebra for modeling phenotype

algorithms, represent phenotype criteria with structured rules The EHR4CR Eligibility Criteria Model (EC Model) is extensible query model representing eligibility criteria in UML language to meet the expressivity needs of computationally viable eligibility criteria. An ad-hoc language ECLECTIC (Eligibility Criteria Language for European Clinical Trial Investigation and Construction) has been developed in order to ensure that it can express only queries that the object model can represent. The UML class diagram and language grammar are two alternative representations of the same model. The resultant object model, although hidden away from the user’s eyes, lies at the heart of the query engine, and is key for model transformation and query serialization in different forms. Req A.2: Support both human readable and computable representations of phenotype

algorithms

70

ECLECTIC is also a human-readable serialization of the object hierarchy, which allows us to reason about the model and perform validation prior to implementation. Req A.3: Support defining temporal relations between events Basic temporal relationships are provided Req A.4: Provide representations for text searching and natural language processing None Req A.5: Query language shall be generic and standard based ECLECTIC is an ad-hoc query language Req A.6: Query builder shall be intuitive Using the EHR4CR query builder (see Figure 24), a study manager can drag and drop data elements stored in the mediation model (marked as “1” in Figure 24) and logical and temporal operators (marked as “2” in Figure 24) in order to populate query-templates designed for representing formally the eligibility criteria of the clinical trial (marked as “3” in Figure 24).

Figure 24. EHR4CR query builder demonstrating Protocol Feasibility Study module

Req A.7 Provide interfaces for external software algorithms: none Req A.8 Maintain backward compatibility: not addressed

7.2.2 Mediation model

In this section, we describe the characteristics of the EHR4CR Common Information Model (CIM) regarding the 10 desiderata stated in the prevous section. Req B.1: The mediation model shall be based on standard domain knowledge and reference

models. The EHR4CR Common Information Model (CIM) consists in a set of multilingual semantic resources based on multiple standards (FHIR resources organized into categories based on HL7 CCD sections and UMLS semantic types) Req B.2: Use of standard terminologies, ontologies and value sets that are multilingual and

internationally used EHR4CR templates are composed of data elements that are bound to a set of international reference terminologies selected by the project: ICD, SNOMED-CT, LOINC, ATC, ICD-O, Pubcan, TNM, PathLex. As much as possible, we enriched and/or merged reference terminologies in order to build multilingual terminologies and value sets (in English, French at least and when possible in the four languages of the EHR4CR partners: English, French, German, and Polish).

71

Req B.3: Possible use of internally defined extensions of existing standard terminologies (in order to add any missing concept or any missing description in any specific language)

An EHR4CR terminology was created in order to create concepts that are in the scope of the project but do not exist in the selected reference terminologies. Req B.4: Mappings between reference terminologies (e.g. SNOMED-MedDRA, SNOMED CT-

NCI Thesaurus) We integrated the UMLS CUI in order to allow multi-terminology binding. Req B.5: Expressiveness The current limited set of FHIR-based templates allows the representation of the main textual clinical data (signs, symptoms, diseases, outcome, procedures, care plans, etc.). We defined context-dependent value sets for representing multiple views or contextual information (e.g. organ specific scores or histologic types, etc.). Req B.6: Scoped to the needs of the users of the research network in the context of

dedicated use cases but scalable and sustainable (designed to be rapidly and efficiently scoped to cover any new requirement, extensible in terms of structure and content)

The EHR4CR mediation model (EHR4CR CIM) has been developed and can be extended, through a global consensus-based development process in order to cover the scope of both i) eligibility criteria and data items identified from a given set of specific clinical trials (bottom up approach resulting in the creation of “useful data elements”) and ii) standards reference clinical information models or data elements (e.g. CDISC SHARE) (top down approach). Although scoped to the needs of the users of the EHR4CR platform in the context of the three use cases of the project (PFS, PRS or CTE), its structure ensures its scalability so that it can be extended in terms of both structure and content to cover any new need. The EHR4CR CIM was developed and evolved through repeated cycles using a "Learning by Doing" approach in order to cover the scope of 14 first clinical trials selected to demonstrate the PFS use case, then of 17 additional clinical trials (PRS use case) and finally of 28 additional clinical trials (CTE use case). Each new version of the EHR4CR CIM has an extended scope and improved quality. Req B.7: Standard formal languages allowing semantic reasoning The semantic resources are stored into a semantic metadata repository (MDR). Metadata scheme is expressed in different programming languages including RDF. Req B.8: Version management Version management is provided for any type of semantic resource (terminologies, value sets, data elements, templates) Req B.9: Collaborative authoring tool A tool was developed for authoring and maintaining the shared semantic resources of the mediation model. Req B.10: Standard semantic services based on new web technologies The semantic interoperability services (SIS) are developed to enable EHR4CR end-user services to assess and consume the semantic resources of the mediation model (terminologies, value sets, data elements, templates) and the mappings. SIS are used at the workbench by the EHR4CR query builder for query specification (representation of free text eligibility criteria using the data elements of the mediation model) and at the EHR4CR endpoints for query transformation. This goal was realized via the expansion of the original functionality outlined in HL7’s Common Terminology Service – Release 2 (CTS2) Specification. The functional profiles of the SIS include capabilities for searching and query code system content, value set content and template content. The technical specifications of the EHR4CR SIS rely on Representational State Transfer (REST)-based APIs/web services, recently been adopted by HL7.

72

7.2.3 Standardization pipeline for data providers

In this section, we describe the characteristics of the EHR4CR standardization pipeline regarding the 5 requirements stated in the “Standardization pipeline” section of the conceptual framework. Req C.1 Automatic mapping algorithms Once hospital clinical data repositories (CDRs) are connected to the EHR4CR platform, source information models need to be mapped to the EHR4CR CIM. In the current state, the concepts used in the definitions of the central data elements were manually mapped to corresponding local terms used in pilot sites. Supporting tools are still under development. The current version of the Terminology Mapping Editor (TME) has limited functionalities, it allows the Terminology Mapper to upload subset of local value sets and to create their mapping to central value sets defined within the EHR4CR CIM. Req C.2: Natural language processing for semantic annotation of text: none Req C.3: Formal representation of mappings and version management Mappings are available in SKOS format. Version management s provided. Req C.4: Use case driven support for prioritizing the mapping effort: none Req C.5:Standard semantic services Mappings are available through REST-based APIs/web services

8 Conclusion

With the development of platforms enabling the use of routinely collected clinical data in the context of international research, scalable solutions for cross border and cross domain semantic interoperability need to be developed. Expression language, underlying model of patient data and codification of eligibility concepts are essential constructs of a formal knowledge representation for eligibility criteria. There is currently an intense focus directed to the issue of developing and maintaining shareable, multipurpose, high-quality computable phenotype algorithms in order to mediate between different different EHR products and clinical research systems.

8.1 The EHR4CR semantic interoperability platform

The EHR4CR semantic interoperability platform fulfills – at least partially - most of the 23 requirements of the proposed conceptual framework. The mediation model is based on multiple standards: standard models (HL7 FHIR templates, ISO 21090, ISO11179), standard value sets and terminologies. Integrating these different multi-level standards is challenging and terminology binding is especially a difficult issue while contextual and versioning issues need to be addressed. We developed specific data structures – faceted templates – to get a good balance between complexity (a limited set of generic templates) and expressiveness (major scalability in terms of structure and content thanks to the facets). As much as possible, we enriched and/or merged reference terminologies in order to build multilingual terminologies and define multilingual value sets (at least in the four languages spoken by the EHR4CR partners: English, French, German, and Polish). An EHR4CR terminology was created in order to create concepts that are in the scope of the project but do not exist in the selected reference terminologies. We developed a collaborative editing tool handling the management of any type of the EHR4CR complex semantic resources (faceted templates, data elements, value sets, concepts from huge

73

and complex terminologies e.g. SNOMED CT) and of their relationships. We addressed the versioning issues for every type of resource, deriving CTS2 approaches for vocabulary updates. A Terminology Mapping Editor (TME), under development, enables participant EHRs to develop and maintain semantic mappings between their proprietary models and the mediation model. This tool is still at its infancy and does not yet fulfil the expected requirements (such as use case driven support for prioritizing the mapping effort, contextual terminology mapping, automatic mapping algorithms addressing multi lingual issues). The semantic resources (mediation models and mappings) are accessible to any component of the EHR4CR platform through standardized semantic services based on new web technologies, such as Representational State Transfer (REST)-based APIs/web services, recently been adopted by HL7. Our current mediation model does not fully fulfil some of the ten requirements. We are considering, in the future, to integrate terminology mappings between reference terminologies (e.g. mappings between SNOMEDCT and MedDRA, NCI-T, ICD-9, ICD-10, ICD-O) in order to fully support multi-terminology binding. We still are working to represent multiple granularities, multiple consistent views, context representation. We plan to evaluate the FHIR resources currently being developed in order to represent multi-scale clinical data including molecular findings such as genomics information. We still need to define complex templates allowing the combination of basic templates. Developing a smart user interface for searching and/or browsing within complex semantic resources remains problematic. We also plan to improve the collaborative editing of these resources by medical experts using the GUI and/or CSV files. We are also working on an improved distribution model (with three modes: full, snapshots and/or deltas). Regarding the data standardization process in hospitals, the Terminology Mapping Editor is still at its infancy and does not yet fulfil the expected requirements (such as use case driven support for prioritizing the mapping effort, contextual terminology mapping, automatic mapping algorithms addressing multi lingual issues) Within the EHR4CR project, we identified the need for a governance body and process for ensuring the quality of the data standardization pipeline within the network. Since a set of complex and sometimes time-consuming activities is required at the hospital side at the connection phase (initial mapping to a core of semantic resources) and at the set up phase of each new study (update of the mappings in the specific context of the study), it is important that those activities are well organized and properly synchronized with central efforts. Thus, it is not just a matter of content scope of the semantic resources but also a matter of reaching agreements on how they are represented and accessed. The governance body and process will be especially important in the context of any operational use of the EHR4CR platform at a broader scale within an extended network.

8.2 Limits, related projects and perspectives

Expression languages employed to represent eligibility logic include ad hoc expressions (with or without the use of templates), the Arden Syntax, logic-based languages (i.e., PAL, SQL, and DL), object-oriented languages (i.e., GELLO), and temporal query languages (e.g., Asbru and Chronus II)[Weng10]. Ad hoc formalisms are functional in many existing systems and can provide interesting features regarding expressiveness. SQL-based queries on a clinical database are expressive but not extensible for knowledge re-use or inference. These mechanisms all suffer from the lack of scalability. Multiple query languages were used for different types of logic within the same model or system. Ontologies are increasingly being used as common terminological resources to automatically reconcile data heterogeneity and implement large-

74

scale, distributed data management systems. Ontology-aware query interfaces that are integrated EHR systems can subsequently leverage the ontology annotations to support extensive query answering functionalities [Sahoo14].

Over the past decade, medical informatics researchers have been studying issues related to clinical information models associated with terminologies and have begun to articulate some requirements for “high quality” models [Ahn13, Weng10, Mo15]. There are several efforts trying to address the interoperability between the clinical research and patient care domains in building a common data model where the interoperating systems are required to interact through this well-defined mediation model. In this top-down approach, a top-level knowledge model agreement is forced for the underlying data models of the interoperating parties for successful data exchange. Some projects, adopting this top-down strategy, proposed solutions that have been carried forward into practice and new experience has been gained: OMOP CDM [Reisinge10], FDA Mini-Sentinel [Curtis12], I2B2-SHRINE [Kohane12, McMurry13], STRIDE [Lowe09], eMERGE [Pathak11,Newton13, Herr15], SHARPn [Rea12,Pathak13], ], TRANSFoRm project [Delaney15] and other initiatives [Weng 10,Sinaci13, Shivade14,Jiang15]. CDISC SHARE is an important initiative in addressing the interoperability between care and research domains through maintaining common data elements built upon BRIDG DAM where they are annotated with CDISC data sets like CDASH and SDTM, and other CDISC terminologies [SHARE15]. CDISC SHARE CDEs will be considered for enriching the EHR4CR mediation model. In the SALUS project, Sinaci et al. also applied a comprehensive set of semantic web technologies with the commonly adopted MDR standard – ISO/IEC 11179. In addition, they built a federated semantic MDR framework and demonstrated that it was possible to semantically link disparate CDE definition efforts by different organizations [Sinaci13]. The EHR4CR project developed an instance of a platform, providing communication, security and semantic interoperability services to the eleven participating hospitals located in five European countries and ten pharmaceutical companies [Coorevits13, De moor14]. According to an evaluation framework of the query languages, mediation model and standardization pipeline, the EHR4CR semantic interoperability platform fulfills most of the requirements. Regarding the mediation model, some requirements remain problematic. The scope of the EHR4CR mediation model needs to be continuously adapted to the user’s needs. Since the update can hardly be fully automatized (e.g. through automatic coding of free text clinical trial protocols), a collaborative editor needs to efficiently support the creation of new semantic resources scoped to any additional use case. Despite recent efforts, formal representation of multimodal and multi-level data supporting data interoperability across clinical research and care domains is still challenging. Terminology mapping in hospital sites is the major bottleneck of the data standardization pipeline. Supportive tools are still at their infancy. Semantic interoperability within a broad international research network reusing clinical data from EHRs requires a rigorous governance process to ensure the quality of the data standardization process.

8.3 References

1. Ahn S, Huff SM, Kim Y, Kalra D. Quality metrics for detailed clinical models. Int J Med Inform. 2013

May;82(5):408-17.

2. Coorevits P, Sundgren M, Klein GO, Bahr A, Claerhout B, Daniel C, et al. Electronic health records:

new opportunities for clinical research. J Intern Med. déc 2013;274(6):547‑ 560.

3. Cuggia M, Besana P, Glasspool D. Comparing semi-automatic systems for recruitment of patients to

http://www.ncbi.nlm.nih.gov/pubmed/?term=Ahn%20S%5BAuthor%5D&cauthor=true&cauthor_uid=23089521

http://www.ncbi.nlm.nih.gov/pubmed/?term=Huff%20SM%5BAuthor%5D&cauthor=true&cauthor_uid=23089521

http://www.ncbi.nlm.nih.gov/pubmed/?term=Kim%20Y%5BAuthor%5D&cauthor=true&cauthor_uid=23089521

http://www.ncbi.nlm.nih.gov/pubmed/?term=Kalra%20D%5BAuthor%5D&cauthor=true&cauthor_uid=23089521

75

clinical trials. Int J Med Inform 2011;80:371–88.

4. Curtis LH et al. Design considerations, architecture, and use of the MiniSentinel distributed data

system. Pharmacoepidem Drug Saf 2012;21:23–31

5. Delaney BC, Curcin V, Andreasson A, Arvanitis TN, Bastiaens H, Corrigan D, Ethier JF, Kostopoulou

O, Kuchinke W, McGilchrist M, van Royen P, Wagner P.Translational Medicine and Patient Safety in

Europe: TRANSFoRm-Architecture for the Learning Health System in Europe. Biomed Res Int.

2015;2015:961526.

6. De Moor GD, Sundgren M, Kalra D, Schmidt A, Dugas M, Claerhout B, Karakoyun T, Ohmann C,

Lastic PY, Ammour N, Kush R, Dupont D, Cuggia M, Daniel C, Thienpont G, Coorevits P. Using

Electronic Health Records for Clinical Research: the Case of the EHR4CR Project. J Biomed Inform.

2014 Oct 18.

7. El Fadly A, Rance B, Lucas N, et al. Integrating clinical research with the Healthcare Enterprise: from

the RE-USE project to the EHR4CR platform. J Biomed Inform 2011;44 Suppl 1:S94–102.

8. ERGO: a template-based expression language for encoding eligibility criteria,

<http://128.218.179.58:8080/homepage/ ERGO_Technical_Documentation.pdf/>; 2009 [accessed

24.11.15].

9. Gennari J, Sklar D, Silva J. Cross-tool communication: from protocol authoring to eligibility

determination. In: Proceedings of the AMIA symposium; 2001. p. 199–203.

10. Herr TM etal. Practical considerations in genomic decision support: The eMERGE experience. J Pathol

Inform. 2015 Sep 28;6:50.

11. Jenders R, Sujansky W, Broverman C, Chadwick M. Towards improved knowledge sharing: assessment

of the HL7 reference information model to support medical logic module queries. In: Proceedings of the

AMIA annual fall symposium; 1997. p. 308–12.

12. Jiang G, Evans J, Oniki TA, Coyle JF, Bain L, Huff SM, Kush RD, Chute CG. Harmonization of

detailed clinical models with clinical study data standards. Methods Inf Med. 2015 Jan 12;54(1):65-74.

13. Kohane IS, Churchill SE, Murphy SN. A translational engine at the national scale: informatics for

integrating biology and the bedside. J Am Med Inform Assoc 2012;19:181–5.

14. Lowe HJ et al. STRIDE – an integrated standards-based translational research informatics platform.

AMIA Annu Symp Proc 2009;2009:391–5.

15. McMurry AJ, Murphy SN, MacFadden D, Weber G, Simons WW, Orechia J, et al. SHRINE: enabling

nationally scalable multi-site disease studies. PLoS ONE. 2013;8(3):e55811.

16. Mo et al. Desiderata for computable representations of electronic health records-driven phenotype

algorithms. J Am Med Inform Assoc. 2015 Nov;22(6):1220-30

17. Nammuni K, Pickering C, Modgil S, Montgomery A, Hammond P, Wyatt JC, et al. Design-a-trial: a

rule-based decision support system for clinical trial design. Knowl-Based Syst 2004;17:121–9.

18. Newton KM, Peissig PL, Kho AN, Bielinski SJ, Berg RL, Choudhary V, et al. Validation of electronic

medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network.

J Am Med Inform Assoc. juin 2013;20(e1):e147‑ 154.

19. Niland J. ASPIRE: agreement on standardized protocol inclusion requirements for eligibility. In: An

unpublished web resource; 2007.

20. Pathak J et al. Mapping clinical phenotype data elements to standardized metadata repositories and

controlled terminologies: the eMERGE Network experience. J Am Med Inform Assoc 2011;18:376–86.

21. Pathak J, Bailey KR, Beebe CE, Bethard S, Carrell DC, Chen PJ, et al. Normalization and

standardization of electronic health records for high-throughput phenotyping: the SHARPn consortium.

J Am Med Inform Assoc. déc 2013;20(e2):e341‑ 348.

22. Rea S, Pathak J, Savova G, Oniki TA, Westberg L, Beebe CE, et al. Building a robust, scalable and

standards-driven infrastructure for secondary use of EHR data: the SHARPn project. J. Biomed. Inform.

2012 Aug;45(4):763–71.

23. Reisinger SJ et al. Development and evaluation of a common data model enabling active drug safety

surveillance using disparate healthcare databases. J Am Med Inform Assoc 2010;17:652–62.

24. Sahoo SS, Lhatoo SD, Gupta DK, Cui L, Zhao M, Jayapandian C, Bozorgi A, Zhang GQ. Epilepsy and

seizure ontology: towards an epilepsy informatics infrastructure for clinical research and patient care. J

Am Med Inform Assoc. 2014 Jan-Feb;21(1):82-9

25. Schreiweis B, Trinczek B, Köpcke F, Leusch T, Majeed RW, Wenk J, Bergh B, Ohmann C, Röhrig R,

Dugas M, Prokosch HU. Comparison of electronic health record system functionalities to support the

patient recruitment process in clinical trials. Int J Med Inform. 2014 Nov;83(11):860-8.

http://128.218.179.58:8080/homepage/ERGO_Technical_Documentation.pdf




http://www.ncbi.nlm.nih.gov/pubmed/?term=Desiderata+for+computable+representations+of+electronic+health+records-driven+phenotype+algorithms

76

26. Shivade C, Raghavan P, Fosler-Lussier E, Embi PJ, Elhadad N, Johnson SB, Lai AM. A review of

approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform

Assoc. 2014 Mar-Apr;21(2):221-30.

27. Sinaci AA, Laleci Erturkmen GB. A federated semantic metadata registry framework for enabling

interoperability across clinical research and care domains. J Biomed Inform. oct 2013;46(5):784‑ 794.

28. Weng C, Tu SW, Sim I, et al. Formal representation of eligibility criteria: a literature review. J Biomed

Inform 2010;43:451–67.

9 Appendix

9.1 List of clinical trials

CT_Id (CT_IdPharma)

CT_Pharma

CT_

Star

tDat

e

CT_

Nam

e

CT_

dis

eas

eAr

ea

CT_Description CT_EHR4CRUseCase

CT_Hospitals

NCT00855166 AstraZeneca

Dia

bet

es

Evaluation of the Effect of Dapagliflozin in Combination With Metformin on Body Weight in Subjects With Type 2 Diabetes

CTE

NCT01032629 Janssen

CA

NV

AS

Dia

bet

es CANVAS - CANagliflozin cardioVascular

Assessment Study CTE

NCT01323660 GSK

Res

pir

ato

ry

An Exercise Endurance Study to Evaluate the Effects of Treatment of Chronic Obstructive Pulmonary Disease (COPD) Patients With a Dual Bronchodilator: GSK573719/GW642444.Study B

CTE

NCT01345019 Amgen

On

colo

gy

Denosumab Compared to Zoledronic Acid in the Treatment of Bone Disease in Subjects With Multiple Myeloma

CTE

NCT01372410 GSK

Res

pir

ato

ry

A Randomized, Double Blind, Placebo Controlled, Incomplete Block, Crossover, Dose Ranging Study to Evaluate the Dose Response of GSK573719 Administered Once or Twice Daily Over 7 Days in Patients With Chronic Obstructive Pulmonary Disease (COPD) (AC4115321)

CTE

NCT01381848 Janssen

Infe

ctio

ns

A Study of Doripenem in Infants Less Than 12 Weeks of Age

CTE

NCT01436110 GSK

Res

pir

ato

ry

Clinical Study Evaluating Safety and Efficacy of Fluticasone Furoate and Fluticasone Propionate in People With Asthma

CTE

NCT01515423 Janssen

Psy

chia

tric

Study of Paliperidone Palmitate 3 Month and 1 Month Formulations for the Treatment of Patients With Schizophrenia

CTE

77

NCT01573624 GSK

Res

pir

ato

ry

Evaluate the Safety, Efficacy and Dose Response of GSK573719 in Combination With Fluticasone Furoate in Subjects With Asthma (ILA115938)

CTE

NCT01618370 Bayer

On

colo

gy

Radium(223) Dichloride (Alpharadin) in Castration-Resistant (Hormone-Refractory) Prostate Cancer Patients With Bone Metastases

CTE

NCT01665144 Novartis

Neu

rosc

ien

ce Exploring the Efficacy and Safety of

Siponimod in Patients With Secondary Progressive Multiple Sclerosis (EXPAND)

CTE

NCT01684423 Bayer

Car

dio

vasc

ula

r

Oral Rivaroxaban in Children With Venous Thrombosis (EINSTEINJunior)

CTE

NCT01691521 GSK

Res

pir

ato

ry

Efficacy and Safety Study of Mepolizumab Adjunctive Therapy in Subjects With Severe Uncontrolled Refractory Asthma

CTE


Car

dio

vasc

ula

r

Study of the Safety and Efficacy of LCZ696 on Arterial Stiffness in Elderly Patients With Hypertension

CTE

NCT01706328 GSK

Res

pir

ato

ry

A Study to Assess the Efficacy of Fluticasone Furoate/Vilanterol (FF/VI) Inhalation Powder 100/25 mcg Once Daily Compared With Fluticasone Propionate/Salmeterol Inhalation Powder 250/50 mcg Twice Daily in Subjects With Chronic Obstructive Pulmonary Disease (COPD)

CTE


Res

pir

ato

ry

Efficacy and Safety of QGE031versus Placebo and Omalizumab in Patients Aged 18-75 Years With Asthma

CTE


Dia

bet

es

A Study Comparing Cardiovascular Effects of Ticagrelor and Clopidogrel in Patients With Peripheral Artery Disease (EUCLID)

CTE

NCT01777334 GSK

Res

pir

ato

ry

The Purpose of This Study is to Evaluate the Spirometric Effect (Trough FEV1) of Umeclidinium/Vilanterol 62.5/25 mcg Once Daily Compared With Tiotriopium 18 mcg Once Daily Over a 24-week Treatment Period in Subjects With COPD

CTE

78


Op

hth

alm

olo

gy

Efficacy and Safety of Two Treatment Regimens of 0.5 mg Ranibizumab Intravitreal Injections Guided by Functional and/or Anatomical Criteria, in Patients With Neovascular Age-related Macular Degeneration (OCTAVE)

CTE


Res

pir

ato

ry

QVA vs. Salmeterol/Fluticasone, 52-week Exacerbation Study

CTE

NCT01830543 Janssen C

ard

iova

scu

lar

A Study Exploring Two Strategies of Rivaroxaban (JNJ39039039; BAY-59-7939) and One of Oral Vitamin K Antagonist in K2Patients With Atrial Fibrillation Who Undergo Percutaneous Coronary Intervention (PIONEER AF-PCI)

CTE

NCT01867762 Janssen

Res

pir

ato

ry

An Effectiveness and Safety Study of Inhaled JNJ 49095397 (RV568) in Patients With Moderate to Severe Chronic Obstructive Pulmonary Disease

CTE

NCT01980290 Janssen

Infe

ctio

ns

Telaprevir With Peginterferon Alfa & Ribavirin in Ex-People Who INject Drugs Infected by Genotype 1 Chronic Hepatitis C (INTEGRATE)

CTE

NCT02021370 Bayer

Ren

al

Fixed Dose Correction / naïve and Pre Dialysis (Europe and Asia Pacific) (DIALOGUE 1)

CTE

NCT02040233 Bayer

Car

dio

vasc

ula

r

Multiple Dose Study in Heart Failure of BAY 1067197 (PARSiFAL)

CTE

Amgen

All

Data Element Standards CTE

Lilly

All

ODM Library v2011 CTE

Novartis

All

Data Element Frequency Count CTE

NCT00345839 Amgen

EVO

LVE

Car

dio

vasc

ula

r (S

eco

nd

ary

Hyp

erp

arat

hy

roid

ism

; C

hro

nic

Kid

ney

D

isea

se)

PFS

NCT00439725 Bayer

Car

dio

vasc

ula

r

PFS

79

NCT00490139 GSK

PFS


Neu

rolo

gy

PFS


On

colo

gy

PFS

NCT00627640 Merck

Neu

rolo

gy

(Par

kin

son

)

PFS

NCT00638690 Janssen

Pro

stat

e ca

nce

r

PFS

NCT00715624 Sanofi

PFS

NCT00796445 GSK

On

colo

gy

PFS


Car

dio

vasc

ula

r (A

cute

D

eco

mp

ensa

t

ed H

eart

Fa

ilure

;

Co

nge

stiv

e H

eart

Fai

lure

) PFS

NCT01018173 Roche

Car

dio

vasc

ula

r an

d

Met

abo

lic

PFS

NCT01039376 GSK PFS

NCT01468987 Eli Lilly PFS

NCT01308580 Sanofi

On

colo

gy

Cabazitaxel at 20 mg/m² Compared to 25 mg/m² With Prednisone for the Treatment of Metastatic Castration Resistant Prostate Cancer (PROSELICA)

PFS,CTE

Amgen

20

09

20

13

PRS

80

AstraZeneca

21

22

01

3

PRS AP-HP, UNIVDUN

Bayer

27

09

20

13

PRS

Bayer

27

09

20

13

PRS KCL

Bayer

21

11

20

13

PRS

JandJ

13

08

20

13

PRS

Novartis

16

12

20

13

PRS

Novartis

90

92

01

3

PRS WWU

Novartis

90

92

01

3

PRS

Roche

90

92

01

3

PRS FAU, WWU

Sanofi

19

09

20

13

PRS

Bayer

27

09

20

13

PRS

81

Roche

90

92

01

3

PRS

Sanofi

Pro

selic

a

PRS FAU

Bayer

PRS FAU

Sanofi

Get

Go

al

Du

o-2

PRS AP-HP, UNIVDUN

Novartis

OC

TAV

E

PRS WWU, HUG

9.2 Detailed Functional Model for each of Interface Semantic Interoperability Services (SIS)

9.3 Business Scenarios

9.3.1 Scenario A: cts2:CodeSystem

9.3.1.1 Scenario A: cts2:CodeSystem Model

Figure a.1: UML Model

82

For a data repository, a cts2:CodeSystem (CS) is not directly a reference. As a CS is dynamic in time (update & changes), cts2:CodeSystemVersion is used as a reference inside Data Repository. (See next figure)

A cts2:CodeSystemConceptVersion is an entity that is part of a cts2:CodeSystemVersion. IRI should keep the version information for a concept. A same concept present inside 2 versions could have same official code (ex : code = A03) but a concept must have 2 different IRI depending version (umls2013aa:ATC#A03 & umls2014aa:ATC#A03) . Basics organization (hierarchy) illustration in the next fig. A.3.

9.3.2 Scenario B: Scenarios about cts2:ValueSet

9.3.2.1 Figure b.1: UML Model

83

9.3.3 Scenario C.: Scenarios about hl7: Templates.

84

9.3.3.1 TemplateVersion Organization

TemplateVersion are organized (as ordered members) inside Categories. Categories are/could be organized (as ordered members) inside others Categories. Figure A illustrates such an organization to reach different TemplateVersion inside an instance. Reach a template :

9.3.3.2 Template Illustration : General View.

85

9.3.3.3 Template Implementation Examples

9.3.3.3.1 Diagnosis Template Example

9.3.3.3.2 Observation Template Example

9.3.3.3.3 Procedure Template Example.

86

9.3.4 Detailed Functional Model for each Interface

9.3.4.1 Scenario A.: Search/query scenarios for cts2:CodeSystem.

9.3.4.1.1 Service A.1: Get a cts2:CodeSystemVersion.

Description From a unique id, the service return the details of a cts2:CodeSystemVersion.

Input IRI or Code or Acronym of a cts2:CodeSystemVersion.

Output Details (prefLabel,OID,IRI) of a cts2:CodeSystemVersion Link to first level cts2:CodeSystemConceptVersion (= skos:topConcept)

Precondition An id (IRI, code, Acronym) is known by the Client.

Postconditions Details of a cts2:CodeSystemVersion & its first level hierarchy.

Exception Conditions id is unknown by the System.

Aspects left to Technical Specification

Relationship to Levels of Conformance

Miscellaneous Notes

Other relevant content

Associated Scenario A, C

9.3.4.1.2 Service A.2: Get a cts2:CodeSystemConceptVersion

cts2:CodeSystemConceptVersion are organized ideas that compose a cts2:CodeSystemVersion. Description From a unique id, the service return the details of a

cts2:CodeSystemConceptVersion.

Input 1. IRI or code of a cts2:CodeSystemConceptVersion

Output 1. Details (prefLabel,OID,IRI) of a cts2:CodeSystemConceptVersion

2. Links to potential narrower cts2:CodeSystemConceptVersion. [IF EXISTS]

Precondition 1. An id (IRI, code) is known by the Client.

Postconditions 1. A detailed cts2:CodeSystemConceptVersion..

Exception Conditions 1. id is unknown by the System.



Miscellaneous Notes


Associated Scenario A, C

9.3.4.2 Scenario B: Search/query scenarios for cts2:ValueSet

cts2:ValueSetVersion are ordered Collections. Value Set contain elements. These Elements are named fhir:Concept.

9.3.4.2.1 Service B.1: Get a cts2:ValueSetVersion.

Description From a unique id, the service return the details of a cts2:ValueSetVersion.

Input 1. IRI or OID of a cts2:ValueSetVersion

Output 1. Details & Content(prefLabel,OID,IRI) of a cts2:ValueSetVersion. [See Notes for details]

Precondition 1. An id (IRI, OID) is known by the Client.

Postconditions 1. A cts2:ValueSetVersion and its members as a Collection ( List)


87



Miscellaneous Notes The Members of Value Set are typed as fhir:Concept entities.


Associated Scenario B, C

9.3.4.2.2 Service B.2: Get a fhir:Concept

The fhir:Concept are the values that contains a cts2:ValueSetVersion. A fhir:Concept could be a link towards a cts2:CodeSystemConceptVersion OR towards a cts2:CodeSystemVersion. fhir:Concept could contain a numeric VALUE representing the rank of element inside the Value Set. fhir:Concept could contain a link towards a cts2:DataElementVersion.

Description From a unique id, the service return the details of a fhir:Concept..

Input 1. IRI or OID of a fhir:Concept.

Output 1. Details (prefLabel,OID,IRI) of a fhir:Concept.. 2. Link to a CodeSystem element (a

cts2:CodeSystemConceptVersion OR a cts2:CodeSystemVersion) 3. Link to a cts2:DataElementVersion. [IF EXISTS] 4. Rank inside the collection [IF EXISTS]


Postconditions 1. A fhir:Concept and its relations.




Miscellaneous Notes


Associated Scenario B, C

9.3.4.3 Scenario C.: Search/query scenarios for hl7:Template

9.3.4.3.1 Service C.1: Get a Category.

A Category is an entity that organize, classify (ordered) templates or other levels of Category. A Category is an Ordered Collection that could contain : A list (ordered) of Sub level Category. A list (ordered) of TemplateVersion.

Description From a unique id, the service return the details of a Category & its potential ordered members that means Category OR TemplateVersion.

Input 1. IRI or OID or ACRONYM of a Category.

Output 1. Details (prefLabel,OID,IRI,Acronym) of a Category 2. Ordered Members = Categories. (IF EXIST)

OR 1. Ordered Members = TemplateVersions . (IF EXIST)

Precondition 1. An id (IRI, OID or ACRONYM) is known by the Client.

Postconditions 1. A Category & its potential ordered members are known by the client.


88



Miscellaneous Notes The client is able to know with this service if this Category contains a sub level of other Categories OR contains some templateVersions.


Associated Scenario C

9.3.4.3.2 Service C.2: Get a TemplateVersion

A TemplateVersion corresponds to one unique version of HL7:Template. A HL7:Template corresponds to a pattern of constraints defining a context that should be use to express some information. Typically a template is used to create some form fields with boolean , list , set of data etc.. Template always contains a link to one DataElementVersion.

Description From a unique id, the service return the details of a TemplateVersion

Input 1. IRI or OID of a TemplateVersion

Output 1. Details (prefLabel,OID,IRI) of a TemplateVersion 2. Link to a DataElementVersion*


Postconditions 1. A templateVersion and its relationship to a dataElementVersion




Miscellaneous Notes



9.3.4.3.3 Service C.4: Get a DataElementVersion

Description From a unique id, the service return the details of a DataElementVersion.

Input 1. IRI or OID of a DataElementVersion

Output 1. Details (prefLabel,OID,IRI) of a DataElementVersion [See Notes for details]

2. link to a ValueSetVersion [IF EXISTS]


Postconditions 1. A DataElementVersion and its relation to a potential ValueSetVersion.




Miscellaneous Notes Details of Data Element should express : Conceptual Space Property : CODE , VALUE . Data Type : CD, CO, BOOLEAN etc… (ISO Data Types)


89


9.3.4.3.4 Service C.5: Get a ValueSetVersion

cf. service B.1

9.3.4.3.5 Service C.6: Get a fhir:Concept

cf. service B.2

9.3.4.3.6 Service C.7: Get a cts2:CodeSystemVersion

cf. service A.1

9.3.4.3.7 Service C.8: Get a cts2:CodeSystemConceptVersion

cf. service A.2

9.4 Semantic services used by SDM/ODM editor

9.4.1 Introduction

The CDISC SDM (Study/Trial Design Model) 1.0 standard is based on the CDISC ODM (Operational Data Model) 1.3.1. Both are XML standards which allow machine-readable, interchangeable descriptions of the study design and the data collection. In the following sections the abbreviation SDM-ODM may occur to indicate this extension mechanism. The ODM 1.3.1 elements can be used within one of the three SDM constructs Structure (Arms, Activities, etc.), Workflow (decision points, branches, etc.) and Timing. Some SDM elements can be used as “standalone” definitions and re-used as ODM annotations. These are for example the Summary Parameters or the Inclusion/Exclusion criteria definitions.

The following chapters describe the use of the CDISC SDM-ODM standard within the EHR4CR scenario 2 and scenario 3 contexts and the integration into third party SDM-ODM editors.

9.4.2 Usage of a SDM-ODM container

The SDM-ODM standard holds the information required to electronically describe a study protocol. The standard allows partial definitions, which offers the EHR4CR project to include only the elements needed to fulfill the scope of scenario 2 and scenario 3. Nevertheless the Study or Data Manager should consider having a mechanism to complete the SDM-ODM representation created by third party SDM-ODM tools. The idea of having a minimal SDM-ODM container is based on the CDISC example 2.3-ODMShell.xml which can be downloaded from the CDISC website. The container structure can be represented with only ODM element definitions.

http://www.cdisc.org/study-trial-design

90

<ODM> <Study OID="SAMPLE_STUDY"> <GlobalVariables> <StudyName>CDISC Study Design Prototype</StudyName> <StudyDescription>A sample study</StudyDescription> <ProtocolName>SDM (Prototype)</ProtocolName> </GlobalVariables> <MetaDataVersion> <Protocol>  </Protocol>  </MetaDataVersion> </Study> </ODM>

Table 3 SDM-ODM container

The SDM element definitions are placed in <Protocol> tag and refer to the ODM definition.

9.4.3 SDM elements for patient recruitment

Reflecting the patient recruitment in scenario 2 the eligibility criteria are mandatory elements within the EHR4CR scope. These criteria in ODM are defined as a free text conditions which return true or false. Each free text criteria should have a unique OID within the SDM-ODM definition file. In combination with the ODM Study OID a global unique OID can be created. The use of multiple languages is possible and should be addressed. <MetaDataVersion> … <ConditionDef Name="Informed consent obtained" OID="co_ic"> <Description> <TranslatedText xml:lang="en">Written informed consent obtained.</TranslatedText> </Description> </ConditionDef> </MetaDataVersion>

Table 4 Free text eligibility criteria

These ODM conditions can now be referenced by Inclusion/Exclusion SDM elements.

91

<sdm:InclusionExclusionCriteria> <Description> <TranslatedText xml:lang="en">Eligibility criteria</TranslatedText> </Description> <sdm:InclusionCriteria> <sdm:Criterion OID="crit_ic" Name="Written informed ..." ConditionOID="co_ic" /> </sdm:InclusionCriteria> <sdm:ExclusionCriteria>

<sdm:Criterion OID="crit_age" Name="Patient <18” ConditionOID="co_age" /> </sdm:ExclusionCriteria> </sdm:InclusionExclusionCriteria>

Table 5 SDM Inclusion/Exclusion criteria representation

The SDM elements like Structure, Workflow and Timing are of limited use in the EHR4CR scope. As XML based standards are also human-readable, the additional elements can be added using a normal text or advanced XML editor but more specialized SDM-ODM editors are advised.

9.4.4 SDM-ODM extension for third party SDM-ODM designer

A third party SDM-ODM editor should be able to create an SDM-ODM container and to load a valid SDM-ODM file in order to complete the values like e.g. the eligibility criteria before this file will be uploaded into the EHR4CR platform. Therefore the tool needs a menu bar with an entry to create open and save a SDM-ODM file. A file dialog box should have the possibility to browse on the file system. The first time the file is created by the local workbench the CreationDateTime attribute of the ODM tag should be completed in the ISO 8601 format like 2013-07-20T11:07:23+01:00. The FileType and Granularity as well as the XML namespaces are fixed attributes in this case.

<ODM xmlns=http://www.cdisc.org/ns/odm/v1.3 xmlns:sdm=http://www.cdisc.org/ns/studydesign/v1.0 FileType="Transactional" Granularity="All" CreationDateTime="2013-07-20T11:07:23+01:00"> </ODM>

Table 6 First SDM-ODM output

9.4.5 Global Definitions = protocol

The local workbench should be able to add and edit an SDM-ODM container (Table 3) with the following elements and attributes. File (OID, Description) Study (OID, Name, Description ) and Protocol Name MetaDataVersion (OID, Name, Description) Supported Languages

92

Figure 1 SDM-ODM Global Definitions

9.4.5.1 Eligibility criteria and conditions

The CDISC SDM standard uses ODM ConditionDefs to describe the eligibility of a patient. These ODM ConditionDefs could be referenced as well to ODM 1.3.1 ItemRefs as CollectionExceptionConditionOIDs. Therefore the conditions should be shown as separate elements in an own list. Via a context menu the elements can be added, changed and deleted. The dialog box shown in Figure 2 supports the user to enter the free text eligibility criteria for the defined languages of Figure 1.

Figure 2 Handling of ODM 1.3.1 ConditionDefs

The last missing step to fulfil the SDM-ODM requirements for the eligibility criteria is the relation between the Inclusion/Exclusion criteria and the ConditionDefs. A table representation of the criteria could be filled with the condition OIDs via Drag & Drop.

93

In two separate groups either the Inclusion or Exclusion Criteria attributes like Criterion OID or Name could be entered. In the background the XML structure of Table 5 SDM Inclusion/Exclusion criteria representation should be created. The created structure must be saved via File / Save File in the menu bar.

Figure 3 Eligibility criteria in SDM-ODM

9.4.5.2 Queries

9.4.5.2.1 Query element exists

If a query element exists in the Query Builder the final ECLECTIC query statement can be saved within the ODM metadata file using the FormalExpression XML tag.

9.4.5.2.2 Missing data elements

In order to create new elements the study designer has to switch to the Data Element editor to update EHR4CR central terminology.

This element should get the context attribute EHR4CR and the query as text inside the node. <ConditionDef Name="Age" OID="co_age"> <Description> <TranslatedText xml:lang="en">Age</TranslatedText> </Description> <FormalExpression Context="EHR4CR">QUERY</FormalExpression> </ConditionDef>

Table 7 Add ECLETIC to ODM conditions

9.4.5.3 EHR4CR Data Element annotation in ODM

In order to prepopulate the ODM ClinicalData the ODM MetaData file should be enriched with the EHR4CR Data Element codes taken from the termApp. The mechanism used for this annotation would be the CDISC ODM Alias XML tag.

94

The third party SDM-ODM editor should have the possibility to add Alias XML elements to an ODM ItemDef (ODM 1.2) and to CodeListItems (ODM 1.3.1). The Alias tag contains the so called context and value attribute. The third party SDM-ODM editor should allow a globally defined context “EHR4CR” for the working study definition. The context has to be added manually.

Figure 4 Global definition of the EHR4CR context

By adding this special context a link to the EHR4CR Meta Data repository can be established. It should be reflected if the connection was successful. This can be done by setting the text color of the global Alias definition to green. Once this connection is established, the attribute value of an ItemGroupDef can be defined as the classification by using a full text search on all available classifications in the EHR4CR repository. The text field for the EHR4CR context updates if a value within the field changes and shows the new matching labels in a drop down.

Figure 5 Example of full text search

This label will be used on ItemGroup level to be set as the Alias value attribute for the EHR4CR context. <ItemGroupDef OID="g _vs" Name="Vital Signs"> <Alias Context="EHR4CR" Value= "12-Vital Signs"/> </ItemGroupDef> The annotation for ItemDefs and codelists are described in the next chapter.

9.4.5.3.1 Annotation of quantitative elements

The attribute value of an ItemDef is completed as well by a full text search of the EHR4CR Data Element labels. The value is in this case the code and not the label. The EHR4CR service has to provide either the code is applicable or the label. The classification is used as a filter argument if

95

it is set to the corresponding ItemGroupDef. This means only the labels for the values within the classification should be returned by the service if the search is used for ItemDefs or codelists. <ItemDef OID="i_diabp" Name="Diastolic" DataType="…" Length="5"> <Alias Context="EHR4CR" Value= "<Code of Diastolic blood pressure>"/> </ItemDef>

Table 8 EHR Data Element annotation in ODM

The link to corresponding measurement units is not possible with ODM 1.3.1. This results in the limitation of having multiple measurement units referenced to one ItemDef in the EHR4CR study setup context.

Figure 6 ODM annotation of CD datatype

9.4.5.3.2 Annotation of qualitative elements

The Alias tag can also be referenced to a CodeListItems in addition to an ItemDef element in the ODM standard. This can be used for the EHR4CR data element type CO as described in Figure 7 ODM annotation of CO data type in ODM. It is advised to also annotate the code of the parent element to the ItemDef.

96

Figure 7 ODM annotation of CO data type

9.4.5.4 Prepopulation of eligibility criteria

As the CRF pre-population of the EHR4CR scenario 3 uses the CDISC ODM standard, the re-use of ODM ConditionDefs and the relation of SDM InclusionExclusionCriteria to ODM ItemDefs seem to be close. Unfortunately the standard does not provide a reference mechanism which can be used for scenario 3 in order to pre-populate the eligibility criteria as ODM elements. The link of ODM ItemDefs to ConditionDefs has another meaning as the ItemRef attribute CollectionExceptionConditionOIDs does NOT collect this questionnaire in the EDC system if true. In the SDM context it means that the patient is Inclusion/Exclusion criteria are met and the patient is eligible. A CDISC compliant EDC system would display after the upload of the ODM Metadata the ItemDefs as questionnaires on the screen and wait for user interaction. After the data collection the values are stored in the so called ItemData tags. If the InclusionExclusion criteria would be linked to the ItemDefs the EHR4CR platform could automatically create the ItemData ODM elements from the already processed Metadata in scenario 2 as there is a 1:1 relation between ODM ItemDefs and SDM InclusionExclusion criteria. As long as the CDISC SDM-ODM standard does not provide such a link the mapping of the ODM ItemData and the EHR record entries will remain as a manual process for the eligibility criteria. The third party SDM-ODM editor must be able to load the SDM-ODM container in order to complete the ODM definitions.