Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
1
Translational Research and Patient Safety in Europe
D8.1
Final Report on Demonstration 1
Work Package Number: 8
Work Package Title: Demonstration and Exploitation
Nature of Deliverable: Report
Dissemination Level: Confidential
Version: 1
Delivery Date From Annex 1: M51
Principal Authors: Ken Blogg, Peter Wagner
Contributing Authors: Karin Hek, Robert Verheij, Hilde Bastiaens, Mark
McGilchrist, Christopher Golby, Vasa Curcin
Partner Institutions: Quintiles, NIVEL, CPRD
This project has received funding from the European Union’s
Seventh Framework Programme for research, technological
development and demonstration under grant agreement no 247787
[TRANSFoRm].
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
2
Relevant background information:
The EU has a longstanding strategy to promote greater safety and productivity for EU
healthcare via advanced Information Communication Technologies (ICT). The
European Commission is partly funding the TRANSFoRm project as part of its efforts
to advance information and computer science to address these market challenges in
a European context. TRANSFoRm will develop a digital infrastructure that facilitates
the reuse of primary care real world electronic health records (eHR) data to improve
both patient safety and the conduct and volume of Clinical Research in Europe
Once complete, TRANSFoRm will act as a common software platform that will allow
investigators to recruit patients and collect data in primary care for prospective studies.
In addition, retrospective study analyses can be executed, in which data can be
sourced and combined with data extracted from existing databases.
The TRANSFoRm project has received funding from the European Union’s Seventh
Framework Programme for research, technological development and demonstration1.
Figure 1. below displays the TRANSFoRm platform. The cyclical nature of the concept,
projects the learning nature that feeds into the technology platform at its centre.
Figure 1. The TRANSFoRm concept1
1 http://www.transformproject.eu/
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
3
The TRANSFoRm platform involves the following components:
Query workbench – this is a generic tool for building a study description –
specifically in this situation for generating a query as a set of criteria with which
to identify potentially eligible patients from a generic data source.
Semantic mediator – an ontological code set mapping / building tool, which
allows for the generation of code sets (in the case of CPRD using Read codes
as the resulting nomenclature) to enable the identification of events within the
data source under scrutiny.
Data Node Connector – a utility to enable application of a generic query to a
particular data source.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
4
Table of Contents
List of Figures ........................................................................................................... 6
List of Tables ............................................................................................................ 7
Executive Summary ................................................................................................. 8
1. Introduction ...................................................................................................... 11
1.1 Research background ................................................................................... 11
1.2 Research motivation ..................................................................................... 11
1.2.1 Potential opportunity 1: Increasing the patient pool .......................................................... 12
1.2.2 Potential opportunity 2: Reducing the administrative burden ........................................... 13
1.2.3 Potential opportunity 3: Forecasting the patient pull-through ........................................... 14
1.3 Research goals ............................................................................................. 17
1.4 Research contributions ................................................................................. 18
1.4.1 Leveraging the power of greater patient recruitment numbers ......................................... 18
1.4.2 Reducing the administrative burden on GP investigative sites ......................................... 18
1.4.3 Optimized patient recruitment forecasts greater patient pull-through ............................... 19
2. Methodology ..................................................................................................... 20
2.1 Data .............................................................................................................. 20
2.2 Identification and selection of the clinical trial to be used as the baseline
benchmark ............................................................................................................ 21
2.3 Formulation of demonstration 1 evaluation protocol ..................................... 21
2.4 TRANSFoRm query input and execution ...................................................... 21
2.5 Analysis of patient data extracted from national registries ............................ 22
2.5.1 Direct patient data extraction from primary care registries ................................................... 22
2.5.2 Patient data extracted from the national registries using the TRANSFoRm Query Workbench
22
2.5.3 Comparison of direct data extraction vs TRANSFoRm Query Workbench extraction from
national registries........................................................................................................................... 22
2.6 Usability assessment of TRANSFoRm query workbench ............................. 23
2.7 Limitations of the methodology ..................................................................... 23
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
5
3. Results .............................................................................................................. 24
3.1 Benchmark Clinical Trial ............................................................................... 24
3.2 Demonstration 1 evaluation protocol ............................................................ 26
3.2.1 Formulating the draft evaluation protocol .......................................................................... 26
3.2.2 Scope ................................................................................................................................ 28
3.3 TRANSFoRm query formulation ................................................................... 29
3.3.1 Final eligibility criteria query (post- November 2014) ........................................................ 29
3.3.2 Previous eligibility query formulations (pre- September 2014) ......................................... 30
3.4 NIVEL extraction ........................................................................................... 30
3.4.1 Scenario 1: NIVEL manual extraction results including the geographic exclusion filter ... 31
3.4.2 Scenario 2: NIVEL manual extraction results excluding the geographic inclusion filter .. 33
3.5 TRANSFoRm - NIVEL extraction .................................................................. 36
3.5.1 Final results from the TRANSFoRm NIVEL extraction ......................................................... 36
3.6 Direct CPRD extraction ................................................................................. 38
3.6.1 Final results from direct CPRD patient data extraction ..................................................... 38
3.7 Demographic analysis .................................................................................. 40
4. Discussion ........................................................................................................ 41
4.1 Manual-NIVEL extraction .............................................................................. 41
4.2 Manual-CPRD extraction .............................................................................. 41
4.3 Manual-NIVEL vs Manual-CPRD extraction ................................................. 42
4.4 TRANSFoRm-NIVEL extraction .................................................................... 43
4.5 Technical hurdles .......................................................................................... 46
4.5.1 Data quality issues ............................................................................................................ 46
4.5.2 Usability issues.................................................................................................................. 48
4.6 TRANSFoRm-CPRD extraction .................................................................... 49
5. Conclusion ....................................................................................................... 49
References .............................................................................................................. 51
Abbreviations .......................................................................................................... 51
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
6
List of Figures
Figure 1. The TRANSFoRm concept……………………………………………………....3
Figure 2. Schematic to show the proposed impact of TRANSFoRm on the clinical trial
process. ............................................................................................................. 11
Figure 3. Theoretical sequence of events for current patient recruitment at a potential
GP investigative site upon receiving a request from a trial sponsor. ................. 13
Figure 4. Conceptual infographic of a patient forecast model that could be used to
leverage the value of TRANSFoRm. ................................................................. 14
Figure 5. Operational metrics from the CT used as a baseline for demonstration 1. 25
Figure 6. Conceptual diagram displaying the estimated potential pool of additional
patients relevant for demonstration 1 trial in each country. ............................... 26
Figure 7. Conceptual comparison of the size of the additional patient pool with the size
of the original trial dataset for each country. ...................................................... 27
Figure 8. Screen capture of the TRANSFoRm Query Workbench entry. .................. 29
Figure 9. A bar chart showing the stepwise decrease in patients from a total of 123,746
as each inclusion exclusion filter is applied. The total number of patients remaining
based upon a geographic analysis is 3,224. ...................................................... 32
Figure 10. A bar chart showing the stepwise decrease in patients from a total of
123,746 as each inclusion exclusion filter is applied. The total number of patients
remaining based upon the exclusion of the geographic filter was 12,606.......... 34
Figure 11. Screen capture of the final TRANSFoRm Query Workbench entry………..36
Figure 12. Infographic showing the individual counts for each segment and the
combined eligibility count for the inclusion criteria………………………………………37
Figure 13. A bar chart showing the stepwise decrease in patients from a total of
296,875 as each inclusion/ exclusion criteria filter is applied…………………………..39
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
7
List of Tables
Table 1. A cost comparison between the two models showing the expected value of
the TRANSFoRm platform. Larger and smaller are arbitrary references for the
assigned cost and not between costs. ............................................................... 16
Table 2. Breakdown of the sites by country, patient recruitment periods and rate of
recruitment extracted from the original clinical trial (NCT00660907-
clinicaltrials.gov ID). ........................................................................................... 25
Table 3. Displays the number of patients assigned to each segment as the inclusion
exclusion filter was applied based upon an initial geographic constraint. .......... 31
Table 4. Displays the number of patients assigned to each segment as the inclusion
exclusion filter was applied. The geographic filter was excluded. ...................... 33
Table 5. Table showing a comparison of the total numbers of patients assigned to
category depending upon the application of the geographic filter. ..................... 35
Table 6. Displays the number of patients assigned to each segment as the inclusion
exclusion filter was applied. The geographic filter was excluded for this analysis.
.......................................................................................................................... 38
Table 7. Demographic analysis of the patient data comparing initial patient populations
against qualifying patient populations. ............................................................... 40
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
8
Executive Summary
Research motivation: The exploitation of the potential opportunities in the
commercialization of TRANSFoRm to pharmaceutical companies. Any value-
proposition of TRANSFoRm is founded upon the validation of the platform itself. The
value proposition would be based upon a more economic approach to patient
recruitment for clinical trials by executing queries in national patient registries. Patient
recruitment is generally very expensive and primary care is a vastly underused source
of patients for clinical trials, which have the additional advantage of being population
based rather than hospital based.
Objectives:
This deliverable aims to demonstrate that TRANSFoRm’s technical tools can provide
value-added benefit to pharmaceutical companies by achieving the objectives below.
Steps involved in getting permission to actually query the database and to invite
patients to participate in a study is not part of this deliverable.
1. Identify and select a suitable clinical trial that could be used as a benchmark.
2. Formulate evaluation protocol based upon the inclusion criteria and exclusion
criteria of the identified clinical trial that will be used for submissions to the query
workbench (QWB) and electronic health record (eHR) databases in the UK and
The Netherlands (NL). (The QWB is a generic tool for building a study
description – specifically for generating a query as a set of criteria with which to
identify potentially eligible patients from a generic data source).
3. Input protocol into TRANSFoRm QWB and execute extraction.
4. Compare and contrast the results of the TRANSFoRm QWB with that of the
patient registry data extracted in isolation from the national data providers.
5. Assess the usability of the TRANSFoRm platform.
Results and Conclusions:
Data from electronic health records (eHR) databases (i.e. NIVEL and CPRD)
has the potential to help identify patients for trial recruitment more effectively
and efficiently compared to common practice today, which is more reactive
rather than proactive.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
9
In the manual extraction and analysis of the anonymised NIVEL eHR database
12,606 patients (excluding geographic filtering) out of 123, 746 were identified
that would qualify for the clinical trial based upon the original trial eligibility
criteria.
The manual extraction of the CPRD data package identified 70,669 (excluding
geographic filtering) qualifying patients from an initial 296,875 patient pool.
The NIVEL data identified 10% of patients could be eligible where as in the UK
35% of patients were eligible. Efforts were exhausted in triple checking that data
for the cause of the discrepancy. Further research is required to identify the
cause of this discrepancy however this is likely due to differences between the
UK and Netherlands health care systems, affecting the completeness of certain
data elements.
The manual data received from NIVEL was much more targeted whereas the
analysis of the CPRD data was much more resource intensive. For future
efforts, critical to success is understanding and coping with the requirements of
databases, such as ethical approval, steering committees approval, manual
checks etc) to streamline the process and to simplify the data burden of
analysis.
The data node connector (DNC) between TRANSFoRm and NIVEL has been
successfully implemented. (The DNC is a utility to enable application of a
generic query to a particular data source.), making it theoretically possible to
acquire data.
The extraction of data from NIVEL Primary Care database via the TRANSFoRm
QWB confirmed functionality of the DNC between TRANSFoRm and NIVEL as
well as providing a usability assessment of the QWB itself.
The extraction of NIVEL eHR data via the QWB provided several findings.
45,264 patients met the inclusion criteria.
There were significant discrepancies between data sets resulting from NIVEL
manual extraction vs QWB extraction.
It was observed that not all of the internal mechanisms of the QWB were visible
to the user. An example is the absence of a detailed description of how the data
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
10
was sourced, or information relating to quality control /version control in
reference to the data pool from which the query is extracted. Therefore the
internal mechanisms by which the QWB executes the query could potentially
lead to confusion for the user and invalid research results.
For this deliverable, individual patient records were not sent to places outside the
database environments. Only patient counts were analysed. There are two key
critical success factors that need to be addressed in order to position TRANSFoRm
closer to pharmaceutical industry as a valuable entity:
1. Greater user transparency will inspire trust in the data which could be
accomplished by creating a procedure for initiating a support service.
2. Optimization of the requested dataset (qualifying patient records
identified by the inclusion and exclusion criteria) in order to balance the
requirement for enough data to clearly identify trends but not
unnecessarily exhaustive that the platform cannot handle the volume of
information that is being asked to be extracted.
This leads to a larger discussion around automated data extraction from eHR
databases in general, that querying data without contextual information is a significant
challenge.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
11
1. Introduction
1.1 Research background
This section outlines the background of the TRANSFoRm platform in Work Package
8. The basis for this research is exploring the potential value added opportunities of
the TRANSFoRm technology to pharmaceutical companies.
Therefore the strategy for conducting this research was to demonstrate the impact of
the platform on a clinical trial as a case study retrospectively to highlight the potential
commercialization opportunities.
Figure 2. shows the intended impact of TRANSFoRm on the clinical trial process.
Figure 2. Schematic to show the proposed impact of TRANSFoRm on the clinical trial process. D1, refers to demonstration 1 (current work) and D2, refers to demonstration 2 (proposed future work by EU 7FP).
This is a highly simplified figure. Not included in this figure are the steps involved in
acquiring permission to query the database and the practical and legal steps involved
in getting permission to invite patients to enrol in the study. Similarly, manual screening
by GPs to check whether a patient can be invited are also not included.
1.2 Research motivation
The motivation behind conducting this research is based upon exploiting the potential
opportunities in the commercialization of TRANSFoRm to pharmaceutical companies.
Any value-proposition of TRANSFoRm is founded upon the validation of the platform
itself. This would be based on a more economic approach to patient recruitment for
clinical trials (CTs) by executing queries in national primary care registries. The
proposed potential opportunities for TRANSFoRm are discussed below.
Identify Potential
Opportunities
Identify Number Of
Potential Trial Subjects (D1)
Patient Enrolment
Study Execution
(D2)
Data Submission
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
12
1.2.1 Potential opportunity 1: Increasing the patient pool
The immediate potential opportunity for the TRANSFoRm platform is an increase in
the number of eligible patients that meet the inclusion criteria that could therefore be
recruited for a clinical trial, if such a trial would be approved by the appropriate bodies
governing respective databases(compared to the number of patients that would have
qualified via the traditional feasibility search method described in 1.2.2).
The queries used to identify the appropriate patient pool would be executed using the
TRANSFoRm platform prior to the sponsor contacting general practitioner (GP)
investigative sites.
This is based upon the assumption that TRANSFoRm platform has the capability to
segment patients into the following categories:
1. Those that are unsuitable for recruitment based upon the clinical trial exclusion
criteria.
2. Patients that qualify for enrolment based upon full data sets (all fields relevant
to the inclusion and exclusion criteria are completed with the correct information
and error free).
3. Greater resolution of a grey area that consists of patients that have incomplete
data for which an additional screening step is required to ensure delegation into
either of the above patient pools.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
13
1.2.2 Potential opportunity 2: Reducing the administrative burden
The current process for potential GP investigative sites for identifying patients upon
receiving a request from a sponsor is described in the Figure 3 below.
The majority of the administrative burden is created in the execution of the feasibility
searches. This is compounded if the same GP investigative site receives requests for
several such searches per month and then subsequently is not selected to proceed for
the study.
Figure 3. Theoretical sequence of events for current patient recruitment at a potential GP investigative site upon receiving a request from a trial sponsor.
Using the TRANSFoRm platform would mean that GPs no longer have to conduct the
searches. However the administrative burden is only reduced as GPs would still have
to cross-check the medical records.
1. Notice of clinical trial given to potential GP site via sponsor
2. Physician and staff execute a feasibility search
3. Cross check inclusion exclusion criteria vs own medical records
4. Sponsor makes decision to use GP as an investigator site / forward patient detials to investigator or to withdraw
5. If successful GP to contact patient for consent
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
14
1.2.3 Potential opportunity 3: Forecasting the patient pull-through
Figure 4. Conceptual infographic of a patient forecast model that could be used to leverage the value of TRANSFoRm. The two swimlanes show the differences in the data journey which is reflected in the potential patient pool size. This is then indexed against the increase in volume, coupled to a decrease in cost (due to assumed reduction in administrative burden) resulting in a net increase in value.
The conceptual infographic for forecasting patient pull-through is shown in Figure 4.
Each swimlane is described in detail below.
1.2.3.1 Swimlane 1: The current model
Sponsors require the identification of potential patients for a clinical trial. Available data
sources are dependent upon patients that have proactively registered their interest in
studies, national registries or trials. This may have been in relation to a patient’s
membership of specific patient associations for specific indications. However there is
a disconnect between the various data sources as these sources currently exist in
isolation. This disconnect means that there is greater difficulty for the pharmaceutical
industry in terms of both time and effort, in aggregating the various data sources to
ultimately identify the potential target patient population.
Sponsors approach GP investigative sites with feasibility search requests. As
described previously, this is a labour intensive process whose administrative burden
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
15
increases with multiple requests. Conceptually, the hurdle at this interface is the
question:
“what is the incentive for the physician and support staff to execute the feasibility
search?”
- It follows; “what are the costs associated with this process in addition to the cost of
incentives for physicians to comply with the sponsors requests?”
Having identified the patients that qualify for recruitment via the inclusion exclusion
criteria three key metrics arise; the total number of patients enrolled onto the CT, the
time period over which recruitment occurs and the number of sites at which the CT is
conducted.
Finally the study execution phase forecasts the total patient pull through by taking into
account the total number of patients that complete the CT, the number of drop outs
and the time taken to complete the trial.
1.2.3.2 Swimlane 2: The TRANSFoRm model
The TRANSFoRm platform moves the heavy data rich analytics earlier on in the
process. TRANSFoRm pulls data from across a linked data eco-system. This allows
the inclusion exclusion filter to be applied prior to contact with potential GP investigative
sites.
Assuming the accuracy of the patient records is of high standard (data fields filled in
correctly) it is feasible that the pool of patients that qualify for the CT and the inclusion
exclusion criteria is greater than the current model of physician feasibility searches. It
is possible that a greater number of patients can be screened.
Based upon an increased pool of qualifying patients that could be recruited for the
clinical trial it can be assumed that compared to the current model:
1. A greater number of patients can be enrolled.
2. The recruitment period would be accelerated.
3. Fewer GP investigative sites could provide the same number of patients as
recruited in the case study CT.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
16
Leverage of the TRANSFoRm platform is built upon the concept of an optimized
recruitment and study execution compared to the original trial. Optimization in this
context refers to fewer drop outs during the trial. It is unfeasible to assess whether CT
cycle time would be affected during this retrospective case study therefore this will
excluded as a metric of analysis in this protocol.
Table 1. below shows a theoretical cost comparison between the swimlane activities
in Figure 4.
Cost type Current
Model
TRANSFoRm Justification
Cost of incentives for GPs and Site Investigators
(x%)
Larger Smaller TRANSFoRm identifies qualifying patients prior to contact with GP investigative sites therefore costs associated with administrative burden of feasibility searches are decreased.
Cost of patient recruitment
(y%)
Larger Smaller Reduction in the overall number of trial sites due to GP investigative sites being co-located to postcode regions of largest qualifying patient pools. Costs associated with patient recruitment and site management are decreased.
Cost of patient drop outs during CT
(z%)
Larger Smaller TRANSFoRm identifies optimal qualifying candidates based upon inclusion exclusion criteria. Therefore the probability of unsuitable patients being enrolled into CT and subsequently dropping out is reduced.
Table 1. A cost comparison between the two models showing the expected value of the TRANSFoRm platform. Larger and smaller are arbitrary references for the assigned cost and not between costs.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
17
1.3 Research goals
The aim of this research is to ‘demonstrate value-add benefit of the TRANSFoRm
platform to pharmaceutical companies.’
The objectives are:
1. Identify and select a suitable clinical trial that could be used as a benchmark.
2. Formulate evaluation protocol based upon the inclusion criteria and exclusion
criteria of the identified clinical trial that will be used for submissions to the query
workbench (QWB) and electronic health record (eHR) databases in the UK and
The Netherlands (NL).
3. Input protocol into TRANSFoRm query workbench and execute extraction.
4. Compare and contrast the results of the TRANSFoRm query workbench with
that of the patient registry data extracted in isolation from the national data
providers.
5. Assess the usability of the TRANSFoRm platform.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
18
1.4 Research contributions
There are three proposed areas in which TRANSFoRm could be leveraged to
demonstrate its added value to pharmaceutical companies.
1.4.1 Leveraging the power of greater patient recruitment numbers
The added value of TRANSFoRm would be as follows:
1. An increase in the pool of qualifying patients that could be recruited for the
clinical trial, based upon the exclusion inclusion criteria.
2. Validation of the TRANSFoRm platform itself as to whether it will allow a clearer
segmentation of patients into categories (patients that do not comply with
recruitment criteria, patients that require additional screening to qualify further
and high value patients that qualify for trial enrolment) based upon the inclusion
and exclusion criteria.
3. Identification of potential investigative sites locations optimal to where the
largest qualifying patient pools are situated.
4. More efficient recruitment such that fewer GP investigative sites are required to
recruit the same number of patients. Further research would be required to
identify if the same number of sites used in the benchmarking CT would provide
more patients or that fewer GP investigative sites provide the same number of
patients as recruited in the CT.
1.4.2 Reducing the administrative burden on GP investigative sites
Due to patient privacy a trusted third party would most likely act as an intermediary
between the GP and the sponsor [see TRANSFoRm privacy and confidentiality
framework]. The GP would receive the list of suitable candidate patients. The
implications are as follows:
1. The administrative burden consumed by the feasibility study is minimized to
contacting patients for consent for use of their data and or interviews.
2. In terms of a business model there is an implied cost-saving. The costs that
would be reduced are those associated with time taken for physicians to search
through patient records in order to exclude patients that do not meet the
inclusion exclusion criteria.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
19
1.4.3 Optimized patient recruitment forecasts greater patient pull-through
It is expected that TRANSFoRm will forecast greater patient pull-through based upon
the following:
1. Increase the patient share due to the increased volume of patients that
successfully complete the CT.
2. Decrease the costs in three key interface points as outlined in Table 1.
3. Data obtained from the increased share of patients that successfully completed
the CT, coupled with the decrease in costs results in an increase in value
proposition of the TRANSFoRm platform as a viable business model.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
20
2. Methodology
This section contains the respective methodologies used for demonstration 1 of Work
Package 8 as well as a summary of the assumptions and limitations.
2.1 Data
NIVEL Primary Care Database (http://www.nivel.nl/en/dossier/nivel-primary-care-
database) is a longitudinal primary care database. Anonymized routinely registered
primary care data is collected in this database to monitor health and health care
utilization in a representative sample of the Dutch population. Participating health care
providers are general practitioners, primary care psychologists, physical therapists,
exercise therapists, dietitians and GP out-of-of-hours services. These data are
combined and supplemented with information about pharmaceutical care and
secondary level care collected by other organizations. Participating health care
providers receive periodic feedback that enables comparison with their peers.
For this study we used data from general practitioners. Currently, more than 500
general practices with approximately 1.6 million listed patients, participate in the NIVEL
Primary Care Database. Data comprises information on consultations, morbidity and
prescriptions. Participating general practices are representative for all Dutch general
practices with respect to type of practice, geographical distribution and degree of
urbanization. In addition, patients listed at participating practices are representative of
the Dutch population regarding age and gender.
Dutch law allows the use of anonymised or pseudonomised extracts of electronic
health records for research purposes under certain conditions. According to Dutch
legislation, informed consent nor approval by a medical ethics committee is obligatory
for this kind of observational studies if certain conditions are fulfilled (Dutch Civil Law,
Article 7:458; http://www.dutchcivillaw.com/civilcodebook077.htm).
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
21
2.2 Identification and selection of the clinical trial to be used as
the baseline benchmark
Having established a relationship with Bristol-Myers Squibb (BMS) to work in an
advisory capacity with the TRANSFoRm consortium, the next step was to identify a
clinical trial that could be used as a benchmark for comparative purposes.
The clinical trial was selected on the basis of following criteria:
1. Locations encompassed the UK and The Netherlands.
2. Trial to take place in primary care setting.
3. Well defined trial inclusion and exclusion criteria.
The UK and NL constraint was due to the fact that these countries have primary care
databases that are accessible for research purposes. Therefore the patient registry
data was available for these countries. This was in natural alignment with the scope of
TRANSFoRm which also specified the primary care setting. The final constraint of
having simple and clear inclusion/ exclusion criteria was important due to limitations in
the scope of what could be entered into the TRANSFoRm workbench query (however
it is acknowledged by the authors that the QWB can only deal with specific queries as
it was built upon the use cases).
2.3 Formulation of demonstration 1 evaluation protocol
The methodology for constructing the evaluation protocol was based upon the
inclusion exclusion criteria from the identified benchmark clinical trial [NCT00660907-
clinicaltrials.gov].
Once a draft protocol was formulated this was circulated amongst the Work Package
8 team as well as BMS. After a redraft the protocol was submitted to NIVEL,
TRANSFoRm and CPRD.
2.4 TRANSFoRm query input and execution
The methodology used to enter the evaluation protocol into the query workbench is
outlined in the TRANSFoRm User Manual version 1.1 [2]. Query was made following
instructions in section 5.0 ‘Using the system’.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
22
2.5 Analysis of patient data extracted from national registries
2.5.1 Direct patient data extraction from primary care registries
The analysis is based upon starting with an initial total patient population count. From
this total patient population count, the selected inclusion/exclusion criteria will apply
and patients will be excluded in a stepwise manner in order to segment the patient
population into three groups. The analysis is based upon segmentation of patients into
three categories by filtering through the inclusion and exclusion criteria in a stepwise
manner. Any miscalculation or potential bias is based upon the interpretation by the
primary authors on whether patients should have been excluded or vice versa.
In the case of incomplete data sets it will be assumed that the patient does not meet
the inclusion criteria and will be excluded. This process would be identical for the data
extracted directly from NIVEL and CPRD. For the data received from NIVEL, Excel
was sufficient to conduct the analysis. For the data received from CPRD, SAS
programming was required to conduct the analysis.
A patient demographic analysis was conducted on the data provided by CPRD and
NIVEL to identify characteristics of the initial patient population and the final qualifying
patient population. Percentage ratio of males to females, mean age, median age,
standard deviation and oldest and youngest patients were calculated. Analysis was
conducted in Excel (NIVEL data) and SAS (CPRD data).
2.5.2 Patient data extracted from the national registries using the TRANSFoRm
Query Workbench
The same stepwise approach using the inclusion and exclusion criteria from 2.4.1 was
applied. No statistical program was used for the analysis of the data extracted via the
QWB.
2.5.3 Comparison of direct data extraction vs TRANSFoRm Query Workbench
extraction from national registries
A comparison of the patient counts returned from the QWB for each of the inclusion
and exclusion criteria against the patient counts of the direct extraction. The
comparison involved analysis of the total patient population identified, numbers of
patients in each segment (as identified by the inclusion and exclusion criteria) and the
available final patient population.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
23
2.6 Usability assessment of TRANSFoRm query workbench
The usability assessment will be based upon the length of time it takes novice users to
enter the query into the TRANSFoRm workbench. A list of findings will be generated
based upon observations and interactions with the query workbench.
The list will be written up and communicated to relevant TRANSFoRm partners to
inform further use cases and Good Clinical Practice validation process.
2.7 Limitations of the methodology
The representativeness of demonstration 1 is limited by a number of factors and needs
to be taken into account during the analysis of the data generated.
Demonstration 1 is limited to one clinical trial in one indication and limited to two
European countries. Therefore, running such a demonstration across several
trials in the same indication, other indications and other countries could yield
very different results.
Patient recruitment criteria in the diabetes trial were relatively few and simple in
the sense that they did not require specialized staff, specific expertise or
infrastructure at the site in order to perform the patient screening. Whether trials
with more complex requirements would benefit from the TRANSFoRm approach
would need to be evaluated separately.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
24
3. Results
This section contains the results of demonstration 1. Below is a guide to the relevant
results sections.
3.1 presents the results of the original clinical trial that which acted as the
baseline in terms of patient recruitment, query formulation and qualifying patient
population.
3.2 presents the results of the inclusion and exclusion query formulated to aid
in the request for patient data from the eHR databases NIVEL and CPRD.
3.3 presents the final query that was entered into the TRANSFoRm QWB.
3.4 presents the results of the direct NIVEL patient data extraction. The results
include the application of an additional geographic filter that could not be applied
to the CPRD data. Therefore results excluding a geographic filter are also
included.
3.5 presents the results of the NIVEL patient data extracted via TRANSFoRm
using the QWB.
3.6 presents the results of the direct CPRD patient data extraction.
3.7 presents the results comparing patient demographics between the direct
patient data extractions from NIVEL against CPRD
3.1 Benchmark Clinical Trial
Based upon advice from BMS the diabetes clinical trial NCT00660907 was selected
as the study to be reproduced. This clinical trial was executed to assess the effect of
dapagliflozin plus metformin compared to glipizide plus metformin on the absolute
change from baseline in HbA1c level after 52 weeks double-blind treatment in patients
with type 2 diabetes, who have inadequate glycaemic control on 1500 mg/day or higher
doses of metformin therapy alone. Further information on the trial can be found at the
clinicaltrials.gov here.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
25
Figure 5. Operational metrics from the CT used as a baseline for demonstration 1.
Location Sites
Recruitment
period
(months)
Recruitment rate
(patients per
month)
(patients
per site)
The
Netherlands
Den Haag, Deurne,
Gorinchem, Groningen
Lichtenvoorde, Losser
Poortvliet, Rotterdam
Wildervank, Zutphen
6 5 3
United
Kingdom
Reading (Berkshire), Aylesbury,
Cookstown (N. Ireland),
Ecclesfield (Sheffield),
Addlestone (Surrey), Trowbridge
(Wiltshire), Bath, Bolton, Bury
St. Edmonds, Coventry,
Edinburgh
8 15 11
Table 2. Breakdown of the sites by country, patient recruitment periods and rate of recruitment extracted from the original clinical trial (NCT00660907-clinicaltrials.gov ID).
168
43
118
30
0
20
40
60
80
100
120
140
160
180
UK NL
Enrolled
Randomised
Number of Patients
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
26
3.2 Demonstration 1 evaluation protocol
Based upon the results of the clinical trial in section 4 the approach described below is
performed separately for the UK and NL.
3.2.1 Formulating the draft evaluation protocol
The approach takes into account geographic proximity between the centres utilised in
the trial and GP practices featured in the TRANSFoRm dataset (in reference to the
data extracted from the eHR databases based upon the final inclusion and exclusion
criteria). Geographic proximity between trial sites and TRANSFoRm sites is limited to
15 kilometres based on patients’ willingness to travel based on industry experience.
The TRANSFoRm dataset is broken down into three patient segments
1. Number of patients that do not comply with the recruitment criteria and therefore
would be excluded from the CT.
2. Number of patients with missing recruitment criteria information who would
require additional screening to qualify them for CT recruitment.
3. Number of patients who do qualify for CT enrolment.
Figure 6. Conceptual diagram displaying the estimated potential pool of additional patients relevant for demonstration 1 trial in each country.
Once patients have been categorized into the three groups, the size of groups 2 and 3
will be compared with the number of patients that were enrolled in the trial.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
27
Figure 7. Conceptual comparison of the size of the additional patient pool with the size of the original trial dataset for each country.
The ratio of patients enrolled in the diabetes trial to those in the TRANSFoRm dataset
provides the evidence to what extent TRANSFoRm provides value in identifying and
selecting patients for a clinical trial. Furthermore, results will also provide evidence to
what extent sites carry a larger number of potential patients than the average diabetes
trial site. The chosen evaluation approach is based on a retrospective evaluation
whether the TRANSFoRm dataset from the UK and NL would have provided additional
patients to be recruited into the diabetes trial described above in the scope section.
Demonstration 1 matches the recruitment criteria with the information contained in the
TRANSFoRm dataset. Therefore, information pertaining to recruitment criteria and
additional information in the NIVEL and CPRD dataset are required:
diagnosis data
age
prescription information of metformin
when HbA1C levels were measured
when insulin treatment was initiated
when kidney function was last tested
other urinary tract diseases
any concomitant medications
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
28
3.2.2 Scope
Based upon the clinical trial identified in section 4, the scope of demonstration 1
focuses on patients from the United Kingdom (UK) and The Netherlands (NL) only, and
excludes patients from other countries who participated in this trial.
This phase III trial had the following recruitment criteria that are of importance for the
purpose of demonstration 1:
3.2.2.1 Inclusion criteria
– Type 2 Diabetes
– ICPC T90 (ICPC version 1)
– Treated with oral anti-diabetic drug therapy including Metformin for at least 8
weeks prior to enrolment
– Prescription information regarding Metformin and the last prescription
date in the 8-month period from January 31, 2008 to September 30,
2008.
– HbA1c >6.5% and </=10%
3.2.2.2 Exclusion criteria
– Type 1 Diabetes
– ICPC T90.01
– Insulin therapy within one year of enrolment
– Prescription information regarding insulin and the last prescription date
in the 18-month period from March 31, 2007 to September 30, 2008.
– Renal (kidney) failure or dysfunction
– ICPC N17, N18 or N19 including all subcategories if available
• N17 = Acute kidney failure
• N18 = Chronic kidney failure
• N18.1 = Chronic kidney failure stage 1 to N18.5 = stage 5
• N18.9 = Chronic kidney failure, unspecified stage
• N19 = Unspecified kidney failure
See Appendix for CPRD ISAC application protocol
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
29
3.3 TRANSFoRm query formulation
Using the evaluation protocol created in section 3.2 and the method outlined in 2.3, the
TRANSFoRm query was created in TRANSFoRm QWB as shown in Figure 8. The
inclusion criteria are highlighted in light blue and the exclusion criteria are highlighted
in red.
3.3.1 Final eligibility criteria query (post- November 2014)
Figure 8. Screen capture of the TRANSFoRm Query Workbench entry.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
30
The final query was formulated in November 2014. The query contains three separate
human insulin prescriptions, each of was a variant of human insulin that the QWB
dictionary identified as a separate line item.
The difference between the query in Figure 8 and previous attempts (see appendix)
was that all of the inclusion and exclusion criteria were entered into one single query.
In previous versions (pre- September 2014) separate queries were created with single
criteria added in a cumulative manner in an attempt to filter the patient numbers. This
approach was intended to capture the decrease in patient pool in a stepwise manner.
This was to attempt to match the process by which the NIVEL and CPRD data had
been manipulated. For example the first query contained only a single inclusion criteria.
The second query contained the first and the second criteria. The third contained the
first, second and third and so on. This resulted in seven separate queries.
3.3.2 Previous eligibility query formulations (pre- September 2014)
A total of seven separate queries were set up in the QWB, which took approximately
90 minutes. The first query contained only a single criterion based on individual
inclusion and exclusion criteria listed in section 3.2.2. Subsequent queries added an
additional criterion cumulatively in order to facilitate analysis. Query 7 contained all
criteria relevant for this study.
The use of the search functionality and parameter settings worked well and wait times
for results to be returned were 1-2 weeks depending on the availability of personal at
NIVEL. Input of the initial query into the QWB was also overseen by NIVEL via an
arrange web-ex session.
See appendix for an example of the previous criteria entry throughout the development
of demonstration 1.
3.4 NIVEL extraction
The final analysis of the data extracted from NIVEL deviated from the original proposed
methodology in 2.3. This was due to registry data being available but at much lower
granularity (detail) for the time period (January 31, 2008 to September 30, 2008) in
which the original benchmark clinical trial was conducted. Geographic data would not
be made available for researchers, but within the boundaries of the database, in the
dutch dataset it was possible to estimate the distance between practice and patient
(based on the 4 digits of a 6-digit postcode) and to exclude patients living 15 km or
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
31
further away from their GP. In the UK this was not possible because of a lack of
geographic identifiers in the database.
3.4.1 Scenario 1: NIVEL manual extraction results including the geographic
exclusion filter
Exclusion criteria Potential patient pool size
Eliminated patients
Percentage patients excluded
Assigned segment
Column in Figure 9
All patient records 123,746 0 - 1
Distance to centre >15 km
24,184 99,562 80 Red 2
HbA1c level between 6.5% and 10% (between 47 and 86)
7,768 16,417 13 Red 3
Metformin medication information
6,097 1,671 1.4 Yellow 4
Uncertain type of diabetes
3,961 1,205 0.97 Yellow 5
Insulin medication information
5,166 931 0.75 Red 6
From practices with ICPC coding compliance <70%
3,716 245 0.19 Yellow 7
Type 1 diabetes 3,482 234 0.18 Red 8
Kidney dysfunction or failure
3,356 126 0.10 Red 9
Age> 85 3,256 100 0.081 Red 10
Other urinary tract disease
3,235 21 0.017 Yellow 11
Concomitant medications
3,224 11 0.0089 Yellow 12
Qualifying patients 3,224 0 Green 13
Table 3. Displays the number of patients assigned to each segment as the inclusion exclusion filter was applied based upon an initial geographic constraint.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
32
For each of the Figures 9 and 10 the following applies:
Red = excluded patients
Yellow = incomplete patient information
Green = included patients
Figure 9. A bar chart showing the stepwise decrease in patients from a total of 123,746 as each inclusion exclusion filter is applied. The total number of patients remaining based upon a geographic analysis is 3,224. The x-axis correlate to the data shown in Table 3.
123,746
- 99,562
- 16,416
- 1,206- 1,205 - 931 - 245 - 234 - 126 - 100 - 21 - 11 3224
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
1 2 3 4 5 6 7 8 9 10 11 12 13
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
33
3.4.2 Scenario 2: NIVEL manual extraction results excluding the geographic
inclusion filter
Exclusion criteria Potential patient pool size
Eliminated patients
Percentage patients excluded
Assigned segment
Column in
Figure 10
All patient records 123,746 0 - 1
HbA1c level between 6.5% and 10% (between 47 and 86)
35,901 87,845 71 Red 2
Metformin medication information
25,906 9,995 8.1 Yellow 3
Uncertain type of diabetes
20,859 5,047 4.1 Yellow 4
Insulin medication information
15,921 4,938 4.0 Red 5
From practices with coding compliance <70%
13,957 1,964 1.6 Yellow 6
Kidney dysfunction or failure
13,370 587 0.47 Red 7
Age >85 12,894 476 0.38 Red 8
Type 1 diabetes 12,721 173 0.14 Red 9
Other urinary tract disease
12,662 59 0.048 Yellow 10
Concomitant medications
12,606 56 0.045 Yellow 11
Qualifying patients 12,606 0 Green 12
Table 4. Displays the number of patients assigned to each segment as the inclusion exclusion filter was applied. The geographic filter was excluded.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
34
Figure 10. A bar chart showing the stepwise decrease in patients from a total of 123,746 as each inclusion exclusion filter is applied. The total number of patients remaining based upon the exclusion of the geographic filter was 12,606. The x-axis correlate to the data shown in Table 4.
123,746
12,606
-87,845
-9,995
-5,047
-4,938
-1,964-587 -476 -173 -59 -56
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
1 2 3 4 5 6 7 8 9 10 11 12
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
35
Segment Including geographic filter Excluding geographic filter
Patients that do not comply with recruitment criteria (red)
117,518 94.8 % 94,019 76.0 %
Additional screening required to qualify further (yellow)
3,153 2.5 % 17,121 13.8 %
High value patients that qualify for clinical trial enrolment (Green)
3,224 2.6 % 12,606 10.2 %
Table 5. Table showing a comparison of the total numbers of patients assigned to category depending upon the application of the geographic filter.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
36
3.5 TRANSFoRm - NIVEL extraction
The results presented in this section are the final patient numbers received in
November 2014.
3.5.1 Final results from the TRANSFoRm NIVEL extraction
Figure 11. Screen capture of the Final TRANSFoRm Query Workbench entry.
Interpretation of the patient returned from the QWB:
45,264 = The total number of patients for the inclusion criteria (Metformin prescriptions AND HbA1c patients AND diabetes patients diagnosed before the 31st of Dec 2012
o 101,031 = the total number of metformin patients irrespective of the
other criteria o 80,071 = the total number of HbA1c measurements o 96,793 = is the total number of diabetes patients diagnosed before the
31 Dec 2012 o 58,538 = Metformin patients AND HbA1c measurements
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
37
15,409 = the total number patients excluded based upon an insulin prescription
884 = the total number of patients with kidney failure that have been excluded
765,139 = the total number of patients over the age of 85 that have been
excluded
Figure 12. Infographic showing the individual counts for each segment and the combined eligibility count for the inclusion criteria.
Inclusion criteria patient counts
Diabetes type 2
=96,793
HbA1c
=80,071
Metformin
=101,031
Age >85
=765,139
Kidney failure =
884Insulin
prescription
=15,409
Number of patients excluded for each of the exclusion critieria
Combined
inclusion = 45,264
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
38
3.6 Direct CPRD extraction
The CPRD data was provided without any geographic identifiers in accordance UK
data privacy laws. Therefore no geographic filter option could be applied as executed
in the NIVEL analysis.
3.6.1 Final results from direct CPRD patient data extraction
Description Patient numbers Percentage patients excluded
Column in Figure 13
Remaining patients
Excluded
Initial patient population 296,875 1
Inclusion criteria 1 Medcode = “type II” or “type 2”
201,899 94,976 32 2
Inclusion criteria 2 Prodcode = “metformin” AND a prescription at least 8 weeks prior to the enrolment date (1JAN2013)
148,693 53,206 18 3
Inclusion criteria 3 HbA1c levels >6.5% and <=10%
99,883 48,810 16 4
Exclusion criteria 1 medcode = “type I” or “type 1”
98,710 1173 0.40 5
Exclusion criteria 2 prodcode = “insulin” AND there is a prescription between 01Jan2013 and 01Jan2014;
77,105 21,605 7.2 6
Exclusion criteria 3 medcode = “renal failure” or “kidney failure”
75,842 1,263 0.42 7
Exclusion criteria 4 age>=85 70,669 5,173 1.7 8
Qualifying patients 70,669 8
Table 6. Displays the number of patients assigned to each segment as the inclusion exclusion filter was applied. The geographic filter was excluded for this analysis.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
39
As shown in Table 6 above, the order in which the inclusion and exclusion criteria was
applied to arrive at the final qualifying patient population was different to the direct
extraction from NIVEL (Table 3 and Table 4) and the extraction of the NIVEL database
by the TRANSFoRm QWB (Figure 11). This was due to having to adapt the structure
and the sequence of the analysis based upon the layout and organization of the data
packages presented by the eHR databases.
Figure 13. A bar chart showing the stepwise decrease in patients from a total of 296,875 as each inclusion/exclusion filter is applied. The total number of patients remaining that met the inclusion/exclusion criteria was 70,669. Criteria description, 1 = initial diabetes population, 2 = type 2 diabetes (inclusion), 3 = metformin prescription (inclusion), 4 = HbA1c levels >6.5% and <= 10% and equivalent in mmol/mol, 5 = type 1 diabetes (exclusion), 6 = insulin therapy within one year of enrolment (exclusion), 7 = renal (kidney) failure of dysfunction (exclusion), 8 = patients >= 85 years old and 9 = final qualifying patient population.
70669
-94976
-53206
-48810
-1173 -21605
-1263 -5173
0
50000
100000
150000
200000
250000
300000
1 2 3 4 5 6 7 8 9
Nu
mb
er o
f p
atie
nts
Critieria
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
40
3.7 Demographic analysis
UK The Netherlands
Type II
diabetes only
Qualifying
patient
population
Type II
diabetes
only
Without
geographic
constraints
With
geographic
constraints
Patient count
(Actual)
201,899 70,669 123,742 12,606 3,226
Patient count
(Percentage)
35% 10% 2.6%
Sex Male: 112,656 41,529 60,855 6,643 1,652
Female: 89,243 29,140 62,787 5,963 1,572
Male:
Female (%)
55.8%: 44.2% 58.8%: 41.2% 49%: 51% 53%: 47% 51%: 49%
Age Mean 67.2 65.1 63.7 64.4
Median 68 66 65.0 65.5 64.0
Std
deviation
13.2 11.5 14.5 11.2 11.2
Min 5 13 0 17 26
Max 106 84 111 85 85
Table 7. Demographic analysis of the patient data comparing initial patient populations against qualifying patient populations.
Table 7. presents the demographic analysis of each of the patient data for the initial
patient population and final qualifying patient populations for both countries.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
41
4. Discussion
4.1 Manual-NIVEL extraction
Based upon Figures 9 and 10, the total initial patient population was determined as
123,746. The most significant filter in reducing the pool of patient was the condition
that patient inclusion fell within pre-determined HbA1c levels. The final patient
population on the basis of the inclusion/exclusion criteria was 12,606 (excluding the
geography filter and 3,224 including the geography filter).
Anonymized eHR data from NIVEL shows that a potentially significant larger patient
population could be achieved than the 43 patients that enrolled in the original clinical
trial. However it is not known how many patients were initially identified in the original
clinical trial and how many patients refused to participate.
The data was provided in excel and was approximately 50 megabytes. From a user
perspective, the data was easier to manipulate compared to the CPRD data package
and required little formal training.
After conducting the analysis a follow-up call was scheduled with NIVEL to talk through
the analysis in a stepwise manner to discuss the findings. This was an opportunity to
review the analysis and discuss any misinterpretations directly with NIVEL.
4.2 Manual-CPRD extraction
Figure 13. shows that the final potential patient population that met the original clinical
trial inclusion/ exclusion criteria was 70,669. The CPRD data removes all geographic
reference descriptors in line with UK patient data protection guidelines so this could
not be reduced further. The final available patient pool is significantly greater than the
168 patients that enrolled in the UK. However it is unknown how many patients were
asked to participate in the original clinical trial.
The analysis of the data from CPRD was significantly more complex as a very wide cut
of the raw data was provided (40 Gigabytes). In order to manipulate and navigate the
data effectively SAS training was required.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
42
It remains to be seen how the TRANSFoRm QWB will handle the difference in the
scope of the data available for patient counts.
4.3 Manual-NIVEL vs Manual-CPRD extraction
Due to the very different data packages from both eHR databases, the structure of the
final analysis was different. Therefore a direct head-to-head of the patient numbers
excluded by inclusion/exclusion criteria would be inaccurate. Despite this complication
it is the final qualifying patient population that is the metric of interest. Table 7 shows
that 70,669 patients (35%) qualified from the UK and 12,6062 (10%) qualified from The
Netherlands as a percentage of the initial type 2 diabetes patient population. There is
a significant discrepancy in the qualifying patients between the two counties of 25%.
When comparing the metrics of the demographic analysis there is no obvious
causation for this discrepancy. Both qualifying patient populations show similar trends
in male to female ratio - 59%:41% UK; 53%:47% (NL), mean age – 65 (UK); 64 (NL);
median age – 66 (UK & NL) and standard deviation – 11.5 (UK); 11.2 (NL). The patient
demographic analysis established how close the qualifying patient group were across
countries.
Looking at the sequential filtering of patients, it can be seen that the criteria that had
the largest impact on excluding patients was the HbA1c range. Of the total type 2
diabetes populations in both countries 38% (76,639) patients in the UK were excluded
compared to 71% (35,901) excluded from The Netherlands. Literature searches reveal
no medical explanation to explain the HbA1c discrepancy between the UK and The
Netherlands that can account for the 33% difference. A hypothesis is that this is due
to differences between the health care systems, that cause differences in data quality
(defined as a function of the completeness of data, consistency of coding between GP
Practices, timeliness of data, and differences in legal regulations and policies – CPRD
data does not collect geographic indicators therefore patients cannot be localized in a
well-defined local area) between the NIVEL and CPRD eHR databases. Especially the
absence of a QoF-like (quality and outcomes framework) reimbursement system might
2 Only the NIVEL analysis excluding the geographic criteria is discussed as the CPRD data had no
geographic identifiers.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
43
be relevant here. These differences in the data make them harder to compare.
However further research would be required to validate this hypothesis.
The technical challenges of analysing data packages that varies by eHR databases
represents both an opportunity and a challenge for TRANSFoRm. If TRANSFoRm is
able to resolve the differences in data packages, this strengthens the platforms
potential business proposition. A platform that has the functionality to make patient
selection across several eHR systems would provide value-add to the pharmaceutical
industry.
However overcoming this hurdle presents a significant challenge. The TRANSFoRm
QWB would have to be continually maintained to ensure that the database is coded to
handle the latest changes and updates to medical dictionaries and product lexicons
used by across the pharmaceutical industry by all the key stakeholder groups / various
eHR databases.
4.4 TRANSFoRm-NIVEL extraction
Figure 11. confirms that the TRANSFoRm platform can firstly be used to create
queries, secondly, can communicate directly with the NIVEL database via the DNC to
extract patient counts. Finally the QWB is able to extract patient counts for individual
inclusion and exclusion criteria as well as cumulative combinations highlighting the
functionality of the logic capability.
This confirms that from a technical perspective, the technical hurdles that have delayed
the submission of the demonstration 1 report from the original 2014 deadline have
been overcome and the QWB is functional. From a usability perspective, queries can
be successfully created and demonstrates an overall positive user experience.
The current issue remains around the formulation of a query with numerous inclusion
and exclusion criteria as was the purpose of this report. Whilst the TRANSFoRm
platform was able to extract patient counts from the NIVEL database, the actual patient
counts do not fully identify the available patient pool. The patient count returned by the
QWB provided a count of 45,264 patients that met all of three of the inclusion criteria,
independent of the exclusion criteria. Figure 12. summarizes the returned patient
counts, indicating how many potential patients meet each of the inclusion criteria
independently and combined.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
44
However the patient counts returned for the exclusion criteria are independent of the
inclusion criteria and independent of each other i.e. there are 765, 139 patients that
are over the age of 85 but it is unknown as to how many of these patients are part of
45,264 patients and need to be excluded. Without being able to apply to exclusion
criteria it remains unknown how much further the 45,264 patients could be reduced
and how close the final qualifying patient population would be to the 12,606 identified
by the direct NIVEL extraction.
In summary, the final qualifying patient population is incomplete as the exclusion
criteria cannot be aggregated with the patient pool identified by the combined inclusion
criteria. This demonstrates the difficulty in the QWB interface in explaining to the users
of the logic functionality.
A recommendation would be to formulate a query with all of the inclusion and exclusion
criteria within ‘sub-buckets’ of each criteria. This may resolve the above incomplete
query problem. An example of such a query is attached in Appendix A.
The strategy for analysis of the TRANSFoRm-NIVEL direct extraction was based upon
the large to small sequential filtering of the data as used in the manual extraction. The
QWB was not designed for filtering large to small consecutively in order to remove
patient numbers in a sequential manner to arrive at the final patient population. Current
attempts to game the system by creating separate queries, each with an accumulation
of the inclusion and exclusion criteria is not a viable workaround for the QWB.
Furthermore creating a greater number of queries requires greater time, but more
importantly places a greater burden on the platform to satisfy more data requests via
multiple query submissions as opposed to creating a single query. From a usability
perspective the QWB needs to offer sequential filtering in order to arrive at the final
patient count.
It is also unknown if the same data pool is used each time from which the query is
extracted. For example, if there is a change in national guidelines that has an effect on
the unit measures the QWB needs to make users aware of these changes and the
date in which they come into effect. The TRANSFoRm user needs to be informed
otherwise there is a risk of excluding patients by accident, impacting the size of the
patient pool. These inconsistencies would lead to inaccurate comparisons of data.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
45
For example, mechanisms need to be in place that alert the user that there may be
different data units for the same measurements (this is due to changes in official unit
guidelines e.g. percentages to mmol/mol) within a requested dataset. The ideal
situation would be that the interface is able to provide the user with options as to
whether to include one measurement system or both. The evolution of the platform
would be one that is able to identify based upon the selection criteria (requested
geographic scope of the data and the time period over which the query covers,) that
the user is recommended/ prompted to include ‘option a’ (percentages only) OR ‘option
b’(mmol/mol only) or ‘option a AND b’.
These differences need to be explained to the user to avoid confusion as well as
providing information on when the changes were implemented and whether
retrospective changes have been made to the patients measurements.
In addition, TRANSFoRm support staff would need to be kept abreast upon any
changes in mapping / use of database specific custom codes (e.g. variations in ICD-
10 / ICD-9 dictionaries) and other relevant changes that will substantially influence
patient counts.
These measures all need to feedback to technical support staff of the TRANSFoRm
platform. Any support person needs to know what has been mapped / any custom
codes / any changes / that the user should be kept informed of.
Currently the QWB user is not able to observe the level of version control on the data,
so if updates are made to the original eHR records (cleaning / revisions on the
database side) the user is not currently informed upon the impact. It still remains to be
seen if this is for the user to know or for the eHR database staff to make sure they
adapt the mappings in the DNC. If the TRANSFoRm platform is kept up to date with
changes in the standards and the eHR databases are kept up to date with mapping to
recent standards, then the QWB user will not need to observe the level of version
control.
Whilst further research has been previously suggested in refining the query to identify
the available qualifying patient population it is of course recognized that the eHR
databases are dynamic and populations change with time.
If research organizations are meant to utilize TRANSFoRm – they need to have access
to resources that are familiar to the data and/ or the QWB. This may include real-time
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
46
support in order to walk the users through the patient counts. The current situation can
be improved by providing ‘dashboard like’ readouts of a breakdown of how the query
data was formulated (source etc.) and simultaneously flag any potential limitations or
missing data. It is most likely that any pharmaceutical sponsors would deem the effort
to interpret the patient counts as excessive without this supporting information.
Users should also be aware of the ‘run times’ for queries to be executed. These are
related to the volume of data that the QWB has to extract from the eHR databases.
Therefore an automated process for calculating a predicted time for when the query
will be completed would be a useful tool.
Demonstration 1 hinges on the availability of information required by the trial. This
availability could be constrained by the completeness of data, technical availability to
retrieve the information and governance decisions to restrict access.
Finally, automation requires an ongoing cleaning of the data sets. This may skew
patient counts if large numbers of records have contaminated fields. Therefore it is
important to understand the current limitations of the raw datasets themselves. There
are two possible options. The first is that investment is provided to rigorously
retrospectively clean the data sets for the largest possible patient population or to
entirely remove the contaminated data, thereby reducing the patient population.
Further work should be commissioned to identify the point of execution, the optimum
point for return of investment between the two options and how the cost should be
distributed. However this is likely to be a much larger and industry wide conversation
rather than one that is limited to eHR databases themselves as the benefits would
target the pharmaceutical industry as a whole.
4.5 Technical hurdles
4.5.1 Data quality issues
Several data quality issues arose during the implementation that resulted in changes
to some TRANSFoRm tools.
Custom coding schemas
NIVEL data is encoded using ICPC1 codes (which is still standard in the Netherlands),
but with several custom codes that are not present in the default ICPC terminologies
used by TRANSFoRm.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
47
The Query Workbench (QWB) uses a library of standardised coding schemas, e.g.
ICD, ICPC etc. However, since NIVEL uses customised versions of standard codes,
the current Data Node Connector (DNC) query logic treated these criteria as if they
had not been present to enable the completion of the overall query. This problem did
not terminate the query prematurely, but it failed to return a value for specific selection
criteria. This prevented identification and selection of patients with or without these
characteristics exist.
TRANSFoRm captures this data with the provenance tool, however the level of detail
present does not cover custom extensions to existing coding schemas. Should the data
provider have such extensions, before using TRANSFoRm, they would need to
document such codes and either map them back to standard codes, or enter the
extended coding schema into TRANSFoRm Semantic Mediator.
Formatting of Data Values
The NIVEL SQL database stores all fields as textual strings with an associated type
field detailing the content: numeric, date, string or list. When the generated DNC query
runs, it converts these strings to their correct type before running any filters specified
in the QWB. However, this type conversion fails if the content of the string variable
does not conform to the required type, which results in premature termination of the
query. This issue arose with the HbA1c values used in eligibility criteria, that contained
invalid values such as the “>”-symbol.
Further investigation revealed that this was a known technical issue on the NIVEL side
and that this does not cause any problems in their normal business practice, they use
a semi-automated cleaning approach after extraction and before sending the data to
the customer. After some consideration, it was agreed to use the Extract-Transform-
Load (ETL) approach to create a cleaned-up mirror of the central NIVEL database, and
to run queries against that version. This was a step that NIVEL was considering for
some time and is likely to become routine in the future.
Differences in Unit Measures
Most data on results collected within the NIVEL database (laboratory, vital signs, etc.)
is provided without unit information. This information used to be stored in a supporting
table, which is no longer maintained.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
48
Taking HbA1c as an example, units changed approximately 3 years ago from %
(Diabetes Control and Complications Trial - DCCT), to mmol/mol (International
Federation of Clinical Chemistry - IFCC). Maintenance of changes in unit measurement
methods would most likely need to be performed by data providers. These changes
are typically based on new scientific evidence and their adoption by local healthcare
systems. Therefore, data providers will need to adopt their systems anyway. The
challenge is then to enable consistency between different unit measurements to enable
analysis.
The TRANSFoRm solution consists of including the unit conversion into the ETL
process before the queries are executed. The alternative would be for the Semantic
Mediator component to implement the logic for detecting the units based on numeric
values, but this is not always possible and is best handled by the data sources
themselves.
4.5.2 Usability issues
During query prototyping, runtimes of up to one hour were encountered. These were
due to the caching of queries being switched off at the time, thus preventing
subsequent queries to make use of cached results. With caching, individual SQL
queries against the NIVEL database required between 5 and 10 minutes to run. The
tables of interest to this demonstration involved roughly 500 million records and an
estimated 250 gigabytes of space with the overall database possibly close to
1 terabyte. Other activity on the database – which is used continuously by NIVEL staff
– also impacted on the runtimes creating variability in response.
In this demonstration the NIVEL database administrator inspected queries before
allowing their execution and also the responses before transmission back to the
platform. For security reasons NIVEL did not allow any query to be run automatically
without a manual check by this database administrator. This was seen as essential for
NIVEL to control the process and had been requested. In addition, it was often
necessary to communicate with the database administrator to get the TRANSFoRm
tools running. It was not possible to run queries without NIVEL staff were available to
perform checks and to help to get queries actually running. .
Over time and with increasing experience on both sides, process control procedures
could be minimised to become less dependent on availability of key staff.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
49
It is very likely that the issues described above are not unique for NIVEL. Similar
problems will probably occur if TRANSFoRm tools would have been tried on any other
database.
4.6 TRANSFoRm-CPRD extraction
At the time of finalizing the report, the DNC between TRANSFoRm and CPRD had not
been established and therefore will require evaluation in the future.
5. Conclusion
The goal of demonstration 1 was to demonstrate the value-add benefit of the
TRANSFoRm platform to pharmaceutical companies. The following objectives were
successfully achieved:
A clinical trial (NCT00660907) was identified in partnership with Bristol-Myers
Squibb
An evaluation protocol was constructed based upon the identified clinical trial in
partnership and refined with eHR data providers NIVEL (The Netherlands) and
CPRD (UK)
Direct patient data requests from Quintiles were executed by both NIVEL and
CPRD
The data-node connector was successfully installed, allowing the TRANSFoRm
platform to communicate with the NIVEL database
Queries were successfully constructed using the TRANSFoRm query
workbench (QWB)
The results of the extraction from the TRANSFoRm query workbench were
successfully compared with the direct patient data extraction from NIVEL
The TRANSFoRm platform has demonstrated that it can identify and select patients
for clinical trials, increasing the overall size of the patient pool. The adopted approach
was able to compare and partially match data provided directly from primary care
databases with patient data extracted via TRANSFoRm.
Primary care eHR databases have been shown to be an underutilized resource for
clinical trial recruitment and offer offers significant potential for the identification and
selection of patients for clinical trials. Based on the NIVEL reference dataset, several
thousand potential patients were found to be eligible.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
50
The QWB was demonstrated to be a user friendly interface and the accompanying user
guide sufficient to guide naïve users of the platform.
Greater guidance might need to be sought on executing more complex queries in the
QWB. Users would need to be able to see the consequences of logic choices in ‘real-
time’ (how the logic choices and sequence in which these choices are entered impact
upon the inclusion and exclusion of patients, both intentionally and unintentionally)
prior to executing the patient count query.
A key organisational issue that is likely to remain is the potentially lengthy approval
timeline that limits the potential of technology to accelerate patient identification and
selection significantly. Firstly, potential pharmaceutical industry users of the
TRANSFoRm platform need to be aware of any ethical approval procedures that need
to be in place prior to any data submission request. Furthermore awareness of the lead
times in executing the query once the data requests have been approved should also
be factored into timelines. However if requests are submitted 4 to 8 weeks prior to the
request for patient counts, then this is likely to suffice from the eHR database (NIVEL)
perspective. With increased usage and greater familiarity of the approach, it is
expected that this approval process would become more efficient and faster,
concluding in the formulation of formal pharmaceutical industry guidance
documentation.
The TRANSFoRm tools have been successfully employed solely on data from NIVEL
Primary Care Database. It is the first time that the TRANSFoRm approach has been
tested on any real database. Results look promising, but there are still issues to be
resolved to keep TRANSFoRm tools running and to improve their functioning. And as
NIVEL Primary care database is one of the biggest and well organised primary care
databases in Europe, we do not believe these issues to be unique for this database.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
51
References
[1] Efficacy and Safety of Dapagliflozin in Combination With Metformin in Type 2
Diabetes Patients:
http://www.clinicaltrials.gov/ct2/show/record/NCT00660907?term=d1690c00004&ran
k=1
[last checked 17th April 2014].
[2] User Manual, Eligibility Criteria, Work Package 5.2 – Query Formulation
Workbench Technical frameworks, policies and integration v1.1 (March 2014)
C.Golby
Abbreviations
CPRD Clinical Practice Research Data Link
CT Clinical Trial
eHR Electronic Health Record
GP General Practitioner
NIVEL Netherlands Institute for Health Services Research
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
52
Appendix A: Proposed TRANSFoRm Query Workbench entry
Screen capture of a proposed TRANSFoRm Query Workbench entry. Each additional
criteria after the first is included within the ‘sub-container’ of the previous criteria.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
53
Appendix B: Previous TRANSFoRm Query Workbench entry
A total of seven separate queries were set up in the QWB, which took approximately
90 minutes. The first query contained only a single criterion. Subsequent queries
added an additional criterion cumulatively in order to facilitate easier analysis. Query 7
contained all criteria relevant for this study.
Screen capture of the previous TRANSFoRm Query Workbench entry.
TRANSFoRm FP7-247787 D8.1 Report on Demonstration 1
54
Appendix C: CPRD ISAC application protocol
ISAC APPLICATION FORM PROTOCOLS FOR RESEARCH USING THE CLINICAL PRACTICE RESEARCH DATALINK
(CPRD)
ISAC use only:
Protocol Number
Date submitted
.............................
.............................
IMPORTANT
If you have any queries, please contact ISAC Secretariat:
1. Study Title
TRANSFoRm (Translational Medicine and Patient Safety in Europe) WP 8 - DEMO 1 - EVALUATION PROTOCOL
2. Principal Investigator (full name, job title, organisation & e-mail address for correspondence regarding this protocol)
Peter Wagner, Life Sciences Consulting Director, Quintiles, [email protected]
3. Affiliation (full address)
Quintiles, Hugenottenallee 167, Neu-Isenburg, 63263
4. Protocol’s Author (if different from the principal investigator)
5. List of all investigators/collaborators (please list the names, affiliations and e-mail addresses* of all collaborators, other than the principal investigator)
Prof Brendan Delaney, Professor of Primary Care Research, King’s College London, [email protected]
Vasa Curcin, Research Fellow, Imperial College London, [email protected]
*Please note that your ISAC application form and protocol must be copied to all e-mail addresses listed above at the time
of submission of your application to the ISAC mailbox. Failure to do so will result in delays in the processing of your
application.
6. Type of Institution (please tick one box below)
Academia Research Service Provider Pharmaceutical Industry
NHS Government Departments Others
7. Financial Sponsor of study
Pharmaceutical Industry (please specify) Academia(please specify)
Government / NHS (please specify) None
Other (please specify) European Union
55
ISAC v1.2-June 2013
8. Data source (please tick one box below)
Sponsor has on-line access Purchase of ad hoc dataset
Commissioned study
Other (please specify) TRANSFoRm
9. Has this protocol been peer reviewed by another Committee?
Yes* No
* Please state in your protocol the name of the reviewing Committee(s) and provide an outline of the review
process and outcome.
10. Type of Study (please tick all the relevant boxes which apply)
Adverse Drug Reaction/Drug Safety Drug Use Disease Epidemiology
Drug Effectiveness Pharmacoeconomic Other
11. This study is intended for:
Publication in peer reviewed journals Presentation at scientific conference
Presentation at company/institutional meetings Other
European Union
12. Does this protocol also seek access to data held under the CPRD Data Linkage Scheme?
Yes No
13. If you are seeking access to data held under the CPRD Data Linkage Scheme*, please select the source(s) of linked data being requested.
Hospital Episode Statistics Cancer Registry Data**
MINAP ONS Mortality Data
Index of Multiple Deprivation/ Townsend Score
Mother Baby Link Other: (please specify)
* As part of the ISAC review of linkages, the protocol may be shared - in confidence - with a representative
of the requested linked data set(s) and summary details may be shared - in confidence - with the Confidentiality
Advisory Group of the Health Research Authority.
56
ISAC v1.2-June 2013
**Please note that applicants seeking access to cancer registry data must provide consent for publication of
their study title and study institution on the UK Cancer Registry website. Please contact the CPRD Research
Team on +44 (20) 3080 6383 or email [email protected] to discuss this requirement further.
14. If you are seeking access to data held under the CPRD Data Linkage Scheme, have you already discussed your request with a member of the Research team?
Yes No*
*Please contact the CPRD Research Team on +44 (20) 3080 6383 or email [email protected] to discuss your
requirements before submitting your application.
Please list below the name of the person/s at the CPRD with whom you have discussed your request.
Tim Williams
15. If you are seeking access to data held under the CPRD Data Linkage Scheme, please provide the following information:
The number of linked datasets requested:
A synopsis of the purpose(s) for which the linkages are required:
Is linkage to a local dataset with <1 million patients being requested?
Yes* No
* If yes, please provide further details:
16. If you have requested linked data sets, please indicate whether the Principal Investigator or any of the collaborators listed in response to question 5 above, have access to any of the linked datasets in a patient identifiable form, or associated with a patient index.
Yes* No
* If yes, please provide further details:
17. Does this protocol involve requesting any additional information from GPs?
Yes* No
* Please indicate what will be required:
Completion of questionnaires by the GP Yes No
Provision of anonymised records (e.g. hospital discharge summaries) Yes No
Other (please describe)
Any questionnaire for completion by GPs or other health care professional must be approved by ISAC before
circulation for completion.
57
ISAC v1.2-June 2013
18. Does this protocol describe a purely observational study using CPRD data (this may include the review of anonymised free text)?
Yes* No**
* Yes: If you will be using data obtained from the CPRD Group, this study does not require separate ethics
approval from an NHS Research Ethics Committee.
** No: You may need to seek separate ethics approval from an NHS Research Ethics Committee for this study.
The ISAC will provide advice on whether this may be needed.
19. Does this study involve linking to patient identifiable data from other sources?
Yes No
20. Does this study require contact with patients in order for them to complete a questionnaire?
Yes No
N.B. Any questionnaire for completion by patients must be approved by ISAC before circulation for completion.
21. Does this study require contact with patients in order to collect a sample?
Yes* No
* Please state what will be collected
22. Experience/expertise available
Please complete the following questions to indicate the experience/expertise available within the team of
researchers actively involved in the proposed research, including analysis of data and interpretation of results
Previous GPRD/CPRD Studies Publications using GPRD/CPRD data
None
1-3
> 3
Yes
No
Is statistical expertise available within the research team?
If yes, please outline level of experience Ability to pull internal
resources from Biostatisticians group if needed (approximately 40 team members)
Is experience of handling large data sets (>1 million records)
58
ISAC v1.2-June 2013
available within the research team?
If yes, please outline level of experience Variations in experience of up
to 10 years
Is UK primary care experience available within the research team?
If yes, please outline level of experience Prof Delaney leads Primary
Care Research
23. References relating to your study
Please list up to 3 references (most relevant) relating to your proposed study.
Diabetes trial number NCT00660907
59
ISAC v1.2-June 2013
PROTOCOL CONTENT CHECKLIST
In order to help ensure that protocols submitted for review contain adequate information for protocol
evaluation, ISAC have produced instructions on the content of protocols for research using CPRD data.
These instructions are available on the CPRD website (www.cprd.com/ISAC). All protocols using CPRD data
which are submitted for review by ISAC must contain information on the areas detailed in the instructions.
IF you do not feel that a specific area required by ISAC is relevant for your protocol, you will need to justify
this decision to ISAC.
Applicants must complete the checklist below to confirm that the protocol being submitted includes all the
areas required by ISAC, or to provide justification where a required area is not considered to be relevant for
a specific protocol. Protocols will not be circulated to ISAC for review until the checklist has been completed
by the applicant.
Please note, your protocol will be returned to you if you do not complete this checklist, or if
you answer ‘no’ and fail to include justification for the omission of any required area.
Included in protocol?
Required area Yes No If no, reason for omission
Lay Summary (max.200 words)
Background
Objective, specific aims and rationale
Study Type
Descriptive
Hypothesis Generating
Hypothesis Testing
Study Design
Sample size/power calculation
(Please provide justification of
60
ISAC v1.2-June 2013
sample size in the protocol)
Study population
(including estimate of expected number of
relevant patients in the CPRD)
Selection of comparison group(s) or controls
Exposures, outcomes and covariates
Exposures are clearly described
Outcomes are clearly described
Use of linked data
(if applicable)
Data/ Statistical Analysis Plan
There is plan for addressing confounding
There is a plan for addressing missing data
Patient/ user group involvement †
Limitations of the study design, data sources
and analytic methods
Plans for disseminating and communicating study results
† It is expected that many studies will benefit from the involvement of patient or user groups
in their planning and refinement, and/or in the interpretation of the results and plans for
further work. This is particularly, but not exclusively true of studies with interests in the impact
on quality of life. Please indicate whether or not you intend to engage patients in any of the
ways mentioned above.
Voluntary registration of ISAC approved studies: Epidemiological studies are increasingly being included in registries of research around the world, including those primarily set up for clinical trials. To increase awareness amongst researchers of ongoing research, ISAC encourages voluntary registration of epidemiological research conducted using MHRA databases. This will not replace information on ISAC approved protocols that may be published in its summary minutes or annual report. It is for the applicant to determine the most appropriate registry for their study. Please inform the ISAC secretariat that you have registered a protocol and provide the location.
61
ISAC v1.2-June 2013
CPRD Protocol
Summary
The EU has a longstanding strategy to promote greater safety and productivity
for EU healthcare via advanced Information Communication Technologies
(ICT). The European Commission is partly funding the TRANSFoRm project as
part of its efforts to advance information and computer science to address these
market challenges in a European context. TRANSFoRm will develop a digital
infrastructure that facilitates the reuse of Primary Care Real World electronic
Health Records (eHR) data to improve both patient safety and the conduct and
volume of Clinical Research in Europe
Once complete, TRANSFoRm will act as a common software platform that will
allow investigators to recruit patients and collect data in primary care for
prospective studies. In addition, retrospective study analyses can be executed,
in which data can be sourced and combined with data extracted from existing
databases.
Purpose: The basis for this research is exploring the potential value add
opportunities of CPRD technology for the TRANSFoRm platform to
pharmaceutical companies and healthcare representative user groups in order
to qualify patients for potential clinical trials.
Method: Therefore the strategy for conducting this research was built around
demonstrating the impact of the platform on a clinical trial as a case study from
a retrospective perspective to highlight the potential opportunities.
Significance: The motivation behind conducting this research is based upon
exploiting the potential opportunities in the commercialization of TRANSFoRm
to pharmaceutical companies. Any value-proposition of TRANSFoRm is
62
ISAC v1.2-June 2013
founded upon the validation of the platform itself. This would be reflected in a
more economic approach to patient identification, selection and recruitment for
clinical trials by executing queries in national patient registries.
1. Objectives, Specific Aims and Rationale
The objective of this study is to explore the potential to use an electronic
healthcare record (EHR) database (CPRD) as a resource for optimised patient
recruitment for interventional studies such as clinical trials. We intend to use
cohort identification algorithms against the CPRD as an example data source
for the UK. The resulting cohort and the identification procedure will be
compared to an actual clinical trial where recruitment has been undertaken
according to a set of inclusion and exclusion criteria. The same inclusion and
exclusion criteria will be modelled to identify the potential cohort of recruitable
patients in CPRD.
Specifically we intend to assess the following criteria, all of which could be
factors in an optimised method for identification and recruitment of patients for
clinical trials:
A potential increased number of overall patients available for
recruitment identified using the CPRD compared to the actual
experience of the clinical trial we are using as a comparison
The ability to identify (as far as possible within the governance rules of
CPRD) geographic clusters of patients in which to target recruitment
If it would be possible to use a reduced number of clinical trial sites to
recruit the same number of patients as were recruited in the
comparator clinical trial
63
ISAC v1.2-June 2013
2. Background
The TRANSFoRm project has received funding from the European Union’s
Seventh Framework Programme for research, technological development and
demonstration3.
Figure 1. The TRANSFoRm concept4
Figure 1. displays the TRANSFoRm platform. The cyclical nature of the
concept, projects the learning nature that feeds into the technology platform at
its centre.
The TRANSFoRm platform involves the following components:
3 http://www.transformproject.eu/
4 http://www.transformproject.eu/
64
ISAC v1.2-June 2013
Query workbench – this is a generic tool for building a study description
– specifically in this situation for generating a query as a set of criteria
with which to identify potentially eligible patients from a generic data
source
Semantic mediator – an ontological code set mapping / building tool,
which allows for the generation of code sets (in the case of CPRD
using Read codes as the resulting nomenclature) to enable the
identification of events within the data source under scrutiny
Data Node Connector – a utility to enable application of a generic
query to a particular data source.
This study is designed to illustrate the value of the TRANSFoRm platform and
its ability to link with CPRD to extract patient records in order to aid patient
recruitment for clinical trials. Patient identification and potential recruitment will
be modelled using the TRANSFoRm platform linked to CPRD (by modelling a
logical query modified to be compatible with the target database) to replicate
the feasibility and recruitment activities of an existing clinical Trial. The trial in
question is a Diabetes trial (EudraCT Number: 2007-005220-33 and
ClinicalTrials.gov identifier: NCT00660907) to assess the effect of dapagliflozin
plus metformin compared to glipizide plus metformin on the absolute change
from baseline in HbA1c level after 52 weeks double-blind treatment in patients
with type 2 diabetes who have inadequate glycaemic control on 1500 mg/day
or higher doses of metformin therapy alone. Further information on the trial can
be found at the following by accessing ClinTrials.Gov [1] or the EU Clinical
Trials Register [2].
65
ISAC v1.2-June 2013
3. Study Type
This study is a comparative descriptive analysis of a cohort identification
process. It is also a validation study.
It is an attempt to validate a technological platform and ultimately highlight the
value of EHR data as a means of conducting feasibility and potential
recruitment for clinical trials.
4. Study Design
A stepwise application of inclusion and exclusion criteria to the CPRD database
study population (see ‘Study Population’ section) will be undertaken based
upon the criteria for the comparator clinical trial. In order to validate the
TRANSFoRm platform a generic query for the study will be developed using
the Study workbench which will them be mapped to a set of CPRD specific
criteria to apply to the database.
In order to make the comparison with the original trial an ‘enrolment’ date of
January 1st 2013 will be used. Relative to this reference date, only live,
permanently registered patients, currently registered at the practice, from
actively contributing practices with a last collection date after this point will be
considered. All patients will need to have had a prior registration period of 8
weeks or more.
66
ISAC v1.2-June 2013
Inclusion criteria:
– Type 2 Diabetes (all read codes under the hierarchy C10*)
– Metformin or insulin prescription for at least 8 weeks prior to enrolment
– Most recent HbA1c >6.5% and <=10% (or equivalent in mmol/mol)
Exclusion Criteria:
– Type 1 Diabetes (all read codes within the hierarchy C108*)
– Insulin therapy within one year of enrolment
– Renal (kidney) failure or dysfunction
• Acute kidney failure
• Chronic kidney failure
• Chronic kidney failure stage 1 to stage 5
• Chronic kidney failure, unspecified stage
• Unspecified kidney failure
From within the study population, patients identified using this approach will be
categorized into 3 categories
1. Number of patients that do not comply with the recruitment criteria and
therefore would be excluded from the trial (any patients with the
presence of any exclusion criteria, or no evidence of metformin
therapy)
2. Number of patients with missing recruitment criteria information who
would require additional screening to qualify them for trial recruitment
(any patients with no usable HbA1c data or with non-specific diabetes
diagnosis codes)
3. Number of patients who do qualify for trial enrollment
67
ISAC v1.2-June 2013
In an attempt to account for geographic proximity between the trial study sites
and GP practices featured in the CPRD dataset, geographic proximity between
trial sites and CPRD practices will be assessed by CPRD (under appropriate
governance restrictions – undertaken by non-research staff within CPRD) on
the basis of postcode. Patient numbers will be provided filtered by < 15 km from
study site, and > 15km from study site. These results will be aggregated by
study site and provided where 3 or more practices fall within any category. For
any category with less than 3 practices involved, results will be aggregated to
< 15 km from study site, and > 15km from study site only. No data will be
provided where cell size is less than 5 patients.
The original trial featured study sites in the following locations.
Reading (Berkshire)
Aylesbury
Cookstown (N. Ireland)
Ecclesfield (Sheffield)
Addlestone (Surrey)
Trowbridge (Wiltshire)
Bath
Bolton
Bury St. Edmonds
Coventry
Edinburgh
In order to undertake the analysis original study site postcodes will be made
available to non-research staff within CPRD.
5. Sample Size
No sample size stipulation is required as we are comparing numbers of patients
eligible for trial enrolment using CPRD in comparison with an existing clinical
trial.
The study population will be in excess of 200,000 patients.
68
ISAC v1.2-June 2013
6. Data Linkage
No data linkage is required.
7. Study Population
The study population to be provided for analysis is defined as any live,
permanently registered patient, currently registered at the practice, from
actively contributing practices with a last collection date after 1st January 2014
with a previous diagnosis of diabetes mellitus (type specific or non-specific).
8. Selection of comparison group(s) or controls
The patient recruitment data for clinical trial NCT00660907 will be used as a
comparator
9. Exposures, Outcomes, and Covariates
The key variables or events of interest are:
Patient ID
General Practice ID
Age
Gender
Diabetes 1 [defined by: ICPC-code T90 or CTG code 13029 or C108-
2/C108-3]
Diabetes 2 [defined by: ICPC-code T90 or C109-2/C109-3]
Last prescription date [for Metformin] in the period from 1st May 2012 to
31st Dec 2012
Last prescription date [for Insulin] in the period from 1st July 2011 to
31st Dec 2012
Last date on which diabetes type 1 [T90.01] was registered
Last date on which diabetes type 2 [T90.02] was registered
Last date on which diabetes [T90; type unknown was registered]
Last data on which kidney failure / dysfunction was registered [U99.01]
69
ISAC v1.2-June 2013
Last date on which other kidney disease / kidney failure was registered
[U99 not registered / unknown]
Patient listed in the practice from period 1st July 2011 to 31st Dec 2012
End date of the patient was listed on the period between 1st July 2011
to 31st Dec 2012
Date of last registered HbA1c value in 2011-2012
Last registered HbA1c value in 2011-2012 (mmol/mol)
10. Data / Statistical Analysis
Application of the criteria will be undertaken using SAS and stepwise reporting
of patient numbers will be generated.
The analysis is based upon starting with an initial total patient population count
And tracking the reduction in eligible patients at each application of any
inclusion / exclusion criteria. The resultant patients will be categorised into 3
groups (see ‘Study Design’ section) in order to segment the patient population
for comparison with the results of the original trial.
Comparison with the results of the recruitment for the original trial will be
presented. No statistical comparisons will be undertaken.
11. Patient or user group involvement
Currently no patient groups are involved as this is a methodological
investigation not pertaining specifically to any clinical area. Diabetes is used as
an exemplar. The findings of this report will be sent to the EU Commission to
show the potential value of TRANSFoRm in conjunction with CPRD.
70
ISAC v1.2-June 2013
12. Limitations of the study design, data sources and analytical methods
The following limitations are noted:
The comparison is being made between potential recruitable patients
from an eHR database and patients actually recruited to a real clinical
trial.
The patient identification in CPRD relies upon only coded data from
within the eHR database. In reality, free text data and non-coded data
would influence individual patient’s eligibility to be recruited
This is one retrospective clinical trial in one indication limited to the UK
Expanding the scope across several trials in the same indication, other
indications and other countries could yield very different results
Patient recruitment criteria in the diabetes trial are simple in the sense
that they did not require specialized staff, specific expertise or
infrastructure at the site in order to perform the patient screening.
Whether trials with more complex requirements would benefit from the
TRANSFoRm / CPRD dataset would need to be evaluated
The availability of information available in the TRANSFoRm / CPRD
dataset and would be constrained by the completeness of data, technical
availability to retrieve the information and governance decisions to
restrict access
71
ISAC v1.2-June 2013
13. Plans for disseminating and communicating study results
The findings of this analysis will be delivered to the European Union at:
http://www.transformproject.eu/
Quintiles has been working in collaboration with Kings College London &
Imperial College London and there is a commitment to produce a peer-reviewed
publication based upon this work.
On the completion of the TRANSFoRm project the results of the study and
associated work will be made available in a transparent manner.
References
[1]
(http://www.clinicaltrials.gov/ct2/show/record/NCT00660907?term=d1690c000
04&rank=1)
[2] (https://www.clinicaltrialsregister.eu/ctr-
search/search?query=D1690C00004)