Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
PharmaSUG 2019 - Paper DS-87
Strategy to Evaluate the Quality of Clinical Data from CROs
Charley Wu, Rob Tarney, Atara Biotherapeutics
1 ABSTRACT:
More and more pharmaceutical and biotech companies (Sponsors) are using CROs for data management.
High quality of data is the key for statistical analysis, FDA/Global Regulatory submissions and product
approval. As most of Clinical Data Management work is done by CROs now, a key obligation of the Sponsor
lies in the evaluation of the quality of clinical data produced by CROs. This oversight is a multi‐faceted
challenge that requires substantial planning to ensure a positive outcome.
This paper introduces a comprehensive approach that includes both automatic and manual data review
efforts. Automated review (Auto‐review) consists of 1) a data structure check, 2) a new data check, 3) an
edit check, 4) a SAE reconciliation, 5) a PK reconciliation, 6) a lab data normalization and reports, 7) a
critical variable check, 8) the usage of ad‐hoc reports.
Auto‐review is primarily achieved by using targeted SAS® programming. Manual review is done by cross‐
functional partners such as sponsor based Medical Monitors, Pharmacovigilance, Clinical Operations and
Data Management. Manual review can usually identify a sub‐set of approximately 5‐10% of data issues
while auto review can identify upwards 90% of issues. Many manual review findings can lead to more
auto‐review programming in a type of feedback loop adding more and/or smarter checks with each round
of findings and thus improve reviewing proficiency.
To finish data review of a typical database, it usually takes a few weeks by manual review while it takes
about mere hours using a smart auto‐review system. This comprehensive approach greatly improves data
quality and enable sponsors to lock database with confidence.
2. INTRODUCTION:
Pharmaceutical and biotech companies are using CRO’s services for data management as the industry
standard. This approach enables sponsors to start up clinical tries quickly, run studies with limited in‐
house resources. Most of clinical trials produce millions of data points. Certainly, there will be many data
issues to investigate. Experienced CROs can usually produce high‐quality clinical trial data but no one can
guarantee data is completely clean. CROs usually transfer data to sponsors regularly and expect sponsors
to review the data and provide feedback quickly so CROs can have enough time to issue queries and get
data issues resolved in time.
The big challenge for sponsors now is how to evaluate the quality of the data from the CROs. Many
sponsors usually just check data manually or sporadically due to limited resources. This approach will be
able to identify some issues but cannot provide a big picture on the health of the data. It is very often that
some important issues are missed. For example, some lab units are off by 1000 (10^6/ul instead of
10^9/ul). Some AEs’ end dates are even after dates of death. To be fair to sponsors, it is very hard to
identify these issues from millions of datapoints manually as the size of the datasets can be cumbersome.
This paper introduces a comprehensive strategy that includes both automatic and manual data reviews.
This strategy can usually finish whole database review within a single business day.
2. Auto Data Review:
Auto‐review consists of 1) Data structure check 2) New data check 3) Edit check 4) SAE reconciliation 5)
PK reconciliation 6) Lab data normalization and reports 7) Critical variable check 8) ad‐hoc reports.
Auto‐review is mainly achieved by SAS® programming. SAS® programs are created based on pre‐defined
specifications. These specifications may vary by study. However, there are many common checks across
studies such as checks for AE, CM, MH, DM, Lab etc. The more standard CRFs are used, the more checks
can be reused across studies.
It may take 1‐2 months to develop auto checks depending on the complexity of studies. Once they are
ready, they can be run repeatedly and greatly improve the data review efficiency. This is especially true if
a study lasts a few years.
It usually takes about 4–6 hrs. to review the whole database by an automated checking process.
Theoretically sponsors could provide feedback to CROs the next day. CRO partners usually appreciate this
kind of quick turn‐around.
Below are some examples of automated checks we have put into practice across our portfolio:
2.1 Data Structure Check:
When data is transferred from CRO to sponsor, the first thing to do is to check whether the data
structure follows what is defined in the EDC database specification. This step will check the following:
a. Variable Name
b. Variable Type (date, numeric, character)
c. Variable label. SDTM only allows 40 chars. It is better to follow SDTM standard
d. Variable Format. Some use codelists
e. Codelist: If a variable uses a codelist, is the value in code‐list?
f. Any new or updated or deleted variable
g. Any new or updated or deleted dataset
Sample Output: Differences between Sponsor DB Spec and Vendor Database
Message Dataset name
Var Name Var Label Var Type
Var Length
Vendor Var label Vendor Var
Type Vendor Var Length
Var Label different AE HLGT Primary HLGT Char 200 MedDRA, primary HLGT Char 200
Var Length different AE HLT Primary HLT Char 200 Primary HLT Char 100
New Variable in Vendor DB SU_ALC ALCFREQ Frequency Char 60
Variable Removed from Vendor DB SU_ALC FREQ_ALC Frequency Char 60
2.2 New/Updated Data Listings:
Data transfers from CROs are usually cumulative. Data organizations receive requests from partner
functions such as Medical Monitors, Clinical Operations, and Data managers for tools that support
processes that can avoid review the same data repeatedly. Instead, they prefer to review new and
updated data only since last review. Therefore, in addition to creating regular data listings, we also
create listings for updated/new data only affording the reviewers a more efficient use of their time.
Sample Output: new and updated data only by dataset.
2.3 Data Edit Checks
EDC database usually includes many simple checks at the data field level. Most of the EDC edit
checks are univariate checks not requiring comparison to other fields. There are some checks that
run across multiple CRF pages but checks with complicated logic are very hard to implement within
the EDC system. Therefore, it is very common to implement these checks in SAS® datasets outside
EDC. Here are some of the examples: identifying duplicate adverse events, AE vs CM reconciliation,
tumor responses to drugs over time, finding prohibited medications, etc.
Please note these edit checks are intended for complicate logics rather than replacing regular EDC
checks.
Sample Output: Discrepancies between End of Treatment and AE
Message Subject Date of Last Dose End of Treatment
Reason End of
Study Date
End of Treatment is Death. But there is no AE with Fatal Outcome. 999999 05-Jun-2017 DEATH 16-Jun-2017
2.4 SAE reconciliation:
Drug safety is an important part of clinical trials. All Serious Adverse Events (SAEs) from clinical trials
need to report to FDA in a timely manner so data availability plays a key role in this process. Typically,
a safety database (e.g. Argus) is used to capture all SAEs and to create MedWatch reports.
Pharmacovigilance physicians usually build SAE cases that include concomitant medications, vital
signs, medications, lab test results, death record, etc.
Clinical databases not only capture SAEs but also capture non‐serious Adverse Events (AEs). These
data are used for safety analysis and other regulatory needs such as IND annual reporting. As SAEs are
captured independently in both Safety database and EDC database, reconciliation is required to
ensure the data are consistent. SAEs must be reconciled before database lock or reporting out of the
data.
SAE reconciliation usually includes 6 key variables: AE verbatim, MedDRA preferred term, Start date,
Stop date, Severity, and Outcome. This is usually done by CROs manually. It is a very tedious and
challenging process as new SAEs keep coming and existing SAEs keep changing due to queries.
To overcome this issue, we developed a program to do it automatically. This program will align SAEs
from Safety Database and EDC database automatically. Any prior queries and comments can be
preserved. The auto process greatly improves reconciliation efficiency and accuracy. While it may take
a week to reconcile 50 SAEs, it will only take about 1 hr. to reconcile them automatically.
Sample Output: SAE Reconciliation
2.5 PK reconciliation
Most clinical trials have Pharmacokinetic (PK) data. Blood samples are taken at pre‐defined timepoints
(e.g. 30min before injection, 30 min post injection, 1hr, 3hr, …) and then sent to PK labs for analysis.
These timepoints are also logged into CRFs. Later, PK lab test results will be sent back to CDM. At that
time, sample timepoints from CRFs and from PK labs need to be reconciled to make sure 1) no samples
are missing, 2) sample timepoints are identical between CRFs and PK lab.
This is usually done by CROs manually. As there might be thousands of PK samples in a clinical trial,
keeping track of them is a big headache for the study team. Manual reconciliation is even more
challenging. It is very often discovered that many sample timepoints are missing, mis‐matched, or
wrong timepoints. Many sponsors just take data as is with minimal reconciliation. That creates a lot
of issues for PK scientists to analyze and interpret the data later.
To evaluate whether PK data has been reconciled properly by CRO, we developed a program to do it
automatically. The program can reconcile thousands of timepoints in one hour. It is fast and accurate.
Many of our CRO partners end up using our auto reconciliation report for query and sample tracking.
Sample Output: Reconciliation between EDC and PK lab data
2.6 Lab data normalization and reports:
Almost all clinical trials have lab data. Lab data is usually the largest dataset with millions of records
in a database. Very often, efficacy endpoints and safety signals are in the lab data. Therefore, lab
data must be cleaned as much as possible.
Lab data may come from different local labs and central labs which may use different test names,
units, and ranges. They must be standardized to be useful for statistical analysis. Standardization is
also important for data management to ensure fit for purpose data. Once a lab test is normalized to
a standard unit, it is easy to compare data across labs and to identity data issues.
Manual data review of millions of records is almost impossible. Therefore, we developed a lab
normalization system to address this issue. Details of the system were published in PharmaSUG
2017 tilted “Laboratory Data Standardization with SAS® ”. This paper won 2017 Best Paper Award
in Application Section.
Many lab reports can be created based on normalized lab data. One example is to find duplicates:
The report below shows two original lab results are different. However, after normalizing to the
same unit, it turns out they are duplicates. That was confirmed later by the lab that they did send
out the same lab data in two different records in different units.
Sample Output: Lab test results duplicates after normalization
Subject Lab Test Collection
Date Original Result
Original Unit Standard Numeric Result
Standard Unit
9999 HBV DNA 12/3/2013 2940 copies/mL 3.47 log copies/mL
9999 HBV DNA 12/3/2013 3.47 log copies/mL 3.47 log copies/mL
2.7 Critical Variables Check
In clinical trials, millions of datapoints are collected. But not all of them bear the same weight.
Datapoints related to efficacy and safety are usually most important. Therefore, CDM must
collaborate with functional partners such as Biostats and Medical Sciences to identify critical variables
during the studies. These data must be collected, sourced document verified (as per the study
monitoring plan), and properly cleaned before database lock.
These critical variables will be given priority for data review and cleaning. To review these variables
efficiently, special data listings and edit checks are created. CDM usually review these data
comprehensively.
Many people believe that all clinical trial data should be cleaned and should be free of errors. That is
an aspirational goal if you have enough unlimited resources and time at both the sponsor and the
study sites. Unfortunately, in real life, sponsors/study sites/CROs/vendors usually have limited
resources and tight timelines. Therefore, setting up data cleaning priorities is key for successful
database lock on time. This is also the reason why Risk‐Based data monitoring is getting more and
more popular. FDA also endorses this approach to ensure a more focused approach to monitoring
the data.
In general, AE, EX, DS, DM datasets should be given to priority review and should be 100% cleaned.
2.8 AD‐HOC reports
As every study is different, ad‐hoc reports are part of data review process. Some studies tend to have
a lot of ad‐hoc reports. That will take a lot of programming resources, so it is better to use standard
CRFs and to follow CDASH/SDTM as much as possible. The goal is to use more standard reports across
studies. If an ad‐hoc report is frequently used, it is better to promote it to a standard report.
Sample Output: Patient Treatment vs Responses
2.9 Manual Review:
Though most data in database can be checked by programming, there are always exceptions. That is
especially true for data that need medical expertise. For example, whether an AE resulting in death
is valid (e.g. can a common cold results in death), a medication is valid for a symptom (e.g. Tylenol
for hypertension) or whether an abnormal finding in Physical Exam should be considered an AE etc.
Many protocol deviations are also identified by manual data review.
Manual data review is important, but it is tedious and time consuming. It is easy to take medical
monitors 1‐3 weeks to finish one round of data review. To facilitate this process, CDM can provide
Medical Monitors with patient profiles, new data only listings, critical variable listings, ad‐hoc
reports, CSRs, etc. However, the more reports medical monitors get, the more time it will take.
Based on our experience, many manual data review tasks can be achieved by SAS® programming
and thus reduce the burden of medical monitors and improve review efficiency therefore it is
important to communicate with the clinical team regularly to transfer as much manual review to
auto review.
Sample Output: Queries generated by Manual Review
3. Discussions
CROs are usually trying very hard to meet sponsors’ expectations. Due to complexities of many studies,
close collaborations between CROs and Sponsors are critical for study success. Sponsors usually have an
oversight plan that governs study setup, implementations and communications. In many cases, vendors
need clear and concise instructions/clarifications from sponsors. In addition to CROs’ diligent data review,
sponsors usually need to review the data as well. Sponsors are ultimately responsible for the quality of
data. The combined auto and manual data review approach described in this paper, if used efficiently,
should enable sponsors to lock database with confidence.
4. Contact Information
Your comments and questions are valued and encouraged. Contact the authors at:
Author Name: Charley Z. Wu E‐mail: [email protected], [email protected] Author Name: Rob Taney E‐mail: [email protected]
SAS® and all other SAS® Institute Inc. product or service names are registered trademarks or
trademarks of SAS® Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.