Strategy to Evaluate the Quality of Clinical Data From CROs ......All Serious Adverse Events (SAEs) from clinical trials need to report to FDA in a timely manner so data availability

PharmaSUG 2019 - Paper DS-87

Strategy to Evaluate the Quality of Clinical Data from CROs

Charley Wu, Rob Tarney, Atara Biotherapeutics

1 ABSTRACT:

More and more pharmaceutical and biotech companies (Sponsors) are using CROs for data management.

High quality of data is the key for statistical analysis, FDA/Global Regulatory submissions and product

approval. As most of Clinical Data Management work is done by CROs now, a key obligation of the Sponsor

lies in the evaluation of the quality of clinical data produced by CROs. This oversight is a multi‐faceted

challenge that requires substantial planning to ensure a positive outcome.

This paper introduces a comprehensive approach that includes both automatic and manual data review

efforts. Automated review (Auto‐review) consists of 1) a data structure check, 2) a new data check, 3) an

edit check, 4) a SAE reconciliation, 5) a PK reconciliation, 6) a lab data normalization and reports, 7) a

critical variable check, 8) the usage of ad‐hoc reports.

Auto‐review is primarily achieved by using targeted SAS® programming. Manual review is done by cross‐

functional partners such as sponsor based Medical Monitors, Pharmacovigilance, Clinical Operations and

Data Management. Manual review can usually identify a sub‐set of approximately 5‐10% of data issues

while auto review can identify upwards 90% of issues. Many manual review findings can lead to more

auto‐review programming in a type of feedback loop adding more and/or smarter checks with each round

of findings and thus improve reviewing proficiency.

To finish data review of a typical database, it usually takes a few weeks by manual review while it takes

about mere hours using a smart auto‐review system. This comprehensive approach greatly improves data

quality and enable sponsors to lock database with confidence.

2. INTRODUCTION:

Pharmaceutical and biotech companies are using CRO’s services for data management as the industry

standard. This approach enables sponsors to start up clinical tries quickly, run studies with limited in‐

house resources. Most of clinical trials produce millions of data points. Certainly, there will be many data

issues to investigate. Experienced CROs can usually produce high‐quality clinical trial data but no one can

guarantee data is completely clean. CROs usually transfer data to sponsors regularly and expect sponsors

to review the data and provide feedback quickly so CROs can have enough time to issue queries and get

data issues resolved in time.

The big challenge for sponsors now is how to evaluate the quality of the data from the CROs. Many

sponsors usually just check data manually or sporadically due to limited resources. This approach will be

able to identify some issues but cannot provide a big picture on the health of the data. It is very often that

some important issues are missed. For example, some lab units are off by 1000 (10^6/ul instead of

10^9/ul). Some AEs’ end dates are even after dates of death. To be fair to sponsors, it is very hard to

identify these issues from millions of datapoints manually as the size of the datasets can be cumbersome.

This paper introduces a comprehensive strategy that includes both automatic and manual data reviews.

This strategy can usually finish whole database review within a single business day.

2. Auto Data Review:

Auto‐review consists of 1) Data structure check 2) New data check 3) Edit check 4) SAE reconciliation 5)

PK reconciliation 6) Lab data normalization and reports 7) Critical variable check 8) ad‐hoc reports.

Auto‐review is mainly achieved by SAS® programming. SAS® programs are created based on pre‐defined

specifications. These specifications may vary by study. However, there are many common checks across

studies such as checks for AE, CM, MH, DM, Lab etc. The more standard CRFs are used, the more checks

can be reused across studies.

It may take 1‐2 months to develop auto checks depending on the complexity of studies. Once they are

ready, they can be run repeatedly and greatly improve the data review efficiency. This is especially true if

a study lasts a few years.

It usually takes about 4–6 hrs. to review the whole database by an automated checking process.

Theoretically sponsors could provide feedback to CROs the next day. CRO partners usually appreciate this

kind of quick turn‐around.

Below are some examples of automated checks we have put into practice across our portfolio:

2.1 Data Structure Check:

When data is transferred from CRO to sponsor, the first thing to do is to check whether the data

structure follows what is defined in the EDC database specification. This step will check the following:

a. Variable Name

b. Variable Type (date, numeric, character)

c. Variable label. SDTM only allows 40 chars. It is better to follow SDTM standard

d. Variable Format. Some use codelists

e. Codelist: If a variable uses a codelist, is the value in code‐list?

f. Any new or updated or deleted variable

g. Any new or updated or deleted dataset

Sample Output: Differences between Sponsor DB Spec and Vendor Database

Message Dataset name

Var Name Var Label Var Type

Var Length

Vendor Var label Vendor Var

Type Vendor Var Length

Var Label different AE HLGT Primary HLGT Char 200 MedDRA, primary HLGT Char 200

Var Length different AE HLT Primary HLT Char 200 Primary HLT Char 100

New Variable in Vendor DB SU_ALC ALCFREQ Frequency Char 60

Variable Removed from Vendor DB SU_ALC FREQ_ALC Frequency Char 60

2.2 New/Updated Data Listings:

Data transfers from CROs are usually cumulative. Data organizations receive requests from partner

functions such as Medical Monitors, Clinical Operations, and Data managers for tools that support

processes that can avoid review the same data repeatedly. Instead, they prefer to review new and

updated data only since last review. Therefore, in addition to creating regular data listings, we also

create listings for updated/new data only affording the reviewers a more efficient use of their time.

Sample Output: new and updated data only by dataset.

2.3 Data Edit Checks

EDC database usually includes many simple checks at the data field level. Most of the EDC edit

checks are univariate checks not requiring comparison to other fields. There are some checks that

run across multiple CRF pages but checks with complicated logic are very hard to implement within

the EDC system. Therefore, it is very common to implement these checks in SAS® datasets outside

EDC. Here are some of the examples: identifying duplicate adverse events, AE vs CM reconciliation,

tumor responses to drugs over time, finding prohibited medications, etc.

Please note these edit checks are intended for complicate logics rather than replacing regular EDC

checks.

Sample Output: Discrepancies between End of Treatment and AE

Message Subject Date of Last Dose End of Treatment

Reason End of

Study Date

End of Treatment is Death. But there is no AE with Fatal Outcome. 999999 05-Jun-2017 DEATH 16-Jun-2017

2.4 SAE reconciliation:

Drug safety is an important part of clinical trials. All Serious Adverse Events (SAEs) from clinical trials

need to report to FDA in a timely manner so data availability plays a key role in this process. Typically,

a safety database (e.g. Argus) is used to capture all SAEs and to create MedWatch reports.

Pharmacovigilance physicians usually build SAE cases that include concomitant medications, vital

signs, medications, lab test results, death record, etc.

Clinical databases not only capture SAEs but also capture non‐serious Adverse Events (AEs). These

data are used for safety analysis and other regulatory needs such as IND annual reporting. As SAEs are

captured independently in both Safety database and EDC database, reconciliation is required to

ensure the data are consistent. SAEs must be reconciled before database lock or reporting out of the

data.

SAE reconciliation usually includes 6 key variables: AE verbatim, MedDRA preferred term, Start date,

Stop date, Severity, and Outcome. This is usually done by CROs manually. It is a very tedious and

challenging process as new SAEs keep coming and existing SAEs keep changing due to queries.

To overcome this issue, we developed a program to do it automatically. This program will align SAEs

from Safety Database and EDC database automatically. Any prior queries and comments can be

preserved. The auto process greatly improves reconciliation efficiency and accuracy. While it may take

a week to reconcile 50 SAEs, it will only take about 1 hr. to reconcile them automatically.

Sample Output: SAE Reconciliation

2.5 PK reconciliation

Most clinical trials have Pharmacokinetic (PK) data. Blood samples are taken at pre‐defined timepoints

(e.g. 30min before injection, 30 min post injection, 1hr, 3hr, …) and then sent to PK labs for analysis.

These timepoints are also logged into CRFs. Later, PK lab test results will be sent back to CDM. At that

time, sample timepoints from CRFs and from PK labs need to be reconciled to make sure 1) no samples

are missing, 2) sample timepoints are identical between CRFs and PK lab.

This is usually done by CROs manually. As there might be thousands of PK samples in a clinical trial,

keeping track of them is a big headache for the study team. Manual reconciliation is even more

challenging. It is very often discovered that many sample timepoints are missing, mis‐matched, or

wrong timepoints. Many sponsors just take data as is with minimal reconciliation. That creates a lot

of issues for PK scientists to analyze and interpret the data later.

To evaluate whether PK data has been reconciled properly by CRO, we developed a program to do it

automatically. The program can reconcile thousands of timepoints in one hour. It is fast and accurate.

Many of our CRO partners end up using our auto reconciliation report for query and sample tracking.

Sample Output: Reconciliation between EDC and PK lab data

2.6 Lab data normalization and reports:

Almost all clinical trials have lab data. Lab data is usually the largest dataset with millions of records

in a database. Very often, efficacy endpoints and safety signals are in the lab data. Therefore, lab

data must be cleaned as much as possible.

Lab data may come from different local labs and central labs which may use different test names,

units, and ranges. They must be standardized to be useful for statistical analysis. Standardization is

also important for data management to ensure fit for purpose data. Once a lab test is normalized to

a standard unit, it is easy to compare data across labs and to identity data issues.

Manual data review of millions of records is almost impossible. Therefore, we developed a lab

normalization system to address this issue. Details of the system were published in PharmaSUG

2017 tilted “Laboratory Data Standardization with SAS® ”. This paper won 2017 Best Paper Award

in Application Section.

Many lab reports can be created based on normalized lab data. One example is to find duplicates:

The report below shows two original lab results are different. However, after normalizing to the

same unit, it turns out they are duplicates. That was confirmed later by the lab that they did send

out the same lab data in two different records in different units.

Sample Output: Lab test results duplicates after normalization

Subject Lab Test Collection

Date Original Result

Original Unit Standard Numeric Result

Standard Unit

9999 HBV DNA 12/3/2013 2940 copies/mL 3.47 log copies/mL

9999 HBV DNA 12/3/2013 3.47 log copies/mL 3.47 log copies/mL

2.7 Critical Variables Check

In clinical trials, millions of datapoints are collected. But not all of them bear the same weight.

Datapoints related to efficacy and safety are usually most important. Therefore, CDM must

collaborate with functional partners such as Biostats and Medical Sciences to identify critical variables

during the studies. These data must be collected, sourced document verified (as per the study

monitoring plan), and properly cleaned before database lock.

These critical variables will be given priority for data review and cleaning. To review these variables

efficiently, special data listings and edit checks are created. CDM usually review these data

comprehensively.

Many people believe that all clinical trial data should be cleaned and should be free of errors. That is

an aspirational goal if you have enough unlimited resources and time at both the sponsor and the

study sites. Unfortunately, in real life, sponsors/study sites/CROs/vendors usually have limited

resources and tight timelines. Therefore, setting up data cleaning priorities is key for successful

database lock on time. This is also the reason why Risk‐Based data monitoring is getting more and

more popular. FDA also endorses this approach to ensure a more focused approach to monitoring

the data.

In general, AE, EX, DS, DM datasets should be given to priority review and should be 100% cleaned.

2.8 AD‐HOC reports

As every study is different, ad‐hoc reports are part of data review process. Some studies tend to have

a lot of ad‐hoc reports. That will take a lot of programming resources, so it is better to use standard

CRFs and to follow CDASH/SDTM as much as possible. The goal is to use more standard reports across

studies. If an ad‐hoc report is frequently used, it is better to promote it to a standard report.

Sample Output: Patient Treatment vs Responses

2.9 Manual Review:

Though most data in database can be checked by programming, there are always exceptions. That is

especially true for data that need medical expertise. For example, whether an AE resulting in death

is valid (e.g. can a common cold results in death), a medication is valid for a symptom (e.g. Tylenol

for hypertension) or whether an abnormal finding in Physical Exam should be considered an AE etc.

Many protocol deviations are also identified by manual data review.

Manual data review is important, but it is tedious and time consuming. It is easy to take medical

monitors 1‐3 weeks to finish one round of data review. To facilitate this process, CDM can provide

Medical Monitors with patient profiles, new data only listings, critical variable listings, ad‐hoc

reports, CSRs, etc. However, the more reports medical monitors get, the more time it will take.

Based on our experience, many manual data review tasks can be achieved by SAS® programming

and thus reduce the burden of medical monitors and improve review efficiency therefore it is

important to communicate with the clinical team regularly to transfer as much manual review to

auto review.

Sample Output: Queries generated by Manual Review

3. Discussions

CROs are usually trying very hard to meet sponsors’ expectations. Due to complexities of many studies,

close collaborations between CROs and Sponsors are critical for study success. Sponsors usually have an

oversight plan that governs study setup, implementations and communications. In many cases, vendors

need clear and concise instructions/clarifications from sponsors. In addition to CROs’ diligent data review,

sponsors usually need to review the data as well. Sponsors are ultimately responsible for the quality of

data. The combined auto and manual data review approach described in this paper, if used efficiently,

should enable sponsors to lock database with confidence.

4. Contact Information

Your comments and questions are valued and encouraged. Contact the authors at:

Author Name: Charley Z. Wu E‐mail: [email protected], [email protected] Author Name: Rob Taney E‐mail: [email protected]

SAS® and all other SAS® Institute Inc. product or service names are registered trademarks or

trademarks of SAS® Institute Inc. in the USA and other countries. ® indicates USA registration.

Other brand and product names are trademarks of their respective companies.

Documents

Strategy to Evaluate the Quality of Clinical Data From CROs ......All Serious Adverse Events (SAEs) from clinical trials need to report to FDA in a timely manner so data availability