Implementation of CDISC at BI – Overview

© 2009 IBM Corporation

Implementation of CDISC at BI – Overview

CDISC German User Group Meeting Sep 2009

Dr. Jens WientgesIBM Global Business Services Life Sciences / Pharma Consulting

2 © 2009 IBM CorporationIBM

1. Motivation, Objectives and expected Benefits

2. System Landscape, Data Flow and Processes

3. Approach

4. Real life examples of issues and sponsor defined elements




3. Approach



Implementing CDISC at BI (ICBI) - Motivation– Requests for analyses on substance/project databases

SDB/PDB are increasing• need to effective use and exploit clinical data beyond single trials • need to build efficient substance databases

– A harmonized data model based on CDISC allows for• a wider range of standard reporting tools• re-use of standard programs• facilitated familiarization with new trials/projects• higher flexibility in assignments to projects• quicker response to regulatory requests (same view on data)

- BI has taken the decision to implement the CDISC data standards to effectively manage, exploit and report clinical data


Corporate wide, Harmonized Clinical Data Structure

1. Effectual for:- single clinical trials - pooled databases (PDB)

2. Operational data structure, allowing:- data quality checks- ADS/ADaM generation- Ad hoc statistical analysis

3. Based on the principles of the CDISC data standards

ICBI - Objectives


Shown in three categories:1. Submission / Regulatory Compliance

2. Knowledge Generation

3. Effort & Time Saving

ICBI - Business Benefits


– Working with a data structure close to the one requested for Submission

• Allows traceability from analysis data (ADaM) back to raw data (BI-CDISC and plain SDTM)

• allows for semi-automated generation of plain SDTM and define.xml• is a one time effort per submission• is less time consuming• creates no external costs

– Having the same view on data as authorities• Increases transparency • Leads to higher efficiency / turn-around time in answering questions

Standardized Data Structure will - further enhance compliance to regulatory requirements - allow more efficient creation of submission package

ICBI - Business BenefitsSubmission / Regulatory Compliance 1


Working with one data structure across trials:• Allows easier creation of PDB and pooling of trial data• Leads to effective meta-analyses on project and/or substance level• Increases re-use of standard programs, program templates and views• Supports exchange between OPUs and functions (e.g. PK/PD, PGx, partners,

…)• Allows (semi-)automated load, transformation and incorporation of external

data from vendors, suppliers, pharmaceutical and collaboration partners• Leads to higher flexibility in assignments to trial & project tasks• Reduces time to answer of internal (various customers, e.g. medical affairs)

requests• Reduces time to answer of external (regulatory) questions

Standardized Data Structure will further enhance effective pooling of data and pooled analyses

2ICBI - Business BenefitsKnowledge Generation


Working with BI-CDISC facilitates downstream processes:• Semi-automated generation of define.xml for SDTM and ADS/ADaM

• no review cycles for define.xml generated externally• Same view on data as authorities

• increases transparency• results in higher efficiency in answering questions

• A higher degree of automation, making use of metadata (CDR)• enables more efficient programming• reduces validation efforts• Reduces effort for creation of standard ADS/ADaM

Standardized Data Structure will - establish a higher level of standardization - further enhance analysis with reduced timelines

3ICBI - Business BenefitsEffort & Time Saving




3. Approach



Chosen Approach for BI-CDSIC In line with the recommendations of the SDTM and Analysis Datasets

Implementation Expert Team for a CDISC data standards implementation we defined the following cornerstones for our data model:

1. Define a sponsor specific in-house data-structure (BI-CDISC) and create SDTM and ADaM/ADS in parallel from there

2. Definition of transformation rules from BI-CDISC to SDTM and from BI-CDISC to ADaM/ADS (but not creating ADS from SDTM)

3. The data model contains both collected and derived data

4. The data model will omit RELREC and SUPPQUAL (will only be created upon generation of plain SDTM for submission)

5. BI-CDISC will make use of the SDTM vocabulary

• SDTM-vocabulary defined as variable metadata and controlled terminology, not the SDTM structure

6. BI-CDISC is defined by metadata and (long-term vision) metadata shall drive the transformations from this BI-CDISC to SDTM and ADaM/ADS. Traceability from SDTM ADaM is sufficiently granted by including the SEQ variable in CDR and inherit it to SDTM/ADaM and/or metadata defining the various transformation steps


ICBI Data Flow through System Landscape

Study Setup

SubmissionTo

FDAO*C

Trial Database

SDTM,ADaM,

Tables, Listings,Profiles,

+Metadata,define.xml

Final Reportas isSDTMADaM

Trial 1

SDTM+

Trial 2

SDTM+

Pool as is

CDR (LSH)Trial Database / Substance DB

TransformCDR 1

DataLoad

PooledDatabaseO*C Export

noChange

as is nochange

nochange

as is Transform SDTM+

nochange

as is nochange

nochange

as is

ADS Dev.Displays Dev.

TransformCDR 2

Transform

Pooled DB

Load from O*C and Transform in CDR (LSH)

SDTMADaM

SDTMADaM

define.xml

define.xml

define.xml

as is

as is

Meta info

Trial specifics manually Master Mapping

Tablepartiallymanually


Cornerstones of ICBI There will be no impact on early processes

like study set up, data entry, and user friendliness of RDC. Data cleaning and discrepancy management remains in O*C

ICBI requires a certain upfront (once for each trial) effort for trial specific transformation to SDTM+ and its QC/validation

Once data are available in the O*C database, they are loaded into LSH. Loading is triggered by a completed Batch Validation session in O*C

After loading the data into LSH, they can be automatically transformed into the SDTM+ structure (Load and transformation steps can be combined in one LSH workflow)

ADS/ADaM will be created from SDTM+ and form the basis for reporting

The submission data sets in plain SDTM are created by sub-setting and restructuring out of SDTM+ (can be automated)


The define.xml can be created semi-automatically taking the meta data available in LSH thus improving quality (inconsistencies) and timely delivery of final submission data sets

To gather all meta information needed for SDTM, ADS and define.xml a process needs to be implemented to capture the meta information throughout the process (see Module “Meta Data Collection and Master Mapping Table”)

To enable DQRM reporting to be based on SDTM+, the data need to be available in SDTM+ structure early/close to First Patient In

Training would be required for all functions working with the data in LSH. The O*C part of the process would not be effected (Overview training recommended only)

Cornerstones of ICBI




3. Approach



Overall Approach

Sources BI-DM O*C BI-DM Plain SDTM BI-DM

Plain SDTM

OCViews

•SDTM Implementation Guide

•CDISC Controlled Terminology

Mapping Table

•BI-DM User Requirements•BI PDB Requirements•BI GLIB CT (formats)•ADaM IG•BI ADS Guideline•Data Quality Requirements

•T/PSAP•ADS Plan•Protocol•aCRF


Design Data Model based on two trials of indication A

Expand Data Model with two trials of indication B

Proove Data Model (PoC)– Create Pooled Database (PDB) of all four trials

– Re-create trial ADS from PDB

– Create submission SDTM from PDB

Overall Approach – Trials


Overall Approach – Teams

Treat/Exposure

Efficacy

Safety

Lab/Ext. Data

Keys & RelationsKeys & RelationsCT & FormatsCT & Formats

• One Rep from each Team

• One Rep from each Team


Overall Approach – Scope for Teams

•Lab - External Data•Safety•Efficacy•Treat. - Exposure - TD

Study A Study B

O*C Views available for the studies used for mapping •are the starting point for the mapping•are divided up among the groups according to topics

• topics are based on logical grouping of SDTM domains




3. Approach



Using --SEQ… --SEQ should not be used for any SAS/SQL evaluation

--SEQ is dynamically assigned and might change until a database is locked

• If BI-CDISC datasets are created multiple times prior to lock then –-SEQ will be assigned differently whenever rows/observations of data have been added or removed

In different snapshots of the same trial the value of --SEQ will not be consistently applied to common observations

The Keys and Relations team does not consider the above points to be issues, (to maintain consistency in --SEQ would be very difficult / impossible to achieve, with little / no gain)


I. Pooling Identifiers / Keys

Proposed Variables are:1. SUBSTANCE

2. PROJECT

3. STUDYID

4. USUBJID/PTNO

5. VISITNUM

6. TPTNUM

7. VISDT

8. --DT

9. --ONDT

10.--ENDT

11.--CAT

12.--SCAT

13.--TESTCD

14.--METHOD

15.--SPEC


ICBI – Interdomain Dependencies

Mappings are often not trivial– BI-CDISC variables should be

derived only once and from one single source

– Domains have to be created/populated in a defined order


CT Consolidation – LABNM Format

For LABNM (>1000 code/decodes) it was decided to split them out to three variables (LBTESTCD, LBSPEC and LBMETHOD)

In special cases additional variables required (position, fasting status, time, …)


Identified SDTM+Topic SDTM SDTM(+) Workload plain Workload plus R/B*

Numeric dates/times

All dates are CHAR (ISO8601)

Keep O*C dates (NUM) and ISO8601 dates in parallel

Mediumbecause all dates have to be transformed to ISO8601 and NUM for analysis

Lowbecause NUM dates are kept and used for analysis. No back-transformation necessary

B

Missing SDTM definitions

no definition available for some variables in SDTM V3.1.2

Have to be kept as plus variables: variables required into current XAE or XTRTGEN macro(N.B. – closely evaluate future need of variable as input to new X-Macros)

Not possible to create ADS from plain SDTM, because required variable for XAE and/or XGENTRT macro. Will not be available with plain SDTM

Very low effort expected, because the variable needed in the macros can be extracted as is from the available PLUS variable without complex referencing, transformations, derivations or imputations

R

Key concept

STUDYIDUSUBJIDDOMAIN--SEQ--GRPID--REFID--SPID

STUDYIDUSUBJIDDOMAINMeaningful Keys to be defined (based on content)

Very Highvalues of ID-variables are not unique across subjects.Only designed for merging parent domains to SUPPQUAL, CO, RELREC.Does not support merging by content across domains (e.g. XR to XD)

Mediumneeds to be defined when creating SDTM+,beneficial for analysis & reporting (no additional work)

R

* R – required, B - beneficial

e.g.

e.g.

e.g.



NUM - CHAR

Variables are of type CHAR in generalExample:USUBJID

--ORRES

Keep both, CHAR and NUM-type variablesExample:USUBJID"PTNO"--ORRES"--ORRESN"

MediumNumeric O*C values are converted to CHAR, then need to be converted back to NUM for analysis & reporting

LowConvert once to CHAR for SDTM.Keep numeric values from O*C as a plus for analysis & reporting (no re-conversion)

B

Code - Decode

Only Decode (CHAR)

Example:XRCATEPOCH

Have • Code (NUM)• associated SAS

format &• Decode (CHAR)

Mediumwithout formats it is not possible to reproduce all the options offered in the CRF

Very low R

No SUPPQUAL

SUPPQUAL Domain

No SUPPQUAL Domain, variables included in parent domainAdditional meta data required to identify qualifier information destined to SUPPQUALAdditional variable that contains the qualifier information that is destined to SUPPQUAL

HighMerging needed because information that clinically belongs together is scattered (search and merge).

MediumInformation that clinically belongs together is located in one Domain.One time effort to create plain SDTM (selecting and splitting).

B


e.g.

e.g.



Date/time imputation

Reported date/time (ISO8601)

Have •reported date/time

•imputed date/time

•imputation rule in parallel

HighIn case of incomplete dates, imputation needs to be done by hand (error prone process)

LowIf imputation rule is implemented in O*C views. Otherwise needs to be defined once for creation of SDTM+

B

Relationship to CRF/DCM

Not included Keep the DCM name where the variable originated from

MediumConnection between SDTM data and CRF is not readily available

Low Primarily to ease programming and help with debugging

B

Tracking of same patient in multiple trials (e.g. extension trial information)

• Previous Trial Number• Previous Patient Numbercould possibly be stored in the Subject Characteristic domain (SC). This needs to be investigated.

•Previous Trial Number

•Previous Patient Number

LowPrevious Trial NumberPrevious Patient Number should be scattered into the Subject Characteristic domain (SC).

Very lowThe collected variables need to be copied from O*C into SDTM+ (DM domain?). These two variables are collected at the site and need to be available in SDTM+ for CTR reporting and to facilitate reporting from the P/SDB.

R


e.g.

e.g.

e.g.


IBM Global Business Services.

Contacts Dr. Jens WientgesPeter Leister

Dr. Jens WientgesMailto:[email protected]: + 49 160 5826897

Peter LeisterMailto: [email protected]: +49 160 3671761

Documents

Implementation of CDISC at BI – Overview