31
Bringing OpenClinica Data into SAS [email protected] 780-248-1170

Bringing OpenClinica Data into SAS

Embed Size (px)

DESCRIPTION

OpenClinica Global Forum 2010. A Java tool to create \'SAS friendly\' XML from OpenClinica

Citation preview

Page 1: Bringing OpenClinica Data into SAS

Bringing OpenClinica Data into SAS

[email protected]

Page 2: Bringing OpenClinica Data into SAS

CRIC supports a wide variety of studies ‘Regulatory’ clinical trials Many different types of academic study Variable size and complexity

Investigators design their own CRFs CRIC has limited control over design strategies and

CRF consistency.

Analysis requirements and data formats vary

SPSS, Stata, SAS, Excel.

CRIC’s Preferred data handling tool is SAS

CRIC and OpenClinica

Page 3: Bringing OpenClinica Data into SAS

OpenClinica exports seem difficult for our users to work with.

Data structures vary depending on the data content.

CRF versions (repeat as extra columns) Group contents (number of repeats)

Multi-select objects difficult to handle. Must be ‘broken’ into separate variables for analysis.

Null values represented as text in otherwise numeric variables

OpenClinica Export

Page 4: Bringing OpenClinica Data into SAS

The Challenge We wanted to:

Produce consistently usable data for minimal up front effort.

Get data that could easily be transferred into different formats.

Produce tall, thin, de-normalized data sets suitable for data management purposes.

Leverage CRF metadata to add value: Dataset labels Variable labels SAS formats and informats SAS special missing values.

Page 5: Bringing OpenClinica Data into SAS

Create ‘SAS friendly’ XML to be read by the XML Libname engine.

Create a SAS XML Map file to assign labels, data types, informats and formats.

Generate a CNTLIN data set in the XML suitable for use by PROC FORMAT.

Note: The XML file can also be imported directly into MS Access.

The Solution

Page 6: Bringing OpenClinica Data into SAS

SAS macros or external utility?◦ Hi complexity

Ensure OpenClinica metadata translated into legal SAS names.

Map OC hierarchy to SAS data sets. CRFs, sections, groups and data items to tables, rows

and columns. De-duplicate object names

◦ No resource to develop complex macros

Development Approach

Page 7: Bringing OpenClinica Data into SAS

Command Line Java Utility◦ Programmer available

(I would have to write SAS code myself!)

◦ Capable development environment◦ Portable (Windows / Linux)◦ Callable from within SAS

The Choice

Page 8: Bringing OpenClinica Data into SAS

Enter connection parameters and study identifier (interactively or command line)

Connect to Postgres via ODBC

Read study metadata

Manipulate the metadata

Write map file

Read study data

Write data file

Data Processing

Page 9: Bringing OpenClinica Data into SAS

Legalize Names SAS names <= 32 characters Must start with a letter or underscore Format names cannot end in a number

De-duplicate names Multiple CRFs may contain the same section and

response option names. Duplicate names have numbers and underscores

appended.

Metadata Manipulations

Page 10: Bringing OpenClinica Data into SAS

CRFs◦ No ‘top level’ mapping between CRFs and data

sets.

CRF Section -> SAS data set CRF sections contain logically grouped data – CRFs

may not! CRFs containing multiple sections result in multiple

output data sets. Every data item contained within a section is output

to the same data set. Section label -> dataset name Section title -> dataset label

Metadata Manipulations

Page 11: Bringing OpenClinica Data into SAS

Groups -> Rows Ungrouped section data repeated in each row Each repeat becomes a separate row in the data set Rows are numbered to provide a unique key based

on their order within the group. Multiple groups contained within the same section

are merged based on order within the groups. Where groups contain unequal numbers of rows

missing values result.

Metadata Manipulations

Page 12: Bringing OpenClinica Data into SAS

CRF items -> dataset variables Item_name -> variable name Description_label -> variable label

Calculate length of character variables SAS has no support for VARCHARs. Explicitly

specifying variable length saves considerable space on disk.

Metadata Manipulations

Page 13: Bringing OpenClinica Data into SAS

A new column is created for each response value Column names based on item_name Columns labeled based on item_label and response

option value. Columns contain 1 or 0 to indicate selected or

unselected.

Multi-select and Checkbox items

Page 14: Bringing OpenClinica Data into SAS

Response option lists become SAS formats and informats.

Format names created from CRF item’s response_label.

Format names legalized and de-duplicated. If separate CRFs contain identical response option

lists only one format results.

Formats and Informats are written to the XML as a new data table.

This is used as a CNTRLIN data set for PROC FORMAT.

Response Options

Page 15: Bringing OpenClinica Data into SAS

Informats are created to read numeric data and handle OpenClinica null values.

CRF Dates

proc format;invalue crfdate 'ASKU' = .k

'NA' = .a'NASK' = .d'NI' = .i'NP' = .p'OTH' = .o'UNK' = .uother = [mmddyy10.];

run;

Missing Values

Page 16: Bringing OpenClinica Data into SAS

Numeric Response Options

proc format;invalue bestnull 'ASKU' = .k

'NA' = .a'NASK' = .d'NI' = .i'NP' = .p'OTH' = .o'UNK' = .uother = [best10.];

run;

Missing Values

Page 17: Bringing OpenClinica Data into SAS

Formats are created for CRF data. Response options

proc format;value yesno 0 = 'No'

1 = 'Yes'.k = 'ASKU'.a = 'NA' .d = 'NASK'.i = 'NI' .p = 'NP' .o = 'OTH' .u = 'UNK';

run;

Missing Values

Page 18: Bringing OpenClinica Data into SAS

Dates

proc format;value crfdate .k = 'ASKU'

.a = 'NA'

.d = 'NASK'

.i = 'NI'

.p = 'NP'

.o = 'OTH'

.u = 'UNK‘Other = [date9.] ;

run;

Missing Values

Page 19: Bringing OpenClinica Data into SAS

Numeric Data

proc format;value bestnull .k = 'ASKU'

.a = 'NA'

.d = 'NASK'

.i = 'NI'

.p = 'NP'

.o = 'OTH'

.u = 'UNK‘Other = [best10.] ;

run;

Missing Values

Page 20: Bringing OpenClinica Data into SAS

CRF Data◦ One data set per CRF section

Each row contains: Study ID Site ID Subject ID Study event name Event start and end date CRF Name CRF Version

Data Set Output

Page 21: Bringing OpenClinica Data into SAS

Subject Data List of subjects including site, secondary ID, group,

etc.

Event Data List of subjects study events including start date, end

date and status.

CRF Status◦ List of subject CRFs including event details, CRF

version, creation date, completion date and status.

Discrepancies

Output Data Sets

Page 22: Bringing OpenClinica Data into SAS

Data for removed subjects is not exported.

PHI data remains encrypted .

Output Data Sets

Page 23: Bringing OpenClinica Data into SAS

C:> java -jar export.jar---------------------------------------- Export Output: ---------------------------------------- MAP FILE: export.map.xml EXPORT FILE: export.xml----------------------------------------Postgresql driver loaded Enter Database url (default: localhost):Database port (default: 5432):Database name (default: openclinica):username (default: clinica):password:  Enter Export file name (default: derived from study):Enter Map file name (default: derived from study):

Interactive Execution

Page 24: Bringing OpenClinica Data into SAS

Successful connection to database openclinica on jdbc:postgresql://localhost:5432/

 Please choose a study:---------------------- 1) Study1 2) Study2 3) Study3 4) Study4==> 1 Retrieving study metadataCreating subject tableWriting formats to .xml fileWriting subjects to .xml fileRetrieving study item dataWriting study item data to fileCompleteFiles generated: study1.map.xml Study1.xml

Interactive Execution

Page 25: Bringing OpenClinica Data into SAS

Command line options may be used rather than prompts. Options include:

Host, database, ID and password Study OID File names Suppression of map file Creation of ‘SPSS friendly’ SAS data sets

Minimal formatting allows data sets to be exported to SPSS using PROC EXPORT.

Command line options allow the utility to be executed from within SAS.

Command Line Options

Page 26: Bringing OpenClinica Data into SAS

Define libraries

libname ocdata xml92 “data_file.xml" xmlmap=“map_file.map“ access=readonly;

libname library “c:\project\fmt";

libname stdylib “c:\project\data";

SAS Code

Page 27: Bringing OpenClinica Data into SAS

Execute the Import%let scommand =java -Xmx256m -jar c:\export\export.jar;

%let shost =-h 10.11.12.13;

%let sport =-p 5432;

%let sstudy =-soid S_STDY1234;

%let sdatabase =-D openclinica;

%let suser =-U dbuserid;

%let spswd =-P password;

%let spss = ;

X "&scommand &shost &sport &sstudy &sdatabase &suser &spswd &smapFile &sdataFile &spss";

SAS Code

Page 28: Bringing OpenClinica Data into SAS

Create the Format Catalog from the XML

proc sort data=ocdata92.fmtlib out=work.fmtlib;

by fmtname type start;

run;

proc format cntlin=work.fmtlib library=library fmtlib;

run;

SAS Code

Page 29: Bringing OpenClinica Data into SAS

Copy the Data Sets

proc datasets library=ocdata92;

copy out=studylib;

exclude fmtlib;

quit;

SAS Code

Page 30: Bringing OpenClinica Data into SAS

Import into SAS

If we have time:◦ XML Structures◦ Import into Access◦ Import into Excel

Do It!

SAS 9.2 (English).lnk

Page 31: Bringing OpenClinica Data into SAS

Rick Watts

[email protected]

780-248-1170

Contact