The Study Database Robert R. Kelley, Ph.D.. Clinical and Translational Research Support Center...

Preview:

Citation preview

The Study Database

Robert R. Kelley, Ph.D.

C l i n i c a l a n d Tr a n s l a t i o n a l Re s e a r c h S u p p o r t C e n t e r

Outline

The Study Database

• Overview• Designing the Database• Building the Database

C l i n i c a l a n d Tr a n s l a t i o n a l Re s e a r c h S u p p o r t C e n t e r

Over

The Study Database

• The Study Database:– Tool(s) used to store data electronically for

queries and data analysis

C l i n i c a l a n d Tr a n s l a t i o n a l Re s e a r c h S u p p o r t C e n t e r

Components of the Study Database

The Study Database

Case Report Form

Study Database

Analysis Software

Query andMonitoring

Software

C l i n i c a l a n d Tr a n s l a t i o n a l Re s e a r c h S u p p o r t C e n t e r

Designing the Study Database

The Study Database

C l i n i c a l a n d Tr a n s l a t i o n a l Re s e a r c h S u p p o r t C e n t e r

The Case Report Form

The Study Database

Good database design starts with a good Case Report Form

Should be driven by the study protocol

Should support the study statistical plan

Should be divided into logical sections or multiple forms

C l i n i c a l a n d Tr a n s l a t i o n a l Re s e a r c h S u p p o r t C e n t e r

Carefully consider variable names

The Study Database case id

Case IDProcedure

Type

Physician

Comments

Temperature

Diastolic BP

Systolic BP

Use only lower case

procedure type

physician

procedure_type

Abbreviate when possible

dbp

temp

Use underscore to separate words

case_id

Consider prefixing exam_temp

Use the shortest variable name you can!!!

Be consistent!

!!

C l i n i c a l a n d Tr a n s l a t i o n a l Re s e a r c h S u p p o r t C e n t e r

Coding levels

The Study Database

Case IDSex

Congestive Heart Failure

Comments

Diabetes

Antibiotic

Diagnosis

For binary data always code with 1/0

1 = Male, 0=Female

1=Yes, 0,No

For multiple choice data, code 1. . .N

1, Beta-lactam monotherapy only2, Beta-lactam + macrolide combination only3, Beta-lactam + quinolone combination only4, Quinolone monotherapy only

For Yes/No, always code with 1/0

Don’t forget to code for NA/UNKNOWN

1 = Yes, 0=No -1 UK

Use alpha numeric codes when they make sense

blmono, Beta-lactam monotherapy onlyblmac, Beta-lactam + macrolide combination onlyblq, Beta-lactam + quinolone combination onlyquin, Quinolone monotherapy only

C l i n i c a l a n d Tr a n s l a t i o n a l Re s e a r c h S u p p o r t C e n t e r

The Data Dictionary

The Study Database

Variable nameCan be a simple spreadsheet with . . .

Data type

Coding levels

Coding restrictions

Description

Units

C l i n i c a l a n d Tr a n s l a t i o n a l Re s e a r c h S u p p o r t C e n t e r

An example. . .

The Study Database

Variable Name Description Data Type Coding Levels Restrictions Unitscase_id Case Identifier Integer >0 NAage Age Integer Continuous >18 Yearssex Sex Integer 1,Yes 0,No NA NAdoa Date of Arrival Date Short Date >Study Start NA

dod Date of Discharge Date Short Date >Study Start and >doa NAicu Admitted to ICU Integer 1,Yes 0,No NA NArul Right Upper Lobe Integer 1,Yes 0,No NA NArml Right Middle Lobe Integer 1,Yes 0,No NA NArll Right Lower Lobe Integer 1,Yes 0,No NA NAlul Left Upper Lobe Integer 1,Yes 0,No NA NAlll Left Lower Lobe Integer 1,Yes 0,No NA NApe Pleural Effusion Integer 1,Yes 0,No NA NA

resp_rate Respiratory Rate Integer Continuous >60 Breaths/Minutetemp Temperature Integer Continuous >35 and <40.5 Celsius

psiPnuemonia Severity Index Integer Continuous 0-350 NA

Defined according to rules

Can be as detailed as necessary

This will often be database

dependent

Very important

for consistency

Derived from

protocol and study definitions

Very important – especially

for lab values

C l i n i c a l a n d Tr a n s l a t i o n a l Re s e a r c h S u p p o r t C e n t e r

Building the Study Database

The Study Database

C l i n i c a l a n d Tr a n s l a t i o n a l Re s e a r c h S u p p o r t C e n t e r

Types of Studies from a Data Perspective

The Study Database

Large Studies

Small Studies

> 100 cases

<20Variables

<100 cases

> 20 Variables

C l i n i c a l a n d Tr a n s l a t i o n a l Re s e a r c h S u p p o r t C e n t e r

Tools for Smaller Studies

The Study Database

Spreadsheet Applications

Personal Database

Applications

MS Excel

OpenOffice-Calc

Apple - Numbers

MS Access

OpenOffice-Base

Filemaker Pro

C l i n i c a l a n d Tr a n s l a t i o n a l Re s e a r c h S u p p o r t C e n t e r

Excel

The Study Database

Many people in science and medicine are already familiar with Excel

Supports descriptive statistics and calculations

Provides rudimentary data visualization

Easy to back up and have multiple copies for safekeeping. MS Excel

LimitationsStrengths

No capability for metadata

Need to encrypt manually

Does not provide authentication, authorization, audit trail

Inadvertent data corruption

Depending on verison, may be limited to 256 columns (variables)

C l i n i c a l a n d Tr a n s l a t i o n a l Re s e a r c h S u p p o r t C e n t e r

Access

The Study Database

Provides a front-end for easy form creation

Supports descriptive statistics and calculations

Provides rudimentary data visualization

Easy to back up and have multiple copies for safekeeping.

MS Access

LimitationsStrengths

Must understand the basics of database design to be able build

effective databases

Need to encrypt manually

Does not provide authentication, authorization, audit trail

Programming help may be required for complicated features

C l i n i c a l a n d Tr a n s l a t i o n a l Re s e a r c h S u p p o r t C e n t e r

Tools for Larger Studies

The Study Database

Database / Applications

OpenClinicaLabKe

y

REDCap

Medidata Rave

Custom Software

C l i n i c a l a n d Tr a n s l a t i o n a l Re s e a r c h S u p p o r t C e n t e r

The Study Database

Unlike Excel and Access these tools require infrastructure and specialized skills to

maintain.

C l i n i c a l a n d Tr a n s l a t i o n a l Re s e a r c h S u p p o r t C e n t e r

Tools for Larger Studies

The Study Database

Infrastructure

Licenses for software (much more expensive than Excel/Access)

Technical Skills to install and maintain system

Place to “host” your database

RecommendationHire a dedicated person

to administer this system or contract it out.

C l i n i c a l a n d Tr a n s l a t i o n a l Re s e a r c h S u p p o r t C e n t e r

OpenClinica

The Study Database

Specifications• Windows Server 2008 or Linux

Variant• Java• Java Application Server

(Apache)• Database: PostgreSQL 8.4 or

Oracle 10.2g

• Designed for capturing data for clinical trials

• Has paid and free versions• Can be hosted locally or by Akaza

Research LLC• Supports exporting data to:

– HTML– Tab-Delimited– SPSS Syntax and Data– CDISC ODM xml 1.3 and 1.2– CDISC OpenClinica Exentsion 1.3 and 1.2

https://www.openclinica.com

C l i n i c a l a n d Tr a n s l a t i o n a l Re s e a r c h S u p p o r t C e n t e r

LabKey

The Study Database

Specifications• Windows Server 2008 or Linux

Variant• Java• Java Application Server

(Apache)• Database: PostgreSQL 8.4 or

Microsoft SQL Server

• Designed for storing lab data such as:– Specimen data– Gene Sequences– Flow Cytometry Data– Proteomics Data– Any Assay Data

• It also supports storing other clinical data• Has paid and free versions• Can be hosted locally or by Akaza Research LLC• Supports exporting data to:

– Excel– Text– Javascript– R– SAS

https://www.labkey.org/

C l i n i c a l a n d Tr a n s l a t i o n a l Re s e a r c h S u p p o r t C e n t e r

Medidata Rave

The Study Database

• A comprehensive commercial clinical data management system

• Very flexible• Expensive and requires significant

technical support• Supports exporting data to:

– SAS– CSV– XML

Specifications• Windows Server 2008 or Linux

Variant• Java• Java Application Server

(Apache)• Database: PostgreSQL 8.4 or

Microsoft SQL Server

http://www.mdsol.com/products/rave_overview.htm

C l i n i c a l a n d Tr a n s l a t i o n a l Re s e a r c h S u p p o r t C e n t e r

REDCap

The Study Database

http://project-redcap.org/

Specifications• Windows Server 2008 or Linux

Variant• PHP• Database: MySQL Server• SMTP email server

• A comprehensive open source clinical data management system

• Supports creating:– Surveys– Data Entry Forms (eCRFs)– Longitudinal Studies

• Free, but requires significant technical support

• Supports exporting data to:– Excel– SPSS– SAS– R– STATA

C l i n i c a l a n d Tr a n s l a t i o n a l Re s e a r c h S u p p o r t C e n t e r

Custom Software

The Study Database

If your requirements are extremely particular you may

want to build your own system

Can be time consuming and resource intensive

NOT RECOMMENDED

C l i n i c a l a n d Tr a n s l a t i o n a l Re s e a r c h S u p p o r t C e n t e r

Questions?

C l i n i c a l a n d Tr a n s l a t i o n a l Re s e a r c h S u p p o r t C e n t e r

References

• Beyond CRES: Excel - Informatics tools for managing your clinical research data, Anderson N. www.iths.org/bmi

• Why not use Excel for data management?, Carlin L., Bondy J, Wolfe P. http://connect.ucdenver.edu/excel/