18
Using De-Identified Data: 10 practical steps Jennifer Creasman CTSI Consultation Services

Using De-Identified Data: 10 practical steps De-Identified Data: 10 practical steps Jennifer Creasman CTSI Consultation Services

  • Upload
    dolien

  • View
    221

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Using De-Identified Data: 10 practical steps De-Identified Data: 10 practical steps Jennifer Creasman CTSI Consultation Services

Using De-Identified Data: 10 practical steps

Jennifer Creasman CTSI Consultation Services

Page 2: Using De-Identified Data: 10 practical steps De-Identified Data: 10 practical steps Jennifer Creasman CTSI Consultation Services

RDB De-identified Flat Files, reduced EHR data but…

§ Number of Tables: 16

§ Total File Size: 209 GB

…and growing

You can’t read these into MExcel or MSAccess – you

need other tools..

Still A Tidal Wave of Data

5/9/17 Clinical Data Colloquium 54 data.ucsf.edu

Page 3: Using De-Identified Data: 10 practical steps De-Identified Data: 10 practical steps Jennifer Creasman CTSI Consultation Services

Ten Practical Steps to Staying on Top of a Tidal Wave of EHR Data

Assumptions:

5/9/17 Clinical Data Colloquium 55 data.ucsf.edu

§ You have a clearly defined research question

§ You have clinical expertise in the area of research

§  If you decide to work with a Data Scientist /Programmer -- All steps still apply

Page 4: Using De-Identified Data: 10 practical steps De-Identified Data: 10 practical steps Jennifer Creasman CTSI Consultation Services

Step 1: Find the Right Tool for You!

§ UCSF Licenses: https://it.ucsf.edu/services/licensed-software

§ MyResearch Platform

§ Single-user license option

Language-oriented Software Programs

5/9/17 Clinical Data Colloquium 56 data.ucsf.edu

Page 5: Using De-Identified Data: 10 practical steps De-Identified Data: 10 practical steps Jennifer Creasman CTSI Consultation Services

Step 2: Develop Your Programming Skills

§ 10,000 hours rule: Just do it!

§ Data Science Courses https:// www.coursera.org/data-science

§  Join a group https://www.meetup.com/topics/computer-programming/

§ Become friends with other programmers. Eat lunch with them.

5/9/17 Clinical Data Colloquium 57 data.ucsf.edu

How Do I Become a Better Programmer?

Page 6: Using De-Identified Data: 10 practical steps De-Identified Data: 10 practical steps Jennifer Creasman CTSI Consultation Services

Step 3: Understand Your data

§ Review the RDB documentation

§ Study the relationships between the tables

§ Open the files

§ Look at the data

Working with RDB De-identified Flat Files

5/9/17 Clinical Data Colloquium 58 data.ucsf.edu

Multiple diagnoses per encounter

Page 7: Using De-Identified Data: 10 practical steps De-Identified Data: 10 practical steps Jennifer Creasman CTSI Consultation Services

§ First bullet point

•  Secondary bullet point

‒  Tertiary bullet point

Introductory sentence Arial – 18pt font – Picture of patient file

5/9/17 Clinical Data Colloquium 59 data.ucsf.edu

De-identified: Patient File Observations: One record per patient

Page 8: Using De-Identified Data: 10 practical steps De-Identified Data: 10 practical steps Jennifer Creasman CTSI Consultation Services

De-identified: Encounter File

§ First bullet point

•  Secondary bullet point

‒  Tertiary bullet point

Introductory sentence Arial – 18pt font – picture of encounter file

5/9/17 Clinical Data Colloquium 60 data.ucsf.edu

Observations: Multiple encounters per patient

Page 9: Using De-Identified Data: 10 practical steps De-Identified Data: 10 practical steps Jennifer Creasman CTSI Consultation Services

De-identified: Flowsheet Data

§ First bullet point

•  Secondary bullet point

‒  Tertiary bullet point

Introductory sentence Arial – 18pt font – Picture of flowsheet data

5/9/17 Clinical Data Colloquium 61 data.ucsf.edu

Observations: Multiple values per encounter

Page 10: Using De-Identified Data: 10 practical steps De-Identified Data: 10 practical steps Jennifer Creasman CTSI Consultation Services

Step 4: Don’t Drown in Your Data!

§ SAS program examples are available in RDB folder

•  Easy to use if identifying patients using ICD9/10 codes and/or patient’s demographic data

•  Otherwise, complex programming might be required

§ Start with a smaller set of variables

Reduce the files to only include your study cohort

5/9/17 Clinical Data Colloquium 62 data.ucsf.edu

Page 11: Using De-Identified Data: 10 practical steps De-Identified Data: 10 practical steps Jennifer Creasman CTSI Consultation Services

Step 5: Get Dirty!

§ Review the literature: What variables are expected to be there?

§ Develop an explicit instructions on how outcomes and covariates should be derived from the available data

§ Provide an explicit and exact recipe for new calculated variables

§ Understand limitations of the data

Own and drive the research

5/9/17 Clinical Data Colloquium 63 data.ucsf.edu

Page 12: Using De-Identified Data: 10 practical steps De-Identified Data: 10 practical steps Jennifer Creasman CTSI Consultation Services

Step 6: Develop Project Specific Documentation

§ Create a mock Table 1

§ Start with word descriptions

§ Specify variable names, don’t leave anything open to interpretation

Plan Ahead

5/9/17 Clinical Data Colloquium 64 data.ucsf.edu

Calculated Variable  

Description  

reinsertion   whether a foley catheter was reinserted during the hospitlization for any reason with any time period  

new foley during hospitalization  

placement time is after hospital admission time for patients that we admitted without a foley catheter  

qmonth change  

new placement time minus previous placement time >25 days  

Variable D No D

P

Age [Patient_Age]  

Gender [Patient_Sex]  

BMI [Vitals_BMI]  

Race [Patient_Ethnicity]

 

Smoking [Patient_smoking]

 

Table 1. Characteristics of inpatient PAD populations with and without depression

Page 13: Using De-Identified Data: 10 practical steps De-Identified Data: 10 practical steps Jennifer Creasman CTSI Consultation Services

Step 6: Develop Project Specific Documentation

Each variable might require a separate calculation

5/9/17 Clinical Data Colloquium 65 data.ucsf.edu

Calculated Variable  

Description  

reinsertion   whether a foley catheter was reinserted during the hospitlization for any reason with any time period  

new foley during hospitalization  

placement time is after hospital admission time for patients that we admitted without a foley catheter  

qmonth change  

new placement time minus previous placement time >25 days  

Page 14: Using De-Identified Data: 10 practical steps De-Identified Data: 10 practical steps Jennifer Creasman CTSI Consultation Services

Step 7: Create Reproducible Code

§ Read raw de-identified files directly into programming software

§  Use only the program to manipulate the data

§ Add text to describe the purpose of the program at a high level and imbed it in the code to describe the specific task

§ Have an organized file structure /Data /Documents /Programs

Tips for Easy Reproducibility

5/9/17 Clinical Data Colloquium 66 data.ucsf.edu

Page 15: Using De-Identified Data: 10 practical steps De-Identified Data: 10 practical steps Jennifer Creasman CTSI Consultation Services

Step 8: Ask for Help

§  Cores at UCSF

§  UCSF Library https://www.library.ucsf.edu/

•  Workshops & Special Events

•  Data Science Training Opportunities

§  CTSI’s Consultation Services

•  Study design experts to help define covariates and outcomes from EHR

•  Programming experts to help translate your recipe into code

§  And more …

UCSF Resources

5/9/17 Clinical Data Colloquium 67 data.ucsf.edu

Image credit: springcreekanimalhospital.com

Page 16: Using De-Identified Data: 10 practical steps De-Identified Data: 10 practical steps Jennifer Creasman CTSI Consultation Services

Step 9: Prepare an Analytical Dataset and Codebook (see Step 6)

§ Each row represents a different a observation, with potentially repeated observations for individual patients

§ Each column represents a different variable in your dataset

Merge study variables into one nice data frame

5/9/17 Clinical Data Colloquium 68 data.ucsf.edu

Page 17: Using De-Identified Data: 10 practical steps De-Identified Data: 10 practical steps Jennifer Creasman CTSI Consultation Services

Step 10: Analyze and Publish Your Results

§ Project started 1/2015, submitted 4/2016 and published 4/2017

§ Project started 6/2016 and was published 3/2017

Easier said than done!

5/9/17 Clinical Data Colloquium 69 data.ucsf.edu

Page 18: Using De-Identified Data: 10 practical steps De-Identified Data: 10 practical steps Jennifer Creasman CTSI Consultation Services

Real-time Feedback

5/9/17 70

slido.com

On your phone, tablet, laptop – Go to:

clinicaldata

Enter event code:

Clinical Data Colloquium data.ucsf.edu