Upload
lamthuan
View
217
Download
0
Embed Size (px)
Citation preview
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
If it is in the EHR it must be true
Using EHR data for research
Keith Marsolo, PhD
Jareen Meinzen-Derr, PhD
Bin Huang, PhD
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Outline of Discussion• Jareen Meinzen-Derr - Epidemiologist
– Introduction to using EHR in research, advantages and methodologic limitations/challenges
• Keith Marsolo – Informaticist
– Overview of data abstraction and challenges, introduction to large network EHR- based registry (PCORNet)
• Bin Huang - Biostatistician
– More in-depth look at the challenges and implications from the analysis perspective along with potential solutions and considerations
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Large-scale electronic
health record-based
research is more
challenging
than traditional
retrospective studies
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
A Primer• My target population of interest is all children
with autism spectrum disorder (ASD) seen at
Cincinnati Children’s
– How do I define ASD?
• ICD9? ICD10?
• Age? ASD diagnostic assessments?
– Where is my population?
• Specific divisions?
• Any clinic? Inpatient vs. outpatient?
• With or without follow-up?
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
A Primer
MRN ICD-9 Clinic Date
0001 299.0 Dev Peds 01/01/2015
0001 299.0 Dev Peds 10/01/2015
0002 299.0 Optho 02/01/2013
0003 315.31 Dev Peds 03/01/2012
0003 299.8 Dev Peds 03/01/2013
0004 348.39 Psych 01/01/2009
0004 348.39 Dev Peds 01/01/2010
Only record in chart
Expressive language disorder Do you include previous visit?
Static encephalopathy
Notes state ASD
assessments indicate
ASD
If I include this code in search, I will receive
thousands of records who do not have ASD
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Electronic Health Records
• A longitudinal collection of electronic health information for and about persons
• Immediate electronic access to person- and population-level information by authorized, and only authorized, users
• Provision of knowledge and decision support that enhance the quality, safety, and efficacy of patient care
• Support of efficient processes for healthcare delivery
IOM
EHR use in research
• Surge in the use of EHR (12.2%-2009 to 75.5%- 2014)
– EHR-based outcomes research studies have increased >6-fold
• Accommodate collection of structured, coded, electronically available data
– Can be used to build longitudinal histories
• All access to health records from multiple locations
– Electronic transmission of records
• More efficient/less expensive alternative to clinical trials
• Can be used to populate databases for both clinical and research purposes
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Great Opportunities• Quality improvement purposes
– Facilitate data sharing, decision-making, efficient administrative operations
• Recruiting for prospective studies/clinical trials
• Public health initiatives– Facilitate surveillance of infectious diseases, disease
outbreaks, chronic illnesses
• Replicating results of randomized controlled trials
• Conduct “Big Data” research– Rich data to study disease progress, health
disparities, clinical outcomes, treatment effectiveness, efficacy of public health interventions
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
How do I Begin?
JUST LIKE YOU WOULD ANY
OTHER OBSERVATIONAL
RESEARCH STUDY THAT USES
SECONDARY DATA
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Shifts in primary responsibility
Study design protocol
Create Data Collection Tool
Manual data abstraction &
entry
Manual verify missing & erroneous
data
Data management
Data analysis
Study design protocol
Electronic Data
Abstraction
Data management
verify data
Manual verification
Data analysis
Clinical researcher
Methodologist should be engaged throughout Methodologist
Clinical researcher Clinical researcher
Methodologist
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
How do I Begin?• What is your research question?
– Is it descriptive vs. analytic?
– Does it have a clear testable hypothesis?
• What are the appropriate study designs?
• Is the information needed to answer question present, accessible, & reliable in the EHR?
• How will you extract and analyze the information?
– What are additional data management and methods needs?
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Before you begin
• Crucial to develop criteria for identifying patients who have condition to be studied
– Data may need to be searched from problem lists, billing codes, medication lists, physical exam results across any/all possible clinic sources
– Must identify how long a patient has had a problem
– Develop processes for solving issues such as identification of first diagnosis
• Study subjects are patients, not participants
– Part of an “open-cohort” and enter or leave at any time
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Have an Awareness
• Known limitations of EHR data must be considered
– In the study design
– In the data collection/abstraction
– In the data analysis
– In the interpretation
• Consequences can include:
– Flawed conclusions
– Altered policy decision or clinical practice
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
EHRs are designed for
clinical care, not research• Not structured in a way that facilitates research
– Providers decide where to put information
– Information may be entered free-text (not
structured or finite list)
– Providers use different terms for same info
– Information not always stored in a way that is
readily searchable
– Data not important in clinical care may be missing
Awareness:
Poor Data Quality• Quality variable due to differences in measurement,
recording, information systems, and clinical focus
• Serious threat to validity and generalizability of clinical research findings
• Context dependent– Same elements deemed high quality for one use and poor
quality for different use
• Presence of extreme values may be irrelevant in determining a median rough estimate of #eligible patients for study
• Same extreme values may have significant undue influence on results of algorithms, or analytic methods sensitive to extreme values
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Incomplete Data• Due to fragmentation of healthcare systems
– Patients moving between systems for special referrals or emergency care
• Due to “poor”/inaccurate documentation (on the part of patients and healthcare providers)– Lack essential information such as treatment
outcomes
• Sick patients often have more data– Non-random missing
• Complete information about patient vs. complete information about patient’s encounter
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Examples in the literature• 30-40% of patients have clinical visits across
multiple institutions
• 55% of clinical research studies supplemented with non-EHR sources of data
– 40% supplemented with patient-reported data
• 49% of patients with ICD-9 pancreatic cancer did not have corresponding pathology documentation (incomplete or incorrect)
Bourgeouis 2010; Finnell 2011; Thiru 2003; Dean 2009; Botsis 2010
“Sicker” Have More Data
Figure 5. Average number of days with data per patient by ASA class. For both
medication orders and laboratory results, all ASA Classes are significantly different
except for Classes 1 and 2.Rusanov 2014
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Sicker have more complete data
Figure 4. Complete records by ASA Class where complete records are those having
at least seven values in each of the two categories (medication orders and laboratory
results).
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Data Quality
• Data entry errors
– Reported as high as 26.9% Goldberg 2008
– Medication discrepancies common
• Data coding, standardization, extraction
– Free text narrative
– Inconsistent terms, phrases, abbreviations
– Billing purposes
– Diagnostic codes may be recorded for detection or “rule out” purposes
Meredith L et al. 2008
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Study Design Still Matters• Errors can occur in selecting a cohort and
characterizing that cohort
• Errors in a small number of cases can have a relatively large effect on outcomes
• Manual review of cases or a sample of cases is invaluable in improving the sample
• May be difficult to find “healthy” patients with sufficient data (comparison cohorts)
• Requires special methodologic approaches to selecting complete patient records from EHR databases while avoiding bias
Hripcsak 2011; Weiskopf 2013
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Impact of Data Errors
Hripcsak 2011
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Bias Challenges
Selection bias: Subset of individuals studied is
not representative of the population of interest
– Selection is not random
• Can distort assessments of measures (e.g.
disease prevalence or exposure risk)
• Estimates not as generalizable
– Ex: Including only patients with complete data
– Ex: Generalizing findings from a hospital-based
study to all who may have a condition
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Bias ChallengesMeasurement bias: Errors in measurement and/or
data collection
• Instrument calibration
• Data collection variability – depending on the field,
clinician judgement plays a role
• Patient’s ability to complete assessment/provide
history (recall)
• Use of certain codes/data to measure exposure
• Clinician decides how long to “follow” patient
– Impact calculation of prevalence, incidence, risk ratios
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Confounding
• Distortion of the estimated effect of exposure
on outcome caused by the presence of an
extraneous factor associated with both
exposure and outcome
– SES factors, lifestyle choices, age
• Without consideration, estimated effect of
treatment may be actually caused by some
other factor
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Confounding Can Hurt
• EHR study of hospitalized patients >65
years, NSAID use associated with 32%
mortality risk reduction
• However, after included additional specific
confounders and analytical techniques,
NSAID use associated with 6% mortality
risk increase
– Addressed unmeasured confoundingSturmer 2005
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Confounding
• Confounding by indication
– Treatment choices influenced by severity or
duration of patient’s disease
– Also influences outcome of treatment
– Sicker patients receive different treatments
– Sicker patients have different (worse) outcomes
• Cannot be adjusted for in conventional
regression analyses
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
From EHR to Clinical Evidences
EHR recorded at the point of
care
Data Extractions
Data Wrangling
Data Curation
Data Analyses
Causal Inference
Decision Theory
Evidence Based
Decision
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
EHR data – entry to extract
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Sources of variability – data entry
*partial list
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Sources of variability – ETL
*partial list
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Sources of variability – User request
*partial list
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Sources of variability - analyst
*partial list
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Sources of variability – self-service tools
*partial list
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Why is this so complicated?
• Conceptual idea of clinical process does not translate to how data are captured in the EHR
• Many different ways to document same piece of information
– Workflow used to collect data often dictates where those elements are stored in reporting database
– Most researchers lack understanding of these workflows
• Quality of results then depend on how question is asked, skill of analyst
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Why is this so complicated?
• Conceptual idea of clinical process does not translate to how data are captured in the EHR
• Many different ways to document same piece of information
– Workflow used to collect data often dictates where those elements are stored in reporting database
– Most researchers lack understanding of these workflows
• Quality of results then depend on how question is asked, skill of analyst
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Example – encounters (CCHMC FY14)
• Annual Report– Total patient encounters: ~1.2 million
– ED visits: ~100K
– Admissions (including short stay): ~31K
– Outpatient: ~1 million
• EHR– Total patient encounters: ~3 million
– ED admissions to inpatient: ~145K
– Inpatient: ~28K
– Ambulatory: ~2.8 million
• Encounter != encounter
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Just pull data from “ambulatory”
encounters…EEG
EXERCISE
CARDIOLOGY TESTING
PUMP/CGM INITIATION ORDERS
MED TAPER SCHEDULE
GENETIC COUNSELOR
NEONATOLOGY TESTING
CARE CONFERENCE - PATIENT/FAMILY PRESENT
HOME VISIT - PALLIATIVE CARE
ABUSE REPORTING
CARE COORDINATOR
SPECIAL NEEDS SUMMARY
EARLY INTERVENTION
HI NEURODEVELOPMENTAL CLINIC TRACKING
INFUSION ORDERS
ENT CLINIC VISITS
FEES/VOICE
HEPATOBLASTOMA LIVER TRANSPLANT FOLLOW UP
PRE-ADOPTION ENCOUNTER
EB PLANNING
FEES CLINIC
VPI - ENT/SPEECH
INTAKE
HVMC PLANNING
PRE-OP PHYSICAL
PLAN OF CARE
ENT INPATIENT VISIT
HOSPITAL TO HOSPITAL TRANSFER
DEVELOPMENTAL TESTING
BIOETHICS CONSULT
ENDO STIM TESTING
HIM INTERFACE CREATED
SURGICAL SITE INFECTION
DERM PATCH TESTING
INTAKE CONSULT
ADEC INTAKE
CPST-PSY ENCOUNTER
ECONSULT TELEMEDICINE
ROADMAP
HOSPITAL ENCOUNTER
UPDATE
PCP/CLINIC CHANGE
WAIT LIST
CLERICAL ORDERS
MOTHER BABY LINK
LACTATION ENCOUNTER
CANCELED
APPOINTMENT
SURGERY
ANESTHESIA
ANESTHESIA EVENT
UNMERGE
HEALTH MAINTENANCE LETTER
PATIENT EMAIL
E-VISIT
MOBILE ORDER ONLY
QUESTIONNAIRE SERIES SUBMISSION
PATIENT OUTREACH
CONTACT MOVED
NURSE TRIAGE
E-CONSULT
E-CONSULT COMMUNITY ORDER
TELEMEDICINE
EXTERNAL CONTACT
OPHTH EXAM
HOSPICE ADMISSION
HOME HEALTH ADMISSION
HOME CARE VISIT
HOME CARE UPDATE
PATIENT WEB UPDATE
COMMUNITY ORDERS
COMMITTEE REVIEW
POST MORTEM DOCUMENTATION
BILLING ENCOUNTER
HOSPITAL
CONFIDENTIAL
OPH TESTING
EDUCATOR
VOICE CLINIC
TELEPHONE
REGISTRATION
EMPTY
LAB REQUISITION
INITIAL CONSULT
ANTI-COAG VISIT
PROCEDURE VISIT
OFFICE VISIT
CONSENT FORM
SCREENING FORM
EXTERNAL HOSPITAL ADMISSION
LETTER (OUT)
REFILL
IMMUNIZATION
HISTORY
RESEARCH ENCOUNTER
REFERRAL
ORDERS ONLY
RX REFILL AUTHORIZE
MEDS ONLY (WEB)
MEDS VOID (WEB)
RESOLUTE PROFESSIONAL BILLING HOSPITAL PROF FEE
EPISODE CHANGES
ANCILLARY ORDERS
PHARMACY VISIT
BPA
ROUTINE PRENATAL
INITIAL PRENATAL
OPHTH OFFICE VISIT
ABSTRACT
WALK-IN
TREATMENT PLAN
ALLIED HEALTH
NURSE ONLY
SOCIAL WORK
NUTRITION
PHYSICAL THERAPY
OCCUPATIONAL THERAPY
SPEECH THERAPY
RESPIRATORY THERAPY
CASE MANAGEMENT
EDUCATION
SURGICAL H&P
CLINICAL SUPPORT
MEDS ONLY / E - PRESCRIBE
PFT ONLY
TRANSPLANT PRE-EVALUATION
TRANSPLANT EVALUATION
TRANSPLANT FOLLOW-UP
TRANSPLANT RESULTS ENTRY
IMMUNOTHERAPY
ALLERGY TESTING
SPECIMEN COLLECTION
AUTO RELEASE ORDERS
URODYNAMIC TESTING
PRE-NATAL
CONSULT CHECKLIST
BOWEL MANAGEMENT
CARE CONFERENCE
INTAKE/TRIAGE
VNS REPROGRAM/SHUTOFF
CLINICAL NOTE
GENETICS
PASTORAL
THERAPY VISIT
INTAKE - NEW PATIENT
HIM SCANS
PRE-VISIT PLANNING
TRANSCRIBED ORDERS
SCHOOL TEACHER/INTERVENTION
CHILD LIFE
THERAPY PROGRESS SUMMARY
BRONCHOSCOPY REQUEST
HEMONC SOCIAL WORK
AUD CONSULT
OPH CONSULT
ALG CONSULT
UROLOGY COMPLEX INTAKE
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Give me all data for element X…
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Why is this so complicated?
• Conceptual idea of clinical process does not translate to how data are captured in the EHR
• Many different ways to document same piece of information
– Workflow used to collect data often dictates where those elements are stored in reporting database
– Most researchers lack understanding of these workflows
• Quality of results then depend on how question is asked, skill of analyst
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
EHRs are constantly evolving
• New functionality is released & workflows change over time
– Clinician-entered
– Patient entry via welcome kiosk
– Patient entry via web-based questionnaire
• These workflows are typically additive, not substitutive
– Need to remember this history
– Will otherwise result in gaps in population
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Example – Has a HEALTH RELATED QUALITY OF
LIFE (QOL) ASSESSMENT been documented?
• Flowsheet RHE PEDS QL #129, Measure RHE PARENT #3757
• Flowsheet RHE PEDS QL #129, Measure RHE PATIENT #1799
• Flowsheet RHE PEDS QL #129, Measure GEN PATIENT #3758
• Flowsheet RHE PEDS QL #129, Measure GEN PARENT#3759
• Questionnaire RHE PEDSQL 13-18 TEEN REPORT #20702, Question RHE PEDSQL 13-18 CHILD TOTAL SCORE #400411
• Questionnaire RHE PEDSQL 13-18 PARENT REPORT FOR TEENS #20703, Question: RHE PEDSQL 13-18 PARENT TOTAL SCORE #20544
• Questionnaire RHE PEDSQL 2-4 PARENT REPORT FOR TODDLERS #20699, Question: RHE PEDSQL 2-4 PARENT TOTAL SCORE #400415
• Questionnaire RHE PEDSQL 5-7 PARENT REPORT FOR YOUNG CHILDREN #20700, Question: RHE PEDSQL 5-7 PARENT TOTAL SCORE #400421
• Questionnaire RHE PEDSQL 5-7 YOUNG CHILD REPORT #20701, Question: RHE PEDSQL 5-7 CHILD TOTAL SCORE #400427
• Questionnaire RHE PEDSQL 8-12 PARENT REPORT FOR CHILDREN #20706, Question: RHE PEDSQL 8-12 PARENT TOTAL SCORE#400439
• Questionnaire RHE PEDSQL 8-12 CHILD REPORT #20705, Question: RHE PEDSQL 8-12 CHILD TOTAL SCORE #400433
• Questionnaire PEDSQL GENERIC 1-12MOS PARENT REPORT FOR INFANTS #20758, Question: PEDSQL 1-12MOS TOTAL SCORE #400280
• Questionnaire PEDSQL GENERIC 13-18 TEEN REPORT #20745, Question: PEDSQL 13-18C TOTAL SCORE #400163
• Questionnaire PEDSQL GENERIC 13-18 PARENT REPORT FOR TEENS #20686, Question: PEDSQL 13-18P TOTAL SCORE #400158
• Questionnaire PEDSQL GENERIC 13-24MOS PARENT REPORT FOR INFANTS #20759, Question: PEDSQL 13-24MOS TOTAL SCORE #100857
• Questionnaire PEDSQL GENERIC 18-25 YOUNG ADULT REPORT #20684, Question: PEDSQL 18-25C TOTAL SCORE #400183
• Questionnaire PEDSQL GENERIC 2-4 PARENT REPORT FOR TODDLERS #20688, Question: PEDSQL 2-4P TOTAL SCORE #400188
• Questionnaire PEDSQL GENERIC 5-7 PARENT REPORT FOR YOUNG CHILDREN #20689, Question: PEDSQL 5-7P TOTAL SCORE #400153
• Questionnaire PEDSQL GENERIC 5-7 YOUNG CHILD REPORT #20683, Question: PEDSQL 5-7C TOTAL SCORE #400178
• Questionnaire PEDSQL GENERIC 8-12 PARENT REPORT FOR CHILDREN #20687, Question: PEDSQL 8-12P TOTAL SCORE #400173
• Questionnaire PEDSQL GENERIC 8-12 CHILD REPORT #20685, Question: PEDSQL 8-12C TOTAL SCORE #400168
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Are there any solutions?
• Engagement with operational reporting groups / data stewards– Often serve as source of truth for a given area
– Deal with much higher request volume
– However – different priorities, funding models – can be difficult to keep activities aligned
• Quality checks / Data characterization– Should help identify if there is a problem
– But not necessarily where to look for the solution
– Difficult to communicate/disseminate findings
The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.
Are there any solutions?
• Engagement with operational reporting groups / data stewards– Often serve as source of truth for a given area
– Deal with much higher request volume
– However – different priorities, funding models – can be difficult to keep activities aligned
• Quality checks / Data characterization– Should help identify if there is a problem
– But not necessarily where to look for the solution
– Difficult to communicate/disseminate findings
Enabling Research at a National Scale
How do you ask a research question at hundreds
of institutions and get back results you can trust?
Option 1 — Write a description and have everyone
create a local implementation to run on their data
Option 2 — Create an algorithm that can run
against a single, common data model
PCORnet Data Strategy
Standardize data into a common data model
Focus on data quality: data curation
Operate a secure distributed query infrastructure
Develop re-usable tools to query the data
Send questions to the data and only return required information
Learn by doing and repeat
Loading the Common Data Model (easy)Same data are represented differently at different institutions (e.g., Race)
Common Data Model Value Set
01 = American Indian or Alaska Native
02 = Asian
03 = Black or African American
04 = Native Hawaiian or Other Pacific Islander
05 = White
06 = Multiple Race
07 = Refuse to Answer
NI = No Information
UT = Unknown
OT = Other
In order to be able to trust results of an analysis, we need to have consistent representations
Common Data Model Value Set
01 = American Indian or Alaska Native
02 = Asian
03 = Black or African American
04 = Native Hawaiian or Other Pacific Islander
05 = White
06 = Multiple Race
07 = Refuse to Answer
NI = No Information
UT = Unknown
OT = Other
SITE 1
Caucasian
African American
Asian
Multiple Race
Blank
SITE 2
101
201
300
401
500
600
SITE 3
African American
American Indian
Asian American
White
Other
Unknown
SITE 1
Caucasian
African American
Asian
Multiple Race
Blank
SITE 2
101
201
300
401
500
600
SITE 3
African American
American Indian
Asian American
White
Other
Unknown
22
Loading the Common Data Model (less easy)Same data are represented differently at different institutions (e.g., Type of Encounter)
In order to be able to trust results of an analysis, we need to have consistent representations
Common Data Model
Ambulatory Visit (AV)
Emergency Department (ED)
ED Admit to Inpatient (EI)
Inpatient Hospital (IP)
Non-Acute Inst. Stay (IS)
Other Ambulatory (OA)
Other (OT)
Unknown (UN)
No Information (NI)
SITE 1
Social Work Visit
Allied Health
Office Visit
Nurse Visit
Procedure Visit
Employee Health
Vascular Lab
Sleep Study Visit
Social Work Visit
SITE 2
Office Visit
Specimen
Postpartum Visit
Clinical Support
Initial Prenatal
SITE 3
Home Care Visit
Office Visit
Therapy Visit
Orders Only
Cardiology Testing
Hospital Encounter
21
Factors that increase complexity
People interpret the CDM specification differently, resulting in variability in how CDM is populated
Different health systems, with different EHRs, implemented at different times
Clinical workflows differ across institutions & impact availability of data
Understanding of EHR / claims data sources differs across institutions – may impact what gets loaded from source systems
All of these issues are present when doing research with EHR data, even within a single center
51
We have tools/processes to address this!
Data Curation assesses and improves global data quality
Characterize the contents of the PCORnet CDM
Evaluate global data quality and fitness-for-use across a broad research portfolio
For a given study, still need to consider data characterization specific to the aims
Assess data on the intended cohort
Ensure that outcomes / variables of interest are available & complete
Determine whether partners actually have enough data / patients to participate
Requires upfront investment, but can save significant time overall
52
Data curation
Step 1
Network partner plans DataMart refresh
Step 2
Network partner responds to the data characterization
query package
Step 3
Coordinating Center approves the DataMart
Step 4
Coordinating Center analyzes results and solicits more information as needed
Step 5
Coordinating Center holds Data Characterization and
Implementation Forums and updates Implementation
Guidance
54
Cycle 2 Required Data Checks
55
Category Data
Check Description
Data Model
ConformanceDC 1.01 Required tables are not present
DC 1.02 Expected tables are not populated
DC 1.03 Required fields are not present
DC 1.04 Fields do not conform to CDM specifications for data type, length, or name.
DC 1.05 Tables have primary key definition errors
DC 1.06 Fields contain values outside of CDM specifications
DC 1.07 Fields have non-permissible missing values
DC 1.08 Tables contain orphan PATIDs (PATIDs not in DEMOGRAPHIC)
DC 1.10 Replication errors between the ENCOUNTER, PROCEDURES and DIAGNOSIS tables
Data
CompletenessDC 3.04 Less than 50% of patients with encounters have DIAGNOSIS records
DC 3.05 Less than 50% of patients with encounters have PROCEDURES records
Cycle 2 Investigative Data Checks
Category
Data
Check Data Check DescriptionData Model Conformance DC 1.09 Tables have orphan ENCOUNTERIDs for more than 5% of records.
Data Plausibility DC 2.01 More than 5% of records have future dates.
DC 2.02 More than 10% of records fall into the lowest or highest categories of age, height, weight,
diastolic blood pressure, systolic blood pressure, prescribed days supply, or dispensed days
supply
DC 2.03 More than 5% of records have illogical date relationships.
DC 2.04 The average number of encounters per visit is > 2.0 for inpatient (IP), emergency department
(ED), or ED to inpatient (EI) encounters
Data Completeness DC 3.01 The average number of diagnoses records with known diagnosis types per encounter is below
threshold [1.0 for ambulatory (AV), inpatient (IP), emergency department (ED), or ED to
inpatient (EI) encounters].
DC 3.02 The average number of procedure records with known procedure types per encounter is below
threshold [0.75 for ambulatory (AV) encounters, 0.75 for emergency department (ED)
encounters, 1.00 for ED to inpatient (EI) encounters, and 1.00 for inpatient (IP) encounters
DC 3.03 More than 10% of records have missing or unknown values for the following fields:
BIRTH_DATE, SEX, DISCHARGE_DISPOSITION (IP/EI encounters only),
DISCHARGE_DATE (IP/EI encounters only), PX_DATE, LOINC, RX_NORM_CUI,
RX_ORDER_DATE, RX_DAYS_SUPPLY, or DISPENSE_SUP
DC 3.06 More than 10% of inpatient (IP) or ED to inpatient (EI) encounters with a diagnosis don't have a
principal diagnosis
56
Data partners are asked to investigate and comment on any exceptions in their Annotated Data Dictionary, and to classify these
exceptions as follows: feature/limitation of source data; could be improved in the near future; may be improved in the future;
or warrants further investigation.
Antibiotics study overview
Study Aims: To evaluate the comparative effects of different types, timing, and amount of antibiotics prescribed during the first 2 years of life on:
Body mass index and risk of obesity at 5 and 10 years
Growth trajectories from infancy onwards
And how these effects differ according to:
Child sex, race/ethnicity, geography
Use of other medications
Maternal BMI, antibiotics during pregnancy, C-section (analysis at 7 sites)
Conducted study-specific data characterization to assess site eligibility:
Findings for prescriptions
RxNorm considerations
60
Study-specific data characterization findings
Lower number of children ≤ 2 with an antibiotics prescription
Start minus end date Low percent missing (~5%)
• Note: This is very different than global measures (highly missing) May be useful: 50th percentile = 10 days Huge range (5th percentile = 0 days ; 95th percentile = 108 days)
Quantity Varying interpretations of quantity (pill, mg, ml, etc.) Large range (5th percentile = 11.00; 95th percentile = 225.50) Missing in 52% of ABX prescriptions
Refills - not consistently populated (60% missing)
Days supply - only populated in 4% of ABX prescribing records
61
Study-specific data characterization findings
Initial query only included RxNorm Dose Form and Clinical Drug or Pack
Specific codes that allow identification of all aspects of the prescription (>2000 codes)
Did not include less specific codes: RxNorm Ingredient, Precise Ingredient, or Drug Component
Learned that several network partners had not mapped to the specific codes
Had to ask network partners to map to the specific codes
Assess whether to include ingredient-level records in the analysis
62
What RXCUI term types are used?Categorization of term types
63
Category-11. Semantic Clinical Drug-SCD
2. Semantic Branded Drug-SBD
3. Generic Pack-GPCK
4. Branded Name Pack-BPCK
Category-2
1. Semantic Clinical Drug Form-SCDF
2. Semantic Branded Drug Form-SBDF
3. Semantic Clinical Dose Form Group-SCDG
4. Multiple Ingredients-MIN
5. Precise Ingredient-PIN
6. Ingredient-IN
7. Semantic Branded Drug Component-SBDC
8. Semantic Clinical Drug Component-SCDC
Category-3
1. Branded Name-BN
2. Semantic Branded Dose Form Group-SBDG
3. Dose Form Group-DFG
4. Dose Form-DF
Category-1(Ingredient + Strength + Dose Form)
Category-21. Ingredient
2. Ingredient + Strength
3. Ingredient+ Dose Form
Category-31. Brand Name
2. Dose Form