19
Population Health Research Data David Rehkopf, ScD Isabella Chu, MPH September 20, 2021 STANFORD CENTER FOR POPULATION HEALTH SCIENCES

Population Health Research Data

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Population Health Research Data

Population Health Research Data

David Rehkopf, ScDIsabella Chu, MPHSeptember 20, 2021

STANFORD CENTER FOR POPULATION HEALTH SCIENCES

Page 2: Population Health Research Data

PHS Data Team

David Rehkopf – Faculty Director

Ayin ValaRebecca Miller

Emma HallgrenIsabella Chu

Page 3: Population Health Research Data

PHS Overview

• Dedicated to advancing health equity• Enable researchers to leverage data to address the

social and environmental determinants of health• Collaborate w/ partners to inform policies &

programs• Combine data science & community development

approaches

Page 4: Population Health Research Data

Individual

Survey, COVID-19 beliefs

Medications

Zip5 PM2.5

Income (paychecks)

EMR (AFC)

Food purchases

Gene Expression

CPS encounters

Phone usage

Browsing history

Social media

Moves (housing insecure)

Supportive Services

Take proactive action

Automate a tedious task

Framework for Data Utility and Value: Large, Longitudinal and Linkable

Slide credit: Nigam Shah

Page 5: Population Health Research Data

Stanford Data Ecosystem

Page 6: Population Health Research Data

Data Security: Technical, Procedural and Physical Protection of Data

Page 7: Population Health Research Data

PHS Dataset Overview

7

Dataset Dataset Type Population Smallest Geo Unit

Sample Size Date Range Strengths

Aarhus Danish Registers

National Cohort, Surveys,

Administrative data, Biologic Samples

Denmark Census Block 5 million 1968 - 2020 Rich, longitudinal, individual linkages

American Family Cohort (AFC) EMR - Primary Care United States Census Block 6.6 million 2010 - 2020 Linkable by individual

Born in BradfordBirth Cohort Surveys, Administrative data, Biologic Samples

Bradford England Census Block 13 thousand 2007 - 2020 Rich, longitudinal, individual

linkages

Ca ADE Linked Medicaid and social services data California Census Block ~ 18 million 2016 - 2021 Rich, longitudinal, individual

linkages

Historic Census Data Historic Census Data United States Address varies 1870 - 1940 Linkable by individual

MarketScan Claims - Commercially Insured United States 3 Digit Zip 140 million 2006 - 2017 Prices, variability in

insurance type

Medicare 20% Sample Claims - Medicare United States 5 digit zip 11 million 2006 - 2018

Representative of Americans over 65, rich,

longitudinal.

Optum Claims - Commercially Insured United States 5 Digit Zip 80 million 2003 - 2020 Rich, longitudinal

Page 8: Population Health Research Data

Hospitalization Datasets: HCUP NIS, KID and CA and AHUD

Domain HCUP NIS HCUP KID HCUP StateDataset Type Claims Claims ClaimsYears covered 2003 - 2018 2003-2012 Varies (2003 - 2018)

Lives covered 7 million hospital stays/yr 2 million hospital stays/yr (sampled every 3 years)

Varies according to stateCA, NYC, FL

Sampling frame20% random sample

discharges from national hospital cohort.

20% random sample discharges from national hospital cohort.

20% random sample discharges from national hospital cohort.

StrengthsThe largest publicly available all-payer inpatient healthcare

database.

- Normal newborns sampled at a rate of 10%

- Complicated newborns and other pediatric discharges (age 20 or less at admission) sampled at a rate of

80%

Very good for state-specific estimates.

• HCUP: Ideal for developing national and regional estimates• Very de-identified

Slide credit: SY Chen, Chamberlain Lab

Page 9: Population Health Research Data

MarketScan and Optum

Domain MarketScan Optum

Years covered 2007-2017 2003-2021

Lives covered 149 million, though cohort declined by 40% in 2016 88 million

Total years of data 10 19

Smallest geographic unit 3 digit zip 5 digit zip

Reuse fee $30K; $60K for commercial $40K or 15% of directs

EMR/Claims linked Available (affordable) Available (prohibitively expensive)

Strengths Costs, dental data, EMRs, family variable, diverse payer mix Zip5, Labs, SES, death, service

Service Good Very good

• Good for clinical epi, rare diseases, population health for exposures w/ a wide geographic distribution• Not ideal for studying vulnerable populations (though MarketScan has Medicaid)

Page 10: Population Health Research Data

Medicare

Domain Medicare 20% Medicare Surgery

Years covered 2006 – 2018 2011 - 2017

Lives covered 11 million ~1 million

Total years of data 13 7

Smallest geographic unit Five Digit Zip Five Digit Zip

Reuse fee $2,000 $2,000

Strengths Longitudinal, rich and representative of Americans >65.

Longitudinal, rich and representative of Americans >65.

• Good for projects which require very granular and longitudinal data.• Significant administrative burden for access (100% success rate).• Only for older patients.

Page 11: Population Health Research Data

Emily Putnam-Hornstein and Regan Foust• Decade long investment• Sabbatical in Sacramento• Agency centered approach

California Agency Data Exchange

Page 12: Population Health Research Data

California Agency Data Exchange

dcw.metadatacenter.org

Page 13: Population Health Research Data

American Family Cohort

Page 14: Population Health Research Data

Exposure Data - Public Data on the PHS Data Portal

The Opportunity Atlas -The Opportunity Atlas is derived from linked Census and IRS data which has been aggregated for public use. These data include contextual data by county and Census tract. The data represents average socioeconomic indicators (e.g., earnings) of where people grew up.The California Department of Education (CDE) -DataQuest provides access to a wide variety of reports, including school performance, test results, school staffing, graduation and dropout, and more in California.Environmental Protection Agency Data -Ambient (outdoor) concentrations of pollutants are measured at more than 4000 monitoring stations owned and operated mainly by state environmental agencies

Page 15: Population Health Research Data

Exposure Data – Data Commons

dcw.metadatacenter.org

Data Commons is an open knowledge repository that combines data from public datasets using mapped common entities. It includes tools to easily explore and analyze data across different datasets without data cleaning or joining.

Page 16: Population Health Research Data

Coming Soon: CO APCD

Domain CO APCD

Years covered 2012-2021

Lives covered 4.5 Million~2/3 of insured Colorado residents

Total years of data 10

Smallest geographic unit Census Block

Reuse fee $30K

EMR/Claims linked Available in 2022

Strengths Detailed, linkable, representative

• Good for projects which require very granular and longitudinal data.• Significant administrative burden for access but good support.• Only for specific states (Colorado)

Page 17: Population Health Research Data

Coming Soon: CO APCD

COVID-19 Aims

• COVID-19 incidence and prevalence• Disparities in COVID-19 treatment

and outcomes• COVID-19 variants• Vaccination patterns and outcomes

including breakthrough COVID-19• Long COVID-19 (PASC)• Excess all cause mortality (~20%)

Page 18: Population Health Research Data

Medicaid Working Group

• Please email Isabella Chu at [email protected] if you’d like to join the Medicaid working group. We will be thinking about which data to acquire (files, years and states) and which questions to take on first.

Page 19: Population Health Research Data

For further information

Data: phsdata.stanford.eduData Access Instructions: phsdocs.stanford.eduOffice Hours: phsofficehours.stanford.eduSlack Channel: PHS Data Users Slack channelData Core Contact: [email protected]