Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Population Health Research Data
David Rehkopf, ScDIsabella Chu, MPHSeptember 20, 2021
STANFORD CENTER FOR POPULATION HEALTH SCIENCES
PHS Data Team
David Rehkopf – Faculty Director
Ayin ValaRebecca Miller
Emma HallgrenIsabella Chu
PHS Overview
• Dedicated to advancing health equity• Enable researchers to leverage data to address the
social and environmental determinants of health• Collaborate w/ partners to inform policies &
programs• Combine data science & community development
approaches
Individual
Survey, COVID-19 beliefs
Medications
Zip5 PM2.5
Income (paychecks)
EMR (AFC)
Food purchases
Gene Expression
CPS encounters
Phone usage
Browsing history
Social media
Moves (housing insecure)
Supportive Services
Take proactive action
Automate a tedious task
Framework for Data Utility and Value: Large, Longitudinal and Linkable
Slide credit: Nigam Shah
Stanford Data Ecosystem
Data Security: Technical, Procedural and Physical Protection of Data
PHS Dataset Overview
7
Dataset Dataset Type Population Smallest Geo Unit
Sample Size Date Range Strengths
Aarhus Danish Registers
National Cohort, Surveys,
Administrative data, Biologic Samples
Denmark Census Block 5 million 1968 - 2020 Rich, longitudinal, individual linkages
American Family Cohort (AFC) EMR - Primary Care United States Census Block 6.6 million 2010 - 2020 Linkable by individual
Born in BradfordBirth Cohort Surveys, Administrative data, Biologic Samples
Bradford England Census Block 13 thousand 2007 - 2020 Rich, longitudinal, individual
linkages
Ca ADE Linked Medicaid and social services data California Census Block ~ 18 million 2016 - 2021 Rich, longitudinal, individual
linkages
Historic Census Data Historic Census Data United States Address varies 1870 - 1940 Linkable by individual
MarketScan Claims - Commercially Insured United States 3 Digit Zip 140 million 2006 - 2017 Prices, variability in
insurance type
Medicare 20% Sample Claims - Medicare United States 5 digit zip 11 million 2006 - 2018
Representative of Americans over 65, rich,
longitudinal.
Optum Claims - Commercially Insured United States 5 Digit Zip 80 million 2003 - 2020 Rich, longitudinal
Hospitalization Datasets: HCUP NIS, KID and CA and AHUD
Domain HCUP NIS HCUP KID HCUP StateDataset Type Claims Claims ClaimsYears covered 2003 - 2018 2003-2012 Varies (2003 - 2018)
Lives covered 7 million hospital stays/yr 2 million hospital stays/yr (sampled every 3 years)
Varies according to stateCA, NYC, FL
Sampling frame20% random sample
discharges from national hospital cohort.
20% random sample discharges from national hospital cohort.
20% random sample discharges from national hospital cohort.
StrengthsThe largest publicly available all-payer inpatient healthcare
database.
- Normal newborns sampled at a rate of 10%
- Complicated newborns and other pediatric discharges (age 20 or less at admission) sampled at a rate of
80%
Very good for state-specific estimates.
• HCUP: Ideal for developing national and regional estimates• Very de-identified
Slide credit: SY Chen, Chamberlain Lab
MarketScan and Optum
Domain MarketScan Optum
Years covered 2007-2017 2003-2021
Lives covered 149 million, though cohort declined by 40% in 2016 88 million
Total years of data 10 19
Smallest geographic unit 3 digit zip 5 digit zip
Reuse fee $30K; $60K for commercial $40K or 15% of directs
EMR/Claims linked Available (affordable) Available (prohibitively expensive)
Strengths Costs, dental data, EMRs, family variable, diverse payer mix Zip5, Labs, SES, death, service
Service Good Very good
• Good for clinical epi, rare diseases, population health for exposures w/ a wide geographic distribution• Not ideal for studying vulnerable populations (though MarketScan has Medicaid)
Medicare
Domain Medicare 20% Medicare Surgery
Years covered 2006 – 2018 2011 - 2017
Lives covered 11 million ~1 million
Total years of data 13 7
Smallest geographic unit Five Digit Zip Five Digit Zip
Reuse fee $2,000 $2,000
Strengths Longitudinal, rich and representative of Americans >65.
Longitudinal, rich and representative of Americans >65.
• Good for projects which require very granular and longitudinal data.• Significant administrative burden for access (100% success rate).• Only for older patients.
Emily Putnam-Hornstein and Regan Foust• Decade long investment• Sabbatical in Sacramento• Agency centered approach
California Agency Data Exchange
California Agency Data Exchange
dcw.metadatacenter.org
American Family Cohort
Exposure Data - Public Data on the PHS Data Portal
The Opportunity Atlas -The Opportunity Atlas is derived from linked Census and IRS data which has been aggregated for public use. These data include contextual data by county and Census tract. The data represents average socioeconomic indicators (e.g., earnings) of where people grew up.The California Department of Education (CDE) -DataQuest provides access to a wide variety of reports, including school performance, test results, school staffing, graduation and dropout, and more in California.Environmental Protection Agency Data -Ambient (outdoor) concentrations of pollutants are measured at more than 4000 monitoring stations owned and operated mainly by state environmental agencies
Exposure Data – Data Commons
dcw.metadatacenter.org
Data Commons is an open knowledge repository that combines data from public datasets using mapped common entities. It includes tools to easily explore and analyze data across different datasets without data cleaning or joining.
Coming Soon: CO APCD
Domain CO APCD
Years covered 2012-2021
Lives covered 4.5 Million~2/3 of insured Colorado residents
Total years of data 10
Smallest geographic unit Census Block
Reuse fee $30K
EMR/Claims linked Available in 2022
Strengths Detailed, linkable, representative
• Good for projects which require very granular and longitudinal data.• Significant administrative burden for access but good support.• Only for specific states (Colorado)
Coming Soon: CO APCD
COVID-19 Aims
• COVID-19 incidence and prevalence• Disparities in COVID-19 treatment
and outcomes• COVID-19 variants• Vaccination patterns and outcomes
including breakthrough COVID-19• Long COVID-19 (PASC)• Excess all cause mortality (~20%)
Medicaid Working Group
• Please email Isabella Chu at [email protected] if you’d like to join the Medicaid working group. We will be thinking about which data to acquire (files, years and states) and which questions to take on first.
For further information
Data: phsdata.stanford.eduData Access Instructions: phsdocs.stanford.eduOffice Hours: phsofficehours.stanford.eduSlack Channel: PHS Data Users Slack channelData Core Contact: [email protected]