43
Social and Decision Analytics Laboratory SALLIE KELLER, DIRECTOR SOCIAL AND DECISION ANALYTICS LABORATORY VIRGINIA BIOINFORMATICS INSTITUTE AT VIRGINIA TECH Health & Social Development Analytics and Big Data A Joint AIR and Virginia Tech Workshop

Sdal air health and social development (jan. 27, 2014) final

Embed Size (px)

DESCRIPTION

The American Institutes for Research (AIR) and Virginia Tech are collaborating to explore and develop new approaches to combining, manipulating and understanding big data. The two are also looking at how big data analytics can help answer questions critical to solving issues in education, workforce, health, and human and social development. They held two workshops on January 7 and 27, 2014- the first on Education and Workforce Analytics and the second on Health and Social Development Analytics.

Citation preview

Page 1: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

SALLIE KELLER, DIRECTOR SOCIAL AND DECISION ANALYTICS LABORATORY

VIRGINIA BIOINFORMATICS INSTITUTE AT VIRGINIA TECH

Health & Social Development Analytics and Big Data –

A Joint AIR and Virginia Tech Workshop

Page 2: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

“In attempting to arrive at the truth, I have applied everywhere for information, but scarcely an instance have I been able to obtain hospital records fit for any purpose of comparison. If they could be obtained, they would enable us to decide many other questions besides the ones alluded to. They would show subscribers how their money was spent, what amount of good was really being done with it, or whether their money was not doing mischief rather than good.”

Florence Nightingale (1864)

Starting the Journey

Page 3: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

•  Pressures & Opportunities of Today

•  Big data –  Why important? –  What about privacy?

•  Health & Social Development analytics –  What makes it big data? –  How does big data change

current approaches? •  Selected examples •  Methodology challenges

Outline

Page 4: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

•  Health as a percent of GDP –  5% in 1960 to 18% in 2012

•  Changing demographics –  Increasing minority

populations –  Rapidly aging populations –  Rural vs. urban living –  Increasing inequality

•  Focus on the patient –  Health outcomes

4

Source: Congressional Budget Office.

Health and Social Development Pressures

Page 5: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

•  Drivers behind health care costs –  Technology, infectious and chronic diseases

•  Workforce demand –  Care givers, biomedical researchers, IT specialists

•  Prevention and personalization –  Changing demographics and lifestyles

Health Care Analytics Opportunities

Page 6: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

•  Understanding and anticipating –  Changes in population growth, aging and diversity –  Adapting to increasing urbanization –  Building individual and community resiliency

•  Tailoring programs and policies by defined subpopulations

Social Development Analytics Opportunities

Page 7: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

•  Big data –  Structured & unstructured –  Collections

•  Designed •  Observational/convenience

•  Statistics / analytics –  Replication, reproducibility,

representativeness –  Description, association, causation

•  prediction ≠ correlation

•  Cost drivers –  Analytics and informatics, NOT data collection

Big Data - Doesn’t matter what its called, only matters what you do with it

Page 8: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

•  Social science research –  Traditionally informed by

surveys and statistically designed experiments

–  Clean, well-controlled, limited in scale (~103)

•  Bringing “Big data” to bear for social policy –  Data informed computational

social science models –  Quantitative social science

methods & practice at scale

Now Big Data is Changing Social Sciences

Page 9: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

Methodological Issues New methods and tools are needed to ensure

–  Data access –  Data quality –  Representativeness –  Replication –  Reproducibility –  Characterization of noisy

data •  Managing biases

–  Selection bias –  Measurement bias

National Research Council 2013

Page 10: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory 10

1993 2013

Changing Privacy Landscape

Page 11: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

•  European Council 1995/1996: –  “… any information relating to an

identified or identifiable natural person; an identifiable person is one who can be identified (data subject), directly or indirectly, in particular by reference to an identification number or to one or more factors specific to his physical, physiological, mental, economic, cultural or social identity.”

•  World Economic Forum 2011: –  “… digital data created by and

about people.”

11

Personal Data - New Asset Class

Page 12: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

World Economic Forum 2013 Yesterday

•  Definition of personal data is predetermined and binary

•  Individual provides legal consent but not truly engaged

•  Policy framework focuses on minimizing risk to individual

Today •  Definition of personal data is

contextual and dependent on social norms

•  Individual engaged and understands how data is used and value created

•  Policy needs to focus on balancing protection with innovation and economic growth

12

Page 13: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

Further Privacy Thoughts •  Will people voluntarily give up their data if they can see a

personal or societal benefit? •  Are norms/expectations changing with generations? •  What are technical fixes for multi-level privacy/

classification? •  What is the optimal level of privacy for studies of interest?

Page 14: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

Can we table privacy for the duration of the workshop?

•  Deserves serious, devoted conversation •  We should be leaders in this conversation •  Will need to specifically address as projects develop

Page 15: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

Changing Landscape of Health Data •  Electronic Health Records •  Interoperability challenges •  Public choices

–  23andME –  Google Health –  Health Vault

P. Breugel, Tower of Babel (1563)

Page 16: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

Personal Health Data •  Today

–  medical history –  lab results –  imaging results (X-ray,

MRI) –  medication records –  Allergies –  vaccination records –  demographic data –  billing information

•  Tomorrow –  genome sequence –  Epigenome –  Transcriptome –  Proteome –  Metabolome –  Immunome –  Microbiome –  survey data –  health monitor data

Page 17: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

Omics "Omics" datasets are large, require sophisticated interpretation, and will have to be reinterpreted over time as knowledge and standard of care change

•  Tomorrow –  Genome sequence –  Epigenome –  Transcriptome –  Proteome –  Metabolome –  Immunome –  Microbiome –  Survey data –  Health monitor data

Page 18: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

Self Reported Data

These self-reported data will vary widely in quality and utility for research, but will be an important source of phenotype information

•  Tomorrow –  genome sequence –  Epigenome –  Transcriptome –  Proteome –  Metabolome –  Immunome –  Microbiome –  survey data –  health monitor data

Page 19: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

Tomorrow is Today •  Infrastructure is being created to enable large longitudinal

studies that combine: –  Comprehensive electronic health records –  Behavioral and environmental factors (survey information) –  Genetic information (partial or complete genome sequence)

NIH - Electronic Medical Records and Genomics Network Wellcome Trust - UK Biobank Vanderbilt University - BioVU Kaiser Permanente – Research Genes, Enviro., & Health Veterans Administration - Million Veteran Program

Page 20: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

Tomorrow is Today •  Began collecting DNA in 2007; now has 167,250 samples •  Opt-out program; relatively few patients opt out •  Samples are matched with deidentified EHRs •  Use is restricted to Vanderbilt researchers

NIH - Electronic Medical Records and Genomics Network Wellcome Trust - UK Biobank Vanderbilt University - BioVU Kaiser Permanente – Research Genes, Enviro., & Health Veterans Administration - Million Veteran Program

Page 21: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

Additional Characteristics that Make the Data Big •  Multi-sourced •  Observational •  Noisy •  Multi-purposed

Page 22: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

Multi-Sourced Data Health and social development occurs within context •  Individual and family history and experiences •  Environment •  Access to care, programs, and facilities •  Local, state, and national health and welfare systems •  Political and economic factors Information communication technology opens opportunity to capture meta data and provenance of the information Challenge: integration and interpretation of data captured under such varied circumstances

Page 23: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

Observational Data •  Can come from every stakeholder, source, or technology

that interacts with the patient, care giver, or facility •  Little discrimination on what is captured

–  Internet medical surveys, on-line disease tracking, prevention activities, attitudes on blogs, etc.

•  On-demand data from multiple systems –  Social networks, education records, work history, medical

records, extramural activities, etc.

Presents opportunity to study the health and development processes as the naturally occur Challenge: manage biases, data quality, and data linkage

Page 24: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

Meanwhile, if the quantity of information is increasing by 2.5 quintillion bytes per day, the amount of useful information almost certainly isn’t. Most of it is just noise, and the noise is increasing faster than the signal. Nate Silver, 2013

Challenge: uncertainty quantification

Noisy data

Page 25: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

Multi-Purposed Data •  Individual health and well being versus the population •  Data reuse for multiple purposes

–  Macro-level: regional, state, national, and international –  Meso-level: institution-wide –  Micro-level: individuals, cohorts, and groups

An opportunity to more fully use data Challenge: What is optimal for an individual may not be optimal for the population and vice versa

Source: Buckingham Shum, S. (2012)

Page 26: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

Case Studies from VT Colleagues and Collaborators

•  Bureau of Economic Analysis Health Accounts •  Out of Hospital Cardiac Arrest •  EMBERS •  Mild Cognitive Impairment •  Synthetic Information

Page 27: Sdal air health and social development (jan. 27, 2014) final

Household Consumption Expenditures for Medical Care: An Alternate Presentation 

Ana Aizcorbe, Eli B. Liebman, David M. Cutler, and

Allison B. Rosen

•  Health care predicted to reach 20% of GDP by 2020 •  Health care expenditures increased ~29% (2002-2006) •  Developing a satellite account on medical care spending •  Data include public and private sources

Survey of Current Business June 2012:34-47

http://www.bea.gov/scb/pdf/2012/06%20June/0612_healthcare.pdf

Page 28: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

Growth in spending varies by disease Growth'in'Medical'Care'Spening,'200272006' Percent'Endocrine' 70.2'Blood' 68.9'Complica9ons'of'pregnancy' 68.9'Residual'codes'and'unclassified' 42.5'Musculoskeletal'system''' 38.6'Injury'and'poisoning' 34.2'Genitourinary'system.' 30.5'Diges9ve'system'' 28.2'Circulatory'system'' 25.6'Nervous'system'' 25.3'Neoplasms'' 24.0'Mental'illness'' 16.7'Respiratory'system' 14.8'Skin' 5.8'Symptoms'and'illNdefined' 2.4'Congenital'anomalies3'' N8.3'Infec9ous'and'parasi9c' N8.7'Certain'perinatal'condi9ons'' N28.1'

Page 29: Sdal air health and social development (jan. 27, 2014) final

Copyright © American Heart Association

A Case-Crossover Analysis of Out-of-Hospital Cardiac Arrest and Air Pollution Clinical Perspective

Katherine B. Ensor, Loren H. Raun, and David Persse

•  Houston 2004-2011 •  Integration of hourly ambient air pollution data with EMS

locations

Circulation Volume 127(11):1192-1199

March 19, 2013

Page 30: Sdal air health and social development (jan. 27, 2014) final

Copyright © American Heart Association

Locations of OHCA events between 2004 and 2011 in Houston, Texas

Page 31: Sdal air health and social development (jan. 27, 2014) final

Forest plot of relative risk of OHCA associated per an interquartile range increase in the average of 1- to 3-hour lagged ozone and 1- to 2-

day lagged PM2.5 by age, ethnicity, sex, and season.

Copyright © American Heart Association

Page 32: Sdal air health and social development (jan. 27, 2014) final

Open Source Indicators for Forecasting ILI Case Counts and Rare Disease Outbreaks

Naren Ramakrishnan (PI) – involves large multi-institutional team •  EMBERS: Early Model-based Event Recognition using

Surrogates •  Fully automated processing of data and delivery of warnings

Source

https://www.cs.vt.edu/node/6565

Page 33: Sdal air health and social development (jan. 27, 2014) final

33

Google Flu Trends Google Search Trends Healthmap Weather Twitter OpenTable Parking Lot Imagery

EMBERS Prediction Pipeline

Page 34: Sdal air health and social development (jan. 27, 2014) final

34

EMBERS Dashboard: Fusing Data and Models

Page 35: Sdal air health and social development (jan. 27, 2014) final

Family Triad Perceptions of Mild Cognitive Impairment (MCI)

Karen A. Roberto, Rosemary Blieszner and Tina Savla

•  Age-related decline in memory and executive functioning •  10-20% of individuals aged 65+ have MCI •  Data Sources

–  Memory clinics, churches, senior housing –  Family-level data: Elder with MCI age 60+, Primary care partner ,

Secondary care partner

Journal of Gerontology: Social Sciences 2011(6): 756-768

Page 36: Sdal air health and social development (jan. 27, 2014) final

reasoning, planning, speech, movement emotions, problem-solving

vision perception of touch, pressure, temperature, pain

perception and recognition of auditory stimuli, memory

*Executive Function*

Brain Functioning

Page 37: Sdal air health and social development (jan. 27, 2014) final

Benefits of Multiple Informants

Families

Complete Acknowledgement

Partial Acknowledgement

No Acknowledgement

Passive Acknowledgement

Page 38: Sdal air health and social development (jan. 27, 2014) final

Synthetic Information – Disease (Pandemic) Evolution

Stephen Eubank, Bryan Lewis, and many others •  Age-related decline in memory and executive functioning •  10-20% of individuals aged 65+ have MCI •  Data Sources

–  Memory clinics, churches, senior housing –  Family-level data: Elder with MCI age 60+, Primary care

partner , Secondary care partner

Source : Roberto, Blieszner, McCann, & McPherson 2011

FIX

http://supercomputing.vbi.vt.edu/

Page 39: Sdal air health and social development (jan. 27, 2014) final

Structured and Unstructured Data Sources

and transforms them…

Overview

Page 40: Sdal air health and social development (jan. 27, 2014) final

Structured and Unstructured Data Sources …into

Synthetic Information

Page 41: Sdal air health and social development (jan. 27, 2014) final

creates and enables

Synthetic Platform

Page 42: Sdal air health and social development (jan. 27, 2014) final

Interactive visualization - Virginia

Page 43: Sdal air health and social development (jan. 27, 2014) final

Social and Decision Analytics Laboratory

•  Imagine a different world –case studies are examples •  Look for synergistic capabilities to build partnerships •  Assess opportunities to integrate multiple sources of data

and approaches to comprehensively understand health and social development issues

•  Propose prototype projects to work on together to set the stage for future projects

Goals for the Workshop