18
The analyses upon which this publication is based were performed under Contract Number HHSM-500-2009-00046C sponsored by the Center for Medicare and Medicaid Services, Department of Health and Human Services. Current issues and challenges in sharing biomedical human subjects data OASIS 2014 Lucila Ohno-Machado, MD, PhD Division of Biomedical Informatics University of California San Diego Oasis 2014

The analyses upon which this publication is based were performed under Contract Number HHSM-500-2009-00046C sponsored by the Center for Medicare and Medicaid

Embed Size (px)

Citation preview

The analyses upon which this publication is based were performed under Contract Number HHSM-500-2009-00046C sponsored by the Center for Medicare and Medicaid Services, Department of Health and Human Services.

Current issues and challenges in sharing biomedical human subjects data

OASIS 2014

Lucila Ohno-Machado, MD, PhDDivision of Biomedical InformaticsUniversity of California San Diego Oasis 2014

Personalized Healthcare

What is the influence of genetics, environment?

Which therapies work best for individual patients?

Person-Centered Outcomes Research

• Genome– Sequencing data

• Phenotype– Personal monitoring

• Blood pressure, glucose

– Personal health records– Behavior monitoring

• Adherence to medication, exercise

• Environment– Air sensors, food quality– Location Source: DOE

Where does knowledge come from?

• Controlled studies with strict eligibility criteria• Does this apply to me?

Hopefully, but we need a lot of data to answer this question:• We need to build infrastructure to access large data

repositories – Lower the barriers to share data

• We need to share tools to analyze the data– Algorithms and computational facilities

Big Data, Medium Data, and Small Data

• Data integration across biological scales• Data annotation and harmonization• Data ‘anonymization’ and privacy preservation

Data for Personalized Medicine

Prevention, Diagnosis and Therapy– Genetic predisposition– Biomarkers– Pharmacogenomics– Health records– Sensors

Handling Protected Health Information– Secure Electronic Environment

• Electronic Health Records• Genetic Data

Sharing Data

• Sharing data today– Data sharing plans required

• Little incentive to actually share– One model: users download data– Yes/No decision on sharing

• Data use agreements across institutions – Pairwise, limited and complicated – Specific to a particular study– Resources for sharing are limited– Security/privacy constraints are hard

for small institutions to follow

National Centers for Biomedical Computing and iDASH

Mission

“A national center for biomedical computing that develops new algorithms, open-source tools, computational infrastructure, and services that will enable biomedical and behavioral researchers nationwide to integrate Data for Analysis, ‘anonymization,’ and Sharing”

Vision

• Share access to data and computation– Allow healthcare providers to focus on

care, biomedical researchers to focus on research

– Provide software, platform, and infrastructure

– Protect privacy– Share

• Data• Workflows• Computation• Security• Policies

Models for Data Sharing

• Cloud Storage: data exported for computation

elsewhere– Users download data from the cloud

• Cloud Compute and Virtualization: computation goes to the data

– Users analyze data in the cloud– Users download virtual machines

Three Different Models for Data Sharing

1. Users download data2. Users compute in a central facility3. Users install software that operates on their data and

transmits results of operations (e.g., queries, analyses)

Model 1: Users download data

• “De-identification” may be necessary• Encrypted transmission• Data Use Agreement CentralLawyers from the University of California helped write

– Data Contributor Agreement• Who can have access for what purpose

– Data User Agreement• Terms of use

• iDASH serves as ‘agent’ for the data

Model 2: Users compute in central facility

• Securing the privacy of human subjects data including biometrics such as genomes

• There are known security issues with commercial clouds (business associate liability agreement mitigates some risks)

• A protected cloud compute environment is capable of operating on genomes and clinical data

• We have built this cloud environment in iDASH

Infrastructure Security for Human Subjects Data

• HIPAA (Health Insurance Portability and Accountability Act) compliant computing environment

• Segmentation (Zones) of projects & functionality• Physical and environmental protection of compute hardware• Access control with Two Factor Authentication• Secure (encrypted tunnel) system access and upload

capability• Centralized logging, intrusion detection• Proxies and filters• Hardened (secured) system configurations

Model 3: Computation goes to the data

• Some health systems cannot host data outside their facilities (e.g., VA)

• Software can be sent to those facilities in order to build an overall model (e.g., regression)

University of California Research eXchange UC-ReX

1. UC Davis2. UC Irvine3. UC Los Angeles4. UC San Diego5. UC San Francisco

Funded by the UC Office of the President to the NIH-funded CTSAs

• Integration of Clinical Data Warehouses from 5 University of California Medical Centers and affiliated institutions (>10 million patients)– Aggregate and individual-level patient data

will be accessible according to data use agreements and IRB approval

– Distributed models to adjust for confounders

• Objectives– Monitor patient safety– Improve outcomes– Promote research

Acknowledgements

• Slides contributed by the iDASH team

• Division of Biomedical Informatics

• Funding byNIHAHRQPCORIUCOPUCSD