28
P ersonal Health T rains From Vision to Implementation Lukas Zimmermann, Jörg Peter, Mete Akgün, Oliver Kohlbacher University of Tübingen & University Hospital Tübingen KohlbacherLab.org - @okohlbacher 1

2019 PHT IN 2019-02-12 - go-fair.org · 13 Project Workflow Project Preparation Get IRB approval (if required) DWH creation Data Harmonization Project Creation Create train image

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Personal Health TrainsFrom Vision to Implementation

Lukas Zimmermann, Jörg Peter, Mete Akgün, Oliver Kohlbacher

University of Tübingen & University Hospital Tübingen

KohlbacherLab.org - @okohlbacher 1

Medical Informatics Initiative (MII)

• Medical Informatics Initiative of the the German Ministry of Research and Education (BMBF) funds four consortia w/ ~ 30 M€ each

• DIFUTURE (coordinator: Klaus Kuhn – TU München, LMU München, Tübingen, Augsburg)

• HiGHmed (coordinator: Roland Eils - Heidelberg - Göttingen - Hannover)• MIRACUM (coordinator: Hans-Ulrich Prokosch – Erlangen, Freiburg, Heidelberg,

Mainz, Marburg, Frankfurt, Gießen)• SMITH (coordinator: Markus Löffler – Leipzig, Aachen, Jena)

3

Medical Informatics Initiative Germany

[med

izini

nfor

mat

ik-in

itiat

ive.

de 2

018]

Right information to the rightperson at the right time

Nation-wide sharing of healthcareand research data

4

PoolingAnalytics

Data Pooling

vs.DistributedComputing

Analysis

Integration

• Requires transfer of all data to a centralized data pool• Very little infrastructure required

• Analysis executed locally – data remains local• Only aggregated results returned and integrated• Requires complex infrastructure & modified analytics

Bring the Algorithms to the Data

5

User

Are there any patients aged 65+ whouse beta-blockers in combination withverapamil?

1: Pick train

2: Put train on rails

Private DataRepositories

ParticipatingStations

Train Yard

PHT – Concepts

6

DataWarehouses

Data Lake

From Data Lake to Data Warehouse

DataConsumers

Data Access

Warehousecreation

on-demand

● Collect all data fromvarious sources

● Schema-on-read

7

Data Integration Center

Images

Data Lake

Project Data Warehouse (pDWH)i2b2-tranSMART

cBioPortal-Loader

Export Data Slices

Clinical &ResearchSources

openBISOmics

VariantStore

IncrementalETL

Data Integration Centers (DICs)

Metadata

Metadata

Raw Data

8

Train

Station Metadata

Algorithm Query Model

Maintains privatedata repositories

Describe Trains and Stationsfor interoperability

Personal Health Train - Key Ideas

=Containerized

Algorithms

9

PHT Prototype – Implementation Overview

10

I2B2 = http://resource-host:8080/transmart

Resources:

docker run -e I2B2= http://resource-host:8080/transmart \--network station-network \train_demonstrator run_algorithm

Running Container for train Data service

Docker NetworkPlatform

Network

Algorithm execution initiated

At the Platform

11

Implementating a Train

File export

Inspection by platform

2: Update Modelimport sklearnmodel = load_from_filesystem(‘/path/to/model’)model.partial_fit(data)model.save_to_filesystem(‘/path/to/model’)

3: Declare Result of Algorithm Executionreturn RunAlgorithmResponse(

sucess=True,message=’Execution Details’,export_files=[‘/path/to/model’])

Train Container is deletedand error reported

+

import requestsresponse = requests.get(‘http://resource-host:8080/transmart/’)data = response.text

1: Retrieve Data

Based on the National Core Dataset of the Medical Informatics Initiative

Data available at each site, but cannot be pooledà Currently: set up DWH, e-mail analysis scripts, execute, e-mail aggregated

results back

Clinical Use Cases

12

Demonstrator Study with the MII

Multimorbidity

• Charlson Index

Rare Diseases

• Groups defined by ICD10• Geo-visualization

Source: Experiences from the National Demonstrator Study within the German Medical Informatics Initiative; Ganslandtet al. submitted to AMIA Clinical Informatics Conference 2019

13

Project Workflow

ProjectPreparation

Get IRB approval (if required)

DWH creation Data Harmonization

Project Creation

Create train image

Create project w/ description

Select suitable stations

Project Review

Review incoming trains/projects

Review train contents

(optional)Approve/Reject

Project Execution

Select train stations Start train Monitor train

progress

Result Retrieval

Retrieve output of the train

Check intermediate

results (optional)

Extract model (optional)

14

Train Station UI – Project Creation

15

Project Workflow

ProjectPreparation

Get IRB approval (if required)

DWH creation Data Harmonization

Project Creation

Create train image

Create project w/ description

Select suitable stations

Project Review

Review incoming trains/projects

Review train contents

(optional)Approve/Reject

Project Execution

Select train stations Start train Monitor train

progress

Result Retrieval

Retrieve output of the train

Check intermediate

results (optional)

Extract model (optional)

16

Train Station UI – Project Review

17

Project Workflow

ProjectPreparation

Get IRB approval (if required)

DWH creation Data Harmonization

Project Creation

Create train image

Create project w/ description

Select suitable stations

Project Review

Review incoming trains/projects

Review train contents

(optional)Approve/Reject

Project Execution

Select train stations Start train Monitor train

progress

Result Retrieval

Retrieve output of the train

Check intermediate

results (optional)

Extract model (optional)

18

Local Train Execution

Isolated Docker Network

StationTrain Container Data Resource

1

Sentinel

2 3

4

5Manifest

Execution1. Start sentinel service & data res.2. Excute train in isolated network3. Manifest: log train’s network traffic4. Station inspects manifest

Train Evalution• Delete container if

1. Timeout or nonzero exit code2. Manifest suspicious

• Otherwise, push back to repository

Train Yard

• Train yard implemented as private docker registry

• Trains are stateful• Registry push/pull operations

support versioning of trains• Routing: visit all stations

sequentially• Pull current state into

station• Execute train and validate• Push back new state• Tags describe station

• Result stored in last valid train• All intermediary results

available as well19

Implementation of Train Routing

Station A

Station B

Station C

• train_image.0

• train_image.1.A

• train_image.2.B

• train_image.3.C

Result/Model

20

Project Workflow

ProjectPreparation

Get IRB approval (if required)

DWH creation Data Harmonization

Project Creation

Create train image

Create project w/ description

Select suitable stations

Project Review

Review incoming trains/projects

Review train contents

(optional)Approve/Reject

Project Execution

Select train stations Start train Monitor train

progress

Result Retrieval

Retrieve output of the train

Check intermediate

results (optional)

Extract model (optional)

21

Machine Learning – Prediction Accuracy Accuracy on pooled data

PHT Network• 18 separate train stations

spawned in the cloud• Multinomial Naïve Bayes on

sequence data (HIV-related)Training Data• Each station provides one

positive and negative exampleValidation Data• 30 data points (balanced) • Assess accuracy of the final

model

93%

Results● Same prediction quality as

on pooled data● Fully automated analysis● Startup & execution:

~ 2 s/station

22

Current Status

Initial Prototype• Not (yet) intended for production use, fully open source• Proof-of-concept for

• Container technologies• Train orchestration• API development• Scalability and performance test• Testbed for ML algorithm development• GUI specification development

Limitations• Strong assumptions on data model• No automated data discovery (yet)• Security mechanisms missing• Reviewing trains?• Governance structure?

23

Outlook

Data Discovery, Integration & Analysis• Standards, mechanisms, and APIs for data discovery and mapping• Which machine learning techniques are suitable?• How to orchestrate data discovery process?• How to make analyses reproducible?

Consent• How do we reconcile train purpose and patient consent?• Dynamic Consent?• Consent encoding and inference? – Smart contracts

Privacy• How to assess (and minimize) the re-identification risk for

patients?• Privacy-preserving machine learning methods

24

OutlookSecurity• How to authenticate the train images?• How could an audit/review process be implemented?

25

Acknowledgements

Fabian RabeBernhard Bauer

Fabian PrasserKlaus A. Kuhn

Daniel SchalkHeidi SeiboldUlrich Mansmann

Oya BeyanMd. Rezaul KarimLars GleimNils LukasStefan Decker

Matthias LöbeFrank MeinekeToralf Kirsten

Johan van Soest

Kees BurgerLuiz Olavo BoninoMark Thompson

Rob Hooft

René HietkampMarc NieuwlandMark D. Wilkinson

Thomas Ganslandt Martin Boeker

Lukas ZimmermannJörg PeterMete AkgünAydin Can PolatkanArmin RothHolger StenzhornNico Pfeifer

DIFUTURE.de · KohlbacherLab.org · @okohlbacher

26

Blockchain Based Distributed Trust

• Consortium (federated) blockchain

• No central authority

• All stations (or pre-selected stations) validate the docker image for the consensus.

• All stations have their own private docker registry (off-chain storage).

• A list of hashes of docker images which are securely linked together in a shared, trusted ledger that anyone can inspect, but which no single entity controls.

27

Publishing a New Docker Image

28

Train Routing