Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Personal Health TrainsFrom Vision to Implementation
Lukas Zimmermann, Jörg Peter, Mete Akgün, Oliver Kohlbacher
University of Tübingen & University Hospital Tübingen
KohlbacherLab.org - @okohlbacher 1
Medical Informatics Initiative (MII)
• Medical Informatics Initiative of the the German Ministry of Research and Education (BMBF) funds four consortia w/ ~ 30 M€ each
• DIFUTURE (coordinator: Klaus Kuhn – TU München, LMU München, Tübingen, Augsburg)
• HiGHmed (coordinator: Roland Eils - Heidelberg - Göttingen - Hannover)• MIRACUM (coordinator: Hans-Ulrich Prokosch – Erlangen, Freiburg, Heidelberg,
Mainz, Marburg, Frankfurt, Gießen)• SMITH (coordinator: Markus Löffler – Leipzig, Aachen, Jena)
3
Medical Informatics Initiative Germany
[med
izini
nfor
mat
ik-in
itiat
ive.
de 2
018]
Right information to the rightperson at the right time
Nation-wide sharing of healthcareand research data
4
PoolingAnalytics
Data Pooling
vs.DistributedComputing
Analysis
Integration
• Requires transfer of all data to a centralized data pool• Very little infrastructure required
• Analysis executed locally – data remains local• Only aggregated results returned and integrated• Requires complex infrastructure & modified analytics
Bring the Algorithms to the Data
5
User
Are there any patients aged 65+ whouse beta-blockers in combination withverapamil?
1: Pick train
2: Put train on rails
Private DataRepositories
ParticipatingStations
Train Yard
PHT – Concepts
6
DataWarehouses
Data Lake
From Data Lake to Data Warehouse
DataConsumers
Data Access
Warehousecreation
on-demand
● Collect all data fromvarious sources
● Schema-on-read
7
Data Integration Center
Images
Data Lake
Project Data Warehouse (pDWH)i2b2-tranSMART
cBioPortal-Loader
Export Data Slices
Clinical &ResearchSources
openBISOmics
VariantStore
IncrementalETL
Data Integration Centers (DICs)
Metadata
Metadata
Raw Data
8
Train
Station Metadata
Algorithm Query Model
Maintains privatedata repositories
Describe Trains and Stationsfor interoperability
Personal Health Train - Key Ideas
=Containerized
Algorithms
10
I2B2 = http://resource-host:8080/transmart
Resources:
docker run -e I2B2= http://resource-host:8080/transmart \--network station-network \train_demonstrator run_algorithm
Running Container for train Data service
Docker NetworkPlatform
Network
Algorithm execution initiated
At the Platform
11
Implementating a Train
File export
Inspection by platform
2: Update Modelimport sklearnmodel = load_from_filesystem(‘/path/to/model’)model.partial_fit(data)model.save_to_filesystem(‘/path/to/model’)
3: Declare Result of Algorithm Executionreturn RunAlgorithmResponse(
sucess=True,message=’Execution Details’,export_files=[‘/path/to/model’])
Train Container is deletedand error reported
+
import requestsresponse = requests.get(‘http://resource-host:8080/transmart/’)data = response.text
1: Retrieve Data
Based on the National Core Dataset of the Medical Informatics Initiative
Data available at each site, but cannot be pooledà Currently: set up DWH, e-mail analysis scripts, execute, e-mail aggregated
results back
Clinical Use Cases
12
Demonstrator Study with the MII
Multimorbidity
• Charlson Index
Rare Diseases
• Groups defined by ICD10• Geo-visualization
Source: Experiences from the National Demonstrator Study within the German Medical Informatics Initiative; Ganslandtet al. submitted to AMIA Clinical Informatics Conference 2019
13
Project Workflow
ProjectPreparation
Get IRB approval (if required)
DWH creation Data Harmonization
Project Creation
Create train image
Create project w/ description
Select suitable stations
Project Review
Review incoming trains/projects
Review train contents
(optional)Approve/Reject
Project Execution
Select train stations Start train Monitor train
progress
Result Retrieval
Retrieve output of the train
Check intermediate
results (optional)
Extract model (optional)
15
Project Workflow
ProjectPreparation
Get IRB approval (if required)
DWH creation Data Harmonization
Project Creation
Create train image
Create project w/ description
Select suitable stations
Project Review
Review incoming trains/projects
Review train contents
(optional)Approve/Reject
Project Execution
Select train stations Start train Monitor train
progress
Result Retrieval
Retrieve output of the train
Check intermediate
results (optional)
Extract model (optional)
17
Project Workflow
ProjectPreparation
Get IRB approval (if required)
DWH creation Data Harmonization
Project Creation
Create train image
Create project w/ description
Select suitable stations
Project Review
Review incoming trains/projects
Review train contents
(optional)Approve/Reject
Project Execution
Select train stations Start train Monitor train
progress
Result Retrieval
Retrieve output of the train
Check intermediate
results (optional)
Extract model (optional)
18
Local Train Execution
Isolated Docker Network
StationTrain Container Data Resource
1
Sentinel
2 3
4
5Manifest
Execution1. Start sentinel service & data res.2. Excute train in isolated network3. Manifest: log train’s network traffic4. Station inspects manifest
Train Evalution• Delete container if
1. Timeout or nonzero exit code2. Manifest suspicious
• Otherwise, push back to repository
Train Yard
• Train yard implemented as private docker registry
• Trains are stateful• Registry push/pull operations
support versioning of trains• Routing: visit all stations
sequentially• Pull current state into
station• Execute train and validate• Push back new state• Tags describe station
• Result stored in last valid train• All intermediary results
available as well19
Implementation of Train Routing
Station A
Station B
Station C
• train_image.0
• train_image.1.A
• train_image.2.B
• train_image.3.C
Result/Model
20
Project Workflow
ProjectPreparation
Get IRB approval (if required)
DWH creation Data Harmonization
Project Creation
Create train image
Create project w/ description
Select suitable stations
Project Review
Review incoming trains/projects
Review train contents
(optional)Approve/Reject
Project Execution
Select train stations Start train Monitor train
progress
Result Retrieval
Retrieve output of the train
Check intermediate
results (optional)
Extract model (optional)
21
Machine Learning – Prediction Accuracy Accuracy on pooled data
PHT Network• 18 separate train stations
spawned in the cloud• Multinomial Naïve Bayes on
sequence data (HIV-related)Training Data• Each station provides one
positive and negative exampleValidation Data• 30 data points (balanced) • Assess accuracy of the final
model
93%
Results● Same prediction quality as
on pooled data● Fully automated analysis● Startup & execution:
~ 2 s/station
22
Current Status
Initial Prototype• Not (yet) intended for production use, fully open source• Proof-of-concept for
• Container technologies• Train orchestration• API development• Scalability and performance test• Testbed for ML algorithm development• GUI specification development
Limitations• Strong assumptions on data model• No automated data discovery (yet)• Security mechanisms missing• Reviewing trains?• Governance structure?
23
Outlook
Data Discovery, Integration & Analysis• Standards, mechanisms, and APIs for data discovery and mapping• Which machine learning techniques are suitable?• How to orchestrate data discovery process?• How to make analyses reproducible?
Consent• How do we reconcile train purpose and patient consent?• Dynamic Consent?• Consent encoding and inference? – Smart contracts
Privacy• How to assess (and minimize) the re-identification risk for
patients?• Privacy-preserving machine learning methods
24
OutlookSecurity• How to authenticate the train images?• How could an audit/review process be implemented?
25
Acknowledgements
Fabian RabeBernhard Bauer
Fabian PrasserKlaus A. Kuhn
Daniel SchalkHeidi SeiboldUlrich Mansmann
Oya BeyanMd. Rezaul KarimLars GleimNils LukasStefan Decker
Matthias LöbeFrank MeinekeToralf Kirsten
Johan van Soest
Kees BurgerLuiz Olavo BoninoMark Thompson
Rob Hooft
René HietkampMarc NieuwlandMark D. Wilkinson
Thomas Ganslandt Martin Boeker
Lukas ZimmermannJörg PeterMete AkgünAydin Can PolatkanArmin RothHolger StenzhornNico Pfeifer
DIFUTURE.de · KohlbacherLab.org · @okohlbacher
26
Blockchain Based Distributed Trust
• Consortium (federated) blockchain
• No central authority
• All stations (or pre-selected stations) validate the docker image for the consensus.
• All stations have their own private docker registry (off-chain storage).
• A list of hashes of docker images which are securely linked together in a shared, trusted ledger that anyone can inspect, but which no single entity controls.