ICT in the IT Future of Medicine Project Babette Regierer Daniel Jameson

Preview:

Citation preview

ICT in the IT Future of Medicine Project

Babette RegiererDaniel Jameson

ITFoM Consortium

170 Partners from 34 countries:

21 EC Member States

Associated Countries:Switzerland, Iceland, Israel, Croatia, Turkey, Norway

Other countries:Australia, New Zealand, Canada, USA, Libanon, Korea, Japan

Number of partners in one country

ITFoM - Project vision

• Assimilation of data about individuals (‘omic, health records).

• Incorporate these data into mathematical models of each individual’s “health”.

• Use these models to make predictions about the health of individuals and, if necessary, courses of treatment best suited to them.

The Virtual Patient: Integration of various models

Molecules

Tissues Anatomy

Statistics

Structure of IFoMMedical Platform (Kurt Zatloukal)

Analytical Technologies (Hans Lehrach)

Infrastructure Hard- and Software (Nora Benhabiles/Oskar Mencer)

Data Pipelines (Ewan Birney)

Computational Methodologies (Mark Girolami)

ICT Integration (Hans Westerhoff)

Coordination and Management (Hans Lehrach/Markus Pasterk)

Challenges for ICT

Acquisition Integration Processing Utilisation

Automation

Scalability

Security

Scale

• 12 million new cancer cases world wide / year– To address all you would need to sequence and

analyse 1 cancer every 2 seconds, that’s at least two complete sequences, at least one for the tumour and one somatic.

Scalability

• All technology must be developed with an eye on scalability– What is appropriate now is guaranteed not to be

in 10 years– All data formats, standards and paradigms must

be flexible and extensible

Security

• ITFoM aware that a huge amount of the data involved in the proposal was sensitive– Proposal to develop a robust, federated security

framework and policies.– Mindful of the location of data objects – certain

objects must remain within the EU.– Identity Management to build on the experience

of a variety of partners (EUDAT, UCL, EBI, IBM).

Acquisition: Data gathered

Acquisition: Data gathered

• For data generation we need to consider:- heterologous data produced (molecules, physiology, patient, society…?)- various technologies for data generation- different user groups (skilled vs. naïve)- different data management systems- different professional level

Acquisition: ICT to facilitate• Easy user-oriented process from machine to knowledge:

- data analysis pipelines must be easy to handle and fast (e.g. flow computing)

- fast data transfer systems- “online” data generation in the future? - development of automated processes- standards for data formats and processes- Suitable data management systems, data storage (local or

distributed, security issues)- new database structure needed to speed up data storage,

transfer, use? (e.g. HANA system)- responsibility for data curation - where, when, how, who?

Integration: Pipelines to models

• Complete genomes provide the framework to pull all biological data together such that each piece says something about biology as a whole

• Biology is too complex for any organisation to have a monopoly of ideas or data

• The more organisations provide data or analysis separately, the harder it becomes for anyone to make use of the results

Integration: Pipelines to models

• The data being gathered must be marshaled into something useful

• Processing, Storage, Retrieval• It must be stored• It must be annotated• It must be auditable

Integration: ICT to facilitate

• Federated data warehouse with standardised interfaces– Includes auditing services– Must integrate with security layer

• Processing pipelines feed into the warehouse• Compute tasks handled on HPC platform using

already established middleware (EBI).• Pipelines – several, draw on existing databases for

automation of annotation where possible.• Data specific compression algorithms

Processing: Simulating models

• Variety of model types

Processing: Simulating models

Processing: ICT to facilitate

• New algorithms and techniques.• HPC platforms.• Protocols.• New hardware.• Once size will not fit all, but all must

communicate with each other.

Utilising: Making use of models

• Closing the loop

Utilising: Making use of models

• We need to consider:- different target groups- easy access to data/information needed- make them work in the field/on the

bedside- technology must be available at low price (e.g. computing power must be cost-

effective = green technology)

Utilising: ICT to facilitate

• Aim is an approach that is easy-to-handle, cost-efficient and running on all systems

- automated data analysis/modeling system- elaborated human-computer interface

(visualization)- automated updating of the information (e.g.

by text mining in publications)- must be easy to plug in new systems- legal issues- results instantly

ICT Components for Genomic MedicineHealthcare Professional

Component 4Individual query

analysis

Component 3Additional clinical

annotation

Component 2Genotype and

Phenotype relationship capture

Component 1Human sequence data

repositories

Component 5Electronic Health Record

Component 6Research on Clinical data

SHIP, GPRD, LSDBs,Research Capability Programme (RCP)

EBI: repositories(petabytes of genome sequence data)Sanger: sequencing (1000 genomes, uk10K)

Reference genome sequence

~3 gigabytes

eHR system (e.g. emis):~10 Mb Variant file as attachment per record

Add genomics:Up to 60 million variant files = 600 terabytes*

Biomedical Informatics Institute (BII)

BII, SMEs etc.cloud based, secure services

Variant file

decision supportsystem

open data

Personal Data

Anonymised Data

Summary Data

Importance of Automation

• Mentioned frequently in ITFoM.• Pipelining and utilising data on this scale is

impossible if all steps are conducted manually.• This includes processing, annotation,

hypothesis generation and testing.• Text mining, machine learning• No one’s actually cracked this.

Conclusions

• A virtual, or digital, patient has the potential to revolutionise healthcare, but it will rely completely on the creation of a broad, probably federated, IT infrastructure.

• An infrastructure such as this is non-trivial.• Any project as ambitious as a virtual patient requires

vastly more expertise than any one individual can hold, but all elements of the project must interact.

• Rigorous definition of data standards, interfaces and pipelines must be coupled with a broad view of the topology within which they play a part.