17
1 Philips HealthCare Informatics A Perspective on Big Data, Analytics and AI John Huffman, CTO Philips Healthcare Informatics September 2016, Utrecht, NL

Philips john huffman

Embed Size (px)

Citation preview

1

Philips HealthCare Informatics

A Perspective on Big Data, Analytics and AI

John Huffman, CTOPhilips Healthcare Informatics

September 2016, Utrecht, NL

2

A Little Bit About My Background35 years or so of AI, reasoning and knowledge integration

• Started at Thinking Machines when it started in the early 80’s– Worked with Danny Hillis, Brewster Kahle on The Connection Machine

• MCC (US Fifth Generation Project)– Worked with Doug Lenat on AI and CYC (comprehensive common sense

knowledge and reasoning project) Liaison to NLP and CHI groups

• Progressively worked on systems of integrated information, knowledge representation, workflow and integrated decision support through start-ups (usually my own) and finally larger companies– Aware, SGI, Stentor, Poiesis Informatics, Philips

3

Lots of Hype Around Big Data…Many companies getting into the fray…

4

Many Opinions on Where We AreHas anyone actually leveraged this in healthcare?

5

Advanced Analytics Process*Multi-Stage Process

*CRISP – Cross Industry Standard Process for Data Mining

6

Too much focus on one component…Multi-Stage Process

*CRISP – Cross Industry Standard Process for Data Mining

7

Steps

8

Analytics Lifecycle Overview

Data Ingestion

Model training

Production

Model Evaluation

Data Scientist

Landing Zone

Data Processing

ETLed Processed

ZoneModel

Repository

Data Science

Cleaned Data

Data Cleaning

Big Data Platform

Anonymized data

Repository

9

Feature Eng

Hosted solution

Analytics Lifecycle (more detail)

REST ML APIs

ML AlgosIPs

Data ScienceHosted Cluster(Create Model)

ETLs

ML R lib

ML Py lib

Models

ML Scoring Service

Feature Engg.

Predictive Analytical

AppsOperationalize

Model Evaluate Model

Predictive Model

Evaluator

Model Staging Hosted Cluster

(Evaluation)

Production Cluster

Access

Processing

Data

Access

Processing

Feature Eng

ML FrmkML Framework

Models

ML Scoring Service

ML Frmk

Data

Big Data platform

Data Science Platform (Analytics and ML)

Proposition Owner

Model Evaluator

Service

Predictive model

creation

Domain Services

Domain Services

Original raw data

ETLed data Anonymized data

Scripts and Model Rep.

Create model

Data Preparation

Phase

10

Challenges in Data Collection and ProcessingBefore any analytics can start…

• Data Identification, Collection and Preparation – Domain knowledge important to discriminate relevant data

• ETL – extracting relevant data from raw data • Massaging – pre-processing the data– [Automatic] annotation of data (e.g. masking of bones in chest xray)

• Normalization of the data – Especially complex when data is received from multiple sources

• Aggregation of data – For purpose of statistical analysis

• Note – All the above steps must be done on the same set of technologies that will be present during the deployment of the resultant model

11

Training and Validating the ModelWhich method is appropriate?

• Effective model creation requires an understanding of the nuances and strengths of different methods– Selection of the right method depending on the task

Classification/Regression/Clustering/Dimensionality reduction…• Identification and compute of the metric(s) to evaluate the model– Requires training and test data

• Ensure there is no overfitting• Validate the model – On extended data sets, cohort variation

• Fine tune the parameters of the model

• Note – All the above steps to be done on the same set of technologies that will be present during the deployment

12

Challenges in Deployment and Operations

• Installation (On-Premise, Cloud, Hybrid)• Configuration• Health Monitoring• Auto-Scaling• Multi-Tenancy• Disaster Recovery• Licensing• Performance Monitoring• Metering and Billing• Upgrades• Snapshots• Certificate Management• Resource Utilization and Trending• Privacy and Security

13

These Methods Are Not NewDecades to centuries old technologies

• Neural Networks– (1943) by Warren McCulloch and Walter Pitts, original called threshold logic

• Deep Learning– (1965) Ivakhnenko and Lapa, papers in 1971 already described deep

networks with 8 layers trained by the group method of data handling algorithm

• Random Decision Forest– (1995) Ho

• Big Data (MapReduce)– 2000-2004 various papers, underlying methods well-known in the mid-90’s.

Apache Hadoop (open source) has been available since 2011• Bayesian methods– Bayes lived in the 1700’s. Naïve Bayes methods since the 50’s

14

Some Lessons from AI HistoryWell-known that data is much more important than method…

• Just Google– “More data and simple algorithms beat complex analytics methods”

• This is well-known from expert system and AI experience– “Brittleness”

Application of models on data outside the training domain frequently fails in unusual, unexpected ways

– Marvin Minsky, “Society of Mind” Complex and intelligent behavior comes from the orchestration of

simple agents

• Without a broad, semantically interoperable, clean data repository – complex analytics, decision support algorithms, and workflow optimizations cannot be derived

• Data is the intellectual property in this domain

15

Analytics StackAnalytics is a set of tools – not a solution

General ML Algorithms

R SDK

Data Repositories (S3, HDFS, Hive…)

REST Machine Learning APIs

Py SDK Analytical Apps

Clinical Image

Analytics

Clinical Text Analytics

3rd Party Apps

JDBC/OBDC

Distributed Processing framework

IPsDeep

Learning libraries

NLP building blocks

Model Rep. Scripts Rep.

• Provide easy to use SDKs (R and Python)• Prebaked thin client development environments

• Rstudio and Jupyter

• All ML Capabilities are exposed via RESTFul APIs• Provide higher level abstraction APIs for

Clinical Text and Clinical images• Provide Building blocks for NLP and DL

frameworks• Host Research IP assets

• Persist the models and scripts in repositories (shared across development and deployment clusters)

16

Philips Approach - HSDPAnalytics and Big Data are an integrated component of the platform

ConnectStore Authorize

Share Orchestrate

Manages, updates, monitors and remotely controls smart devices

Securely identifies users, authorizes consent, ensures data privacy and tracks user activity

Standardizes interfaces between HealthSuite enabled applications and devices with third-party systems

Provides functionality to help complete routine tasks and coordinate communications among users

A tailored set of capabilities and tools, optimized for rapid prototyping and development of healthcare and health-related applications

Host

Provides managed infrastructure to monitor the health of systems and performance of applications

Analyze

Acquire, access and manage personal data from devices and applications through a cloud-hosted repository

Offers the foundational infrastructure to build decision support algorithms and machine learning applications

17