35
Sven Junkergård - CTO Solving the Disconnected Data Problem in Healthcare Using MongoDB A MongoSF talk December 3 rd 2014

Solving the Disconnected Data Problem in Healthcare Using MongoDB

  • Upload
    mongodb

  • View
    575

  • Download
    0

Embed Size (px)

DESCRIPTION

The data diversity in healthcare and life sciences is exploding and the market is fundamentally changing as a result of healthcare reform. The result is more and more data but it is compartmentalized and disconnected. At Zephyr Health, we have developed a data platform that is able to provide connectivity between thousands of healthcare data assets using an ontology driven approach storing data in MongoDB. This session will show how we break down this very challenging problem and how some of MongoDBs more recent features have been utilized to do so.

Citation preview

Page 1: Solving the Disconnected Data Problem in Healthcare Using MongoDB

Sven Junkergård - CTO

Solving the Disconnected Data

Problem in Healthcare Using

MongoDB

A MongoSF talk – December 3rd 2014

Page 2: Solving the Disconnected Data Problem in Healthcare Using MongoDB

• MSc Computer Science and Engineering – Chalmers University

of Technology in Gothenburg

• AMS, Capgemini

• Cake Financial – aggregating retail investor portfolios and

generating investment insights from the best of the best

• Billfloat – novel financial credit product with highly differentiated

underwriting method

• Zephyr Health – built out technology and engineering team to

deliver on a big vision – integrate disconnected data in

healthcare and solve real problems. Now CTO.

ME

I am a reformed consultant who used to do architecture consulting…

2

Page 3: Solving the Disconnected Data Problem in Healthcare Using MongoDB

3

• Life Sciences

• Brand Management

• Big Data

• Applied Mathematics

• Algorithms

• IaaS | SaaS | PaaS

San FranciscoLondon

India

OFFICE LOCATIONS

ORGANIZATIONAL

EXPERTISE

CURRENT CLIENTSInclude members of:

GLOBAL TOP 5

BIOPHARM

GLOBAL TOP 5

PHARM

GLOBAL TOP 5

MEDICAL

DEVICES

WHO I WORK FOR – ZEPHYR HEALTH

• Machine Learning

• Artificial Intelligence

• Statistics & Modeling

• Data Science

• Visualization

• App Development

OUR FOCUS

• Organize disconnected data in healthcare and life science

• Visualize the combination of heterogeneous data sources in analytical problems

• Solve important and challenging problems for our customers

Page 4: Solving the Disconnected Data Problem in Healthcare Using MongoDB

V

V

V

Volume

Velocity

Variety

V Visualization

SOLVING THE VARIETY PROBLEM

4

Genomic sequencing

Streaming device data

Understanding healthcare

landscape and treatment

effectiveness

Healthcare example

• Image sources: illumina and iRhythm

Internal Vendor Public

Providing relevant and

powerful visualizations

that provide real insights

Data trends

Page 5: Solving the Disconnected Data Problem in Healthcare Using MongoDB

WHY HEALTHCARE DATA IS A DIFFERENT WORLD ENTIRELY

5

Loan application decision Clinical trial investigator decision

• Research

• Published trials

• Current sponsored trials

• Prescriptions

• Claims

• Funding

• Network leadership

• Site profile

• Site certification

• Site statistics

Applicant demographics

Bank

account

Credit

report

Identity

check

Income

verification

SSN

SSN SSN

SSN SSN

Investigator

Site

Patients

Inconsis

tent o

r mis

sin

g k

eys

Page 6: Solving the Disconnected Data Problem in Healthcare Using MongoDB

THE TYPES OF PROBLEMS THAT CAN BE SOLVED

WITH INTEGRATED DISPARATE DATA

Problem What is it?

Site selectionFinding the right locations to house clinical trials

Trail outcomesVisualizing data from different sources within clinical

trials

Medical expertise

communication

Identifying the healthcare professionals with the right

expertise

Scoring and rankingFinding the top ranking healthcare professionals or

institutions for a particular purpose

Network leadership

analysis

Understanding who is connected to who and how

information is disseminated

Care delivery

effectiveness

Identifying areas of great or poor performance and the

underlying reason

Patient outcomesRelating patient outcomes to specific market activities

Health economicsUnderstanding the financial effectiveness of an

intervention or introducing a new standard or care

6

Page 7: Solving the Disconnected Data Problem in Healthcare Using MongoDB

DATA CATEGORIES AND EXAMPLES

Keys Controlled Vendor specific Anything and nothing

FormatsSpreadsheets

(structured) Flat files Anything

Managing variety is the key to solving the problem

Sales

Speakers

Partners

CRM

Payments

Trials

Internal

Rx

Claims

Primary research

Consulting

Referral patterns

Vendors

Providers

Grants

Public trials

Research

Public

Creating a complete picture requires combining disconnected data from

an enormous variety of sources

7

Managing data variety is the key to solving the problem

Page 8: Solving the Disconnected Data Problem in Healthcare Using MongoDB

A DIFFERENT PROBLEM REQUIRES A DIFFERENT SOLUTION

Instead…

• A different data model based on

descriptive meta data

• A non-traditional data store

• Something other than Informatica

• Automated intelligent algorithms

• A few special tricks

• An API

• Some really great applications...

8

OLAP Cube BI Insigh

t

ETL DW DM

Page 9: Solving the Disconnected Data Problem in Healthcare Using MongoDB

ENTITY CENTRIC DATA MODEL

Entity

table

Data

source 1

Data

source 2

Data

source n

Entity

Attributes

Entity

Attributes

Entity

Attributes

Traditional, relational model Entity centric model

Meta

data

……

……

……

……

……

……

……

……

……

……

……

……

……

Page 10: Solving the Disconnected Data Problem in Healthcare Using MongoDB

ONTOLOGY-BASED DEVELOPMENT

10

Requirements• Flexible

• Extensible and adaptive

• Easy to maintain

Solution• Ontology: used to formally represent knowledge within a

domain

• Vocabulary: Collection of entities, attributes, relationships

that provides context within the domain

• Taxonomy (Classification): A hierarchical collection of

controlled terms from vocabulary

Page 11: Solving the Disconnected Data Problem in Healthcare Using MongoDB

VOCABULARY

11

Entities

Organic Attributes

Derived Attributes

Entity Relationships

Real world things or eventsE.g. Institution, patient, sales,

potential, etc.

Data points coming from datasets

E.g. first_name, age, revenue, date, etc.

Relationships between different entities

Processed key-value pairs from existing organic and/or derived

attributes

Page 12: Solving the Disconnected Data Problem in Healthcare Using MongoDB

WHY MONGODB?

Our requirements• Extremely flexible data storage• Low cost of evolving schema• Highly performant for complex joints, recursive queries etc• Scalable to large volumes of connected information

MongoDB: • Document store is a great fit for storing arbitrary information• Key-value pair in JSON format – (allowed for both adding data traceability and

cheap data evolution)• Secondary indexes and strict consistency• Map-reduce functionality

Challenges:• Queries are powerful but not easy to write• We needed complex joints across arbitrary information (how do you create an

index on something you don’t even know what it is ahead of time?)

12

Page 13: Solving the Disconnected Data Problem in Healthcare Using MongoDB

DATA ORGANIZATION

13

Full Profile

Main ProfileEntity

RelationshipsAttribute

References

Identity Section

Attributes (Organic + Derived)

dataset dataset_recordsFile

InfoRaw Data

Geo locations

Page 14: Solving the Disconnected Data Problem in Healthcare Using MongoDB

DATA INTEGRATION

14

{

first_name: Charles

last_name: Morris

street: 200 First St.

city: Rochester

state: MN

zip: 55905

phone: 802-555-1234

email: [email protected]

headshot: <AF6713…>

thought_leader_score: 8

pub_count: 203

}

DISPARATE SOURCESOF INFORMATION

STRUCTUREDPROFILE

APPLICATIONREPRESENTATION

All enabled through a series of data integration algorithms

Page 15: Solving the Disconnected Data Problem in Healthcare Using MongoDB

ALGORITHM EXAMPLES

15

Disambiguation

Dataset identification

Clustering

Record linkage

C MorrisHeart and Vascular Center

123 Main St

Rochester, MN 55903

802-555-9988

Charles “Chuck” MorrisCardiologist

200 First St.

Rochester, MN 55905

802-555-1234

[email protected]

??Automatically choosing

the most authoritative

version of an attribute

Maximizing re-use of

meta data describing

imported data sets

Pre-calculating clusters

in weakly attributed data

Page 16: Solving the Disconnected Data Problem in Healthcare Using MongoDB

ILLUSTRATIVE MONGODB PROFILE

{

“_id” : “53bcf9cae4b03f352d4b47c7“,

"identity": {"npi": "1",

"specialty": ["Cardiologist”],

"first_name": "Tom",

"last_name": "Smith”},

"attributes": {

"npi": {1},

"first_name": {"Tom”},

"last_name": {"Smith”},

"specialty": {"Cardiologist”}

}

}

16

NPI FirstName LastName Specialty

1 Tom Smith Cardiologist

Page 17: Solving the Disconnected Data Problem in Healthcare Using MongoDB

ADDING ADDITIONAL ATTRIBUTES

{

“_id” : “53bcf9cae4b03f352d4b47c7“,

"identity": {"npi": "1",

"specialty": ["Cardiologist”],

"first_name": "Tom",

"last_name": "Smith”},

"attributes": {

"npi": {1},

"first_name": {"Tom”},

"last_name": {"Smith”},

"specialty": {"Cardiologist”},

"institution": {"UCSF Medical Center”},

"clinical_trial": {"Heart Valve Clinical Trial”},

"start_date": {"01/01/2011”},

"end_date": {"03/25/2013”}

}

}

17

NPI FirstName LastName Specialty

1 Tom Smith Cardiologist

NPI Institutio

n

ClinicalTrial Name Start Date End Date

1 UCSF

Medical

Center

Heart Valve Clinical

Trial

01/01/2011 03/25/2013

Page 18: Solving the Disconnected Data Problem in Healthcare Using MongoDB

TRICKS TO TAME THE WILD DATA

• Ontology – how we keep track of all ingested information

• Vocabulary – bringing structure to large variety of information

• Derived attributes – encapsulate complexity

• GIS transformations – practical integration of geo data

• Indexing – fast access to complex information in MongoDB

18

Page 19: Solving the Disconnected Data Problem in Healthcare Using MongoDB

DERIVED ATTRIBUTES

What’s the problem?• Data is rarely clean and business rules are

complex

What are we doing about it?• Use existing (organic) attributes and apply

rules to generate new (derived) attributes

• Derived attributes generated through

queries or map-reduce jobs

Why it matters• Too complex and expensive to consider all

business rules at run-time with every query

• Hides the complexity and introduces

uniformity

19

Entity

Attributes

Page 20: Solving the Disconnected Data Problem in Healthcare Using MongoDB

GEOSPATIAL MAPPING APPROACH FOR

AWKWARD GEO DATA

20

Using traditional method

Reporting unit

Postal codes

Stuttgart District

Using geospatial method

Geocoded reporting unit

State

• Additional challenges with mismatches

between

reporting unit postal codes and mapping

postal codes

• Have to compensate for missing postal

codes

• Split patients or metrics across multiple

regions

when reporting unit spans multiple regions

Mapping + calculations

Baden-Württemberg

Mapping + calculation

State

Baden-Württemberg

Stuttgart

District

• Requires determining a single central point for each

reporting unit

• Uses no mapping documents

• No compensatory calculations required

• Overall accuracy increases

701737017370173

Page 21: Solving the Disconnected Data Problem in Healthcare Using MongoDB

INDEXING

Why MongoDB alone does not get it done• Cross collection queries required for large number of scenarios

• Indexing challenges when dealing with unknown information

What we did• Graph based index

• Entities and attributes are nodes

• Entity – attribute ownership and entity to entity relationships are edges

How we use it• zQueries allow us to do complex

queries from web front ends

21

Page 22: Solving the Disconnected Data Problem in Healthcare Using MongoDB

Disconnected Data Apps for Life Sciences

Algorithm Driven

Data Ingestion

Synchronization

Proprietary REST API

zQuery

Internal Vendor Public

Data Organized in

Connected Profile

Documents

Graph Based

Materialized

Query Index

Ontology Driven Data Tier

100,000,000+ data points ingested and indexed each year

THE ZEPHYR PLATFORM

100,000,000+ data points ingested and indexed each year

22

Page 23: Solving the Disconnected Data Problem in Healthcare Using MongoDB

Zephyr Platform

Ontology Driven

Data Store

A

P

I

REST API

Exposes both data and the

ontology

zQueries

jSON based query language for

queries against dynamic and

connected data

Functional Focus

Solving specific business problem

with focused apps

Design

Single page apps with targeted

data visualizations

Analytical Apps

CONSUMING INTEGRATED DISPARATE DATA

Analytical applications use the zAPI and the ontology to produce

applications that adapt to changing data

23

Page 24: Solving the Disconnected Data Problem in Healthcare Using MongoDB

TARGETED ANALYTICAL APPLICATIONS

Apps for real business problems leveraged by everyday business users

Illuminate

Voyager Kaleidoscope

24

Lighthouse

Page 25: Solving the Disconnected Data Problem in Healthcare Using MongoDB

A BRIEF DEMO

25

Page 26: Solving the Disconnected Data Problem in Healthcare Using MongoDB

LEARNINGS

• There was no one technology or one database that provided a

compete solution embrace diversity

• Create generic platform, pour effort into specialized

algorithms to populate data intelligently

• Ontology driven development can be very powerful but data

organization still a challenge

• Indexing on a priori unknown attributes is challenging

• Data modeling is always important, large profiles had to be

broken down

26

Page 27: Solving the Disconnected Data Problem in Healthcare Using MongoDB

SUMMARY

Wrapping it all up in five points

1. Healthcare is different and has lots of critical data that is disconnected

2. Generic, MongoDB-based data storage model using meta-data

3. Data integration powered by algorithms

4. Document profiles for facts, graph for querying

5. Diverse set of end user analytical applications powered by the generic data

platform

Why this matters

• Standards are really important, but slow to develop

• Huge amount of change occurring in our healthcare system

• We need to make decisions today based on available data sets despite existing

challenges

27

Page 28: Solving the Disconnected Data Problem in Healthcare Using MongoDB

THANK YOU!

Brian Roy – Strategy and architecture

Mahesh Chaudhari – Database architecture

Cesar Arevalo – Data integration implementation

The guys that made all of it come together!

28

Page 29: Solving the Disconnected Data Problem in Healthcare Using MongoDB

Zephyr Health

450 Mission St. Suite 201

San Francisco, California 94105

+1.415.529.7649

zephyrhealth.com

CTO

+1.415.503.7412

[email protected]

Sven

Junkergård

29

CONTACT INFORMATION

Page 30: Solving the Disconnected Data Problem in Healthcare Using MongoDB

BACKUP SCREEN SHOTS

30

Page 31: Solving the Disconnected Data Problem in Healthcare Using MongoDB

ILLUMINATE – LANDING PAGE

Page 32: Solving the Disconnected Data Problem in Healthcare Using MongoDB

ILLUMINATE – ALL CASES VIEW

Page 33: Solving the Disconnected Data Problem in Healthcare Using MongoDB

ILLUMINATE – GRID VIEW

Page 34: Solving the Disconnected Data Problem in Healthcare Using MongoDB

ILLUMINATE – GRAPH VIEW

Page 35: Solving the Disconnected Data Problem in Healthcare Using MongoDB

ILLUMINATE – PROFILE VIEW