and Challenges and Platform Evaluation · UHP Hadoop deployment is composed of, 1 master and 2 slave nodes, with 1.8TB HDFS size, 20MB block size, Block Replication of 3, and 64GB

/ 2

04 Case Study and Evaluation

03 Requirements and Platform

02 Interoperability and Challenges

01 Health ITEcosystem

/ 3Sources of Big Data in Health-care

https://tcf.org/content/report/strengthening-protection-patient-medical-data/?agreed=1

/Health-care Reality

Volume of patient data increasing exponentially

Quality of patient data declining

Fragmented, duplicate and conflicting patient information within and across databases and touch points

Regulatory and safety issues drive new requirements

Lorraine Fernandes, Bill Klaver (Year), ‘Why Initiate: The foundation for healthcare interoperability’, Initiate Project,

https://slideplayer.com/slide/695106/

/ 5Patient Identification for Ubiquitous Profiling

$

Improve patient care and reduce

medical risks

Improve efficiency by

reducing redundant

care activities

Support consumer directed health

information management

Comply with regulations

Enhance operational productivity

and efficiency

Interoperable Health Care System

Lorraine Fernandes, Bill Klaver (Year), ‘Why Initiate: The foundation for healthcare interoperability’, Initiate Project, https://slideplayer.com/slide/695106/

/ 6Health-care Ecosystem

Lorraine Fernandes, Bill Klaver (Year), ‘Why Initiate: The foundation for healthcare

interoperability’, Initiate Project, https://slideplayer.com/slide/695106/

Exists Heterogeneity

/ 7Data Heterogeneity

John DoePID 1234

John PId 5678

Medical Record impedance mismatch1. Different field names2. Different normalization3. Missing data

Hospital A Hospital B

Patient

/ 8Data Heterogeneity Examples

idName

ExternalID

DOB

Sex

SS

License

MaritalStatus

UserDefined

BillingNote

Address

City

State

PostalCode

Country

MotherName

EmergencyContact

EmergencyPhone

HomePhone

WorkPhone

MobilePhone

ContactEmail

TrustedEmail

Provider

Referring_Provider

Pharmacy

HIPPANoticeReceived

AllowVoiceMessage

LeaveMessageWith

AllowMailMessage

AllowSMS

AllowEmail

AllowImmunizationRegistryUse

AllowImmunizationInfoSharing

AllowHeartInformationExchange

AllowPatientPortal

CareTeam

CMSPortalLogin

ImmunizationRegistryStatus

ImmunizationRegistryStatusEffect

iveDate

PublicityCode

PublicityCodeEffectiveDate

ProtectionIndicator

ProtectionIndicatorEffectiveDate

Language

Race

Ethnicity

FamilySize

FinancialReviewDate

Homeless

MonthlyIncome

Interpreter

MigrantSeasonal

VFC

Religion

DateDecreased

ReasonDecreased

openemr_Demographics openemr_MedicalProblemsTitle

Coding

BeginDate

EndDate

Occurrence

ReferredBy

Outcome

Destination

openemr_PrescriptionsPatientName

Add

EndDate

Occurrence

ReferredBy

Outcome

Destination

Krsiloemr_tblPatient

PatientID

PatientMRNNo

PatientName

DateOfBirth

Age

Gender

SymptomsAndSigns

ClinicalHistory

PhysicalExam

ECG

NTproBNP

BNP

LVEF

LAVI

LVMI

Ee

eSeptal

LongitudinalStrain

TRV

EncounterDate

OpenEMRPatient Record

IMP SiloPatient Record

1. Different field names2. Different normalization3. Missing Data

?

Solution:Interoperability

/ 9What is Health-care Interoperability?(Technical Definition)

IEEE :: interoperability

HL7 :: interoperability

IEEE Standard Computer Dictionary: A Compilation of IEEE Standard Computer Glossaries (New York, NY: 1990)

HIMSS :: interoperability

Interoperability means the ability of health information systems to worktogether within and across organizational boundaries in order to advancethe effective delivery of healthcare for individuals and communities. Thereare three levels of health information technology interoperability:1) Foundational; 2) Structural; and 3) Semantic.

Functional

reliably exchange informationwithout error

Semantic

interpret, and effectively use the exchanged information

The Ability of two or more systems or components to exchange informationand to use the information that has been exchanged

IEEE :: interoperability

/ 10State-of-the-art in Healthcare Interoperability

Structured Storage with semantic reconciliation on write• HL71 and openEHR2

1. http://www.hl7.org 2. http://www.openehr.org/ 3. https://www.opencimi.org/ 5. http://yosemiteproject.org/interoperability-roadmap/

Standardize the Standards• Clinical Information Modeling Initiative (CIMI) 3

• LOINC + SNOMED-CT Integration4

Use crowd sourcing for generating mappings• Yosemite Project5

4. https://loinc.org/collaboration/snomed-international/


o Semantic Matching• Pattern based Mediation Systems

Differentor based similarity matric creation[1]

Tree Structure Based Ontology Integration(TSBOI)[2]

PatOMat[3]; An Ontology Preprocessing Language and OWL based Generic Framework for automatic pattern detection and ontology transformation

• Healthcare standards based Mediation Systems LinkEHR[4,5]; A tool for transforming between HL7v2, openEHR and

CEN/ISO13606

Poseacle Convertor[6]; transforming CEN/ISO 13606 and openEHR

ResearchEHR[7]; transformation using Poseacle convertor, and structured data curation tools


o Semantic Integration• Ontology-Based Data Access(ODBA)[8] Framework working on well-defined domain ontologies(e.g. Ontology for

Cancer Research Variables-OCRV)

• Health Service Bus[9]; provides transformations using XSLT between HL7v3 to HL7v2 and openEHR

• Event Driven Health Service Bus[10]; utilizes JBossESB to convert structured medical data to RDF form and then creating a semantic linked graph using Health and Lifelogging Data-HLD Ontology

Interoperability in Big Data?

/ 13Big “Health-care” Data and Interoperability

BigData

Volume

Velocity

Veracity Variety

Value

• Primary Data Sources• HMIS, Clinical Decision Support Systems (CDSS), and IoT devices

• Secondary Data Sources• general living habits, Medical Knowledge Management Systems, Biobanks, Geno

me data stores and others

• Streaming Data• Medical IoT, Continuous Glucose Monitor, Smart

Watch• Requires Low Latency

• Non Streaming Data• HMIS• Prefers High Reliability

• Data Format• Formal Standards (Kiah 2014)• HL7, LOINC, SNOMED-CT• Non-formal Standards (Geissbuhler 2011)• Purpose (Dale Compton 2005)• Patients• Medical Experts• Organizations• Environment

• Low quality of Data• Lack of Golden Ontology, which can standardize all

EHRs• High Volumes, does not mean High Quality (especially qualitative data) (Boyd 2011)

• LinkedEHR(Denaxas 2012;Hemingway 2017)

How can we identify new insights, resulting from integration of medical data?

• UK Biobank with 500,000 participants (Sudlow 2015)

• mendelian disorder risk study with 100 million participants (Blair 2013)

• EHR4CR project with 45 partners in EU (De Moor 2015)

/ 14Interoperability Perspectives

Data Interoperability

Knowledge Interoperability

Process Interoperability

Data IntegrationData Exchange Data Usage

/ 15Data Interoperability Requirements

• Voluminous data

• Non standard compliant implementations

• Different data representation standards

• Different terminologies(e.g. LOINC vs SNOMED-CT)

• Mapping generation

• Mapping conflict resolution

• Mapping change management

Data Integration

• Different messaging standards

• Different communication methods (web services, p2p, etc.)

• Semantics of information

• Globalization (Language differences)

Data Exchange

• Mapping generation

• Mapping conflict resolution

• Mapping change management

• Selection of a feasible (e.g. high execution speed, high accuracy) semantic transformation tool/algorithm

• Privacy

Data Usage

A platform is required to handle

these requirements:

Ubiquitous Health Platform

Healthcare Data

Data In

terop

erability

Physiological Sensors

Medical Expert

Organization

Clinical Notes

Patient

Healthcare Information

Healthcare Knowledge

/ 16Ubiquitous Health Platform (UHP)Use Cases

Medical Data Persistence Medical Profile Build Semantic Transformation

UHP

Physiological Sensors EMR EHR Clinical Notes

Patient DoctorOrganization

/ 17Ubiquitous Health Platform (UHP)Abstract Idea

Data Source 1

Data Source 2

Data Source 3

Data Source n

…

Big Data Store

Query Interface 1

…

Query Interface 2

Query Interface 3

Query Interface 4

Semantic Query

Controller

Data Integration

Semantic Maps

Semantic Reconciliation 1

…



Semantic Reconciliation n

Big Data Curation Mediation based semantic reconciliation-on-read

Expert driven Semantic Verification

Semantic Blocks

Semantic Blocks

Semantic Blocks

Medical Experts

/ 18Use Case: Semantic Transformation

EHR BEHR A

L-StoreMedical Data Archive

EHR X

SemanticTransformation

Ontology Store

UHP Maps

UHP

Patient

Organization

Medical Expert

/ 19Use Case: Ubiquitous Health Profile

19

EHR B EHR A

EHR BEHR AMedical documents

k1 v1k2 v2

.k3 v3

Map

Patient Medical Profile

UHPr

L-Store Medical Data Archive

Ontology Store

UHP Maps

Raw Data

Identifier (𝑖𝑚)

Type (τ)

Version (𝑣𝑚)

UHP

Patient

MedicalExpert

Medical Expert

/ 20Deployment

L-StoreOpenEMR

OpenEMRData

(12 pts.)

CardioSiloEMR

CardioSilo Data(40 pts.)

UHP

UHPr Storage Form

DocDocDoc

Raw Data

Identifier (𝑖𝑚)

Type (τ)

Version (𝑣𝑚)

HDFS

290,101 patient records

Dem

ogr

aph

ic

Rep

ort

Cardio pt.

Report

UHP Hadoop deployment is composed of, 1 master and 2 slave nodes, with 1.8TB HDFS size, 20MB block size, Block Replication of 3, and 64GB ram on the master, while 32GB on the slaves.

8,202,040 Medical Fragments

Med. Problem Report

PrescriptionReport

Sem

an

tic Qu

ery

Inte

rface

EHR A

Hospital A

Hospital B

EHR B

EHR XHospitalC

Patient

/ 21Case Study: Patient Integrated Record

OpenEMR Reports1. Demographics

2. Medical Problems3. Prescription

IMP Cardiovascular Medical Silo

Name: Harry PotterDate of Birth: 1988-07-08

UHPr Storage and processing

OpenEMR

IMP CardiovascularMedical Silo

Medical Fragments

OpenEMR Hospital A

Hospital B

IMP Cardiovascular Medical Silo

/

Scalability Evaluation Criteria

Evaluation Metrics

Iterations

• OpenEMR• IMP CardioVascular Medical Silo

22Experimental Setup: Ubiquitous Health Profile

Timeliness

Scalability

Accuracy

Id DESCRIPTION

C1 Time taken to insert UHPr medical fragment file into HDFS

C2Time taken to insert medical fragment bridging information, linking gid(𝑖𝑈𝐻𝑃𝑟) with

fragmentid(𝑖𝑓) into HDFS

C3 Time taken to insert UHPr patient index part of L-Store into HDFS

C4 Time taken to create UHPr table schema in Hive

C5 Time taken to create medical fragment bridging table schema in Hive

C6 Time taken to create UHPr patient index table schema in Hive.

C7 Time taken to retrieve all fragment ids for 1 user

C8 Time taken to retrieve all medical fragments for 1 user

Iteration TOTAL MEDICAL FRAGMENTS

1 2,000

2 200,000

3 800,000

4 2,400,000

5 2,400,000

6 40

Dataset Timeliness Evaluation Criteria

Id DESCRIPTION


C2Time taken to insert medical fragment bridging information, linking gid(𝑖𝑈𝐻𝑃𝑟) with

fragmentid(𝑖𝑓) into HDFS


C4 Time taken to create UHPr table schema in Hive

C5 Time taken to create medical fragment bridging table schema in Hive

C6 Time taken to create UHPr patient index table schema in Hive.



Vertical Scaling Horizontal Scaling

/ 23Evaluation: Ubiquitous Health Profile

•New Patients(P):80,000

•New Medical Records(MR): 2,400,000

Iteration 0

•P: 100

•MR: 2,000

Iteration 1•P: 10,000

•MR: 200,000

Iteration 2

•P: 40,000

•MR: 800,000


•MR: 2,400,000

Iteration 4

•P: 80,000

•MR: 2,400,000

Iteration 5•P: 1

•MR: 40

Iteration 6

Timeliness; The medical fragments are quickly archived and retrieved at a faster pace than data growth rate.

1 2 3 4 5 6 7 8 9 10

C7(28.8528s) 27.782 27.991 28.849 29.663 28.719 28.983 29.233 29.022 29.272 29.014

C8(119.1014s) 121.43 119.56 117.02 117.93 118.11 117.48 118.31 119.2 122.38 119.6

0

20

40

60

80

100

120

140

Tim

e (s

eco

nd

s)

Attempt

Iteration 1

C7(28.8528s) C8(119.1014s)

1 2 3 4 5 6 7 8 9 10

C7(28.4869s) 27.429 29.051 28.631 29.497 29.172 28.921 27.614 27.869 28.622 28.063

C8(121.4805s) 121.46 119.61 121.52 122.94 120.78 122.59 120.34 121.21 121.49 122.87

0

20

40

60

80

100

120

140

Tim

e (s

eco

nd

s)

Attempt

Iteration 2

C7(28.4869s) C8(121.4805s)

1 2 3 4 5 6 7 8 9 10

C7(30.9533s) 30.604 30.703 30.829 30.488 30.942 30.579 31.44 30.556 31.451 31.941

C8(128.011s) 127.44 127.88 127.43 128.53 130.18 126.62 128.00 128.18 126.37 129.42

0

20

40

60

80

100

120

140

Tim

e (s

eco

nd

s)

Attempt

Iteration 3

C7(30.9533s) C8(128.011s)

1 2 3 4 5 6 7 8 9 10

C7(33.0076s) 34.826 32.043 31.756 33.186 32.481 34.64 32.837 33.638 31.621 33.048

C8(139.1931s) 138.69 136.78 136.98 140.24 140.2 142.1 139.3 138.98 141.56 137.11

020406080

100120140160

Tim

e (s

eco

nd

s)

Attempt

Iteration 4

C7(33.0076s) C8(139.1931s)

1 2 3 4 5 6 7 8 9 10

C7(33.7804s) 32.49 33.833 33.3 33.459 34.75 33.572 33.559 33.456 33.889 35.496

C8(148.0349s) 150.6 147.31 147.91 146.57 147.3 147.79 151.72 150.18 144.95 146.02

020406080

100120140160

Tim

e (s

eco

nd

s)

Attempt

Iteration 5

C7(33.7804s) C8(148.0349s)

1 2 3 4 5 6 7 8 9 10

C7(124.8474s) 125.88 120.34 125.38 124.86 122.9 121.53 131.01 130.17 122.92 123.5

C8(194.5284s) 190.73 193.31 194.34 192.22 196.59 194.55 195.93 196.11 196.96 194.54

0

50

100

150

200

250

Tim

e (s

eco

nd

s)

Attempt

Iteration 6a

C7(124.8474s) C8(194.5284s)

1 2 3 4 5 6 7 8 9 10

C7(57.4094s) 57.537 57.552 55.975 57.001 57.738 56.93 58.232 58.526 57.845 56.758

C8(104.7012s) 102.86 103.67 103.19 103.24 106.85 109.89 103.98 101.98 103.52 107.83

0

20

40

60

80

100

120

Tim

e (s

eco

nd

s)

Attempt

Iteration 6b

C7(57.4094s) C8(104.7012s)


•New Patients(P):80,000

•New Medical Records(MR): 2,400,000

Iteration 0

•P: 100

•MR: 2,000


•MR: 200,000

Iteration 2

•P: 40,000

•MR: 800,000


•MR: 2,400,000

Iteration 4

•P: 80,000

•MR: 2,400,000

Iteration 5•P: 1

•MR: 40

Iteration 6


2000 200000 800000 2400000 2400000 40

C1 1.863 3.553 8.396 21.237 21.378 1.899

C2 1.915 1.954 2.169 2.304 2.317 1.992

C3 1.96 7.792 25.595 69.648 69.559 1.891

05

1015202530354045505560657075

Tim

e (s

econ

ds)

Medical Fragments per Iteration

Timeliness of recording medical fragments

C1 C2 C3


C2Time taken to insert medical fragment bridging information, linking gid(𝑖𝑈𝐻𝑃𝑟) with fragmentid(𝑖𝑓) into HDFS




C7 C8

1 28.8528 119.1014

2 28.4869 121.4805

3 30.9533 128.011

4 33.0076 139.1931

5 33.7804 148.0349

6a 124.8474 194.5284

6b 57.4094 104.7012

0

20

40

60

80

100

120

140

160

180

200

220

Tim

e (s

eco

nd

s)

Timeliness of record retreival from HDFS using Hive




• Scalability; The medical fragments are quickly archived and retrieved at a faster pace than data growth rate.• Accuracy; Each medical fragment is retrieved accurately

1 2 3 4 5 6

New Medical Fragments 2000 200000 800000 2400000 2400000 40

Total Medical Fragments 2402000 2602000 3402000 5802000 8202000 8202040

1

2

4

8

16

32

64

128

256

512

1024

2048

4096

8192

16384

32768

65536

131072

262144

524288

1048576

2097152

4194304

0

1000000

2000000

3000000

4000000

5000000

6000000

7000000

8000000

9000000

Log 1

0sc

ale

of m

edic

al f

ragm

ents

Nu

mb

er o

f m

edic

al f

ragm

ents

Scalability in UHPr Accuracy: 100%

C7 C8

1 28.8528 119.1014

2 28.4869 121.4805

3 30.9533 128.011

4 33.0076 139.1931

5 33.7804 148.0349

6a 124.8474 194.5284

6b 57.4094 104.7012

0

20

40

60

80

100

120

140

160

180

200

220

Tim

e (s

eco

nd

s)

Timeliness of record retreival from HDFS using Hive

/ 27Conclusion

Interoperable system essence lies in availability of medical data

Big data provides support to achieve data interoperability by integrating multiple sources data

UHP utilizes state of the art technologies to build medical profiling of different patients

Real time data will bring real time challenges to build an effective interoperable system

/ 28Future Direction

http://blog.timicoin.io/blockchain-and-tokenization-make-ehr-interoperability-irrelevant-and-more-importantly-create-a-marketplace-for-healthcare-innovation/

Query Interface 1

Query Interface 2 Semantic Query

Controller

Semantic Maps


…



Semantic Reconciliation n

Mediation based semantic reconciliation-on-read

Expert driven Semantic Verification

Semantic Blocks

Semantic Blocks

Semantic Blocks

Medical Expert

Big Data Store

/ 29References

1. P. Xu, Y. Wang, and B. Liu, “A differentor-based adaptive ontology-matching approach,” J. Inf. Sci., vol. 38, no. 5, pp. 459–475, 2012.

2. J. Xie, F. Liu, and S. U. Guan, “Tree-structure Based Ontology Integration,” J. Inf. Sci., vol. 37, no. 6, pp. 594–613, 2011.

3. O. Zamazal and V. SVÁTEK, “PatOMat-Versatile Framework for Pattern-Based Ontology Transformation.,” Comput. Informatics, vol. 34, no. 2, pp. 305–336, 2015.

4. J. A. Maldonado, D. Moner, D. Boscá, J. T. Fernández-Breis, C. Angulo, and M. Robles, “LinkEHR-Ed: A multi-reference model archetype editor based on formal semantics,” Int. J. Med. Inform., vol. 78, no. 8, pp. 559–570, 2009.

5. C. Martínez Costa, M. Menárguez-Tortosa, and J. T. Fernández-Breis, “Clinical data interoperability based on archetype transformation,” J. Biomed. Inform., vol. 44, no. 5, pp. 869–880, 2011.

6. M. Marcos, J. A. Maldonado, B. Martínez-Salvador, D. Boscá, and M. Robles, “Interoperability of clinical decision-support systems and electronic health records using archetypes: A case study in clinical trial eligibility,” J. Biomed. Inform., vol. 46, no. 4, pp. 676–689, 2013.

7. J. A. Maldonado et al., “Using the ResearchEHR platform to facilitate the practical application of the EHR standards,” J. Biomed. Inform., vol. 45, no. 4, pp. 746–762, 2012.

8. H. Zhang et al., “An ontology-guided semantic data integration framework to support integrative data analysis of cancer survival,” BMC Med. Inform. Decis. Mak., vol. 18, no. Suppl 2, 2018.

9. A. Ryan and P. Eklund, “The health service bus: An architecture and case study in achieving interoperability in healthcare,” Stud. Health Technol. Inform., vol. 160, no. PART 1, pp. 922–926, 2010.

10. D. T. Meridou, C. Z. Patrikakis, A. P. Kapsalis, I. S. Venieris, P. Kasnesis, and D.-T. I. Kaklamani, “An event-driven health service bus,” MOBIHEALTH 2015 - 5th EAI Int. Conf. Wirel. Mob. Commun. Healthc. - Transform. Healthc. through Innov. Mob. Wirel. Technol., 2015

11. S. P. Gardner, “Ontologies and semantic data integration,” Drug Discov. Today, vol. 10, no. 14, pp. 1001–1007, 2005.

12. H. Hemingway et al., “Big data from electronic health records for early and late translational cardiovascular research: Challenges and potential,” Eur. Heart J., vol. 39, no. 16, pp. 1481–1495, 2018.

13. Y. Katsis et al., “Big Data Techniques for Public Health: A Case Study,” 2017 IEEE/ACM Int. Conf. Connect. Heal. Appl. Syst. Eng. Technol., pp. 222–231, 2017.

Documents

and Challenges and Platform Evaluation · UHP Hadoop deployment is composed of, 1 master and 2 slave nodes, with 1.8TB HDFS size, 20MB block size, Block Replication of 3, and 64GB