28
2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of the technical staff [email protected]

2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

Embed Size (px)

Citation preview

Page 1: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved.

Integrating XML with legacy relational data

for publishing on handheld devicesDavid A. LeeSenior member of the technical [email protected]

Page 2: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 2

Importance of correct Clinical Information

Page 3: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 3

Introduction and Agenda

• Introduction

• Background “The Problem Space”

• False Starts

• “Common Object Model”

• Final Design

• Lessons Learned

• Conclusion

Page 4: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 4

Introduction

• Who is Epocrates ?

• Common Terminology

• Core Application

Page 5: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 5

IntroductionWho is Epocrates ?

• Epocrates is the industry leader in providing clinical references on handheld devices.

• 475,000 active subscribers

• Subscription based clinical publishing

Page 6: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 6

IntroductionCommon Terminology

• PDA - "Personal Digital Assistant".

• Monograph - information describing a single drug, disease, lab test, preparation or other clinical entity.

• Syncing - The process of synchronizing a server's database with a PDA

• PDB - "Palm Database". A very simple variable length record format with a single 16 bit key index.

Page 7: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 7

IntroductionCore Application

“Essentials”

Handheld Clinical Reference

Page 8: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 8

Background “The Problem Space”

• XML vs “Legacy” data

• Characteristics of the application

• Characteristics of "legacy" data

• Characteristics of "new" XML data

• Publishing Workflow

• Workflow Requirements

Page 9: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 9

BackgroundXML vs “Legacy” data

XML “Legacy” data

Page 10: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 10

BackgroundCharacteristics of the Application

• Runs on handheld devices

• Limited memory capability (8MB typical)

• Simple database

• Small display

• Synchronization speed critical

• Linking of related content

Page 11: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 11

BackgroundCharacteristics of “legacy” data

• Stored in Oracle SQL database

• Highly structured and referential

• Hard to change schema

• Data constantly changing (manual editing)

• Specifically designed schemas and content for presentation on PDAs.

• Difficult to change workflow, representation or tools

• Part of a complex workflow for publishing data to large subscriber base

Page 12: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 12

BackgroundCharacteristics of “new” XML data

• One large XML Document containing many “monographs”. 15-150MB common.

• Periodic updates with unknown amount of change.

• Schema likely to change unexpectedly

• No control over content

• Referential data within and across monographs.

• Both structural and “markup” type elements

Page 13: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 13

BackgroundPublishing workflow

Wireless

OracleSQL Database

Clinical SpecialistExternalData

WeeklyPublishingProcess

Desktop

Data Center“Sync Servers”

Desktop

PDA

PDA

PDA

Clinical Specialist

Incremental Data“Delta”

Record Images

Complete Data

Record Images

Page 14: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 14

BackgroundWorkflow Requirements

• Integrates into current workflow with minimum changes to existing process and data structures.

• Reliable change detection

• Support for deferred detection of dependancies

• Accurately manage changes when data is updated

• Resilient to XML schema changes

• Extensible design

Page 15: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 15

False Starts

• One Big BLOB

• Full Normalization

• One BLOB per monograph

• XML Database

Page 16: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 16

False StartsOne Big BLOB

Store entire XML as a single “BLOB”

Pros

• Very simple and easy

Cons

• Deferred almost all processing to sync server

• Impossible to detect changes

• Solves no significant problem over using the filesystem

• No relational representation

• Difficult to search with SQL

• No structure or indexing at SQL level

Page 17: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 17

False StartsFull Normalization

Fully normalize XML schema into separate DB tables for every element.

Pros• “Ideal” relational representation, referential integrity• Fine granularity of modification detection

Cons• Very large number of tables (> 150)• Difficult to implement• Bad performance

Page 18: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 18

False StartsOne BLOB per Monograph

Split each monograph into an XML document fragment and store as a BLOB.

Pros• Fairly simple to implement• Granularity maps well to device DB structure

Cons• Difficult to search via SQL• Referential data not exposed at SQL level• Significant processing deferred to sync server

Page 19: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 19

False StartsXML Database

Use a native XML Database

Pros

• Efficient and architecturally clean XML storage

Cons

• NO in-house experience

• Difficult to integrate with existing tools

• “Locked In” to DB provider

• XML largely processed in-memory

Page 20: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 20

Common Object Model

Abstract design pattern for modeling content with mappings to concrete representations.

• Document Structure

• XML Mapping

• Database Mapping

• Application Mapping

Page 21: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 21

Common Object ModelDocument Structure

Indexing

ClassIndex

by type

Sub Classes Sub Classes

TopicsIndex

by type

Topics

External Indexing

ICD9 Codes

Drug ID/Name

????

MonographTitle

Sub Title

Section

Sub Section

Sub Section

Section

Sub Section

Sub Section

Page 22: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 22

Common Object ModelXML Mapping

MonographTitle

Sub Title

Section

Sub Section

Sub Section

Section

Sub Section

Sub Section

<monograph>

<section>

<sub_section>

Monograph Text

</sub_section>

<sub_section> ... </sub_section>

</section>

<section> ... </section>

<indexes>

Indexing Data

</indexes>

</monograph>

Page 23: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 23

Common Object ModelDatabase Mapping

MONOGRAPH

TOPIC_INDEX

CLASSES

MONO_PDB

CLASS_TOPICS

Page 24: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 24

Common Object ModelApplication Mapping

MonographTitle

Sub Title

Section

Sub Section

Sub Section

Section

Sub Section

Sub Section

Page 25: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 25

Final Design

Final Design comprises a model based on the Common Object model as a Design Pattern.

• One BLOB per Monograph– XML Document Fragment

• Normalized referential data– Key fields as table fields

• Separate “compiled” BLOB per monograph

Page 26: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 26

Final SolutionModified Workflow Processes

Wireless

OracleSQL Database

Clinical SpecialistExternalData

WeeklyPublishingProcess

Desktop

Data Center“Sync Servers”

Desktop

PDA

PDA

PDA

Clinical Specialist

Incremental Data“Delta”

Record Images

Complete Data

Record Images

Data Loader

XML Data

“PDB”Generation

Page 27: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 27

Lessons Learned

• Split up large XML files

• Don’t assume “All or Nothing”

• Process XML with a programming language

• Look for the distinction between “Structure” and “Markup”

• The ‘Real World’ is a compromise.

Page 28: 2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of

2005 Epocrates, Inc. All rights reserved. Slide | 28

Conclusion