42
Moving Towards A Data Repository That Facilitates Data Analysis CHOP November 18, 2009 1

Moving Towards A Data Repository That Facilitates Data Analysis CHOP November 18, 2009 1

Embed Size (px)

Citation preview

Moving Towards A Data Repository That Facilitates Data Analysis

CHOPNovember 18, 2009

1

Relational Database Design

2

Normalization

• Normalization - process of efficiently organizing data in a database to reduce redundancies of data

• Goal - consistency of data – Store data once and one time only!– security – disk space– speed of queries– efficiency of database updates– data integrity

In normalized database no aggregation and no calculated fields

3

Data Anomolies

4

Unnormalized data set

Patient ID

Name Address DOB Doc ApptDate

Location DX

111111 Cindy Marselis

2320 Edge Hill Road

1/11/64 Armstrong 9/1/09 11:00 AM

Alter 2011 Herniated DiscFlu

111111 Cindy Marselis

9331 Rising Sun Avenue

1/11/64 Morningstar 9/1/09 11:00 AM

Alter 2011 Herniated Disc

111111 Cindy Marselis

2320 Edge Hill Road

1/11/64 Allen 11/1/09 10:00 AM

Alter 2012 Psoriasis

222222 Kathryn Marselis

2320 Edge Hill Road

11/3/04 Dershaw 8/1/09 11:00 AM

Speakman 105

Well baby check

111111 Cindy Schwartz

9331 Rising Sun Avenue

1/11/64 Armstrong 8/11/09 3:00 PM

Alter 105 PsoriasisHerniated Disc

5

Normalized db - before

6

Normalized db - after

7

Example of Appointment Entity Relationship Diagram

8

Structured, free text, unstructured text

9

Free text

• Issues with string searches– Must match exactly in

case, punctuation, spelling, etc.

• Use of lookup tables where possible

10

Unstructured Text

• Gartner: white-collar workers spend from 30 to 40% of time managing documents

• Merrill Lynch: > 85 % of business information exists as unstructured data

– e-mails, memos, notes from call centers and support operations, news, user groups, chats, reports, letters, surveys, white papers, marketing material, research, presentations and Web pages.

• In relational db, data that can't be stored in rows and columns.

– stored in a BLOB (binary large object)

– e-mail files, word-processing text documents, PowerPoint presentations, JPEG and GIF image files, and MPEG video file

• Metadata (data about data can be stored)http://www.information-management.com/issues/20030201/6287-1.html 11

Approaches to structured and unstructured data

1. Unique database: consolidates all structured and unstructured data together

– expensive to buy and maintain

– large volume of data can clog the database making it slow and inefficient

2. Use two databases: one structured data, and one for unstructured data.

– Avoids performance issues with structured data

– significant performance limitations for unstructured data

12

Approaches to structured and unstructured data

3. Unstructured data left on file servers with database to record and links to unstructured data files.

– Avoids issue with volumes of data

– Fragile as links are broken when files and folders moved around.

– Must create links every time new document created

4. Complex and expensive connectors used to tap in all databases and file servers providing unified view of data.

– Expensive and complex requiring purchase and maintenance of multiple databases and file servers with the added cost of all required connectors.

5. Patents currently under development.

13

Certification Commission for Health Information Technology (CCHIT)

EHR Construct

EMAR: Electronic Medication Administration RecordCPOE: Computerized Physician Order Entry PFS: Physician Fee ScheduleOC/RR: Physician Order Communication/Results RetrievalCPOE: Computerized Physician Order Entry PFS: Physician Fee ScheduleR-ADT: Registration Admission Discharge Transfer

14

Data Warehousing

15

16

• External and internal forces require tactical and strategic decisions

• Search for competitive advantage

• Business environments are dynamic

• Decision-making cycle time is reduced

Pressures Driving Need for Business Intelligence and Data Warehousing

17

• Operational data – Relational, normalized

database – Optimized to support

transactions – Real time updates

Operational vs. Decision Support Data

• DSS – Snapshot of operational

data– Summarized – Large amounts of data

• Data analyst viewpoint– Timespan– Granularity– Dimensionality

18

Creating a Data Warehouse

19

• Separated from operational environment• Integrated Data• Historical data over long time horizon• Snapshot data captured at given time• Subject-oriented data• Mainly read-only data with periodic batch updates from

operational source, no online updates

Codd’s Key Data Warehouse Rules

20

• Contains different levels of data detail – Current and old detail– Lightly and highly summarized

• Metadata (data about the data) critical components – Identify and define data elements– Provide the source, transformation,

integration, storage, usage, relationships, and history of data elements

Codd’s Key Data Warehouse Rules contd.

Decision Support Systems DSS

21

22

DSS Components

Decomposition of DSS –Operational Data

o Tumor registryo A/D/To Radiology narrativeo Pathology narrativeo Lab resultso Patient Accountingo Charge Master

23

Decomposition of DSS –External Data

o Research spidero Treatment guidelineso Reimbursement scheduleso NCI/NIH protocols

24

Decomposition of DSS –ETL

• Rationalize normal lab values• Transform gender codes and free

text• Narrative dumps• Doctor cleansing

o Similar nameso Which practice gets credit?

25

26

ETL – Extraction, Transformation, Load

Transform: cleanse data for consistency and output exceptionso Apply business ruleso Selecting certain columns to load

(not null records)o Translating coded values (1, M, male

= 0 )o Derive new calculated value

(sale_amount = qty * unit_price)o Join data from multiple sources

(lookup, merge)o Aggregate (rollup/summarize data –

average LOS for each doctor by DRG)

o Transpose/pivot (turning columns into rows)

o Data validation.

Extractdata from

source systems

Load:data into

repository

ETL Best Practice

27

• ETL: 60-80% of development effort• Create multi-departmental team charged with

consensus on Transformation!• Review exceptions carefully

o Indicator of issues with operational db designo Indicator of changes needed in transformation

Decomposition of DSS –Business Data

• Business data – central repository

• Includes metadata: source, format, timing of feeds

Characteristic Factors

Integrated • Centralized• Holds data retrieved from

entire organization

Subject-Oriented

• Optimized to give answers to diverse questions

• Used by all functional areas

Time Variant • Flow of data through time• Projected data

Non-Volatile • Data never removed• Always growing

28

Decomposition of DSS –Business Model Data

• Comprehensive Cancer Center definition of a patient

o Must have seen a physician for suspected or confirmed benign or malignant condition

o What about patients seen for screening mammography?

29

Decomposition of DSS –End user query tool

• Web-based or client-server?• OLAP – Online Analytic Processing

o Microsofto Business Objects (bought by

SAP)o MicroStrategyo Cognos (bought by IBM)o Oracle (includes Hyperion)

30

Decomposition of DSS –End –user tool

o Drill down functionalityo Roll upo Charts – not data levelo Export features

http://demos.telerik.com/aspnet-ajax/chart/examples/functionality/drilldown/defaultcs.aspx

31

Design – Star Schema

32

Star Schema

• Center fact tableo usually contains numeric information for summary reports.

• Dimension table radiate from fact table• Dimension table is hierarchial

• ‘rollup’ allows to compare types of hospitals, disease categories, or even patient age bands.

• Creates logical data cube dimensions identifying a set of numeric measurements within the cube.

33

34

• Data-modeling technique • Maps multidimensional decision support into relational

database• Yield model for multidimensional data analysis while

preserving relational structure of operational DB• Four Components:

– Facts– Dimensions– Attributes– Attribute hierarchies

Star Schema

35

Simple Star Schema

Star Schema

36

Entity Relationship Diagram

37

Analysis

38

39

• Advanced data analysis environment• Supports decision making, business modeling, and

operations research activities • Characteristics of OLAP

– Use multidimensional data analysis techniques– Provide advanced database support– Provide easy-to-use end-user interfaces– Support client/server architecture

Online Analytical Processing (OLAP)

Healthcare Cube – slice and dice view

Diagnosis

Tim

e

Physician

40

Dashboard

41

Scorecard including Key Performace Indicators (KPI)

• Risk-adjusted mortality index

• Risk-adjusted complications index

• Risk-adjusted patient safety index

• Severity-adjusted average length of stay

• Expense per adjusted discharge, case mix- and wage-adjusted

42