I2b2-1 CSE 5095 Topics in Biomedical Informatics Antonio Cusano Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road,

i2b2-1

CSE5095

Topics in Biomedical InformaticsTopics in Biomedical Informatics

Antonio CusanoComputer Science & Engineering Department

The University of Connecticut371 Fairfield Road, Box U-255

Storrs, CT 06269-2155

[email protected] 2011

Informatics for Integrating Biology and the Bedside

(i2b2)http://www.i2b2.org

i2b2-2

CSE5095

OverviewOverview

Introduction to i2b2Introduction to i2b2 Modeling the i2b2 Data ModelModeling the i2b2 Data Model Overview of the i2b2 Software ToolsOverview of the i2b2 Software Tools Using the i2b2 SoftwareUsing the i2b2 Software Overview of the i2b2 Hive CellsOverview of the i2b2 Hive Cells Example Use Case ScenarioExample Use Case Scenario Notable Projects & Usage in BMINotable Projects & Usage in BMI Evaluating i2b2Evaluating i2b2 SummarySummary

i2b2-3

CSE5095

Background & MotivationBackground & Motivation The rise of Electronic Medical Record Systems The rise of Electronic Medical Record Systems

(EMRS) holds great promise for clinical research(EMRS) holds great promise for clinical research Increasingly important for integration between

medical record data and clinical research data But many challenges exist:But many challenges exist:

EMRS are typically built with the “single patient” in mind It would be difficult to observe trends in data across

combinations of many patients How do we “clean” EMR data at a global,

enterprise-level without compromising the data? Removal of some data by person X could be a

devastating loss to person Y How do we maintain patient privacy?

i2b2-4

CSE5095

Background & MotivationBackground & Motivation What do we need?What do we need?

A system that supports queries that cut across multiple patients More dependent on standard descriptors

A system that can process and understand complex queries and specifications

A system that can integrate medical record data and clinical research data Provide a robust data model

A system that protects the privacy of the patients Solution?Solution?

i2b2-5

CSE5095

Introducing i2b2Introducing i2b2 Informatics for Integrating Biology and the BedsideInformatics for Integrating Biology and the Bedside

One of seven NIH Roadmap National Centers for Biomedical Computing (http://www.ncbcs.org) Funded under the NIH Common Fund Part of the networked national effort to build the

infrastructure for biomedical computing in the nation Established in 2004 Based at Partners HealthCare in Boston,

Massachusetts Non-profit, integrated health system founded by

Brigham and Women’s Hospital and Mass. General Primary Investigator:

Isaac Kohane, M.D., Ph.D., Professor of Pediatrics at Harvard Medical School

i2b2-6

CSE5095

Mission StatementMission Statement Overcome two major obstacles:Overcome two major obstacles:

The computational challenges of discovery across large, heterogeneous data sets routinely obtained in clinical care

The lack of knowledge of genomic-level physiology and how to study it

Therefore, the goals of i2b2 are:Therefore, the goals of i2b2 are: To provide clinical researchers with the software

tools necessary to collect and integrate medical record and clinical research data in the genomics age By creating a software suite that constructs and

manages the modern clinical research chart

i2b2-7

CSE5095

i2b2 Software Toolsi2b2 Software Tools

The i2b2 HiveThe i2b2 Hivever. 1.5.2ver. 1.5.2

The Clinical Research ChartThe Clinical Research Chart

i2b2-8

CSE5095

i2b2 Software – Design Objectivesi2b2 Software – Design Objectives Design focused around several goals:Design focused around several goals:

Provide a secure presentation of patient information for research purposes

Provide a software framework that can be easily extended

Provide secure communication capabilities for said software framework

Provide a flexible data model tuned to the needs of patient-specific information Requiring timely and scalable query performance Adaptable to new and unanticipated representations of

health care information

i2b2-9

CSE5095

Identifying the Data Model RequirementsIdentifying the Data Model Requirements Developers identified these key requirements for Developers identified these key requirements for

constructing a data model for i2b2constructing a data model for i2b2 Integration of data from distributed and

differently structured databases In order to perform comprehensive and integrative

analyses Separation of data used for research from daily

operational or transactional data Eliminate any performance implications and maintain

integrity Standardization of a model across systems

Ensure all i2b2 systems possess the same data model to enable data sharing

Ease of use by end-users

i2b2-10

CSE5095

Dimensional ModelingDimensional Modeling Model the database using two concepts:Model the database using two concepts:

Facts The quantitative or factual data being queried

Dimensions Descriptions of the various facts

i2b2-11

CSE5095

Star SchemaStar Schema Possesses a central “fact” table where each row Possesses a central “fact” table where each row

represents a single factrepresents a single fact A fact is an observation of a patient

Diagnoses, Procedures, Genetic Data, Lab Data, Health History, Demographics Data, etc.

An observation is not the same thing as an event Observations are recorded by a specific observer within

a specific time range regarding a specific concept Fact table is surrounded by numerous dimension Fact table is surrounded by numerous dimension

tablestables Four dimension tables

Concept, Provider, Visit, Patient Contains descriptors that characterize the facts

i2b2-12

CSE5095

Star SchemaStar Schema

i2b2-13

CSE5095

Star Schema PerformanceStar Schema Performance Enterprise repositories and project-specific, local Enterprise repositories and project-specific, local

repositories can contain very large amounts of datarepositories can contain very large amounts of data The size of the central fact table can grow to be

very large as a result, impacting performance It is critical to have indexes on that table to maintain It is critical to have indexes on that table to maintain

stable performancestable performance Use system-specific enhancements when possible

SQL Server databases can use clustered indexes to any table to produce sorted results

i2b2-14

CSE5095

i2b2 Software – Purposei2b2 Software – Purpose Serves two primary use cases:Serves two primary use cases:

Expose an enterprise wide repurposing and distribution of medical record data for research Enable high performance collection of medical record

data for querying and distribution Enable discovery within data on a wide scale

Enable usage of medical record data in clinical studies

How do we achieve these use cases?How do we achieve these use cases? Use the i2b2 Software Tools!

The i2b2 Hive The Clinical Research Chart

– A core component of the i2b2 Hive

i2b2-15

CSE5095

What is the i2b2 Hive?What is the i2b2 Hive? A collection of interoperable services provided by A collection of interoperable services provided by

i2b2 cellsi2b2 cells Each cell behaves as a functional service

Cells are loosely coupled (independence)Cells are loosely coupled (independence) Cells do not know their relative locality (proximity)Cells do not know their relative locality (proximity) Cells are connected and communicate with each other Cells are connected and communicate with each other

using web servicesusing web services Can be invoked manually by the user Can be invoked automatically by the system

workflow What do we notice?What do we notice?

Highly modular architecture Highly scalable

i2b2-16

CSE5095

What are i2b2 Cells?What are i2b2 Cells? The i2b2 cell is the basic building block of the i2b2 The i2b2 cell is the basic building block of the i2b2

environmentenvironment An application “wrapped” into a functional unit

Encapsulates business logic as well as access to data Encapsulates business logic as well as access to data objects behind standard web service interfacesobjects behind standard web service interfaces Supported services include REST, SOAP Communication using XML messages

Business Logic

Data Access

Data Objects

i2b2 web service interfaces

HTTP XML

RESTSOAP

i2b2-17

CSE5095

Structure of the XML MessageStructure of the XML Message XML schema that defines:XML schema that defines:

A header for communication management A header for the message request/response A message body that contains the data

For example, can contain patient sets with their:– Phenotypic (Clinical) and Genotypic Data

– References to other data objects (images, attachments)

i2b2-18

CSE5095

Example XML Message HeaderExample XML Message Header

i2b2-19

CSE5095

Example XML Message BodyExample XML Message Body

i2b2-20

CSE5095

Advantages of Web ServicesAdvantages of Web Services Because all communication is in XML…Because all communication is in XML…

Not limited to any single operating system Not limited to any single programming language

Cells can be developed in Microsoft .NET, Perl, Cells can be developed in Microsoft .NET, Perl, Python, Java, etc.Python, Java, etc. Any language that supports REST or SOAP

capability can be used Cells can exist on Windows, Linux, and Mac OS and Cells can exist on Windows, Linux, and Mac OS and

communicate with each othercommunicate with each other i.e. cells residing on a Windows platform can talk

with those on a UNIX platform No restriction on how simple or complex a cell can beNo restriction on how simple or complex a cell can be

XML tags the data REST/SOAP transfers the data

i2b2-21

CSE5095

But Where’s the User Interface?But Where’s the User Interface? Web services do not provide a visual user interfaceWeb services do not provide a visual user interface The developer is required to build a client componentThe developer is required to build a client component

Must include a Graphical User Interface (GUI) and Control Mechanism for user interaction

Some considerations:Some considerations: Should utilize the web service interfaces for

communication, rather than a home-brew approach Must ensure cell-to-cell communication is maintained

Reuse the functionality of existing cells

i2b2-22

CSE5095

How are Cells Classified?How are Cells Classified? The i2b2 Hive is composed of a number of cells with The i2b2 Hive is composed of a number of cells with

varying importance and functionalityvarying importance and functionality Core cells are essential for operation of the Hive

Provide basic services Written in Java using Java J2EE specifications Front-end clients written using the Standard Widget

Toolkit (SWT)– Provides native OS look-and-feel for the user interfaces

Optional and Plug-in type cells add functionality to the Hive but are not essential

Special Hive Cells:Special Hive Cells: The Clinical Research Chart The i2b2 Web Client The i2b2 Workbench Application

i2b2-23

CSE5095

The Clinical Research ChartThe Clinical Research Chart The Clinical Research Chart is the implementation of The Clinical Research Chart is the implementation of

the Star Schema in i2b2the Star Schema in i2b2 Functions as the integrated data repository for the

i2b2 Hive Core cell of the i2b2 Hive (Data Repository Cell)

Requires all core cells to gain complete functionality– In fact, the main purpose of the other Core cells is to support

the activities of the CRC Fundamentally built to store medical data

Which can be accessed by any cell in the i2b2 Hive Similarly, any cell can contribute to placing data into

the CRC

i2b2-24

CSE5095

The Clinical Research ChartThe Clinical Research Chart Useful for:Useful for:

Repurposing patient data and integrating it with genomic data and clinical trial data for clinical research

Important to note:Important to note: Not a mechanism for searching through hospital

clinical systems Not a transaction system to manage clinical trials

i2b2-25

CSE5095

The i2b2 Web ClientThe i2b2 Web Client Designed for enterprise related activitiesDesigned for enterprise related activities

i.e. selecting patients from an enterprise repository Written entirely in JavaScript, HTML, and CSSWritten entirely in JavaScript, HTML, and CSS

Uses AJAX to eliminate page refreshing Cross platform and compatible with most browsersCross platform and compatible with most browsers

Known compatibility issues with IE5 and lower Easy to deploy and updateEasy to deploy and update Important to note:Important to note:

Can create patient sets and retrieve patient counts Only anonymous patient data is shown

Data is obfuscated by adding or subtracting a small random number to the available aggregate totals

i2b2-26

CSE5095

The i2b2 Workbench ApplicationThe i2b2 Workbench Application Designed for project-based useDesigned for project-based use

i.e. data manipulation, visual analytics Written in Java using the Eclipse FrameworkWritten in Java using the Eclipse Framework

The client applications are Eclipse plug-ins which compose the workbench application

Can be extended with other Java/Eclipse plug-ins More resource intensive than its web companionMore resource intensive than its web companion

Helpful for heavy client-side processing

i2b2-27

CSE5095

How to use the i2b2 SoftwareHow to use the i2b2 Software First, use the web or desktop client to select/query First, use the web or desktop client to select/query

patients from the enterprise data repository (EDR)patients from the enterprise data repository (EDR)

i2b2-28

CSE5095

Creating the QueryCreating the Query Patient attributes are dragged from the “Terms” panels Patient attributes are dragged from the “Terms” panels

into the “Query Tool” panelsinto the “Query Tool” panels Terms in the same panel are logically OR’d Terms in different panels are logically AND’d

i2b2-29

CSE5095

How to use this Data?How to use this Data? Querying from an EDR returns limited dataQuerying from an EDR returns limited data

A patient count from the results of the query Aggregate counts of the demographics of these

patients

Not very useful for research purposes in current formNot very useful for research purposes in current form In order to effectively use this data, patient sets

must be saved into a new, project-specific database Will be saved in your local i2b2 installation

This process is known as creating a “data mart”This process is known as creating a “data mart” Requires IRB approval

i2b2-30

CSE5095

Creation of a Data MartCreation of a Data Mart A data mart ensures patient privacy by only storing A data mart ensures patient privacy by only storing

information allowed under HIPAA regulationsinformation allowed under HIPAA regulations Public Health Information (PHI) is not included in

the data mart Data is saved in the CRC (Star Schema DB Model)Data is saved in the CRC (Star Schema DB Model)

i2b2-31

CSE5095

Working with the DataWorking with the Data Use the i2b2 Workbench Application to view & Use the i2b2 Workbench Application to view &

manipulate the data from your data martmanipulate the data from your data mart

i2b2-32

CSE5095

User & Hive InteractionUser & Hive Interaction When using the web or desktop client, you’re not just When using the web or desktop client, you’re not just

accessing the Clinical Research Chart directlyaccessing the Clinical Research Chart directly In fact, most interaction incorporates the

functionalities of many i2b2 Cells At the minimum, all core cells are used in some way

What do these other cells do?What do these other cells do?

Ontology Management

Identity Management

File Repository

Workflow Management

Data Repository

(CRC)

Project Management

i2b2-33

CSE5095

Workflow Framework CellWorkflow Framework Cell This cell is used to process information in steps This cell is used to process information in steps

through various parts of the Hivethrough various parts of the Hive Most processed information will come to reside in

the CRC or be displayed to the user Specifically:Specifically:

Facilitates communication between cells Manages project-specific XML data objects for

users of a given project These objects typically originate in other cells These objects are organized in hierarchical structures

that represent relationships between elements Allows users to organize, label, and annotate data

objects

i2b2-34

CSE5095

Workflow Framework CellWorkflow Framework Cell Use Case DiagramUse Case Diagram

i2b2-35

CSE5095

Workflow Framework CellWorkflow Framework Cell Operations and DescriptionsOperations and Descriptions

i2b2-36

CSE5095

Workflow Framework CellWorkflow Framework Cell We can see the Workflow Management Cell at work We can see the Workflow Management Cell at work

in the i2b2 Web and Desktop Clientsin the i2b2 Web and Desktop Clients For example, providing hierarchal structure for

concepts and patient sets

i2b2-37

CSE5095

Project Management CellProject Management Cell This cell is used to provide user authentication and This cell is used to provide user authentication and

manage group and role informationmanage group and role information User access is determined by a user’s role

Defines what actions they may perform in the Hive Default role is User

Other roles include Manager, Administrator Users can have one or more roles

It also keeps track of what cells are part of the Hive It also keeps track of what cells are part of the Hive and their locationand their location

i2b2-38

CSE5095

Project Management CellProject Management Cell Can be accessed by either an i2b2 client or by another Can be accessed by either an i2b2 client or by another

i2b2 celli2b2 cell Client: user trying to login to client Cell: check which roles exist for user for that cell

Authentication and Authorization Use Case Diagram:Use Case Diagram:

i2b2-39

CSE5095

File Repository CellFile Repository Cell Fundamentally, this cell holds large files of dataFundamentally, this cell holds large files of data

Radiological images, genetic sequences These files are generally referenced from the

Clinical Research Chart Manages the sending and receiving of these files Manages the sending and receiving of these files

between cellsbetween cells Other cells will use REST or SOAP service calls to

access files in this cell under most conditions Users can use this cell to upload filesUsers can use this cell to upload files XML Request format:XML Request format:

<message_body><recvfile_request>

<filename>/oasis/ABT001b/brain_324.jpg</filename></recfile_request>

</message_body>

i2b2-40

CSE5095

Ontology Management CellOntology Management Cell Manages the terminology and knowledge information Manages the terminology and knowledge information

typically used in the Hive, especially in the CRCtypically used in the Hive, especially in the CRC Provides descriptive terms and other information

for data stored in the observation_fact table This metadata is stored in a separate table(s) outside of

the Star Schema These vocabulary terms are organized in

hierarchical structures (Workflow Framework) This information is either requested by or distributed This information is either requested by or distributed

to cells during most of the Hive’s transactionsto cells during most of the Hive’s transactions Use Case Diagram:Use Case Diagram:

i2b2-41

CSE5095

Ontology Management CellOntology Management Cell Typical Ontology TableTypical Ontology Table

Hierarchical levelFull path that leads to the termDescriptive text valueIs field a synonym for another term?Display icon used in the user interfaceField not used in i2b2Describes ontological conceptExtra information about the concept in XMLColumn name in fact table that holds concept codeName of look-up table that holds concept codeName of field that holds concept pathT for text or N for numericSQL operator used in WHERE clause for queriesDimension table path that maps to the conceptStore miscellaneous commentsTooltip that appears in the user interfaceDate the data was updatedDate the data was downloadedDate the data was importedCoded value for the originating source systemCoded value indicating term type: DOC or LAB

i2b2-42

CSE5095

Identity Management CellIdentity Management Cell Manages a patient's protected health information in a Manages a patient's protected health information in a

manner consistent with HIPAA privacy rulesmanner consistent with HIPAA privacy rules Patient data is available only as a HIPAA defined

“Limited Data Set” Removal of patient identifiers

Uses a “code book” that maps the real patient Uses a “code book” that maps the real patient identifiers to arbitrary patient numbers in the CRCidentifiers to arbitrary patient numbers in the CRC

Design and Architecture documents are not publicly Design and Architecture documents are not publicly available for this cellavailable for this cell It’s a secret?

i2b2-43

CSE5095

Optional i2b2 CellsOptional i2b2 Cells Natural Language Processing CellNatural Language Processing Cell

Manipulates text reports to extract specific terms and knowledge from them Extract concepts such as diagnoses, smoking status

These concepts are then used to achieve various representations of the data

Concepts returned divided into three categories: UMLS concepts

– Mapping parts of the document to concepts in the Unified Medical Language System (UMLS) database

Regular Expression concepts– Matching document text to a set of regular expression rules

Smoking Status concepts– Classification model trained on human-annotated smoking-

related sentences

i2b2-44

CSE5095

Natural Language ProcessingNatural Language Processing

i2b2-45

CSE5095

Optional i2b2 CellsOptional i2b2 Cells Pulmonary Function Test (PFT) Processing CellPulmonary Function Test (PFT) Processing Cell

Parses a pulmonary function report and extracts embedded test values Report must be in a specific format

Returned values may be stored in the CRC and used in queries or other types of analyses

Report format not specified in any official i2b2 documentation, but examples have been published Provides some idea about the required format

i2b2-46

CSE5095

Pulmonary Function Report FormatPulmonary Function Report Format

i2b2-47

CSE5095

Example Use Case ScenarioExample Use Case Scenario Clinical Asthma InvestigationClinical Asthma Investigation

Available data includes: Text notes from asthma clinic Reports from pulmonary function tests

Questions…Questions… How and when is the data extracted? How and when is the data encrypted? How and when is the data collated into something

meaningful and useful? Answer!Answer!

Use the functionality provided by the i2b2 Hive Core cells and Optional cells

Once data is gathered and processed, add this data to the Clinical Research Chart

i2b2-48

CSE5095

Workflow RequirementsWorkflow Requirements The Workflow Framework (WF) cell controls The Workflow Framework (WF) cell controls

communication between the other cellscommunication between the other cells

Identify cells that will be needed for this workflowIdentify cells that will be needed for this workflow Identity Management, Data Repository, Natural

Language Processing, and PFT Processing

i2b2-49

CSE5095

Workflow Continued…Workflow Continued… The available data is uploaded through the Identity The available data is uploaded through the Identity

Management (IM) cellManagement (IM) cell Names, medical record numbers, and other

sensitive information are resolved and retained in the IM cell

Data is encrypted (based on the block cipher Advanced Encryption Standard)

Data is added to the Clinical Research Chart (CRC)Data is added to the Clinical Research Chart (CRC) The CRC now contains a HIPAA compliant, limited The CRC now contains a HIPAA compliant, limited

data setdata set

Text Notes, PFT Reports

EncryptEncrypt

i2b2-50

CSE5095

Workflow Continued…Workflow Continued… With our newly defined data set, we want to extract With our newly defined data set, we want to extract

concepts from the text notesconcepts from the text notes i.e. hospital discharge summaries, EMR data

WF cell retrieves notes from the CRC and sends them WF cell retrieves notes from the CRC and sends them to the Natural Language Processing cell (NLP)to the Natural Language Processing cell (NLP)

The NLP cell manipulates the notes and extracts The NLP cell manipulates the notes and extracts specific information from them to form conceptsspecific information from them to form concepts

These concepts are then pushed back to the CRCThese concepts are then pushed back to the CRC

i2b2-51

CSE5095

Workflow Continued…Workflow Continued… Similarly, we want to extract concepts from the PFT Similarly, we want to extract concepts from the PFT

reportsreports WF cell retrieves the PFT reports from the CRC and WF cell retrieves the PFT reports from the CRC and

sends them to the PFT Processing cellsends them to the PFT Processing cell The PFT cell parses the records one by one and The PFT cell parses the records one by one and

generates concepts from themgenerates concepts from them The values associated with each test record are placed The values associated with each test record are placed

back into the CRCback into the CRC

i2b2-52

CSE5095

Workflow CompleteWorkflow Complete Data has now been fully processed and saved in the Data has now been fully processed and saved in the

CRC and is available for viewing and manipulationCRC and is available for viewing and manipulation Using the i2b2 Workbench Application

Allows the investigator to query, analyze, and display the data

What did we get from this process?What did we get from this process? Medication and diagnoses concepts related to

asthma from the NLP notes Physical findings and physiological test results

extracted from the PFTs Resulting in a wealth of valuable data for the clinical Resulting in a wealth of valuable data for the clinical

investigator to aid in clinical discoveryinvestigator to aid in clinical discovery

i2b2-53

CSE5095

Crimson ProjectCrimson Project Developed by Dr. Lynn Bry of Partners HealthCareDeveloped by Dr. Lynn Bry of Partners HealthCare Project Objectives:Project Objectives:

Provide enhanced sample management within i2b2 Support prospective and retrospective sample

collection Prospective: requests typically routed to an external

information system Retrospective: requests typically directed towards an

existing repository or registry Three i2b2 cellsThree i2b2 cells

Regulatory cell Sample Cohort Management cell Sample Registry cell

https://community.i2b2.org/wiki/display/crimson/Crimson+Homehttps://community.i2b2.org/wiki/display/crimson/Crimson+Home

i2b2-54

CSE5095

Crimson Project – The CellsCrimson Project – The Cells Regulatory CellRegulatory Cell

Manages the regulatory aspects associated with sample request and sample data management within i2b2 De-identification of data Connection management with external systems Storing PHI encryption keys

Sample Cohort Management CellSample Cohort Management Cell Focused on translating, broadcasting, and tracking

i2b2 sample requests Sample Registry CellSample Registry Cell

Manage the import process of sample data from external sources

i2b2-55

CSE5095

Crimson Project – ArchitectureCrimson Project – Architecture

i2b2-56

CSE5095

SMArt Project for i2b2SMArt Project for i2b2 Developed by Nich WattanasinDeveloped by Nich Wattanasin Project Objective:Project Objective:

Develop a common API for SMArt applications to interact with the i2b2 platform

Project in the very early stages of developmentProject in the very early stages of development First release: September 14, 2010 Only 20 revisions since (as of April 2011)

Current Capabilities:Current Capabilities: A handful of functions that return targeted

information from a single patient record Accomplished via REST calls Results returned in RDF/XML format

Plug-in for the i2b2 Web Client https://community.i2b2.org/wiki/display/SMArt/SMART+Homehttps://community.i2b2.org/wiki/display/SMArt/SMART+Home

i2b2-57

CSE5095

SMArt Project – Current FunctionsSMArt Project – Current Functions Get MedicationsGet Medications

Returns a list of medications for a specific patient record

Get DemographicsGet Demographics Returns the demographic information for a specific

patient record Get ProblemsGet Problems

Returns a list of problems for a specific patient record

Get AllergiesGet Allergies Returns a list of allergies for a specific patient

record

GET http://i2b2_server/records/{record id}/{medications | demographics | GET http://i2b2_server/records/{record id}/{medications | demographics | problems | allergies}/problems | allergies}/

i2b2-58

CSE5095

SMArt Dashboard Web Client Plug-inSMArt Dashboard Web Client Plug-in Ability to embed SMArt Apps directly into the i2b2 Ability to embed SMArt Apps directly into the i2b2

Web ClientWeb Client Ability to access i2b2 patient data via the SMArt Ability to access i2b2 patient data via the SMArt

connect model/project common APIconnect model/project common API

i2b2-59

CSE5095

i2b2 Research Data Warehousei2b2 Research Data Warehouse A custom i2b2 implementation at Cincinnati A custom i2b2 implementation at Cincinnati

Children’s Hospital Medical Center Children’s Hospital Medical Center (https://i2b2.cchmc.org)(https://i2b2.cchmc.org) Developed by the CCHMC i2b2 teamDeveloped by the CCHMC i2b2 team Project adds several new capabilities to the i2b2 Project adds several new capabilities to the i2b2

platform:platform: Ability to view clinical data in a web-based form

(similar to a chart review) Ability to enter data directly into i2b2 using forms

i.e. data that is not collected from an EMR Ability to run reports and perform custom

visualizations on the data CCHMC uses i2b2 to create a “research data CCHMC uses i2b2 to create a “research data

warehouse”warehouse” But what is a research data warehouse?

i2b2-60

CSE5095

What is a Research Data Warehouse?What is a Research Data Warehouse? According to CCHMC…According to CCHMC…

A research data warehouse is a repository that integrates information on patients from multiple sources Electronic health records Lab results Genetic and research data Birth registry data Government data (Medicaid)

What it is used for:What it is used for: Cohort identification, hypothesis generation

What it is NOT used for:What it is NOT used for: Decision support, clinical trials, real-time alerts

i2b2-61

CSE5095

i2b2 Research Data Warehousei2b2 Research Data Warehouse

i2b2-62

CSE5095

Evaluating i2b2Evaluating i2b2 PerformancePerformance

Statistics provided by Partners Healthcare Query Performance (on their primary i2b2 system)

4.6 million patient records 1.2 billion observations (facts) on these patients

(observation_fact table)– Queries requesting patient counts on this repository typically

complete within 10 seconds, many within several milliseconds Data Mart Initialization Performance

2.6 million patient records 550 million observations (facts) on these patients 8x3 GHz processor machine with 32GB RAM

– Completed building in approximately 1 hour and 15 minutes

i2b2-63

CSE5095

Evaluating i2b2Evaluating i2b2 ScalabilityScalability

Enabled by the modular nature of the i2b2 cell and ease of integration into the Hive Encourages development outside of the i2b2 core team Fosters rapid software development

UsabilityUsability Simple installation processes to get started Intuitive user interfaces Wealth of documentation publicly available online

Reduced learning curve InteroperabilityInteroperability

Works on a variety of operating systems, web browsers, and server technologies Not limited to commercial technologies

i2b2-64

CSE5095

LimitationsLimitations Naturally, users can create project-level repositories Naturally, users can create project-level repositories

(data marts) from an enterprise-level repository(data marts) from an enterprise-level repository Can we update our project databases with fresh,

updated enterprise data? Can we upload our project data, regardless of

origin, into the enterprise repository? Such capabilities are not currently supported in i2b2Such capabilities are not currently supported in i2b2

Difficult to implement the numerous policies required for these functions

i2b2-65

CSE5095

LimitationsLimitations i2b2 cells communicate through web services, which i2b2 cells communicate through web services, which

are not always flexibleare not always flexible Perhaps we want to execute our own SQL queries?

Not possible, queries are limited to pre-specified queries and result sets, dictated by the cells

How do we overcome this?How do we overcome this? Developers planning to introduce a second SQL

access layer to the CRC Will allow for greater flexibility with queries

– But will need to comply with security rules and strict ontology

i2b2-66

CSE5095

SummarySummary Presented i2b2 as a software tool and a data model Presented i2b2 as a software tool and a data model

aiding in clinical research and discoveryaiding in clinical research and discovery Addresses the inherit challenges of integrating

medical record and clinical research data

Relatively young project, but on the fast track for Relatively young project, but on the fast track for growth and developmentgrowth and development Roadmap for future releases with a new version

currently in release candidate (RC) status

Adoption and usage in BMI looks promisingAdoption and usage in BMI looks promising Approximately 17 sites outside of Partners

HealthCare are engaged in i2b2 projects

i2b2-67

CSE5095

Thank You!Thank You!

Documents

I2b2-1 CSE 5095 Topics in Biomedical Informatics Antonio Cusano Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road,