Upload
alvin-ferguson
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
i2b2-1
CSE5095
Topics in Biomedical InformaticsTopics in Biomedical Informatics
Antonio CusanoComputer Science & Engineering Department
The University of Connecticut371 Fairfield Road, Box U-255
Storrs, CT 06269-2155
[email protected] 2011
Informatics for Integrating Biology and the Bedside
(i2b2)http://www.i2b2.org
i2b2-2
CSE5095
OverviewOverview
Introduction to i2b2Introduction to i2b2 Modeling the i2b2 Data ModelModeling the i2b2 Data Model Overview of the i2b2 Software ToolsOverview of the i2b2 Software Tools Using the i2b2 SoftwareUsing the i2b2 Software Overview of the i2b2 Hive CellsOverview of the i2b2 Hive Cells Example Use Case ScenarioExample Use Case Scenario Notable Projects & Usage in BMINotable Projects & Usage in BMI Evaluating i2b2Evaluating i2b2 SummarySummary
i2b2-3
CSE5095
Background & MotivationBackground & Motivation The rise of Electronic Medical Record Systems The rise of Electronic Medical Record Systems
(EMRS) holds great promise for clinical research(EMRS) holds great promise for clinical research Increasingly important for integration between
medical record data and clinical research data But many challenges exist:But many challenges exist:
EMRS are typically built with the “single patient” in mind It would be difficult to observe trends in data across
combinations of many patients How do we “clean” EMR data at a global,
enterprise-level without compromising the data? Removal of some data by person X could be a
devastating loss to person Y How do we maintain patient privacy?
i2b2-4
CSE5095
Background & MotivationBackground & Motivation What do we need?What do we need?
A system that supports queries that cut across multiple patients More dependent on standard descriptors
A system that can process and understand complex queries and specifications
A system that can integrate medical record data and clinical research data Provide a robust data model
A system that protects the privacy of the patients Solution?Solution?
i2b2-5
CSE5095
Introducing i2b2Introducing i2b2 Informatics for Integrating Biology and the BedsideInformatics for Integrating Biology and the Bedside
One of seven NIH Roadmap National Centers for Biomedical Computing (http://www.ncbcs.org) Funded under the NIH Common Fund Part of the networked national effort to build the
infrastructure for biomedical computing in the nation Established in 2004 Based at Partners HealthCare in Boston,
Massachusetts Non-profit, integrated health system founded by
Brigham and Women’s Hospital and Mass. General Primary Investigator:
Isaac Kohane, M.D., Ph.D., Professor of Pediatrics at Harvard Medical School
i2b2-6
CSE5095
Mission StatementMission Statement Overcome two major obstacles:Overcome two major obstacles:
The computational challenges of discovery across large, heterogeneous data sets routinely obtained in clinical care
The lack of knowledge of genomic-level physiology and how to study it
Therefore, the goals of i2b2 are:Therefore, the goals of i2b2 are: To provide clinical researchers with the software
tools necessary to collect and integrate medical record and clinical research data in the genomics age By creating a software suite that constructs and
manages the modern clinical research chart
i2b2-7
CSE5095
i2b2 Software Toolsi2b2 Software Tools
The i2b2 HiveThe i2b2 Hivever. 1.5.2ver. 1.5.2
The Clinical Research ChartThe Clinical Research Chart
i2b2-8
CSE5095
i2b2 Software – Design Objectivesi2b2 Software – Design Objectives Design focused around several goals:Design focused around several goals:
Provide a secure presentation of patient information for research purposes
Provide a software framework that can be easily extended
Provide secure communication capabilities for said software framework
Provide a flexible data model tuned to the needs of patient-specific information Requiring timely and scalable query performance Adaptable to new and unanticipated representations of
health care information
i2b2-9
CSE5095
Identifying the Data Model RequirementsIdentifying the Data Model Requirements Developers identified these key requirements for Developers identified these key requirements for
constructing a data model for i2b2constructing a data model for i2b2 Integration of data from distributed and
differently structured databases In order to perform comprehensive and integrative
analyses Separation of data used for research from daily
operational or transactional data Eliminate any performance implications and maintain
integrity Standardization of a model across systems
Ensure all i2b2 systems possess the same data model to enable data sharing
Ease of use by end-users
i2b2-10
CSE5095
Dimensional ModelingDimensional Modeling Model the database using two concepts:Model the database using two concepts:
Facts The quantitative or factual data being queried
Dimensions Descriptions of the various facts
i2b2-11
CSE5095
Star SchemaStar Schema Possesses a central “fact” table where each row Possesses a central “fact” table where each row
represents a single factrepresents a single fact A fact is an observation of a patient
Diagnoses, Procedures, Genetic Data, Lab Data, Health History, Demographics Data, etc.
An observation is not the same thing as an event Observations are recorded by a specific observer within
a specific time range regarding a specific concept Fact table is surrounded by numerous dimension Fact table is surrounded by numerous dimension
tablestables Four dimension tables
Concept, Provider, Visit, Patient Contains descriptors that characterize the facts
i2b2-13
CSE5095
Star Schema PerformanceStar Schema Performance Enterprise repositories and project-specific, local Enterprise repositories and project-specific, local
repositories can contain very large amounts of datarepositories can contain very large amounts of data The size of the central fact table can grow to be
very large as a result, impacting performance It is critical to have indexes on that table to maintain It is critical to have indexes on that table to maintain
stable performancestable performance Use system-specific enhancements when possible
SQL Server databases can use clustered indexes to any table to produce sorted results
i2b2-14
CSE5095
i2b2 Software – Purposei2b2 Software – Purpose Serves two primary use cases:Serves two primary use cases:
Expose an enterprise wide repurposing and distribution of medical record data for research Enable high performance collection of medical record
data for querying and distribution Enable discovery within data on a wide scale
Enable usage of medical record data in clinical studies
How do we achieve these use cases?How do we achieve these use cases? Use the i2b2 Software Tools!
The i2b2 Hive The Clinical Research Chart
– A core component of the i2b2 Hive
i2b2-15
CSE5095
What is the i2b2 Hive?What is the i2b2 Hive? A collection of interoperable services provided by A collection of interoperable services provided by
i2b2 cellsi2b2 cells Each cell behaves as a functional service
Cells are loosely coupled (independence)Cells are loosely coupled (independence) Cells do not know their relative locality (proximity)Cells do not know their relative locality (proximity) Cells are connected and communicate with each other Cells are connected and communicate with each other
using web servicesusing web services Can be invoked manually by the user Can be invoked automatically by the system
workflow What do we notice?What do we notice?
Highly modular architecture Highly scalable
i2b2-16
CSE5095
What are i2b2 Cells?What are i2b2 Cells? The i2b2 cell is the basic building block of the i2b2 The i2b2 cell is the basic building block of the i2b2
environmentenvironment An application “wrapped” into a functional unit
Encapsulates business logic as well as access to data Encapsulates business logic as well as access to data objects behind standard web service interfacesobjects behind standard web service interfaces Supported services include REST, SOAP Communication using XML messages
Business Logic
Data Access
Data Objects
i2b2 web service interfaces
HTTP XML
RESTSOAP
i2b2-17
CSE5095
Structure of the XML MessageStructure of the XML Message XML schema that defines:XML schema that defines:
A header for communication management A header for the message request/response A message body that contains the data
For example, can contain patient sets with their:– Phenotypic (Clinical) and Genotypic Data
– References to other data objects (images, attachments)
i2b2-20
CSE5095
Advantages of Web ServicesAdvantages of Web Services Because all communication is in XML…Because all communication is in XML…
Not limited to any single operating system Not limited to any single programming language
Cells can be developed in Microsoft .NET, Perl, Cells can be developed in Microsoft .NET, Perl, Python, Java, etc.Python, Java, etc. Any language that supports REST or SOAP
capability can be used Cells can exist on Windows, Linux, and Mac OS and Cells can exist on Windows, Linux, and Mac OS and
communicate with each othercommunicate with each other i.e. cells residing on a Windows platform can talk
with those on a UNIX platform No restriction on how simple or complex a cell can beNo restriction on how simple or complex a cell can be
XML tags the data REST/SOAP transfers the data
i2b2-21
CSE5095
But Where’s the User Interface?But Where’s the User Interface? Web services do not provide a visual user interfaceWeb services do not provide a visual user interface The developer is required to build a client componentThe developer is required to build a client component
Must include a Graphical User Interface (GUI) and Control Mechanism for user interaction
Some considerations:Some considerations: Should utilize the web service interfaces for
communication, rather than a home-brew approach Must ensure cell-to-cell communication is maintained
Reuse the functionality of existing cells
i2b2-22
CSE5095
How are Cells Classified?How are Cells Classified? The i2b2 Hive is composed of a number of cells with The i2b2 Hive is composed of a number of cells with
varying importance and functionalityvarying importance and functionality Core cells are essential for operation of the Hive
Provide basic services Written in Java using Java J2EE specifications Front-end clients written using the Standard Widget
Toolkit (SWT)– Provides native OS look-and-feel for the user interfaces
Optional and Plug-in type cells add functionality to the Hive but are not essential
Special Hive Cells:Special Hive Cells: The Clinical Research Chart The i2b2 Web Client The i2b2 Workbench Application
i2b2-23
CSE5095
The Clinical Research ChartThe Clinical Research Chart The Clinical Research Chart is the implementation of The Clinical Research Chart is the implementation of
the Star Schema in i2b2the Star Schema in i2b2 Functions as the integrated data repository for the
i2b2 Hive Core cell of the i2b2 Hive (Data Repository Cell)
Requires all core cells to gain complete functionality– In fact, the main purpose of the other Core cells is to support
the activities of the CRC Fundamentally built to store medical data
Which can be accessed by any cell in the i2b2 Hive Similarly, any cell can contribute to placing data into
the CRC
i2b2-24
CSE5095
The Clinical Research ChartThe Clinical Research Chart Useful for:Useful for:
Repurposing patient data and integrating it with genomic data and clinical trial data for clinical research
Important to note:Important to note: Not a mechanism for searching through hospital
clinical systems Not a transaction system to manage clinical trials
i2b2-25
CSE5095
The i2b2 Web ClientThe i2b2 Web Client Designed for enterprise related activitiesDesigned for enterprise related activities
i.e. selecting patients from an enterprise repository Written entirely in JavaScript, HTML, and CSSWritten entirely in JavaScript, HTML, and CSS
Uses AJAX to eliminate page refreshing Cross platform and compatible with most browsersCross platform and compatible with most browsers
Known compatibility issues with IE5 and lower Easy to deploy and updateEasy to deploy and update Important to note:Important to note:
Can create patient sets and retrieve patient counts Only anonymous patient data is shown
Data is obfuscated by adding or subtracting a small random number to the available aggregate totals
i2b2-26
CSE5095
The i2b2 Workbench ApplicationThe i2b2 Workbench Application Designed for project-based useDesigned for project-based use
i.e. data manipulation, visual analytics Written in Java using the Eclipse FrameworkWritten in Java using the Eclipse Framework
The client applications are Eclipse plug-ins which compose the workbench application
Can be extended with other Java/Eclipse plug-ins More resource intensive than its web companionMore resource intensive than its web companion
Helpful for heavy client-side processing
i2b2-27
CSE5095
How to use the i2b2 SoftwareHow to use the i2b2 Software First, use the web or desktop client to select/query First, use the web or desktop client to select/query
patients from the enterprise data repository (EDR)patients from the enterprise data repository (EDR)
i2b2-28
CSE5095
Creating the QueryCreating the Query Patient attributes are dragged from the “Terms” panels Patient attributes are dragged from the “Terms” panels
into the “Query Tool” panelsinto the “Query Tool” panels Terms in the same panel are logically OR’d Terms in different panels are logically AND’d
i2b2-29
CSE5095
How to use this Data?How to use this Data? Querying from an EDR returns limited dataQuerying from an EDR returns limited data
A patient count from the results of the query Aggregate counts of the demographics of these
patients
Not very useful for research purposes in current formNot very useful for research purposes in current form In order to effectively use this data, patient sets
must be saved into a new, project-specific database Will be saved in your local i2b2 installation
This process is known as creating a “data mart”This process is known as creating a “data mart” Requires IRB approval
i2b2-30
CSE5095
Creation of a Data MartCreation of a Data Mart A data mart ensures patient privacy by only storing A data mart ensures patient privacy by only storing
information allowed under HIPAA regulationsinformation allowed under HIPAA regulations Public Health Information (PHI) is not included in
the data mart Data is saved in the CRC (Star Schema DB Model)Data is saved in the CRC (Star Schema DB Model)
i2b2-31
CSE5095
Working with the DataWorking with the Data Use the i2b2 Workbench Application to view & Use the i2b2 Workbench Application to view &
manipulate the data from your data martmanipulate the data from your data mart
i2b2-32
CSE5095
User & Hive InteractionUser & Hive Interaction When using the web or desktop client, you’re not just When using the web or desktop client, you’re not just
accessing the Clinical Research Chart directlyaccessing the Clinical Research Chart directly In fact, most interaction incorporates the
functionalities of many i2b2 Cells At the minimum, all core cells are used in some way
What do these other cells do?What do these other cells do?
Ontology Management
Identity Management
File Repository
Workflow Management
Data Repository
(CRC)
Project Management
i2b2-33
CSE5095
Workflow Framework CellWorkflow Framework Cell This cell is used to process information in steps This cell is used to process information in steps
through various parts of the Hivethrough various parts of the Hive Most processed information will come to reside in
the CRC or be displayed to the user Specifically:Specifically:
Facilitates communication between cells Manages project-specific XML data objects for
users of a given project These objects typically originate in other cells These objects are organized in hierarchical structures
that represent relationships between elements Allows users to organize, label, and annotate data
objects
i2b2-35
CSE5095
Workflow Framework CellWorkflow Framework Cell Operations and DescriptionsOperations and Descriptions
i2b2-36
CSE5095
Workflow Framework CellWorkflow Framework Cell We can see the Workflow Management Cell at work We can see the Workflow Management Cell at work
in the i2b2 Web and Desktop Clientsin the i2b2 Web and Desktop Clients For example, providing hierarchal structure for
concepts and patient sets
i2b2-37
CSE5095
Project Management CellProject Management Cell This cell is used to provide user authentication and This cell is used to provide user authentication and
manage group and role informationmanage group and role information User access is determined by a user’s role
Defines what actions they may perform in the Hive Default role is User
Other roles include Manager, Administrator Users can have one or more roles
It also keeps track of what cells are part of the Hive It also keeps track of what cells are part of the Hive and their locationand their location
i2b2-38
CSE5095
Project Management CellProject Management Cell Can be accessed by either an i2b2 client or by another Can be accessed by either an i2b2 client or by another
i2b2 celli2b2 cell Client: user trying to login to client Cell: check which roles exist for user for that cell
Authentication and Authorization Use Case Diagram:Use Case Diagram:
i2b2-39
CSE5095
File Repository CellFile Repository Cell Fundamentally, this cell holds large files of dataFundamentally, this cell holds large files of data
Radiological images, genetic sequences These files are generally referenced from the
Clinical Research Chart Manages the sending and receiving of these files Manages the sending and receiving of these files
between cellsbetween cells Other cells will use REST or SOAP service calls to
access files in this cell under most conditions Users can use this cell to upload filesUsers can use this cell to upload files XML Request format:XML Request format:
<message_body><recvfile_request>
<filename>/oasis/ABT001b/brain_324.jpg</filename></recfile_request>
</message_body>
i2b2-40
CSE5095
Ontology Management CellOntology Management Cell Manages the terminology and knowledge information Manages the terminology and knowledge information
typically used in the Hive, especially in the CRCtypically used in the Hive, especially in the CRC Provides descriptive terms and other information
for data stored in the observation_fact table This metadata is stored in a separate table(s) outside of
the Star Schema These vocabulary terms are organized in
hierarchical structures (Workflow Framework) This information is either requested by or distributed This information is either requested by or distributed
to cells during most of the Hive’s transactionsto cells during most of the Hive’s transactions Use Case Diagram:Use Case Diagram:
i2b2-41
CSE5095
Ontology Management CellOntology Management Cell Typical Ontology TableTypical Ontology Table
Hierarchical levelFull path that leads to the termDescriptive text valueIs field a synonym for another term?Display icon used in the user interfaceField not used in i2b2Describes ontological conceptExtra information about the concept in XMLColumn name in fact table that holds concept codeName of look-up table that holds concept codeName of field that holds concept pathT for text or N for numericSQL operator used in WHERE clause for queriesDimension table path that maps to the conceptStore miscellaneous commentsTooltip that appears in the user interfaceDate the data was updatedDate the data was downloadedDate the data was importedCoded value for the originating source systemCoded value indicating term type: DOC or LAB
i2b2-42
CSE5095
Identity Management CellIdentity Management Cell Manages a patient's protected health information in a Manages a patient's protected health information in a
manner consistent with HIPAA privacy rulesmanner consistent with HIPAA privacy rules Patient data is available only as a HIPAA defined
“Limited Data Set” Removal of patient identifiers
Uses a “code book” that maps the real patient Uses a “code book” that maps the real patient identifiers to arbitrary patient numbers in the CRCidentifiers to arbitrary patient numbers in the CRC
Design and Architecture documents are not publicly Design and Architecture documents are not publicly available for this cellavailable for this cell It’s a secret?
i2b2-43
CSE5095
Optional i2b2 CellsOptional i2b2 Cells Natural Language Processing CellNatural Language Processing Cell
Manipulates text reports to extract specific terms and knowledge from them Extract concepts such as diagnoses, smoking status
These concepts are then used to achieve various representations of the data
Concepts returned divided into three categories: UMLS concepts
– Mapping parts of the document to concepts in the Unified Medical Language System (UMLS) database
Regular Expression concepts– Matching document text to a set of regular expression rules
Smoking Status concepts– Classification model trained on human-annotated smoking-
related sentences
i2b2-45
CSE5095
Optional i2b2 CellsOptional i2b2 Cells Pulmonary Function Test (PFT) Processing CellPulmonary Function Test (PFT) Processing Cell
Parses a pulmonary function report and extracts embedded test values Report must be in a specific format
Returned values may be stored in the CRC and used in queries or other types of analyses
Report format not specified in any official i2b2 documentation, but examples have been published Provides some idea about the required format
i2b2-47
CSE5095
Example Use Case ScenarioExample Use Case Scenario Clinical Asthma InvestigationClinical Asthma Investigation
Available data includes: Text notes from asthma clinic Reports from pulmonary function tests
Questions…Questions… How and when is the data extracted? How and when is the data encrypted? How and when is the data collated into something
meaningful and useful? Answer!Answer!
Use the functionality provided by the i2b2 Hive Core cells and Optional cells
Once data is gathered and processed, add this data to the Clinical Research Chart
i2b2-48
CSE5095
Workflow RequirementsWorkflow Requirements The Workflow Framework (WF) cell controls The Workflow Framework (WF) cell controls
communication between the other cellscommunication between the other cells
Identify cells that will be needed for this workflowIdentify cells that will be needed for this workflow Identity Management, Data Repository, Natural
Language Processing, and PFT Processing
i2b2-49
CSE5095
Workflow Continued…Workflow Continued… The available data is uploaded through the Identity The available data is uploaded through the Identity
Management (IM) cellManagement (IM) cell Names, medical record numbers, and other
sensitive information are resolved and retained in the IM cell
Data is encrypted (based on the block cipher Advanced Encryption Standard)
Data is added to the Clinical Research Chart (CRC)Data is added to the Clinical Research Chart (CRC) The CRC now contains a HIPAA compliant, limited The CRC now contains a HIPAA compliant, limited
data setdata set
Text Notes, PFT Reports
EncryptEncrypt
i2b2-50
CSE5095
Workflow Continued…Workflow Continued… With our newly defined data set, we want to extract With our newly defined data set, we want to extract
concepts from the text notesconcepts from the text notes i.e. hospital discharge summaries, EMR data
WF cell retrieves notes from the CRC and sends them WF cell retrieves notes from the CRC and sends them to the Natural Language Processing cell (NLP)to the Natural Language Processing cell (NLP)
The NLP cell manipulates the notes and extracts The NLP cell manipulates the notes and extracts specific information from them to form conceptsspecific information from them to form concepts
These concepts are then pushed back to the CRCThese concepts are then pushed back to the CRC
i2b2-51
CSE5095
Workflow Continued…Workflow Continued… Similarly, we want to extract concepts from the PFT Similarly, we want to extract concepts from the PFT
reportsreports WF cell retrieves the PFT reports from the CRC and WF cell retrieves the PFT reports from the CRC and
sends them to the PFT Processing cellsends them to the PFT Processing cell The PFT cell parses the records one by one and The PFT cell parses the records one by one and
generates concepts from themgenerates concepts from them The values associated with each test record are placed The values associated with each test record are placed
back into the CRCback into the CRC
i2b2-52
CSE5095
Workflow CompleteWorkflow Complete Data has now been fully processed and saved in the Data has now been fully processed and saved in the
CRC and is available for viewing and manipulationCRC and is available for viewing and manipulation Using the i2b2 Workbench Application
Allows the investigator to query, analyze, and display the data
What did we get from this process?What did we get from this process? Medication and diagnoses concepts related to
asthma from the NLP notes Physical findings and physiological test results
extracted from the PFTs Resulting in a wealth of valuable data for the clinical Resulting in a wealth of valuable data for the clinical
investigator to aid in clinical discoveryinvestigator to aid in clinical discovery
i2b2-53
CSE5095
Crimson ProjectCrimson Project Developed by Dr. Lynn Bry of Partners HealthCareDeveloped by Dr. Lynn Bry of Partners HealthCare Project Objectives:Project Objectives:
Provide enhanced sample management within i2b2 Support prospective and retrospective sample
collection Prospective: requests typically routed to an external
information system Retrospective: requests typically directed towards an
existing repository or registry Three i2b2 cellsThree i2b2 cells
Regulatory cell Sample Cohort Management cell Sample Registry cell
https://community.i2b2.org/wiki/display/crimson/Crimson+Homehttps://community.i2b2.org/wiki/display/crimson/Crimson+Home
i2b2-54
CSE5095
Crimson Project – The CellsCrimson Project – The Cells Regulatory CellRegulatory Cell
Manages the regulatory aspects associated with sample request and sample data management within i2b2 De-identification of data Connection management with external systems Storing PHI encryption keys
Sample Cohort Management CellSample Cohort Management Cell Focused on translating, broadcasting, and tracking
i2b2 sample requests Sample Registry CellSample Registry Cell
Manage the import process of sample data from external sources
i2b2-56
CSE5095
SMArt Project for i2b2SMArt Project for i2b2 Developed by Nich WattanasinDeveloped by Nich Wattanasin Project Objective:Project Objective:
Develop a common API for SMArt applications to interact with the i2b2 platform
Project in the very early stages of developmentProject in the very early stages of development First release: September 14, 2010 Only 20 revisions since (as of April 2011)
Current Capabilities:Current Capabilities: A handful of functions that return targeted
information from a single patient record Accomplished via REST calls Results returned in RDF/XML format
Plug-in for the i2b2 Web Client https://community.i2b2.org/wiki/display/SMArt/SMART+Homehttps://community.i2b2.org/wiki/display/SMArt/SMART+Home
i2b2-57
CSE5095
SMArt Project – Current FunctionsSMArt Project – Current Functions Get MedicationsGet Medications
Returns a list of medications for a specific patient record
Get DemographicsGet Demographics Returns the demographic information for a specific
patient record Get ProblemsGet Problems
Returns a list of problems for a specific patient record
Get AllergiesGet Allergies Returns a list of allergies for a specific patient
record
GET http://i2b2_server/records/{record id}/{medications | demographics | GET http://i2b2_server/records/{record id}/{medications | demographics | problems | allergies}/problems | allergies}/
i2b2-58
CSE5095
SMArt Dashboard Web Client Plug-inSMArt Dashboard Web Client Plug-in Ability to embed SMArt Apps directly into the i2b2 Ability to embed SMArt Apps directly into the i2b2
Web ClientWeb Client Ability to access i2b2 patient data via the SMArt Ability to access i2b2 patient data via the SMArt
connect model/project common APIconnect model/project common API
i2b2-59
CSE5095
i2b2 Research Data Warehousei2b2 Research Data Warehouse A custom i2b2 implementation at Cincinnati A custom i2b2 implementation at Cincinnati
Children’s Hospital Medical Center Children’s Hospital Medical Center (https://i2b2.cchmc.org)(https://i2b2.cchmc.org) Developed by the CCHMC i2b2 teamDeveloped by the CCHMC i2b2 team Project adds several new capabilities to the i2b2 Project adds several new capabilities to the i2b2
platform:platform: Ability to view clinical data in a web-based form
(similar to a chart review) Ability to enter data directly into i2b2 using forms
i.e. data that is not collected from an EMR Ability to run reports and perform custom
visualizations on the data CCHMC uses i2b2 to create a “research data CCHMC uses i2b2 to create a “research data
warehouse”warehouse” But what is a research data warehouse?
i2b2-60
CSE5095
What is a Research Data Warehouse?What is a Research Data Warehouse? According to CCHMC…According to CCHMC…
A research data warehouse is a repository that integrates information on patients from multiple sources Electronic health records Lab results Genetic and research data Birth registry data Government data (Medicaid)
What it is used for:What it is used for: Cohort identification, hypothesis generation
What it is NOT used for:What it is NOT used for: Decision support, clinical trials, real-time alerts
i2b2-62
CSE5095
Evaluating i2b2Evaluating i2b2 PerformancePerformance
Statistics provided by Partners Healthcare Query Performance (on their primary i2b2 system)
4.6 million patient records 1.2 billion observations (facts) on these patients
(observation_fact table)– Queries requesting patient counts on this repository typically
complete within 10 seconds, many within several milliseconds Data Mart Initialization Performance
2.6 million patient records 550 million observations (facts) on these patients 8x3 GHz processor machine with 32GB RAM
– Completed building in approximately 1 hour and 15 minutes
i2b2-63
CSE5095
Evaluating i2b2Evaluating i2b2 ScalabilityScalability
Enabled by the modular nature of the i2b2 cell and ease of integration into the Hive Encourages development outside of the i2b2 core team Fosters rapid software development
UsabilityUsability Simple installation processes to get started Intuitive user interfaces Wealth of documentation publicly available online
Reduced learning curve InteroperabilityInteroperability
Works on a variety of operating systems, web browsers, and server technologies Not limited to commercial technologies
i2b2-64
CSE5095
LimitationsLimitations Naturally, users can create project-level repositories Naturally, users can create project-level repositories
(data marts) from an enterprise-level repository(data marts) from an enterprise-level repository Can we update our project databases with fresh,
updated enterprise data? Can we upload our project data, regardless of
origin, into the enterprise repository? Such capabilities are not currently supported in i2b2Such capabilities are not currently supported in i2b2
Difficult to implement the numerous policies required for these functions
i2b2-65
CSE5095
LimitationsLimitations i2b2 cells communicate through web services, which i2b2 cells communicate through web services, which
are not always flexibleare not always flexible Perhaps we want to execute our own SQL queries?
Not possible, queries are limited to pre-specified queries and result sets, dictated by the cells
How do we overcome this?How do we overcome this? Developers planning to introduce a second SQL
access layer to the CRC Will allow for greater flexibility with queries
– But will need to comply with security rules and strict ontology
i2b2-66
CSE5095
SummarySummary Presented i2b2 as a software tool and a data model Presented i2b2 as a software tool and a data model
aiding in clinical research and discoveryaiding in clinical research and discovery Addresses the inherit challenges of integrating
medical record and clinical research data
Relatively young project, but on the fast track for Relatively young project, but on the fast track for growth and developmentgrowth and development Roadmap for future releases with a new version
currently in release candidate (RC) status
Adoption and usage in BMI looks promisingAdoption and usage in BMI looks promising Approximately 17 sites outside of Partners
HealthCare are engaged in i2b2 projects