1
Jianwei Leng, MS 1,2 , Brett R. South, MS 1,2,3 , Brad Adams, MS 1,2 , Tyler B. Forbush, Shuying Shen, MStat 1,2,3 , Scott L. DuVall, PhD, Wendy Chapman, PhD 5 1 VA Salt Lake City Health Care System, IDEAS Center, University of Utah, 2 Department of Internal Medicine, 3 Biomedical Informatics, and 4 Radiology, University of Utah, Salt Lake City, UT, 5 University of California, San Diego, Division of Biomedical Informatics, La Jolla, California • Client application that can run on most operating systems that supports Java including, Microsoft Windows x86/x64 platforms, Apple Mac OS X, Sun Solaris, and Linux. • Supports standardized formats including a file folder system, and structured XML inputs and outputs allowing integration with other open source tools for annotation and knowledge management including Knowtator 3 and Protégé 4 . Objectives Systems Architecture Server Integration and Future Work References: 1. South, BR, Shen S, Leng J, Forbush T, DuVall SL, Chapman WW. A Prototype Tool Set to Support Machine-Assisted Annotation. In BioNLP 2012. 2012. Montreal, Canada. Contact information: [email protected] VA Consortium for Healthcare Informatics Research 500 Foothill Drive, Salt Lake City, UT 84148, (801) 499-1175 Acknowledgements: VA Consortium for Healthcare Informatics Research (CHIR), VA HSR HIR 08-374, the VA Informatics and Computing Infrastructure (VINCI), VA HIR 08-204, and NIH Grant U54 HL 108460 for integrating Data for Analysis, Annonymization and Sharing (iDASH), NIGMS 7R01GM090187. eHOST System Features Availability and Documentation Abstract • Introduce an open source annotation tool called the E xtensible H uman O racle S uite of T ools (eHOST) and a server side administration component called called the C hart Review A dministration S erver for P atient R eview (CASPR). Basic and advanced system functionalities that include: an annotation interface, error analysis and reporting, integration of machine-assisted approaches, and semi- automated curation of information. Manually annotating documents is costly, time-consuming and labor-intensive. A clear opportunity exists to develop new tools and assess functionalities that introduce efficiencies into the process of generating reference standards for a variety of development tasks. In the biomedical domain, an infrastructure is needed that will support large-scale secure annotation of sensitive clinical data as well as distributed annotation approaches. Figure 1. eHOST (Extensible Human Oracle Suite of Tools) Oracle Mode: Find and annotate identical strings of text using the same annotation class (Figure 5). Semi-Automated curation: reduce candidate entries in pre-annotation dictionaries and improve processing speed of machine-assisted pre-annotation. Integrated regular expressions builder: build and apply custom regular expression libraries to identify specific terms, or other information that commonly occurs in clinical reports. Integrated UMLS Search function: to support data normalization tasks often associated with annotation of clinical texts (Figure 4). Basic System Features Schema builder: using eHOST and/or CASPR users can design annotation schema representing information classes, assign attributes, and build relations between classes (Figure 1,2). Corpus management: eHOST provides workspace and active project editors. CASPR supports a MySQL database backend (Figures 3 and 5). Annotation mode: identify and mark candidate spans of text using annotation schema (Figure 1). eHOST also supports difference matching, error checking and calculation of standard reporting metrics. • Coupling eHOST with CASPR provides a means for distributed annotation allowing a study coordinator to quickly set up new annotation projects, plan and re-plan annotation assignments and manage submitted data. Data are written and stored in a queriable database. • CASPR manages which annotations belong to which projects, datasets, tasks, batches, and annotator assignments, allowing appropriate presentation of annotations to any assigned task in a project workflow. • Future directions will include a more formal usability assessment that will integrate distributed annotation using the eHOST/CASPR interfaces. API documentation, a demo project, and source code for eHOST available: http://code.google.com/p/ehost/ . Figure 2. (CASPR) Chart Review Administration Server for Patient Review Study Coordinator CASPR Annotation Admin Server Corpus & Schemas Chart Review Administration Server for Patient Review (CASPR) eHOST eHOST eHOST eHOST eHOST INTERNET Figure 4. Embedded UMLS Searching Function in eHOST Figure 3. Complete Solution for document Level Review using eHOST and CASPR MySQL Database Annotations Coordinator Web Interface Generate Schema Load files Assign tasks & Define Workflow Adjudicator Adj2 Annotator A1 Annotator A2 Annotator A3 Adjudicator Adj1 Sync to eHOST for Annotator A1 Task for A1 eHOST A1 Human Annotation Using eHOST Adjudication eHOST CASPR Figure 5. eHOST/CASPR Workflow Realizing Efficient Annotation with eHOST: extensible Human Oracle Suite of Tools

Realizing Efficient Annotation with eHOST: extensible Human Oracle ... Shen... · Manually annotating documents is costly, time-consuming and labor-intensive. A clear opportunity

  • Upload
    haminh

  • View
    221

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Realizing Efficient Annotation with eHOST: extensible Human Oracle ... Shen... · Manually annotating documents is costly, time-consuming and labor-intensive. A clear opportunity

Jianwei Leng, MS1,2, Brett R. South, MS1,2,3, Brad Adams, MS1,2, Tyler B. Forbush, Shuying Shen, MStat1,2,3, Scott L. DuVall, PhD, Wendy Chapman, PhD5

1VA Salt Lake City Health Care System, IDEAS Center, University of Utah, 2Department of Internal Medicine, 3Biomedical Informatics, and 4Radiology, University of Utah, Salt Lake City, UT, 5University of California, San Diego, Division of Biomedical Informatics, La Jolla, California

•  Client application that can run on most operating systems that supports Java including, Microsoft Windows x86/x64 platforms, Apple Mac OS X, Sun Solaris, and Linux.

•  Supports standardized formats including a file folder system, and structured XML inputs and outputs allowing integration with other open source tools for annotation and knowledge management including Knowtator3 and Protégé4.

Objectives

Systems Architecture

Server Integration and Future Work

References: 1.  South, BR, Shen S, Leng J, Forbush T, DuVall SL, Chapman WW. A Prototype Tool Set to

Support Machine-Assisted Annotation. In BioNLP 2012. 2012. Montreal, Canada.

Contact information: [email protected] VA Consortium for Healthcare Informatics Research 500 Foothill Drive, Salt Lake City, UT 84148, (801) 499-1175

Acknowledgements: VA Consortium for Healthcare Informatics Research (CHIR), VA HSR HIR 08-374, the VA Informatics and Computing Infrastructure (VINCI), VA HIR 08-204, and NIH Grant U54 HL 108460 for integrating Data for Analysis, Annonymization and Sharing (iDASH), NIGMS 7R01GM090187.

eHOST System Features

Availability and Documentation

Abstract

•  Introduce an open source annotation tool called the Extensible Human Oracle Suite of Tools (eHOST) and a server side administration component called called the Chart Review Administration Server for Patient Review (CASPR).

•  Basic and advanced system functionalities that include: an annotation interface, error analysis and reporting, integration of machine-assisted approaches, and semi-automated curation of information.

Manually annotating documents is costly, time-consuming and labor-intensive. A clear opportunity exists to develop new tools and assess functionalities that introduce efficiencies into the process of generating reference standards for a variety of development tasks. In the biomedical domain, an infrastructure is needed that will support large-scale secure annotation of sensitive clinical data as well as distributed annotation approaches.

Figure 1. eHOST (Extensible Human Oracle Suite of Tools)

•  Oracle Mode: Find and annotate identical strings of text using the same annotation class (Figure 5).

•  Semi-Automated curation: reduce candidate entries in pre-annotation dictionaries and improve processing speed of machine-assisted pre-annotation.

•  Integrated regular expressions builder: build and apply custom regular expression libraries to identify specific terms, or other information that commonly occurs in clinical reports.

•  Integrated UMLS Search function: to support data normalization tasks often associated with annotation of clinical texts (Figure 4).

Basic System Features •  Schema builder: using eHOST and/or CASPR users can

design annotation schema representing information classes, assign attributes, and build relations between classes (Figure 1,2).

•  Corpus management: eHOST provides workspace and active project editors. CASPR supports a MySQL database backend (Figures 3 and 5).

•  Annotation mode: identify and mark candidate spans of text using annotation schema (Figure 1). eHOST also supports difference matching, error checking and calculation of standard reporting metrics.

•  Coupling eHOST with CASPR provides a means for distributed annotation allowing a study coordinator to quickly set up new annotation projects, plan and re-plan annotation assignments and manage submitted data.

•  Data are written and stored in a queriable database. •  CASPR manages which annotations belong to which

projects, datasets, tasks, batches, and annotator assignments, allowing appropriate presentation of annotations to any assigned task in a project workflow.

•  Future directions will include a more formal usability assessment that will integrate distributed annotation using the eHOST/CASPR interfaces.

•  API documentation, a demo project, and source code for eHOST available: http://code.google.com/p/ehost/.

������

Figure 2. (CASPR)Chart Review Administration Server for Patient Review

Study���Coordinator

CASPR���Annotation

Admin���Server

Corpus & Schemas

Chart Review Administration Server for Patient Review (CASPR)

eHOST

eHOST

eHOST

eHOST

eHOST

INTERNET

Figure 4. Embedded UMLS Searching Function in eHOST

Figure 3. Complete Solution

for document Level Review using eHOST and CASPR

MySQL Database

Annotations

Coordinator

Web

Inte

rface

Generate Schema

Load files

Assign tasks & Define Workflow

AdjudicatorAdj2

Annotator A1

Annotator A2

Annotator A3

AdjudicatorAdj1

Sync to eHO

ST for A

nnotator A1

Task for A1

eHOST ���A1

Human Annotation Using

eHOST

Adjudication

eHO

ST

CAS

PR

Figure 5. eHOST/CASPR Workflow

Realizing Efficient Annotation with eHOST: extensible Human Oracle Suite of Tools