19
Enabling semantic search in a bio- specimen repository July 9 th , 2013 ICBO 2013 Shahim Essaid, Carlo Torniai, and Melissa Haendel

Enabling semantic search in a bio-specimen repository - ICBO 2013

Embed Size (px)

DESCRIPTION

Paper presented at the international conference on biomedical ontology 2013.

Citation preview

Page 1: Enabling semantic search in a bio-specimen repository - ICBO 2013

Enabling semantic search in a bio-

specimen repository

July 9th, 2013ICBO 2013

Shahim Essaid, Carlo Torniai, and Melissa Haendel

Page 2: Enabling semantic search in a bio-specimen repository - ICBO 2013

OHSU’s Biolibrary Search Engine

Data aggregated from four repositories with plans for additional repositories

A web-based search engine over de-identified data

Our goal was to develop a controlled application ontology to support search capabilities

Page 3: Enabling semantic search in a bio-specimen repository - ICBO 2013

OHSU Biolibrary system

Page 4: Enabling semantic search in a bio-specimen repository - ICBO 2013

Search application

Two search interfaces(with no data integration)

Limited free text search

Page 5: Enabling semantic search in a bio-specimen repository - ICBO 2013

Search application

Search through anatomy and histology lists

Multiple wizard-like forms

Page 6: Enabling semantic search in a bio-specimen repository - ICBO 2013

Example coded data vs. pathology report

(Available structured data from one case)

However, pathology report also includes:•Low grade pancreatic intraepithelial

neoplasia•Extensive perineural invasion•Acute and chronic cholecystitis•Bile duct tissue with chronic inflammation•Chronic pancreatitis•Acute gastric serositis

Page 7: Enabling semantic search in a bio-specimen repository - ICBO 2013

Entity recognition with MetaMap

Page 8: Enabling semantic search in a bio-specimen repository - ICBO 2013

Selected mapping examples(the same report from earlier)

Final Pathologic Diagnosis:

A: Gallbladder, cholecystectomy:

- Acute and chronic cholecystitis

- Negative for malignancy

B: Bile ductular tissue, biopsy: - Bile duct tissue with chronic

inflammation - Negative for malignancy

Page 9: Enabling semantic search in a bio-specimen repository - ICBO 2013

C: Superior mesenteric vein margin, biopsy:

- Vascular tissue with no diagnostic abnormality

- Negative for malignancy

D: Portal vein margin, biopsy: - Fibroconnective tissue with no diagnostic abnormality - Negative for malignancy

Selected mapping examples(the same report from earlier)

Page 10: Enabling semantic search in a bio-specimen repository - ICBO 2013

Selected mapping examples(the same report from earlier)

E: Pancreas, stomach, duodenum, pancreaticogastroduodenectomy:

- Pancreatic ductal adenocarcinoma, grade 2/3, invading peripancreatic fat

- Size: 3 cm in greatest dimension

- Pancreatic neck margin positive for invasive carcinoma (please see comment)

- Superior mesenteric artery margin negative at 0.2 cm from invasive tumor, deep pancreatic margin negative at 0.6 cm from invasive tumor

- Extensive perineural invasion present

- No angiolymphatic invasion identified

- Metastatic pancreatic ductal adenocarcinoma present in two of ten peripancreatic lymph nodes (2/10)

Page 11: Enabling semantic search in a bio-specimen repository - ICBO 2013

Deriving an OWL ontology for DL queries

Page 12: Enabling semantic search in a bio-specimen repository - ICBO 2013

Adding relationships(developing an application ontology to

support search) “subclass of” axioms generated based on the UMLS hierarchy

table

Mapped entities were augmented with transitive closure of parents

“part of” axioms were generated by aggregating many mereological relationships from the UMLS relationship table

Relate anatomy, pathology, and disease entities with SMOMED-CT disorder/disease definitions

Page 13: Enabling semantic search in a bio-specimen repository - ICBO 2013

Adding relationships(developing an application ontology to

support search)

Problematic multiple and cyclic inheritance resolved manually

Resulted in an OWL ontology that supports useful DL queries along the “subclass of” and “part of/has part” axes. Examples:• Retrieve all pathologies (limited to a type if needed) that

affect an anatomical site (± all parts)• Retrieve all anatomical sites with a specific type of pathology• List all pathologies/sites for a disease• Etc.

The MetaMap mappings were saved in a database table. After relevant concepts are identified with a DL query, a database query can find actual reports.

Page 14: Enabling semantic search in a bio-specimen repository - ICBO 2013

SNOMED-CT examples of disorder definitions(used to relate anatomy to pathology in the application ontology)

Page 15: Enabling semantic search in a bio-specimen repository - ICBO 2013

Application integration

Integration with existing application was limited to appending the annotations to the text of pathology reports| C1521733 C0332144 0:26 | C0016976 32:44 | C0205178 63:70 | …

Annotations (CUIs and location) are then indexed in Solr and can be searched with the existing free text search form. (after a DL query on the OWL file)

Page 16: Enabling semantic search in a bio-specimen repository - ICBO 2013

A simple DL query for anatomy

(linked to actual report in the mapping table)

Page 17: Enabling semantic search in a bio-specimen repository - ICBO 2013

Difficulties and limitations

“Structured” text in pathology reports is not in natural language, making it perform less well using MetaMap

Named entity recognition helps with document retrieval but extraction of structured data is more valuable

Negation detection is poor but very important Significant multiple inheritance and subsumption cycles

(inappropriate equivalences) when several UMLS vocabularies are used to derive an OWL representation

Short project, no access to full reports, limited computational resources

Page 18: Enabling semantic search in a bio-specimen repository - ICBO 2013

Conclusions OHSU Biolibrary is adding many other specimen

collections, need for better search will increase

Can use NER to enhance the data with SNOMED-CT

Interest in identifying references in pathology reports to specimen blocks and slides to annotate these resources as well

Still limited resources for supporting sophisticated terminology and semantic efforts….

Page 19: Enabling semantic search in a bio-specimen repository - ICBO 2013

Thanks

Dr. Chris Corless

Rob Schuff

Medical Research Foundation of Oregon