View
216
Download
0
Category
Preview:
Citation preview
1
Open Health Natural Language Processing Consortium
• www.ohnlp.org (part of caBIG Vocabulary Knowledge Center web presence)
• Goal• foster an open-source collaborative community around
clinical NLP that can deliver best-of-breed annotators, leverage the dynamic features of UIMA flow-control, and establish the infrastructure for clinical NLP.
• Two open source releases as part of OHNLP• Mayo’s pipeline for processing clinical notes (cTAKES)• IBM’s pipeline for processing medical notes (MedKAT)
and pathology reports (MedKAT/P)
2
3
4
cTAKES Technical Details • Open source release March 15, 2009
• www.ohnlp.org• Downloads: Documentation and Downloads• Technical details: Publications
• Framework • IBM’s Unstructured Information Management Architecture
(UIMA) open source framework
• Methods • Natural Language Processing methods (NLP)
• Application • High-throughput phenotype extraction system (80M+ notes;
80B+ tokens)
5
cTAKES Components
• Core components• Sentence boundary detection (OpenNLP)• Tokenization (rule-based)• Morphologic normalization (NLM’s “norm”)• POS tagging (OpenNLP)• Shallow parsing (OpenNLP)• Named Entity Recognition
• Diseases/disorders, signs/symptoms, procedures, anatomical sites, medications
• Dictionary mapping (lookup algorithm)• Machine learning (MAWUI)
• Negation and status identification (NegEx)
6
cTAKES Type System
7
cTAKES example
8
Current Efforts - I
• Anaphoric relations and coreference (as part of the Ontology Development and Information Extraction project, University of Pittsburgh) (2008 - 2011)
• In collaboration with Chapman and Crowley
• Semantic processing of the clinical text (in collaboration with Palmer, Martin and Ward, University of Colorado) (2009 - 2011)
• Treebanking (deep parses)• Predicate-argument structure and semantic labeling
(PropBanking)• UMLS relations (except temporal relations)
9
Current Efforts - II• Temporal relation discovery (2010-2014)
• In collaboration with Palmer, Martin and Ward, University of Colorado
• Lexical resources for the clinical domain (2010-2015)• In collaboration with Chapman, University of
Colorado and Elhadad, Columbia University• A la Treebank and clinical named entities with
attributes and modifiers
Recommended