Upload
matthieu-schapranow
View
425
Download
0
Embed Size (px)
Citation preview
Mining and Processing of Unstructured Medical Data
Cindy Perscheid
Festival of Genomics
London, Jan 19, 2016
■ Doctor‘s and discharge letters
■ Clinical trial descriptions
■ Scientific publications
Unstructured Medical Data Information Hidden in Text
Perscheid, Schapranow
Processing of Unstructured Medical Data
Chart 2
■ Huge amount of data: Pubmed with references to +25 Million articles
■ Restricted querying: Keyword search
■ Multilingual
Unstructured Medical Data Challenges and Limitations
Perscheid, Schapranow
Processing of Unstructured Medical Data
Chart 3
[Patients 65 years]NP or [older]ADJP [with]PP [breast cancer]NP ... [Patients 65 years]NP or [older]ADJP [with]PP [breast cancer]NP ...
■ Named Entity Recognition: Identify keywords
■ Part-Of-Speech Tagging: Identify grammatical function of words
■ Parsing: Identify sentence structure and components
□ Chunking: Combine words and POS tags to chunks
□ Relation Extraction: Identify relations between sentence parts
■ Semantic Role Labeling: Identify specific roles in sentence
■ …
Natural Language Processing Selected Methods
Perscheid, Schapranow
Processing of Unstructured Medical Data
Noun Noun Noun
Disease
Preposition
Person
Adjective
Chart 4
Noun
■ IMDB provides text analysis features, e.g.
□ Fulltext indexing
□ Entity Recognition
□ Tokenization/Chunking
□ Fuzzy search
■ Mechanisms can be made domain-specific by specifying
□ Dictionaries
□ CGUL rules containing regular expressions with linguistic attributes
Outlook IMDB Textual Analysis Features
T Text Retrieval and Extraction
Multi-Core and Parallelization
Reduction of Layers
x x
Perscheid, Schapranow
Processing of Unstructured Medical Data
Chart 5
?
Natural Language Processing Applications
Perscheid, Schapranow
Processing of Unstructured Medical Data
Chart 6 Hello Bonjour
Text Summarization
Question Answering Systems
Machine Translation
Information Retrieval and Extraction
Doctor‘s Letter Explanation
major depression
What disease is mirtazapine
predominantly used for?
?
■ In short: Slow tools, wrong results
□ Too hard: Natural language is complex
□ Too much data: >25 Million papers in PubMed…
Application Example: Question Answering Still a lot to Improve…
Perscheid, Schapranow
Processing of Unstructured Medical Data
Credit: Dr. Mariana Neves, Hasso Plattner Institute
Chart 7
Thanks!
Hasso Plattner Institute Enterprise Platform & Integration Concepts
August-Bebel-Str. 88 14482 Potsdam, Germany
Dr. Matthieu-P. Schapranow [email protected]
http://we.analyzegenomes.com/
Cindy Perscheid, M. Sc. [email protected]
Perscheid, Schapranow
Processing of Unstructured Medical Data
Chart 8