34
Mapping biomedical literature into UMLS concepts MetaMap Presented By: Osama Jomaa Miami University

Unified Medical Language System & MetaMap

Embed Size (px)

Citation preview

Mapping biomedical literature into UMLS concepts

MetaMap

Presented By: Osama JomaaMiami University

Unified Medical Language

Motivation

“... to facilitate the development of computer systems that behave as if they "understand" the meaning of the language of biomedicine and health.”

National Library of Medicine

UMLS Components

1. Metathesaurus

+1 Million biomedical concepts from over 100 vocabularies

2. .Semantic Network

133 categories & 54 relationships.

3. .Specialist Lexicon & Lexical Tools

Software programs to aid in NLP

Meta thesaurusPatient Care Controlled Terms

Biomedical Vocabs from Different

LanguagesClinical/Health Services Research

Health Services Billing

Biomedical Literature Catalogs

Public Health Statistics

.

.

.

.

.

5,000,000

biomedi cal te rm

1,000 ,000 Con

cep ts

+ 100 Source Vocabs

Relational DB Tables

Metathesaurus●Concepts are classified into categories:–Diagnosis

–Procedures & Supplies

–Diseases

–….

●Concepts have unique identifier.●Concepts have preferred terms.●Concepts can be grouped into subsets via applying filters.

Source Vocabularies Categories

One Concept Many Terms

One concept can have many terms in multiple vocabularies.

Example: Atrial Fibrillation

Preferred TermsConcept: Hodgkin's Disease

Unique Identifiers● Concept Unique Identifier (CUI)

Link all the names in all the source vocabs that mean the same to one concept and assign a unique identifier, CUI, to it.

● Lexical Unique Identifier (LUI)

Are lexical variants for the concepts detected using Lexical Variant Generator (LVG) program.

● String Unique Identifier (SUI)

Represents variations in the char set, upper-lower case, or permutation difference.

● Atom Unique Identifier (AUI)

Every occurrence of a string in each source vocab is assigned a unique identifier, AUI.

Semantic Network● Semantic Types

+133 types, each MT concept assigned one semantic type at least.

● Semantic Relationships

54 relationaship. Is-A is the most important.

Semantic NetworkSemantic Types Examples:✔ Organisms✔ Anatomical structures✔ Biologic function✔ Chemicals✔ Physical objects

Entity

Event

Semantic Relationships Examples:✔ Physically related to✔ Spatially related to✔ Temporally related to✔ Functionally related to✔ Conceptually related to

Lexical Tools

●The Specialist Lexicon

Is an English lexicon (dictionary) that includes over 200,000 biomedical terms from a variety of source to aid in NLP.

●Lexical Variant Generator (LVG)●Norm

Normalizer

●Wordind

Tokenizer

MetaMap

Why Concept Identification?

● Information extraction/Data mining

● Classification/Categorization

● Text summarization

● Question answering

● Literature-based Knowledge Discovery

ExamplePhrase: “lung cancer.”

Meta Candidates (8):

1000 Lung Cancer {MDR,DXP} (Malignant neoplasm of lung) [Neoplastic Process]

1000 Lung Cancer (Carcinoma of lung) [Neoplastic Process]

861 Cancer (Malignant Neoplasms) [Neoplastic Process]

861 Lung [Body Part, Organ, or Organ Component]

861 Cancer (Cancer Genus) [Invertebrate]

861 Lung (Entire lung) [Body Part, Organ, or Organ Component]

861 Cancer (Specialty Type - cancer) [Biomedical Occupation or

Discipline]

768 Pneumonia [Disease or Syndrome]

Meta Mapping (1000):

1000 Lung Cancer (Carcinoma of lung) [Neoplastic Process]

Meta Mapping (1000):

1000 Lung Cancer (Malignant neoplasm of lung) [Neoplastic Process]

The Algorithm

MetaMap Options● Word Sense Disambiguation (-y)

Determines which concept is the best choice using surrounding context.

● Negation (--negx)

Identifies negated entities.

Examples●WSD Examples–“Fifteen (6.4%) of 234 colds treated with placebo ..”

●Cold (cold temperature) [npop]●Cold (Common cold) [dsyn]●Cold (Cold Sensation) [phsf]

–“.. the drugs were compared in two four-point, double-blind bioassays.”●Double (Diplopia) [dsyn] vs. Double (Duplicate) [ftcn]●Blind (Blind Vision) [dsyn] vs. BLIND (Blinded) [reasa] vs. Blind (Visually impaired persons) [podg]

● Bioassays (Biological Assay) [lbpr]

Examples● Negation Example

– “There is no focal infiltrate or pleural effusion.”

– --negex output(in addition to normal output):

NEGATIONS:

Negation Type:nega

Negation Trigger: no

Negation PosInfo: 9/2

Negated Concept: C0332448:Infiltrate

Concept PosInfo: 18/10

Negation Type:nega

Negation Trigger: no

Negation PosInfo: 9/2

Negated Concept: C2073625:pleural effusion, C0032227:Pleural Effusion

Concept PosInfo: 32/16

Other Options● -@ --WSD <hostname> : Which WSD server to use.

● -8 --dynamic_variant_generation : dynamic variant generation

● -D --all_derivational_variants : all derivational variants

● -J --restrict_to_sts <semtypelist> : restrict to semantic types

● -K --ignore_stop_phrases : ignore stop phrases.

● -R --restrict_to_sources <sourcelist> : restrict to sources

● -V --mm_data_version <name> : version of MetaMap data to use.

● -X --truncate_candidates_mappings : truncate candidates mapping

● -Y --prefer_multiple_concepts : prefer multiple concepts

● -Z --mm_data_year <name> : year of MetaMap data to use.

● -a --all_acros_abbrs : allow Acronym/Abbreviation variants

● -b --compute_all_mappings : compute/display all mappings

● -d --no_derivational_variants : no derivational variants

● -e --exclude_sources <sourcelist> : exclude semantic types

● -g --allow_concept_gaps : allow concept gaps

● -i --ignore_word_order : ignore word order

● -k --exclude_sts <semtypelist> : exclude semantic types

● -o --allow_overmatches : allow overmatches

● -r --threshold <integer> : Threshold for displaying candidates.

● -y --word_sense_disambiguation : use WSD

MetaMap Output Formats

● Human-readable outputp

● MetaMap Machine Output (MMO)

● XML output

● Colorized MetaMap output (MetaMap 3D)

● Fielded (MMI) Outputs

Human ReadablePhrase: "heart attack"

Meta Candidates (8):

1000 Heart attack (Myocardial Infarction) [Disease or Syndrome]

861 Heart [Body Part, Organ, or Organ Component]

861 Attack, NOS (Onset of illness) [Finding]

861 Attack (Attack device) [Medical Device]

861 attack (Attack behavior) [Social Behavior]

861 Heart (Entire heart) [Body Part, Organ, or Organ Component]

861 Attack (Observation of attack) [Finding]

827 Attacked (Assault) [Injury or Poisoning]

Meta Mapping (1000):

1000 Heart attack (Myocardial Infarction) [Disease or Syndrome]

Machine Outputcandidates([

ev(-1000, 'C0027051', 'Heart attack', 'Myocardial Infarction', [heart,attack], [dsyn], [[[1,2],[1,2],0]], yes, no, ['MEDLINEPLUS], [0/12]),

ev(-861, 'C0018787', 'Heart', 'Heart', [heart], [bpoc], [[[1,1],[1,1],0]], yes, no, ['AIR'],[0/5]),

ev(-861, 'C0277793', 'Attack, NOS', 'Onset of illness', [attack], [fndg], [[[2,2],[1,1],0]], yes, no, ['MTH'], [6/6]),

ev(-861, 'C0699795', 'Attack', 'Attack device', [attack], [medd] [[[2[medd],[[[2,2],[1,1],0]],2] [1 1] 0]] yesyes, nono, ['MTH'[ MTH ,'MMSL']MMSL ], [6/6])[6/6]),

ev(-861, 'C1261512', attack, 'Attack behavior', [attack],[socb], [[[2,2],[1,1],0]], yes, no, ['MTH','PSY','AOD'], [6/6]),

ev(-861, 'C1281570', 'Heart', 'Entire heart', [heart], [bpoc], [[[1,1],[1,1],0]], yes, no, ['MTH','SNOMEDCT'], [0/5]),

Ev(-861, , 'C1304680',, 'Attack',, 'Observation of attack',, [attack],,[fndg], [[[2,2],[1,1],0]],yes, no, ['MTH','SNOMEDCT'], [6/6]),

ev(-827, 'C0004063', 'Attacked', 'Assault', [attacked], [inpo], [[[2,2],[1,1],1]], yes, no, ['ICD10AM'], [6/6])]).

Unformatted XML<Candidate><CandidateScore>-1000</CandidateScore><CandidateCUI>C0027051</CandidateCUI><CandidateM

atched>Heart attack</CandidateMatched><CandidatePreferred>Myocardial Infarction</CandidatePreferr

ed><MatchedWords Count=2><MatchedWord>heart</MatchedWord><MatchedWord>attack</MatchedWord></Match

edWords><SemTypes Count=1><SemType>dsyn</SemType></SemTypes><MatchMaps Count=1><MatchMap><TextMat

chStart>1</TextMatchStart><TextMatchEnd>2</TextMatchEnd><ConcMatchStart>1</ConcMatchStart><ConcMa

tchEnd>2</ConcMatchEnd><LexVariation>0</LexVariation></MatchMap></MatchMaps><IsHead>yes</IsHead><

IsOverMatch>no</IsOverMatch><Sources Count=24><Source>MEDLINEPLUS</Source></Sources><ConceptPIs C

ount=1><ConceptPI><StartPos>0</StartPos><Length>12</Length></ConceptPI></ConceptPIs></Candidate>

Formatted XML<Candidate>

<CandidateScore>-1000</CandidateScore>

<CandidateCUI>C0027051</CandidateCUI>

<CandidateMatched>Heart attack</CandidateMatched>

<CandidatePreferred>Myocardial Infarction</CandidatePreferred>

<MatchedWords

Count=2><MatchedWord>heart</MatchedWord><MatchedWord>attack</MatchedWord></MatchedWords>

<SemTypes>

<Count=1><SemType>dsyn</SemType></SemTypes>

<MatchMaps Count=1>

<MatchMap>

<TextMatchStart>1</TextMatchStart>

<ConcMatchEnd>2</ConcMatchEnd>

<LexVariation>0</LexVariation>

</MatchMap>

</MatchMaps>

<IsHead>yes</IsHead>

<IsOverMatch>no</IsOverMatch>

<Sources Count=24><Source>MEDLINEPLUS</Source></Sources>

<ConceptPIs Count=1><ConceptPI><StartPos>0</StartPos><Length>12</Length></ConceptPI></ConceptPIs>

</Candidate>

MetaMap 3D

MetaMap: Technical Aspect

●Download –MetaMap API Underlying Architecture.

–MetaMap Java API.

●Extract and Install–$ bzip2 -dc public_mm_linux_javaapi_{four-digit-year}.tar.bz2 | tar xvf -

–$ ./bin/install.sh

●Starting MetaMap Server

$ ./bin/skrmedpostctl start #Start SKR Server

$ ./bin/wsdserverctl start #Start WSD Server (Optional)

$ ./bin/mmserver{two-digit-year} #Start MetaMap Server

MetaMap Java API

Two jar files contain the API:

✔ /src/javaapi/dist/MetaMapApi.jar

✔ /src/javaapi/dist/prologbeans.jar

Code Time :)

MetaMapApi api = new MetaMapApiImpl("localhost");

List<Result> resultList = api.processCitationsFromFile("Abstract.txt");

Result result = resultList.get(0);

Code Time :)for (Utterance utterance: result.getUtteranceList()) {

System.out.println("Utterance:");

System.out.println(" Id: " + utterance.getId());

System.out.println(" Utterance text: " + utterance.getString());

System.out.println(" Position: " + utterance.getPosition());

Code Time :)

for (PCM pcm: utterance.getPCMList()) {System.out.println("Phrase:");System.out.println(" text: " + pcm.getPhrase().getPhraseText());System.out.println("Candidates:");for (Ev ev: pcm.getCandidateList()) { System.out.println(" Candidate:"); System.out.println(" Score: " + ev.getScore()); System.out.println(" Concept Id: " + ev.getConceptId()); System.out.println(" Concept Name: " + ev.getConceptName()); System.out.println(" Preferred Name: " + ev.getPreferredName()); System.out.println(" Matched Words: " + ev.getMatchedWords()); System.out.println(" Semantic Types: " + ev.getSemanticTypes()); System.out.println(" MatchMap: " + ev.getMatchMap()); System.out.println(" MatchMap alt. repr.: " + ev.getMatchMapList()); System.out.println(" is Head?: " + ev.isHead()); System.out.println(" is Overmatch?: " + ev.isOvermatch()); System.out.println(" Sources: " + ev.getSources()); System.out.println(" Positional Info: " + ev.getPositionalInfo());}

Code Time :)

System.out.println("Mappings:");for (Mapping map: pcm.getMappingList()) { System.out.println(" Map Score: " + map.getScore()); for (Ev mapEv: map.getEvList()) { System.out.println(" Score: " + mapEv.getScore()); System.out.println(" Concept Id: " + mapEv.getConceptId()); System.out.println(" Concept Name: " + mapEv.getConceptName()); System.out.println(" Preferred Name: " + mapEv.getPreferredName()); System.out.println(" Matched Words: " + mapEv.getMatchedWords()); System.out.println(" Semantic Types: " + mapEv.getSemanticTypes()); System.out.println(" MatchMap: " + mapEv.getMatchMap()); System.out.println(" MatchMap alt. repr.: " + mapEv.getMatchMapList()); System.out.println(" is Head?: " + mapEv.isHead()); System.out.println(" is Overmatch?: " + mapEv.isOvermatch()); System.out.println(" Sources: " + mapEv.getSources()); System.out.println(" Positional Info: " + mapEv.getPositionalInfo()); }}}}