Upload
osama-jomaa
View
128
Download
0
Tags:
Embed Size (px)
Citation preview
Motivation
“... to facilitate the development of computer systems that behave as if they "understand" the meaning of the language of biomedicine and health.”
National Library of Medicine
UMLS Components
1. Metathesaurus
+1 Million biomedical concepts from over 100 vocabularies
2. .Semantic Network
133 categories & 54 relationships.
3. .Specialist Lexicon & Lexical Tools
Software programs to aid in NLP
Meta thesaurusPatient Care Controlled Terms
Biomedical Vocabs from Different
LanguagesClinical/Health Services Research
Health Services Billing
Biomedical Literature Catalogs
Public Health Statistics
.
.
.
.
.
5,000,000
biomedi cal te rm
1,000 ,000 Con
cep ts
+ 100 Source Vocabs
Relational DB Tables
Metathesaurus●Concepts are classified into categories:–Diagnosis
–Procedures & Supplies
–Diseases
–….
●Concepts have unique identifier.●Concepts have preferred terms.●Concepts can be grouped into subsets via applying filters.
One Concept Many Terms
One concept can have many terms in multiple vocabularies.
Example: Atrial Fibrillation
Unique Identifiers● Concept Unique Identifier (CUI)
Link all the names in all the source vocabs that mean the same to one concept and assign a unique identifier, CUI, to it.
● Lexical Unique Identifier (LUI)
Are lexical variants for the concepts detected using Lexical Variant Generator (LVG) program.
● String Unique Identifier (SUI)
Represents variations in the char set, upper-lower case, or permutation difference.
● Atom Unique Identifier (AUI)
Every occurrence of a string in each source vocab is assigned a unique identifier, AUI.
Semantic Network● Semantic Types
+133 types, each MT concept assigned one semantic type at least.
● Semantic Relationships
54 relationaship. Is-A is the most important.
Semantic NetworkSemantic Types Examples:✔ Organisms✔ Anatomical structures✔ Biologic function✔ Chemicals✔ Physical objects
Entity
Event
Semantic Relationships Examples:✔ Physically related to✔ Spatially related to✔ Temporally related to✔ Functionally related to✔ Conceptually related to
Lexical Tools
●The Specialist Lexicon
Is an English lexicon (dictionary) that includes over 200,000 biomedical terms from a variety of source to aid in NLP.
●Lexical Variant Generator (LVG)●Norm
Normalizer
●Wordind
Tokenizer
Why Concept Identification?
● Information extraction/Data mining
● Classification/Categorization
● Text summarization
● Question answering
● Literature-based Knowledge Discovery
ExamplePhrase: “lung cancer.”
Meta Candidates (8):
1000 Lung Cancer {MDR,DXP} (Malignant neoplasm of lung) [Neoplastic Process]
1000 Lung Cancer (Carcinoma of lung) [Neoplastic Process]
861 Cancer (Malignant Neoplasms) [Neoplastic Process]
861 Lung [Body Part, Organ, or Organ Component]
861 Cancer (Cancer Genus) [Invertebrate]
861 Lung (Entire lung) [Body Part, Organ, or Organ Component]
861 Cancer (Specialty Type - cancer) [Biomedical Occupation or
Discipline]
768 Pneumonia [Disease or Syndrome]
Meta Mapping (1000):
1000 Lung Cancer (Carcinoma of lung) [Neoplastic Process]
Meta Mapping (1000):
1000 Lung Cancer (Malignant neoplasm of lung) [Neoplastic Process]
MetaMap Options● Word Sense Disambiguation (-y)
Determines which concept is the best choice using surrounding context.
● Negation (--negx)
Identifies negated entities.
Examples●WSD Examples–“Fifteen (6.4%) of 234 colds treated with placebo ..”
●Cold (cold temperature) [npop]●Cold (Common cold) [dsyn]●Cold (Cold Sensation) [phsf]
–“.. the drugs were compared in two four-point, double-blind bioassays.”●Double (Diplopia) [dsyn] vs. Double (Duplicate) [ftcn]●Blind (Blind Vision) [dsyn] vs. BLIND (Blinded) [reasa] vs. Blind (Visually impaired persons) [podg]
● Bioassays (Biological Assay) [lbpr]
Examples● Negation Example
– “There is no focal infiltrate or pleural effusion.”
– --negex output(in addition to normal output):
NEGATIONS:
Negation Type:nega
Negation Trigger: no
Negation PosInfo: 9/2
Negated Concept: C0332448:Infiltrate
Concept PosInfo: 18/10
Negation Type:nega
Negation Trigger: no
Negation PosInfo: 9/2
Negated Concept: C2073625:pleural effusion, C0032227:Pleural Effusion
Concept PosInfo: 32/16
Other Options● -@ --WSD <hostname> : Which WSD server to use.
● -8 --dynamic_variant_generation : dynamic variant generation
● -D --all_derivational_variants : all derivational variants
● -J --restrict_to_sts <semtypelist> : restrict to semantic types
● -K --ignore_stop_phrases : ignore stop phrases.
● -R --restrict_to_sources <sourcelist> : restrict to sources
● -V --mm_data_version <name> : version of MetaMap data to use.
● -X --truncate_candidates_mappings : truncate candidates mapping
● -Y --prefer_multiple_concepts : prefer multiple concepts
● -Z --mm_data_year <name> : year of MetaMap data to use.
● -a --all_acros_abbrs : allow Acronym/Abbreviation variants
● -b --compute_all_mappings : compute/display all mappings
● -d --no_derivational_variants : no derivational variants
● -e --exclude_sources <sourcelist> : exclude semantic types
● -g --allow_concept_gaps : allow concept gaps
● -i --ignore_word_order : ignore word order
● -k --exclude_sts <semtypelist> : exclude semantic types
● -o --allow_overmatches : allow overmatches
● -r --threshold <integer> : Threshold for displaying candidates.
● -y --word_sense_disambiguation : use WSD
MetaMap Output Formats
● Human-readable outputp
● MetaMap Machine Output (MMO)
● XML output
● Colorized MetaMap output (MetaMap 3D)
● Fielded (MMI) Outputs
Human ReadablePhrase: "heart attack"
Meta Candidates (8):
1000 Heart attack (Myocardial Infarction) [Disease or Syndrome]
861 Heart [Body Part, Organ, or Organ Component]
861 Attack, NOS (Onset of illness) [Finding]
861 Attack (Attack device) [Medical Device]
861 attack (Attack behavior) [Social Behavior]
861 Heart (Entire heart) [Body Part, Organ, or Organ Component]
861 Attack (Observation of attack) [Finding]
827 Attacked (Assault) [Injury or Poisoning]
Meta Mapping (1000):
1000 Heart attack (Myocardial Infarction) [Disease or Syndrome]
Machine Outputcandidates([
ev(-1000, 'C0027051', 'Heart attack', 'Myocardial Infarction', [heart,attack], [dsyn], [[[1,2],[1,2],0]], yes, no, ['MEDLINEPLUS], [0/12]),
ev(-861, 'C0018787', 'Heart', 'Heart', [heart], [bpoc], [[[1,1],[1,1],0]], yes, no, ['AIR'],[0/5]),
ev(-861, 'C0277793', 'Attack, NOS', 'Onset of illness', [attack], [fndg], [[[2,2],[1,1],0]], yes, no, ['MTH'], [6/6]),
ev(-861, 'C0699795', 'Attack', 'Attack device', [attack], [medd] [[[2[medd],[[[2,2],[1,1],0]],2] [1 1] 0]] yesyes, nono, ['MTH'[ MTH ,'MMSL']MMSL ], [6/6])[6/6]),
ev(-861, 'C1261512', attack, 'Attack behavior', [attack],[socb], [[[2,2],[1,1],0]], yes, no, ['MTH','PSY','AOD'], [6/6]),
ev(-861, 'C1281570', 'Heart', 'Entire heart', [heart], [bpoc], [[[1,1],[1,1],0]], yes, no, ['MTH','SNOMEDCT'], [0/5]),
Ev(-861, , 'C1304680',, 'Attack',, 'Observation of attack',, [attack],,[fndg], [[[2,2],[1,1],0]],yes, no, ['MTH','SNOMEDCT'], [6/6]),
ev(-827, 'C0004063', 'Attacked', 'Assault', [attacked], [inpo], [[[2,2],[1,1],1]], yes, no, ['ICD10AM'], [6/6])]).
Unformatted XML<Candidate><CandidateScore>-1000</CandidateScore><CandidateCUI>C0027051</CandidateCUI><CandidateM
atched>Heart attack</CandidateMatched><CandidatePreferred>Myocardial Infarction</CandidatePreferr
ed><MatchedWords Count=2><MatchedWord>heart</MatchedWord><MatchedWord>attack</MatchedWord></Match
edWords><SemTypes Count=1><SemType>dsyn</SemType></SemTypes><MatchMaps Count=1><MatchMap><TextMat
chStart>1</TextMatchStart><TextMatchEnd>2</TextMatchEnd><ConcMatchStart>1</ConcMatchStart><ConcMa
tchEnd>2</ConcMatchEnd><LexVariation>0</LexVariation></MatchMap></MatchMaps><IsHead>yes</IsHead><
IsOverMatch>no</IsOverMatch><Sources Count=24><Source>MEDLINEPLUS</Source></Sources><ConceptPIs C
ount=1><ConceptPI><StartPos>0</StartPos><Length>12</Length></ConceptPI></ConceptPIs></Candidate>
Formatted XML<Candidate>
<CandidateScore>-1000</CandidateScore>
<CandidateCUI>C0027051</CandidateCUI>
<CandidateMatched>Heart attack</CandidateMatched>
<CandidatePreferred>Myocardial Infarction</CandidatePreferred>
<MatchedWords
Count=2><MatchedWord>heart</MatchedWord><MatchedWord>attack</MatchedWord></MatchedWords>
<SemTypes>
<Count=1><SemType>dsyn</SemType></SemTypes>
<MatchMaps Count=1>
<MatchMap>
<TextMatchStart>1</TextMatchStart>
<ConcMatchEnd>2</ConcMatchEnd>
<LexVariation>0</LexVariation>
</MatchMap>
</MatchMaps>
<IsHead>yes</IsHead>
<IsOverMatch>no</IsOverMatch>
<Sources Count=24><Source>MEDLINEPLUS</Source></Sources>
<ConceptPIs Count=1><ConceptPI><StartPos>0</StartPos><Length>12</Length></ConceptPI></ConceptPIs>
</Candidate>
MetaMap: Technical Aspect
●Download –MetaMap API Underlying Architecture.
–MetaMap Java API.
●Extract and Install–$ bzip2 -dc public_mm_linux_javaapi_{four-digit-year}.tar.bz2 | tar xvf -
–$ ./bin/install.sh
●Starting MetaMap Server
$ ./bin/skrmedpostctl start #Start SKR Server
$ ./bin/wsdserverctl start #Start WSD Server (Optional)
$ ./bin/mmserver{two-digit-year} #Start MetaMap Server
MetaMap Java API
Two jar files contain the API:
✔ /src/javaapi/dist/MetaMapApi.jar
✔ /src/javaapi/dist/prologbeans.jar
Code Time :)
MetaMapApi api = new MetaMapApiImpl("localhost");
List<Result> resultList = api.processCitationsFromFile("Abstract.txt");
Result result = resultList.get(0);
Code Time :)for (Utterance utterance: result.getUtteranceList()) {
System.out.println("Utterance:");
System.out.println(" Id: " + utterance.getId());
System.out.println(" Utterance text: " + utterance.getString());
System.out.println(" Position: " + utterance.getPosition());
Code Time :)
for (PCM pcm: utterance.getPCMList()) {System.out.println("Phrase:");System.out.println(" text: " + pcm.getPhrase().getPhraseText());System.out.println("Candidates:");for (Ev ev: pcm.getCandidateList()) { System.out.println(" Candidate:"); System.out.println(" Score: " + ev.getScore()); System.out.println(" Concept Id: " + ev.getConceptId()); System.out.println(" Concept Name: " + ev.getConceptName()); System.out.println(" Preferred Name: " + ev.getPreferredName()); System.out.println(" Matched Words: " + ev.getMatchedWords()); System.out.println(" Semantic Types: " + ev.getSemanticTypes()); System.out.println(" MatchMap: " + ev.getMatchMap()); System.out.println(" MatchMap alt. repr.: " + ev.getMatchMapList()); System.out.println(" is Head?: " + ev.isHead()); System.out.println(" is Overmatch?: " + ev.isOvermatch()); System.out.println(" Sources: " + ev.getSources()); System.out.println(" Positional Info: " + ev.getPositionalInfo());}
Code Time :)
System.out.println("Mappings:");for (Mapping map: pcm.getMappingList()) { System.out.println(" Map Score: " + map.getScore()); for (Ev mapEv: map.getEvList()) { System.out.println(" Score: " + mapEv.getScore()); System.out.println(" Concept Id: " + mapEv.getConceptId()); System.out.println(" Concept Name: " + mapEv.getConceptName()); System.out.println(" Preferred Name: " + mapEv.getPreferredName()); System.out.println(" Matched Words: " + mapEv.getMatchedWords()); System.out.println(" Semantic Types: " + mapEv.getSemanticTypes()); System.out.println(" MatchMap: " + mapEv.getMatchMap()); System.out.println(" MatchMap alt. repr.: " + mapEv.getMatchMapList()); System.out.println(" is Head?: " + mapEv.isHead()); System.out.println(" is Overmatch?: " + mapEv.isOvermatch()); System.out.println(" Sources: " + mapEv.getSources()); System.out.println(" Positional Info: " + mapEv.getPositionalInfo()); }}}}