44
Medical WordNet A Proposal Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Medical WordNet A Proposal Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

Embed Size (px)

Citation preview

Medical WordNetA Proposal

Christiane Fellbaum

Princeton University

and

Berlin-Brandenburg Academy of Sciences

The Challenge

Bridge communication gap between lay persons and health care providers

Health Care Providers (HCP or “Experts”)

--Physicians

--Nurses

--Therapists

--on-line medical information systems

Non-Experts

• patients

• family members

• benefit administrators

• lawyers

Modes of communications

• Live Interaction with Patients

• Virtual Interaction

--On-line medical information

Experts, lay persons speak different “dialects”

Characteristics of HCP language

• Ignorance/uncertainty as to non-experts’ lexical and conceptual knowledge

• Same word is used with different meanings by the two populations (word-concept mismatch)

• HCP use technical terms• HCP substitute synonyms from different levels

Characteristics of non-expert language

Idiosyncratic, “unregulated”

--mix of technical and folk terms

--taxonomies are less elaborate, shallower

(fewer intermediate levels of categorial distinctions)

--lay concepts are fuzzy (e.g., flu)

--lay concepts have no (clear) equivalents in medicine

(“Kreislaufprobleme”: “circulatory problems”)

Expert vs. non-expert language in dialogue interaction

• HCP introduce new concepts for which the lay person is unprepared

--go from symptoms to diagnosis, treatments, etc.

• Lay questions are frequently “yes/no” • Expert replies are usually not “yes/no” • Often no opportunities for “repair”

Additional problem with on-line information systems

Trivial linguistic features can have potentially significant consequences

Example: MEDLINEplus

• different results depending on query:

• tremor vs. intentional tremor

• tremble vs. trembling

Linguistic (morphological) differences in the query result in semantically different answers

(our) solution

• Make the HCP “bilingual”

• Enable “translation” between consumer health information systems and laymen

Problems on three levels

• Lexical

• Conceptual

• Propositional (facts, beliefs, hypotheses,...)

Some ground rules for the next 45 mins

Nothing hinges on “concept”

Propose synset:

{concept, universal, idea, type...}

“Truth” applies only to propositions, not

entities

WordNet has “unicorn”, “Mickey Mouse”, etc.

A Linguist’s view

• Concepts/universals are expressed by lexemes (words)

• Words are embedded in contexts and partially derive their meanings from contexts

• Truth of propositions depends partially on their lexical make-up

Goals

• Document medical knowledge that can be understood by average adult health care consumer in the U.S.

• Make existing tools accessible for non-experts

Plan of Attack

• Create lexical database of medical terminology modeled on WordNet, with WN’s potential for NLP

• Lexical (word) information is complemented with definitional sentences, one for experts, one for laymen

• Sentences provide meaningful contexts for terms• 2 Sentential subcorpora: Facts and Beliefs

Some background: WordNet

• Large lexical database for English

• Semantic network? yes

• Thesaurus? yes BUT unlike in Roget’s, WN’s relations are labeled

• Ontology? who knows?

WordNet

• Constructed entirely by hand• Semantic network of 115,000 synonyms

sets (“synsets”)• Example synsets:• {chest, thorax, torso,# body_part,@ the part

of the body below the neck and above the belly; “the victim had a knife stuck in his chest”)}

WordNet synsets

• One or more “cognitively synonymous” lexemes

• Definition (“gloss”)• Examples sentence

• Meronymy, hyponymy relate noun synsets• result: semantic network

WordNet synsets

• Where did the makers of WN get their synonyms, meronyms, etc. from?

• Mid-1980s: no corpora were available• Association norms• Some psycholinguistic testing (sorting

experiments)• Assumption: speakers’ use of words reflects

conceptual organization

WordNet

WordNet’s value for computational linguistics, Natural Language Processing

• Synonyms, related synsets allow searches for semantically related nodes

--E.g., query expansion• Information retrieval, Q-A systems, data mining,...• Inferencing

Two problems: Synonymy and Polysemy

WordNet maps lexemes (words) and concepts (meanings)

Words are labels for concepts that speakers find salient

--Identification of the same concept labelled with different words (synonymy); e.g.

chest, thorax--Disambiguation of polysemous words weak patient vs. weak solution

Synonymy and Polysemy

• Synonymy: membership in the same synset

• Polysemy: number of synsets of which a given string is a member

WordNet

In addition, related words and concepts can be found via the relations among entire synsets

Hyponymy/hyperonymy (super-/subordination)HIV is a kind of virusOne kind of virus is HIV

Meronymy/holonymy (part-whole)occipital bone is part of craniumcranium has an occipital bone

WordNet

Different kinds of hyponymy

Types vs. Instances

Kingdom is a type of countryMonaco is an instance of a kingdom

Lexical semantics in WordNet

The meaning of a word results from its place in the semantic network

WordNet for medical/bioinformatics?

• Synonymy, polysemy are problems here, too

• is WN’s way of mapping words and meanings useful?

WordNet for medical/bioinformatics?

• WN’s was compiled by non-experts

• Medical coverage is sparse and arbitrary

WordNet’s medical coverage

• contains both expert and folk terms (indistinguishable)

• contains archaic terms like unction• no type vs. role (symptom) distinction• e.g., tumors are abnormal but not:some tumors are malignant• No links among entities, properties, processes, states• domain labels (medicine, drugs,..) are assigned

incompletely and inconsistently (no good domain ontology)

• Create lexical database of medical terminology modelled on WordNet (MedWN)

• Info in MedWN can be accessed automatically• Retain WN’s features to make it usable for NLP

Steps to take

• Review, validate, augment WN’s present medical coverage

• Ensure sufficiently high scientific level so that

MedWN can work in tandem with existing terminology banks, ontologies,...

Create subcorpora of sentences

• MedicalFactNet

--sentences rated as correct by medical experts

--sentences express “true” beliefs about medical phenomena

--intelligible to non-experts

Subcorpora of sentences

• MedicalBeliefNet

--sentences rated highly for assent by lay persons

--representative fraction of true and false beliefs about medical phenomena

Constraints on subcorpora

• Complete, grammatical English sentences

• No anaphora (it, then, this): context-free generic sentences

• Statements embed terms in typical, informative contexts

Sources for subcorpora

--sentences generated via WordNet’s relations

--WordNet’s definitions of medical terms

--sentences from online medical services

Sentences from on-line information sources

--fact sheets

--NIAID Health Information Publications

--UK NetDoctor’s Diseases Encyclopedia

Example

NetDoctor text:Hay fever, otherwise known as seasonal

allergenic rhinitis, is an allergic reaction to airborne substances such as pollen....

Created sentences:Hay fever is an allergy.Hay fever is an allergic reactionHay fever is a reaction to pollen...

Second source of sentences

• Derive propositions from WordNet:

Express labeled arcs as proposition

e.g. if x is a hyponym of y“x is a type of y”meronymy: “x is a part of y “

Validation

• Derived sentences are judged by humans• Likert Scale 1-5• Participants assign a score for U

(understanding) to all sentences• Sentences judged to be understandable are

scored further forB (belief) by lay personsC (correctness) by experts

Validation

• Statements receiving a B-score of 4 or higher => MedicalBeliefNet

• Statements receiving a C-score of 4 or higher => MedicalFactNet

Side effects (beneficial) of corpus

• Basis for new NLP applications in the medical domain

• Basis for exploring individual and group differences wrt medical knowledge, vocabulary, reasoning, decision-making

• Use in medical training

Future work

• Scale up coverage• Add relations among events (states, activites) as

expressed by verbs• Current work: explore “function/purpose” relation

among verbs (analogous to roles among entities expressed by nouns)

e.g., to run is to exercise (defeasible) to run is to move (not defeasible)

Future work

• Add relations and modalities (causality, conditionals,..)

--these are more or less explicit in WordNet

• Crosslingual MedWN?

Bootstrap from existing multilingual wordnets?