Download ppt - Medical FactNet

Transcript
Page 1: Medical FactNet

Medical FactNetBarry Smith

University at Buffalo and IFOMIS, Leipzig

Christiane FellbaumPrinceton University and Berlin Academy

Page 2: Medical FactNet

Online-Inquiry to MEDLINEplusQuery text response (with links to documents sorted by the

following keywords)

tremor Tremor, Multiple Sclerosis, Parkinson’s Disease, Degenerative Nerve Diseases, Movement Disorders

intentional tremor Tremor, Multiple Sclerosis, Parkinson’s Disease, Spinal Muscular Atrophy, Degenerative Nerve Diseases

tremble Anxiety, Parkinson’s Disease, Panic Disorder, Caffeine, Tremor

trembling Anxiety, Parkinson’s Disease, Panic Disorder, Phobias, Tremor

right hand trembles Phobias, Anxiety, Infant and Toddler Development, Parkinson’s Disease, Diabetes

right hand trembles when grasping

Infant and Toddler Development, Sports Fitness, Sports Injuries, Diabetes, Rehabilitation

Page 3: Medical FactNet

Online-Inquiry to MEDLINEplusQuery text response (with links to documents sorted by the

following keywords)

tremor Tremor, Multiple Sclerosis, Parkinson’s Disease, Degenerative Nerve Diseases, Movement Disorders

intentional tremor Tremor, Multiple Sclerosis, Parkinson’s Disease, Spinal Muscular Atrophy, Degenerative Nerve Diseases

tremble Anxiety, Parkinson’s Disease, Panic Disorder, Caffeine, Tremor

trembling Anxiety, Parkinson’s Disease, Panic Disorder, Phobias, Tremor

right hand trembles Phobias, Anxiety, Infant and Toddler Development, Parkinson’s Disease, Diabetes

right hand trembles when grasping

Infant and Toddler Development, Sports Fitness, Sports Injuries, Diabetes, Rehabilitation

Page 4: Medical FactNet

A consumer health medical information system must be able to map between expert and non-expert medical vocabulary

GOAL: A unified medical language system for non-expert medical vocabulary

UMLS for dummies

Page 5: Medical FactNet

A New Methodology for the Construction and Validation of Information Resources for Consumer Health

Page 6: Medical FactNet

MWN: SPECIFIC AIMS

to extend and validate WordNet 2.0’s medical coverage in light of recent advances in medical terminology research

focusing initially on the English-language single word expressions used and understood by non-experts

provision of a mapping to UMLS, MeSH, and other expert terminologies

use as interlingua for MWNs in other languages

Page 7: Medical FactNet

WordNet (Miller, Fellbaum)

Large lexical database; ubiquitous tool of NLP

coverage comparable to collegiate dictionary, over 130,000 word forms

40 wordnets in different languages

WordNet: rich medical coverage, but pooly validated and poor formal architecture

How create a validated Medical WordNet (MWN)?

Page 8: Medical FactNet

Building blocks of WordNet = ‘synsets’

= ‘concepts’ in medical terminologyterms in same synset = they are interchangeable

in some sentential contexts without altering truth-value: {car, automobile}, {shut, close}

synsets linked via small number of binary relations: is-a part-of verb entailments: (walk-limp, forget-know).

Page 9: Medical FactNet

Strengths of WordNet 2.0

Open source

Very broad coverage

Is-a / part-of architecture

Tool for automatic sense disambiguation

Page 10: Medical FactNet

13 senses for feel is a verb

experience – She felt resentfulfind – I feel that he doesn't like mefeel – She felt small and insignificant; feel – We felt the effects of inflationfeel – The sheets feel softgrope –He felt for his walletfinger – Feel this soft cloth! explore – He felt his way around the dark room)feel – It feels nice to be home againfeel – He felt the girl in the movie theater)

Page 11: Medical FactNet

Medical senses of ‘feel’palpate – examine a body part by palpation:

The nurse palpated the patient's stomach; The runner felt her pulse.

sense – perceive by a physical sensation, e.g. coming from the skin or muscles: He felt his flesh crawl; She felt the heat when she got out of the car; He feels pain when he puts pressure on his knee.

feel – seem with respect to a given sensation: My cold is gone – I feel fine today; She felt tired after the long hike.

Page 12: Medical FactNet

MWN

many word units are monosemic (clinician, stethoscope)most common words are polysemiclexicon of the order of 4000 word units with some 3,000 distinct word senses.

tested by incorporation in NLP applications used for purposes of information retrieval, machine translation, question-answer systems, text summarization

Page 13: Medical FactNet

How to validate Medical WordNet?How to fix the scope of ‘non-expert’?

Page 14: Medical FactNet

Answer: Medical FactNet (MFN)

a large corpus of natural-language sentences providing medically validated contexts for MWN terms.

pilot corpus: 40,000 sentences

full MFN (for common diseases): ~250,000 sentences

accredited as intelligible by non-experts

and as true by experts

Page 15: Medical FactNet

Medical BeliefNet (MBN)

= totality of sentences about medical phenomena to which non-experts assent

comes for free, given our methodology for creating MFN

Page 16: Medical FactNet

Sources for MFN

1. WordNet glosses and arcs

2. Online health information services targeted to consumers

NetDoctor, MEDLINEplus

(factsheets on common diseases)

Page 17: Medical FactNet

Constructing MBN and MFN

sources (WordNet, MEDLINEplus …)

filtering for intelligibility by non-experts

pool of natural language sentences

filtering for non-expert assent filtering for validation by experts

Medical BeliefNet Medical FactNet?

Page 18: Medical FactNet

MFN: SPECIFIC AIMS

To create a pilot open-source corpus of sentences about medical phenomena in the English language

restricted to natural language

grammatically complete

logically and syntactically simple sentences

rated as understandable by non-expert human subjects in controlled questionnaire-based experiments

Page 19: Medical FactNet

MFN: SPECIFIC AIMS

= sentences must be self-contained

make no reference to any prior context

not contain any proper names, indexical expressions or other linguistic devicesthat need to be interpreted with respect to other sentences.

Page 20: Medical FactNet

Constructing MFN

Sentences in MFN must receive high marks for correctness on being assessed by medical experts.

MFN designed to constitute a representative fraction of the true beliefs about medical phenomena which are intelligible to non-expert English-speakers.

Page 21: Medical FactNet

Constructing MBN

Sentences in MBN must receive high marks for assent on being assessed by non-experts.

MBN designed to constitute a representative fraction of the beliefs about medical phenomena (both true and false beliefs) distributed through the population of English speakers.

Page 22: Medical FactNet

Compiling MFN and MBN in tandem

will allow systematic assessment of the disparity between lay beliefs and vocabulary as concerns medical phenomena and the exactly corresponding expert medical knowledge.

will allow us to establish automatically for any given sub-population which areas its beliefs about medical phenomena differ most significantly from validated medical knowledge

Page 23: Medical FactNet

USES OF MFN

for quality assurance of MWN

to support the population of MWN by yielding new families of words and word senses

medical education

consumer health information

(in conjunction with MBN) allow new sorts of experiments in the linguistics, psychology and anthropology of consumer health

Page 24: Medical FactNet

Evaluation of MFN

measure the benefits it brings when incorporated into an existing on-line consumer health portal based on term-search technology.

test whether exploiting the resources of MFN can lead to improved results in the retrieval of expert information

Page 25: Medical FactNet

Differences between expert and non-expert medical language

mismatch between expert and non-expert language

taxonomies reflecting popular lexicalizations have small coverage relative to technical vocabularies

and shallow hierarchies:

no popular terms linking infectious disease and mumps

Page 26: Medical FactNet

Differences between expert and non-expert medical language

popular medical terms (flu) often fuzzier than technical terms

extension of non-expert term used also by experts sometimes smaller, sometimes larger

hypothesis: with few exceptions the focal meanings coincide in their extensions

Page 27: Medical FactNet

Mismatches in Doctor-Patient CommunicationPractical skills of physician in acquiring and

conveying relevant and reliable information by using non-expert language tailored to individual patient

The physician, too, is a human being, thus ex officio a member of the wider community of non-experts

continues to use non-expert language for everyday purposes

Page 28: Medical FactNet

But there are problems

Page 29: Medical FactNet

Question: My seven-year-old son developed a rash today … a friend of mine had her 10-day-old baby at my home last evening before we were aware of the illness. … I have read that chickenpox is contagious up to two days prior to the actual rash. Is there cause for concern at this point?

Answer: Chickenpox is the common name for varicella infection. ... You are correct in that a person with chickenpox can be contagious for 48 hours before the first vesicle is seen. ... Of concern, though, is the fact that newborns are at higher risk of complications of varicella, including pneumonia. ...There is a very effective means to prevent infection after exposure. A form of antibody to varicella called varicella-zoster immune globulin (VZIG) can be given up to 48 hours after exposure and still prevent disease. ...

(from Slaughter)

Page 30: Medical FactNet

Lexical mismatches

rooted in legal concerns?both primary care physician and online information

system must respond primarily with generic, or case- or context-independent, information

most requests relate to specific and episodic phenomena (occurrences of pain, fever, reactions to drugs, etc.).

Hence focus of MFN on generic sentences = context-independent statements about causality, about types of persons or diseases or about typical or possible courses of a disease.

Page 31: Medical FactNet

MFN

designed to map the generic medical information which non-experts are able to understand

Page 32: Medical FactNet

Corpus- and fact-based approaches to information retrieval

meanings of highly polysemous terms cannot be discriminated without consideration of their contexts.

People do this without apparent difficulties

New NLP methodologies to harness computers to manipulate large text corpora

Train automatic systems on large numbers of semantically annotated sentences, exploit standard pattern-recognition and statistical techniques for purposes of disambiguation.

Page 33: Medical FactNet

Use of WordNet in medical informatics

e.g. as tool for simplifying information extraction from the corpus of MEDLINE abstracts:

by replacing verbs with corresponding synsets and so reducing the number of relations that need to be taken account of in the analysis of texts

Page 34: Medical FactNet

Example: FrameNet

500 Frames, each with a plurality of Frame Elements

Medical Frames:

Addiction, Birth, Biological Urge, Body Mark, Cure, Death, Health Response, Medical Conditions, Medical Instruments, Medical Professional, Medical Specialties and Observable Body Parts.

Page 35: Medical FactNet

Frame: Cure

Frame Elements: alleviate. v, alleviation. n, curable. a, curative. a, curative. n, cure. n, cure. v, ease. v, heal. v, healer. n, incurable. a, palliate. v, palliation. n, palliative. a, palliative. n, rehabilitate. v, rehabilitation. n, rehabilitative. a, remedy. n, resuscitate. v, therapeutic. a, therapist. n, therapy. n, treat. v, treatment. n.

Page 36: Medical FactNet

Example: Penn Proposition Bank

designed as a corpus of coherent texts. The intention is to train an automatic system to ‘learn’ the contexts for words and their context-specific meanings.

corpus characterized by a specific logical (function-argument-based) architecture.

Page 37: Medical FactNet

Both FrameNet and Proposition Bankhave poor medical coverage

Both focus on word usage in general, rather than on domain-specific contexts.

Neither concerned with the questions of factuality or validation of statements

Page 38: Medical FactNet

Example: CYC knowledge base

collection of hundreds of thousands of statements mostly about the external world:

The earth is roundMountains are one kind of landformAlbany is the capital of New York

parcelled into micro-theories

Page 39: Medical FactNet

In contrast to CYC,

(i) MFN focuses on one single (albeit very large) domain

(ii) MFN stores English sentences (CYC is language non-specific);

(iii) MFN discriminates folk beliefs and expert knowledge (designed to be consistent with the body of established science;

(iv) MFN will be publicly available.

Page 40: Medical FactNet

Existing Princeton WordNet 2.0labels 504 word-forms ‘medicine’:infection#1 {(the pathological state resulting

from the invasion of the body by pathogenic microorganisms)}

infection#3 {(the invasion of the body by pathogenic microorganisms and their multiplication which can lead to tissue damage and disease)}

infection#4 {infection, contagion, transmission – (an incident in which an infectious disease is transmitted)}

Page 41: Medical FactNet

Maturation

maturation#2 {growth, growing, maturation, development, ontogeny, ontogenesis – ((biology) the process of an individual organism growing organically; a purely biological unfolding of events involved in an organism changing gradually from a simple to a more complex level; he proposed an indicator of osseous development in children)}

maturation#3 {festering, suppuration, maturation – (the formation of morbific matter in an abscess or a vesicle and the discharge of pus)}

Page 42: Medical FactNet

But it mixes up expert and non-expert vocabulary, both current and medieval:

suppuration#2 {pus, purulence, suppuration, ichor, sanies, festering – (a fluid product of inflammation)}

Page 43: Medical FactNet

And it contains medically relevant errors:snore-sleep linked via verb entailment: “if someone

snores, then he necessarily also sleeps.”

In medicine: quite possible to snore while awake, since snoring implies the respiratory induced vibration of glottal tissues as associated not only (and most usually) with sleep but also with relaxation or obesity.

Methodology for constructing MFN will provide us with a systematic means to detect such errors.

Page 44: Medical FactNet

snore sleep

Constructing MBN will give us the resources to do justice to the reason why such cases were included in the first place:

People can only snore when they are asleep and similar sentences belong precisely to the folk beliefs about medicine which MBN will document

Page 45: Medical FactNet

Extracting sentences from online consumer health information sources

In one experiment sentences were derived by researchers in medical informatics from factsheets on Airborne allergens in NIAID’s Health Information Publications and on Hay fever and perennial allergic rhinitis in the UK NetDoctor’s Diseases Encyclopedia.

Page 46: Medical FactNet

Source (NIAID) OutputThere is no good

way to tell the difference between allergy symptoms of runny nose, coughing, and sneezing and cold symptoms. Allergy symptoms, however, may last longer than cold symptoms.

from NIAID HealthInfo

Allergies have symptoms.

Colds have symptoms.

A runny nose is a symptom of an allergy.

Coughing is a symptom of an allergy.

Sneezing is a symptom of an allergy.

Cold symptoms are similar to allergy symptoms.

A cold is not an allergy.

Allergy symptoms may last longer than cold symptoms.

Page 47: Medical FactNet

Output sentences

use simple syntax and draw on natural-language terms used in original sources

Sentences containing anaphora, instructions, warnings, … are replaced by complete statements constructed via simple syntactic modifications – or ignored.

Page 48: Medical FactNet

Output Sentences

1644 sentences produced (= 20 person hours of effort)500 sentences were subjected to a preliminary evaluation by pairs of medical students (on a score of 1-5 …)

58% were rated by with a score of 2 x 5

but: measures for inter-rater agreement too low for these results to be statistically significant.

Page 49: Medical FactNet

Validation methods

sources

A: filtering for intelligibility by non-experts

pool

B: filtering for non-expert assent C: filtering for validation by

experts

Medical BeliefNet Medical FactNet

Page 50: Medical FactNet

Validation methods

sources

filtering for intelligibility by non-experts

pool

filtering for non-expert assent filtering for validation by experts

Page 51: Medical FactNet

This will provide an empirical delineation of the scope of ‘natural language’ (non-expert language)

Natural language = language (typical) non-experts (think they) can understand

Does ‘depillation’ belong to natural language? ‘suppuration’? ‘auto-immune’? ‘tomograph’? ‘hypertension’? ‘radiologist’?

Page 52: Medical FactNet

Method

400 x 250 statements will be rated for understandability by two participants, making for a total of 200,000 ratings in response to the question: on a scale from 1-5, would you describe this sentence as hard to understand or easy to understand?

Raters will be encouraged not to reflect on successive statements

Only those statements which receive a score of at least 4 from each of 2 subjects will pass on to the pool

Page 53: Medical FactNet

Validation methods

sources

filtering for intelligibility by non-experts

pool

filtering for non-expert assent filtering for validation by

experts

Page 54: Medical FactNet

Method

Collections of 200 statements from the pool will be rated for assent by each of 250 participants.

on a scale from 1-5, would you describe this sentence with the words do not agree at all … agree completely?

Raters will be encouraged to reflect upon their answers if necessary

Statements receiving a score of at least 4 from each of two raters will be stored as components of Medical BeliefNet (MBN).

Page 55: Medical FactNet

Validation methods

sources

filtering for intelligibility by non-experts

pool

filtering for non-expert assent filtering for validation by

experts

Page 56: Medical FactNet

Method

Raters, selected from medical faculty and advanced medical students, will be subject to a pre-evaluation as follows. A set of 40 sentences in the pool will be validated as true or false by the relevant specialists Only those candidate participants with very high scores in matching these validations will be selected to serve as raters in the validations of sentences for MFN.

Page 57: Medical FactNet

Method

Rating for MFN will involve no time constraints raters will be encouraged to use reference works On a scale from 1-5, how strongly do you believe

this statement? Only sentences receiveing scores of 5 from each

of two raters will be added to the MFN database. Thus in relation to those sentences which receive

a score of less than 5, raters will be encouraged to propose alternative statements, which will be used as new input to the non-expert phase for assessment.

Page 58: Medical FactNet

Training of expert raters for MFNwill include e.g. guidance as to the treatment of statements which relate only to what holds for the most part or in most cases.

people with a cold sometimes sneeze

could mean either: not all people with a cold sneeze, contradicting the fact that sneezing is a mandatory symptom for a cold,

or all people with a cold sneeze, but not all the time, which would be rated as correct.

Page 59: Medical FactNet

Evaluation of MWN and MFN

users of a consumer health information portal will be randomly assigned to one of four groups: 1. access to the unsupplemented portal; 2. access also to MWN, 3. access to MFN, 4. access to both MWN and MFN

then apply Saracevic Kantor method for evaluating user satisfaction with internet query services

Page 60: Medical FactNet

Future work

application of MBN/MFN methodology to evaluate the reliability of the medical knowledge of different non-expert communities

by preserving data pertaining to the sources of entries in MBN it will be possible to keep track of specific kinds of false beliefs as originating in specific kinds of informants. This may prove a valuable source of information in targeting specific groups for specific types of remedial medical education.

Page 61: Medical FactNet

Future workexperiments in the tradition of E. Rosch to investigate how the domain of medical phenomena is conceptualized by non-expert human subjects ()

Basic level words: tomato, cabbage vs.bean vs. vegetable (too general) / cherry tomato

(too specific)

what is the basic level of lexical specification in the domain of medical phenomena?what are the basic kinds in the ontology of medicine of natural-language-using subjects?

Page 62: Medical FactNet

Different roles of MFN and MBN

MFN associated with constructing practical tools designed to assist users in coming to believe what is true

MBN associated with researchregarding what people believe about medical phenomena.

Page 63: Medical FactNet

Towards a comprehensive assay of consumer health knowledge

Ultimate goal: to document in an ontologically coherent fashion the entirety of the medical knowledge that is capable of being understood by average adult consumers of healthcare services in the United States today.

Page 64: Medical FactNet

Just as English WordNet

serves as an interlingual index between wordnets in different languages,

so MWN and MFN can function as an inter-ontology index between different expert factnets prepared for different parts of technical biomedical knowledge

NLM goal of expert medical factnet

Page 65: Medical FactNet

ARistOTLEAggregative Realist Ontology of Total

Language

Page 66: Medical FactNet

The End


Recommended