Medical FactNet

  • Published on

  • View

  • Download

Embed Size (px)


Medical FactNet. Barry Smith University at Buffalo and IFOMIS, Leipzig Christiane Fellbaum Princeton University and Berlin Academy. Online-Inquiry to MEDLINEplus. Online-Inquiry to MEDLINEplus. - PowerPoint PPT Presentation


  • Medical FactNetBarry Smith University at Buffalo and IFOMIS, Leipzig

    Christiane FellbaumPrinceton University and Berlin Academy

  • Online-Inquiry to MEDLINEplus

    Query textresponse (with links to documents sorted by the following keywords)tremorTremor, Multiple Sclerosis, Parkinsons Disease, Degenerative Nerve Diseases, Movement Disordersintentional tremorTremor, Multiple Sclerosis, Parkinsons Disease, Spinal Muscular Atrophy, Degenerative Nerve DiseasestrembleAnxiety, Parkinsons Disease, Panic Disorder, Caffeine, TremortremblingAnxiety, Parkinsons Disease, Panic Disorder, Phobias, Tremorright hand tremblesPhobias, Anxiety, Infant and Toddler Development, Parkinsons Disease, Diabetesright hand trembles when graspingInfant and Toddler Development, Sports Fitness, Sports Injuries, Diabetes, Rehabilitation

  • Online-Inquiry to MEDLINEplus

    Query textresponse (with links to documents sorted by the following keywords)tremorTremor, Multiple Sclerosis, Parkinsons Disease, Degenerative Nerve Diseases, Movement Disordersintentional tremorTremor, Multiple Sclerosis, Parkinsons Disease, Spinal Muscular Atrophy, Degenerative Nerve DiseasestrembleAnxiety, Parkinsons Disease, Panic Disorder, Caffeine, TremortremblingAnxiety, Parkinsons Disease, Panic Disorder, Phobias, Tremorright hand tremblesPhobias, Anxiety, Infant and Toddler Development, Parkinsons Disease, Diabetesright hand trembles when graspingInfant and Toddler Development, Sports Fitness, Sports Injuries, Diabetes, Rehabilitation

  • A consumer health medical information system must be able to map between expert and non-expert medical vocabulary GOAL: A unified medical language system for non-expert medical vocabulary UMLS for dummies

  • A New Methodology for the Construction and Validation of Information Resources for Consumer Health

  • MWN: SPECIFIC AIMS to extend and validate WordNet 2.0s medical coverage in light of recent advances in medical terminology researchfocusing initially on the English-language single word expressions used and understood by non-expertsprovision of a mapping to UMLS, MeSH, and other expert terminologiesuse as interlingua for MWNs in other languages

  • WordNet (Miller, Fellbaum)Large lexical database; ubiquitous tool of NLPcoverage comparable to collegiate dictionary, over 130,000 word forms40 wordnets in different languagesWordNet: rich medical coverage, but pooly validated and poor formal architectureHow create a validated Medical WordNet (MWN)?

  • Building blocks of WordNet = synsets = concepts in medical terminologyterms in same synset = they are interchangeable in some sentential contexts without altering truth-value: {car, automobile}, {shut, close}

    synsets linked via small number of binary relations: is-a part-of verb entailments: (walk-limp, forget-know).

  • Strengths of WordNet 2.0Open sourceVery broad coverageIs-a / part-of architectureTool for automatic sense disambiguation

  • 13 senses for feel is a verbexperience She felt resentfulfind I feel that he doesn't like mefeel She felt small and insignificant; feel We felt the effects of inflationfeel The sheets feel softgrope He felt for his walletfinger Feel this soft cloth! explore He felt his way around the dark room)feel It feels nice to be home againfeel He felt the girl in the movie theater)

  • Medical senses of feelpalpate examine a body part by palpation: The nurse palpated the patient's stomach; The runner felt her pulse.sense perceive by a physical sensation, e.g. coming from the skin or muscles: He felt his flesh crawl; She felt the heat when she got out of the car; He feels pain when he puts pressure on his knee.feel seem with respect to a given sensation: My cold is gone I feel fine today; She felt tired after the long hike.

  • MWNmany word units are monosemic (clinician, stethoscope)most common words are polysemiclexicon of the order of 4000 word units with some 3,000 distinct word senses. tested by incorporation in NLP applications used for purposes of information retrieval, machine translation, question-answer systems, text summarization

  • How to validate Medical WordNet?How to fix the scope of non-expert?

  • Answer: Medical FactNet (MFN)a large corpus of natural-language sentences providing medically validated contexts for MWN terms. pilot corpus: 40,000 sentencesfull MFN (for common diseases): ~250,000 sentences accredited as intelligible by non-expertsand as true by experts

  • Medical BeliefNet (MBN)= totality of sentences about medical phenomena to which non-experts assentcomes for free, given our methodology for creating MFN

  • Sources for MFNWordNet glosses and arcsOnline health information services targeted to consumersNetDoctor, MEDLINEplus(factsheets on common diseases)

  • Constructing MBN and MFN sources (WordNet, MEDLINEplus )

    filtering for intelligibility by non-experts pool of natural language sentences

    filtering for non-expert assent filtering for validation by experts


  • MFN: SPECIFIC AIMSTo create a pilot open-source corpus of sentences about medical phenomena in the English language restricted to natural language grammatically completelogically and syntactically simple sentences rated as understandable by non-expert human subjects in controlled questionnaire-based experiments

  • MFN: SPECIFIC AIMS= sentences must be self-contained make no reference to any prior context not contain any proper names, indexical expressions or other linguistic devicesthat need to be interpreted with respect to other sentences.

  • Constructing MFNSentences in MFN must receive high marks for correctness on being assessed by medical experts. MFN designed to constitute a representative fraction of the true beliefs about medical phenomena which are intelligible to non-expert English-speakers.

  • Constructing MBNSentences in MBN must receive high marks for assent on being assessed by non-experts.MBN designed to constitute a representative fraction of the beliefs about medical phenomena (both true and false beliefs) distributed through the population of English speakers.

  • Compiling MFN and MBN in tandem will allow systematic assessment of the disparity between lay beliefs and vocabulary as concerns medical phenomena and the exactly corresponding expert medical knowledge.

    will allow us to establish automatically for any given sub-population which areas its beliefs about medical phenomena differ most significantly from validated medical knowledge

  • USES OF MFNfor quality assurance of MWNto support the population of MWN by yielding new families of words and word sensesmedical educationconsumer health information(in conjunction with MBN) allow new sorts of experiments in the linguistics, psychology and anthropology of consumer health

  • Evaluation of MFNmeasure the benefits it brings when incorporated into an existing on-line consumer health portal based on term-search technology. test whether exploiting the resources of MFN can lead to improved results in the retrieval of expert information

  • Differences between expert and non-expert medical languagemismatch between expert and non-expert language taxonomies reflecting popular lexicalizations have small coverage relative to technical vocabulariesand shallow hierarchies:no popular terms linking infectious disease and mumps

  • Differences between expert and non-expert medical languagepopular medical terms (flu) often fuzzier than technical terms extension of non-expert term used also by experts sometimes smaller, sometimes largerhypothesis: with few exceptions the focal meanings coincide in their extensions

  • Mismatches in Doctor-Patient CommunicationPractical skills of physician in acquiring and conveying relevant and reliable information by using non-expert language tailored to individual patient The physician, too, is a human being, thus ex officio a member of the wider community of non-experts continues to use non-expert language for everyday purposes

  • But there are problems

  • Question: My seven-year-old son developed a rash today a friend of mine had her 10-day-old baby at my home last evening before we were aware of the illness. I have read that chickenpox is contagious up to two days prior to the actual rash. Is there cause for concern at this point?

    Answer: Chickenpox is the common name for varicella infection. ... You are correct in that a person with chickenpox can be contagious for 48 hours before the first vesicle is seen. ... Of concern, though, is the fact that newborns are at higher risk of complications of varicella, including pneumonia. ...There is a very effective means to prevent infection after exposure. A form of antibody to varicella called varicella-zoster immune globulin (VZIG) can be given up to 48 hours after exposure and still prevent disease. ... (from Slaughter)

  • Lexical mismatchesrooted in legal concerns?both primary care physician and online information system must respond primarily with generic, or case- or context-independent, information most requests relate to specific and episodic phenomena (occurrences of pain, fever, reactions to drugs, etc.). Hence focus of MFN on generic sentences = context-independent statements about causality, about types of persons or diseases or about typical or possible courses of a disease.

  • MFNdesigned to map the generic medical information which non-experts are able to understand

  • Corpus- and fact-based approaches to information retrievalmeanings of highly polysemous terms cannot be discriminated without consideration of their contexts. People do this without apparent difficultiesNew NLP methodologies to harness computers to manipulate large text corporaTrain automatic systems on large numbers of semantically annotated sentences, exploit standard pattern-recognition and statistical techniques for purposes of disambiguation.

  • Use of WordNet in medical informaticse.g. as tool for simplifying information extraction from the corpus of MEDLINE abstracts: by replacing verbs with corresponding synsets and so reducing the number of relations that need to be taken account of in the analysis of texts

  • Example: FrameNet 500 Frames, each with a plurality of Frame ElementsMedical Frames:Addiction, Birth, Biological Urge, Body Mark, Cure, Death, Health Response, Medical Conditions, Medical Instruments, Medical Professional, Medical Specialties and Observable Body Parts.

  • Frame: Cure Frame Elements: alleviate. v, alleviation. n, curable. a, curative. a, curative. n, cure. n, cure. v, ease. v, heal. v, healer. n, incurable. a, palliate. v, palliation. n, palliative. a, palliative. n, rehabilitate. v, rehabilitation. n, rehabilitative. a, remedy. n, resuscitate. v, therapeutic. a, therapist. n, therapy. n, treat. v, treatment. n.

  • Example: Penn Proposition Bankdesigned as a corpus of coherent texts. The intention is to train an automatic system to learn the contexts for words and their context-specific meanings. corpus characterized by a specific logical (function-argument-based) architecture.

  • Both FrameNet and Proposition Bankhave poor medical coverageBoth focus on word usage in general, rather than on domain-specific contexts. Neither concerned with the questions of factuality or validation of statements

  • Example: CYC knowledge base collection of hundreds of thousands of statements mostly about the external world: The earth is roundMountains are one kind of landformAlbany is the capital of New Yorkparcelled into micro-theories

  • In contrast to CYC, MFN focuses on one single (albeit very large) domainMFN stores English sentences (CYC is language non-specific); MFN discriminates folk beliefs and expert knowledge (designed to be consistent with the body of established science; MFN will be publicly available.

  • Existing Princeton WordNet 2.0labels 504 word-forms medicine:infection#1 {(the pathological state resulting from the invasion of the body by pathogenic microorganisms)}infection#3 {(the invasion of the body by pathogenic microorganisms and their multiplication which can lead to tissue damage and disease)}infection#4 {infection, contagion, transmission (an incident in which an infectious disease is transmitted)}

  • Maturationmaturation#2 {growth, growing, maturation, development, ontogeny, ontogenesis ((biology) the process of an individual organism growing organically; a purely biological unfolding of events involved in an organism changing gradually from a simple to a more complex level; he proposed an indicator of osseous development in children)}maturation#3 {festering, suppuration, maturation (the formation of morbific matter in an abscess or a vesicle and the discharge of pus)}

  • But it mixes up expert and non-expert vocabulary, both current and medieval:

    suppuration#2 {pus, purulence, suppuration, ichor, sanies, festering (a fluid product of inflammation)}

  • And it contains medically relevant errors:snore-sleep linked via verb entailment: if someone snores, then he necessarily also sleeps. In medicine: quite possible to snore while awake, since snoring implies the respiratory induced vibration of glottal tissues as associated not only (and most usually) with sleep but also with relaxation or obesity. Methodology for constructing MFN will provide us with a systematic means to detect such errors.

  • snore sleepConstructing MBN will give us the resources to do justice to the reason why such cases were included in the first place: People can only snore when they are asleep and similar sentences belong precisely to the folk beliefs about medicine which MBN will document

  • Extracting sentences from online consumer health information sourcesIn one experiment sentences were derived by researchers in medical informatics from factsheets on Airborne allergens in NIAIDs Health Information Publications and on Hay fever and perennial allergic rhinitis in the UK NetDoctors Diseases Encyclopedia.

  • Source (NIAID)OutputThere is no good way to tell the difference between allergy symptoms of runny nose, coughing, and sneezing and cold symptoms. Allergy symptoms, however, may last longer than cold symptoms. from NIAID HealthInfoAllergies have symptoms.Colds have symptoms.A runny nose is a symptom of an allergy.Coughing is a symptom of an allergy.Sneezing is a symptom of an allergy.Cold symptoms are similar to allergy symptoms.A cold is not an allergy.Allergy symptoms may last longer than cold symptoms.

  • Output sentencesuse simple syntax and draw on natural-language terms used in original sourcesSentences containing anaphora, instructions, warnings, are replaced by complete statements constructed via simple syntactic modifications or ignored.

  • Output Sentences 1644 sentences produced (= 20 person hours of effort) 500 sentences were subjected to a preliminary evaluation by pairs of medical students (on a score of 1-5 )58% were rated by with a score of 2 x 5but: measures for inter-rater agreement too low for these results to be statistically significant.

  • Validation methodssources A: filtering for intelligibility by non-experts pool

    B: filtering for non-expert assent C: filtering for validation by...