22
An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory www.intelligence.tuc.gr Technical University of Crete (TUC) Chania, Crete, Greece

An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory

Embed Size (px)

Citation preview

Page 1: An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory

An Automatic Retrieval System for Expert and

Consumer UsersRena Peraki,Euripides G.M. PetrakisAngelos Hliaoutakis

Intelligent Systems Laboratory www.intelligence.tuc.grTechnical University of Crete (TUC)Chania, Crete, Greece

Page 2: An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory

BIBE 2012, Larnaca, Cyprus2

Problem Definition

• Medical information systems are designed for experts !– Use complex terms in their searches– Domain specific answers

• Must also serve naive consumers – Do simple searches using natural language

terms– Easy to read and comprehend information

• Investigate methods for the categorization of information by user profile

Page 3: An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory

BIBE 2012, Larnaca, Cyprus

3

Current Practices

• MedScape, Medlineplus, MedHunt rely on the manual translation and categorization of information for consumers – Slow, does not scale-up for large collections

• In MEDLINE of U.S. NLM, documents are indexed by experts and for experts only – No categorization by user user profile– 10-12 MeSH terms per document (pathology,

disease, treatment, drugs etc)– Over 15 million documents - Slow !!– Need to automate this process

Page 4: An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory

BIBE 2012, Larnaca, Cyprus4

Objectives

• Investigate methods for automatic document indexing in MEDLINE

• These index terms are subsequently used for filtering documents by user profile

• Main Idea: categorization of terms to simple terms comprehendible by consumers or more involved terms suitable for experts

Page 5: An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory

BIBE 2012, Larnaca, Cyprus5

Resources

• Automatic indexing in MEDLINE:– MMTx [U.S. NLM]: MMTx focus on UMLS

rather than MeSH– AMTEx [DKE, 2009]: MeSH terms, faster and

more accurate than MMTx

• Dictionaries for biomedical and health related concepts– UMLS Metathesaurus, MeSH

• Dictionaries for general English words– WordNet, Specialist

Page 6: An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory

BIBE 2012, Larnaca, Cyprus6

MMTx (MetaMap Transfer)

• Developed by U.S. NLM• Maps text to UMLS Metathesaurus

concepts– but MEDLINE indexing is based on MeSH– MeSH is a subset of Metathesaurus

Suffers from term overgeneration Unrelated terms added to the final candidate list

The list must be cleaned-up to keep only MeSH terms

Topic drift

Page 7: An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory

BIBE 2012, Larnaca, Cyprus7

The AMTEx method [DKE 2009]

• Main idea:

Initial term extraction based on a hybrid linguistic/statistical approach, the C/NC value

Extracts general single and multi-word terms (noun phrases)

Mainly multi-word terms: “heart disease”, “coronary artery disease”

Extracted terms are validated against MeSH

Faster, improved precision by merely a fifth of term output of MMTx

Page 8: An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory

BIBE 2012, Larnaca, Cyprus8

ExampleInput: Full text article

MEDLINE index terms: “Aged”, “Data Collection”, “Humans”,“Knee”, “Middle Aged”, “Osteoarthritis, Knee/complications”, “Osteoarthritis, Knee/diagnosis”, “Pain/classification”, “Pain/etiology”, “Prospective Studies”, “Research Support, Non-U.S. Gov’t”

MMTx terms: “osteoarthritis knee”, “retention”, “peat”, “rheumatology”, “acetylcholine”, “lysine acetate”, “potassium acetate”, “questionnaires”, “target population”, “population”, “selection bias”, “creativeness”, “reproduction”, “cohort studies”, “europe”, “couples”, “naloxone”, “sample size”, “arthritis”, “data collection”, “mail” ‘health status”, “respondents”, “ontario”, “universities”, “dna”, “baseline survey”, “medical records”, “informatics”, “general practitioners”, “gender”, “beliefs”, “logistic regression”, “female”, “marital status”, “employment status”, “comprehension”, “surveys”, “age distribution”, “manual”, “occupations”, “manuals”, “persons”, “females”, “minor”, “minority groups”, “incentives”, “business”, “ability”, “comparative study”, “odds ratio”, “biomedical research”, “pubmed”, “copyright”, “coding”, “longitudinal studies”, “immunoelectrophoresis”, “skin diseases”, “government”, “norepinephrine”, “social sciences”, “survey methods”, “tyrosine”, “new zealand”, “azauridine”, “gold”, “nonrespondents”, “cycloheximide”, “rheum”, “jordan”, “cadmium”, “radiopharmaceuticals”, “community”, “disease progression”, “history”

AMTEx terms: “health surveys”, “pain”, “review publication type”, “data collection”, “osteoarthritis knee”, “knee”, “science”, “health services needs and demand”, “population”, “research”, “questionnaires”, “informatics”, “health”

Page 9: An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory

BIBE 2012, Larnaca, Cyprus9

Term & Document Categorization

Page 10: An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory

BIBE 2012, Larnaca, Cyprus10

New Vocabularies

• Vocabulary of General Terms (VGT): 105.675 general (WordNet) terms

• Vocabulary of Consumer Terms (VCT): 7,165 consumer (MeSH) terms.

• Vocabulary of Expert Terms (VET): 16,719 consumer (MeSH) terms

(MeSH) - (WordNet)=VGT

(MeSH) (WordNet)=VCT

(WordNet) - (MeSH)=VET

Page 11: An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory

BIBE 2012, Larnaca, Cyprus11

Document Categorization

• Documents are represented by vectors of terms extracted by AMTEx, MMTx or assigned by human experts

• The more VET (VCT) terms a document contains the higher its probability to be suitable for experts (consumers)– E.g., a document with VET% = 0.62 has 62%

probability to be one suitable for experts

Page 12: An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory

BIBE 2012, Larnaca, Cyprus12

Evaluation

• Precision and Recall measures: a good method has high values of both

• Datasets: OHSUMED: 348,566 MEDLINE abstracts that come with 64 queries and their relevant answers

• Ground truth: the set of MeSH index terms assigned to documents by experts

Page 13: An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory

BIBE 2012, Larnaca, Cyprus13

AMTEx vs MMTx

• AMTEx: faster, improved precision by merely a fifth of term output of MMTx

Data Set MethodNumber of Terms

Precision RecallTime

(hours)

OHSUMEDAMTEX MMTX

840

0.1250.089

0.1010.336

7.38314.516

PMCAMTEX

MMTX

2572

0.0340.033

0.0620.162

1.3872.727

Page 14: An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory

BIBE 2012, Larnaca, Cyprus14

Categorization by User Profile

• How good is the method in retrieving answers for consumers and experts ?

• We run retrievals for consumers & experts– 15 out of the 64 queries contain no expert

terms and are suitable for consumers– The remaining queries are suitable for experts– Documents are represented by document

vectors of MeSH, MMTx, or AMTEx terms– The retrieval method is Vector Space Model– The document similarity score of VSM is

multiplied by its respective VET or VCT score

Page 15: An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory

BIBE 2012, Larnaca, Cyprus15

Consumers Retrieval Task

Page 16: An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory

BIBE 2012, Larnaca, Cyprus16

Experts Retrieval Task

Page 17: An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory

BIBE 2012, Larnaca, Cyprus17

Results Analysis

• The results indicate – A tendency of human experts to assign simple

terms to documents and – Selective ability of AMTEx in extracting

complex terms suitable for experts

Page 18: An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory

18

Conclusions & Future Work

• We investigate methods:– Automatic document indexing – Categorization by user profile

• AMTEx is well suited for both problems

• Future work: more elaborate document categorization methods (machine learning, fuzzy)

• More term and document categories – According to UMLS SN (pathology, treatment)– User categories (e.g., specialty)

BIBE 2012, Larnaca, Cyprus

Page 19: An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory

BIBE 2012, Larnaca, Cyprus19

Questions and answers

Page 20: An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory

BIBE 2012, Larnaca, Cyprus

20

ΑΜΤΕx OutlineClick icon to add SmartArt graphic

INPUT:Document Collection

INPUT:Document Collection C/NC value

Multi-word Term Extraction& Term Ranking

C/NC valueMulti-word Term Extraction

& Term Ranking

MeSHTerm Validation

MeSHTerm Validation

Single-word Term ExtractionNon-MeSH multi-word are broken down & validated against MeSH

Single-word Term ExtractionNon-MeSH multi-word are broken down & validated against MeSH

Variant GenerationVariant Generation Term Expansion(MeSH)

Term Expansion(MeSH)

MeSHThesaurusResource

MeSHThesaurusResource

OUTPUT:MeSH

Term Lists

OUTPUT:MeSH

Term Lists

Page 21: An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory

BIBE 2012, Larnaca, Cyprus21

MeSH: Medical Subject Headings

The NLM medical & biological terms thesaurus:

• Organized in IS-A hierarchies – more than 15 taxonomies & more than 22,000 terms– a term may appear in multiple taxonomies

• No PART-OF relationships

• Terms organized into synonym sets called entry terms, including stemmed term forms

Page 22: An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory

22

Fragment of the MeSH IS-A Hierarchy

BIBE 2012, Larnaca, Cyprus

neuralgia

Root

Nervous systemdiseases

Neurologicmanifestations

pain

headache

Cranial nervediseases

Facialneuralgia