Text-based Discovery in Biomedicine The Architecture of the DAD -system

Preview:

DESCRIPTION

Text-based Discovery in Biomedicine The Architecture of the DAD -system. Marc Weeber 1,2 , Henny Klein 1 , Alan R. Aronson 2 , Jim G. Mork 2 , Lolkje T. W. de Jong - van den Berg 1 , Rein Vos 1,3. - PowerPoint PPT Presentation

Citation preview

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

Text-based Discovery in Biomedicine

The Architecture of the DAD-system

Marc Weeber1,2, Henny Klein1,Alan R. Aronson2, Jim G. Mork2,

Lolkje T. W. de Jong - van den Berg1, Rein Vos1,3

1Department of Social Pharmacy and Pharmacoepidemiology, Groningen University Institute for Drug Exploration, The Netherlands

2Lister Hill National Center for Biomedical Communication, National Library of Medicine, Bethesda, MD

3Health Ethics and Philosophy, Faculty of Health Sciences, University of Maastricht, The Netherlands

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

Introduction• Goal:

Finding new biomedical knowledge through the combination of existing knowledge as represented in the medical literature

• Motivation:

Prevention of re-inventing the wheel, re-usage of specific knowledge outside the original domain of discovery

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

Swanson

• AB: Raynaud’s disease is characterized by high blood viscosity and high platelet aggregation

• BC: Fish oil is known to reduce blood viscosity and platelet aggregation

A CB?

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

Vos and Rikken

• Drugs instead of diet factors

• Intermediate (B) terms are adverse drug reactions

• Drug – Adverse drug reactions – Disease: The DAD-system

• Vos (1991) Drugs looking for diseases

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

Existing Techniques

• Swanson & Smalheiser:• Single words/multi word terms• MEDLINE titles• No statistics

• Gordon & Lindsay:• Single words/multi word terms• Information Retrieval statistics• Replication of Swanson’s discoveries

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

New Techniques

• Use of UMLS concepts

• PubMed

• MetaMap: mapping free text (MEDLINE titles and abstracts) to concepts

• Interactive web interface

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

Two-step Approach

• Open discovery, generating a hypothesis

A ??

• Closed discovery, testing a hypothesis

A C?

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

Why UMLS Concepts?

• Use of only biomedically relevant information

• Useful transition from single word to multi word term

• Semantic information (semantic types) for filtering (e.g. select only Disease or Syndrome)

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

DAD-system

Meta-thesaurus

SpecialistLexicon

PubMed SemanticNetwork

MetaMapKS

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

DAD-system

Meta-thesaurus

MySQLDatabase

SpecialistLexicon

PubMed SemanticNetwork

MetaMap

FilterTxt2ConQuery ShowSelect

KS

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

DAD-system

Meta-thesaurus

MySQLDatabase

SpecialistLexicon

PubMed SemanticNetwork

MetaMap

FilterTxt2ConQuery ShowSelect

KS

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A

Open Discovery

•Query (user input):

raynaud’s disease

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A

Open Discovery

•Mapping text to concept through MetaMap:

Raynaud's Disease [Disease or Syndrome]

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A

Open Discovery

•Synonym lookup:

Raynaud's syndrome Raynaud's disease /phenomenon

•Variant generation:

e.g. syndrome / syndromes

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A

Open Discovery

•PubMed query:

raynaud OR raynauds

•Processing: query in titles and abstracts

•Result: 1,246 MEDLINE citations

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A

Open Discovery

•Text to concept mapping of all citations

•Sentences with Raynaud’s disease

•Result: 1,278 UMLS concepts

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A

Open Discovery

•Select functional/physiological concepts•Semantic types in filter:

Body Location or RegionBiologic FunctionCell FunctionPhenomenon or ProcessPhysiologic FunctionTissue

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A B

Open Discovery

•Result: 57 Concepts

•Frequency range:

1- 18

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A B

Open Discovery

•Selected B-concepts:

Plasma Viscosity LevelBlood ViscosityPlatelet AdhesivenessPlatelet AggregationEffects, Blood

Coagulation

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A B

Open Discovery

•Variants:

plasma, plasmasviscosity, viscous,aggregation, aggregations,

aggregatingcoagulation, coagulating

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A B

Open Discovery

•PubMed query:

blood coagulation OR blood viscosity OR plasma viscosity OR platelet

adhesiveness OR platelet aggregation

•Result: 10,611 MEDLINE citations

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A B

Open Discovery

•Concepts in sentences with B-concepts:

7,702

•Concepts not in Raynaud sentences:

6,747

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A B

Open Discovery

•Filter for dietary related concepts

•Semantic types in filter:

VitaminLipidElement, Ion, or Isotope

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A B C

Open Discovery

Eicosapentaenoic AcidFish OilFatty Acids, Omega 3MAXEPAOmega-3

PolyunsaturatedFatty Acid

Cod Liver OilSalmon Oil

•Result: 206 Concepts

•Rank order on relations

•Fish oil related concepts:

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A C

Closed Discovery

Eicosapentaenoic AcidFish OilFatty Acids, Omega 3MAXEPAOmega-3

PolyunsaturatedFatty Acid

Cod Liver OilSalmon Oil

Raynaud’s Disease

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A C

Closed Discovery

1,246 citations1,278

concepts

479 common concepts

463 citations1,795

concepts

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A C

Closed Discovery

Functional / Physiological Filter

45 B-concepts

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A C

Closed Discovery

B

•Known concepts:

Plasma viscosity level

Blood ViscosityPlatelet

AdhesivenessPlatelet AggregationEffects, Blood

Coagulation

•New concepts:

VasodilatationVeins, CapillariesDinoprostoneFibrinolysisDeformabilityRheology

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

Juxtaposition

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

Success / Failure +Simulation of Raynaud’s disease – fish

oil and migraine – magnesium

+Discovery of new therapeutic applications for thalidomide

- Mapping (Mg = milligram / magnesium)

- Association defined by co-occurrence

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

Future

• Better semantic analysis:increase(A,B) and decrease(B,C)

• Better user interface

• More databasese.g. finding genetic bases for diseases

Recommended