57
The UMLS Semantic Network Support for semantic integration and reasoning Workshop UMLS Semantic Network NLM, NIH, Bethesda, 7-8 Apr 2005 Anita Burgun

The UMLS Semantic Network Support for semantic integration and reasoning

  • Upload
    hetal

  • View
    36

  • Download
    0

Embed Size (px)

DESCRIPTION

The UMLS Semantic Network Support for semantic integration and reasoning. Workshop UMLS Semantic Network NLM, NIH, Bethesda, 7-8 Apr 2005 Anita Burgun. Overview. Semantic integration Role of the SN Integration of resources Integration of data Reasoning Reasoning with hierarchies - PowerPoint PPT Presentation

Citation preview

Page 1: The UMLS Semantic Network Support for semantic integration and reasoning

The UMLS Semantic NetworkSupport for semantic integration and

reasoning

Workshop UMLS Semantic Network

NLM, NIH, Bethesda, 7-8 Apr 2005

Anita Burgun

Page 2: The UMLS Semantic Network Support for semantic integration and reasoning

Overview

• Semantic integration– Role of the SN– Integration of resources– Integration of data

• Reasoning– Reasoning with hierarchies– Reasoning with associative relations

• Perspectives• Illustration

– Genes, gene products, diseases– Findings, signs, diseases

Page 3: The UMLS Semantic Network Support for semantic integration and reasoning

Semantic integration

1- Role of ontologies

Page 4: The UMLS Semantic Network Support for semantic integration and reasoning

IntegrationDWH

Patientfiles

External resources

SWISSPROT

MEDLINE

…..

Data Warehouse

Micro-arraydata

GENBANK

Gene instances

Ontologies

Mediation system

GOA

Local res.

Page 5: The UMLS Semantic Network Support for semantic integration and reasoning

Integrating data in the domain of organ failure and transplantation

Local Information Systems

EfGtransplantation

REINEnd stage renal failure

EfG terminology server

dialysis

Page 6: The UMLS Semantic Network Support for semantic integration and reasoning

T1

T2

T3

MAPPING

ONTO-TERM

mappingterm-term

Semantic NetworkMetathesaurus

Page 7: The UMLS Semantic Network Support for semantic integration and reasoning

Semantic integration

2- Resource Integration

Page 8: The UMLS Semantic Network Support for semantic integration and reasoning

IntegrationDWH

Patientfiles

External resources

SWISSPROT

MEDLINE

…..

Data Warehouse

Micro-arraydata

GENBANK

Gene instances

Ontologies

Mediation system

GOA

Local res.

Page 9: The UMLS Semantic Network Support for semantic integration and reasoning

Introduction

• Increasing need for physicians and biologists to access information on the Internet

• Biomedical sources

– Scattered

– Multiple heterogeneity

– Rapid evolution and frequent creation

• Integration

<OrgName> <OrgName_name> <OrgName_name_binomial> <BinomialOrgName> <BinomialOrgName_genus>Homo</BinomialOrgName_genus> <BinomialOrgName_species>sapiens</BinomialOrgName_species> </BinomialOrgName> </OrgName_name_binomial> </OrgName_name> <OrgName_lineage>Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo</OrgName_lineage> <OrgName_gcode>1</OrgName_gcode> <OrgName_mgcode>2</OrgName_mgcode> <OrgName_div>PRI</OrgName_div></OrgName>

<ORGANISM>Homo sapiens</ORGANISM><TAXONOMY>Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.</TAXONOMY>

<OrgName> <OrgName_name> <OrgName_name_binomial> <BinomialOrgName> <BinomialOrgName_genus>Homo</BinomialOrgName_genus> <BinomialOrgName_species>sapiens</BinomialOrgName_species> </BinomialOrgName> </OrgName_name_binomial> </OrgName_name> <OrgName_lineage>Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo</OrgName_lineage> <OrgName_gcode>1</OrgName_gcode> <OrgName_mgcode>2</OrgName_mgcode> <OrgName_div>PRI</OrgName_div></OrgName>

Page 10: The UMLS Semantic Network Support for semantic integration and reasoning

Objectives

• Overall: creating a system– Global access– Homogeneous and up-to-date information

• Specific: acquiring sources schemas– As automatically as possible– Dealing with updates, adding new resources– Generate different paths to access information

Page 11: The UMLS Semantic Network Support for semantic integration and reasoning

Sources schema

• Rarely available or hard to exploit• No existing standard• Identifying the schema of each source by

exploiting its contents– Informs on the type of information present

in the source– Extraction from its Web site

Page 12: The UMLS Semantic Network Support for semantic integration and reasoning
Page 13: The UMLS Semantic Network Support for semantic integration and reasoning

Use of UMLS

• Heterogeneity of schemas

• Need of a common vocabulary: the UMLS

• Example : finding the site of expression of a gene starting from a gene symbol

Page 14: The UMLS Semantic Network Support for semantic integration and reasoning
Page 15: The UMLS Semantic Network Support for semantic integration and reasoning
Page 16: The UMLS Semantic Network Support for semantic integration and reasoning

Results• 279 distinct terms extracted from 11 sources

– 232 found in the UMLS corresponding to 495 MTH concepts

• 318 were correct

• 177 were not

– 47 not found

• Of the 318 MTH concepts, 60 concepts are common to at least 2 distinct extracted terms (158 are specific)

Page 17: The UMLS Semantic Network Support for semantic integration and reasoning

Semantic Type Frequency Extracted term MT Concept

Intellectual Product 33 Other Database Links databases

Qualitative Concept 32 Approved Gene Symbol Approved

Functional Concept 28 BIOCHEMICAL FEATURES Biochemical

Spatial Concept 26 Chromosomal Location Location

Quantitative Concept 20Gross insertions & duplications

Gross

Gene or Genome 17 Related Genes Genes

Nucleic Acid, Nucleoside, or Nucleotide 14Additional Gene cDNA sequence

cDNA

Biologically Active Substance 13 Nucleotide Protein Proteins

Idea or Concept 10 Previously Approved Symbols symbol

Genetic Function 9 GENE FUNCTION Gene function, NOS

Organism Attribute 9 relative length Length

Temporal Concept 9 Aliases ALIAS

Amino Acid, Peptide, or Protein 6Name and origin of the protein

Protein

Indicator, Reagent, or Diagnostic Aid 5 Molecular reagents Reagents

Occupation or Discipline 5 Nomenclature History History <1>

Research Activity 5 CLONING Cloning

Disease or Syndrome 4 Disorders & Mutations Disease

Finding 4 view view

Nucleotide Sequence 4 SNPs Variants SNPs

Occupational Activity 4 Abstract Abstracting

Page 18: The UMLS Semantic Network Support for semantic integration and reasoning

Mapping ULT to the UMLS

• General concepts– Citation -> Organism attribute– Description -> Research activity– Symbol -> Idea or Concept– Name -> Intellectual product– History -> Finding– Matches -> Manufactured object– Link -> Chemical Viewed Structurally– Association -> Mental Process/ Social Behavior

Page 19: The UMLS Semantic Network Support for semantic integration and reasoning

General concepts

General classes« Metaterms »WordNet??

Upper Level Ontology

GeneralOntology

Domain Ontology

Idea or ConceptIntellectual ProductAttributesFunctional/Spatial/Temporal Concept

Page 20: The UMLS Semantic Network Support for semantic integration and reasoning

Semantic integration

3- Data Integration

Page 21: The UMLS Semantic Network Support for semantic integration and reasoning

Functional genomics

• Post genomics• Gene expression, protein function, biological

process, disease• van de Vijver MJ et al. A gene-expression

signature as a predictor of survival in breast cancer. N Engl J Med. 2002 Dec 19; 347(25): 1999-2009.

• Objective : provide « medical » annotation of genes (BioMeKe)

• GeneTraces (Cantor, Lussier)

Page 22: The UMLS Semantic Network Support for semantic integration and reasoning

Gene, gene product, disease

• HUGO : manage heterogeneity of data• Superoxide dismutase 1, soluble/ amyotrophic

lateral sclerosis 1 (adult)• C1420306 SOD1 gene (symbol) gene or Genome• C0669516 SOD1 gene product (symbol) Amino acid,

protein• C0002736 ALS (previous symbol) Disease or

Syndrome

• No relation in MTH

Page 23: The UMLS Semantic Network Support for semantic integration and reasoning

Gene, gene product, disease

• HUGO • Aconitase 1, soluble• C1412126 ACO1 gene (symbol) gene or

Genome• C0378502 ACO1 protein (symbol)/ IRP 1

protein (alias)• Amino acid, protein• OR relation between the two concepts in MTH

Page 24: The UMLS Semantic Network Support for semantic integration and reasoning

Gene, gene product, disease

• HUGO synonymous terms

T1 T3T2

C2

C1 C3

ST1 ST3

ST2

Page 25: The UMLS Semantic Network Support for semantic integration and reasoning

Gene, gene product, disease

Gene or Genome

AA, protein Disease or Syndrome

produces location_of

affectscauses

Page 26: The UMLS Semantic Network Support for semantic integration and reasoning

Reasoning

Page 27: The UMLS Semantic Network Support for semantic integration and reasoning

Reasoning

categorization

SN relations

Page 28: The UMLS Semantic Network Support for semantic integration and reasoning

Reasoning : relations

1- The hierarchy and the economy principle

Page 29: The UMLS Semantic Network Support for semantic integration and reasoning

The economy principle

• R1. Ad hoc precision– The intent is to establish a set of semantic types, which will be useful for a

variety of tasks without introducing undue complexity. The most specific semantic type in the semantic type hierarchy is assigned to the concept.

• R2. No hybrid types– Instead of creating a lattice structure, with hybrid types inheriting from

two supertypes, the SN has a single inheritance tree structure. As a consequence, a Metathesaurus concept inheriting from two STs is assigned to both types.

• R3. No category “other”– Rather than proliferating the number of semantic types to encompass

multiple additional subcategories, concepts that cannot be categorized by any sibling Semantic Type are simply assigned their common supertype.

Page 30: The UMLS Semantic Network Support for semantic integration and reasoning

The economy principle and the theory

• Intensions and extensions– Taxonomies (isa) are systems in which categories

(intensions) are related to one another by means of subordination, or, in class parlance (extensions), systems in which classes are related to one another by means of class inclusion.

• Categories and classes– When a category K has subcategories K1, K2, …. Kn, its

extension, the class CK is the union of the classes for each of its subcategories, i.e. CK1, CK2,……CKn.

Page 31: The UMLS Semantic Network Support for semantic integration and reasoning

Research Device

Manufactured object used primarily in carrying out scientific

research or experimentation

Medical Device

Manufactured object used primarily in the diagnosis, treatment, or prevention of

physiologic or anatomic disorders

Clinical Drug

Pharmaceutical preparation as produced by the manufacturer

Manufactured Object

physical object made by human beings

CMD CRD CCD

CMO

CMD CRD CCD

Categories

Classes

45 inch calibre bulletmagnetic tape, matches, corridor

Page 32: The UMLS Semantic Network Support for semantic integration and reasoning

Reasoning : relations

2- Associative relations

Page 33: The UMLS Semantic Network Support for semantic integration and reasoning

Diseases and Findings

Conceptual entity

Finding

Entity Event

Sign or Symptom Disease or syndrome

Pathologic function

Natural phenomenon or process

Page 34: The UMLS Semantic Network Support for semantic integration and reasoning

Diseases and Findings: SN

Finding

Sign or Symptom Disease or syndrome

Associated_with

Evaluation_of

Manifestation_of

Diagnoses

is_a

Page 35: The UMLS Semantic Network Support for semantic integration and reasoning

Relations SN

• Disease or Syndrome affects Disease or Syndrome • Disease or Syndrome associated_with Disease or Syndrome • Disease or Syndrome co-occurs_with Disease or Syndrome • Disease or Syndrome complicates Disease or Syndrome • Disease or Syndrome degree_of Disease or Syndrome • Disease or Syndrome manifestation_of Disease or Syndrome • Disease or Syndrome occurs_in Disease or Syndrome • Disease or Syndrome precedes Disease or Syndrome • Disease or Syndrome process_of Disease or Syndrome • Disease or Syndrome result_of Disease or Syndrome

Page 36: The UMLS Semantic Network Support for semantic integration and reasoning

Relations in SNOMED CT vs SN• Class ASNCT = SNCT concepts assigned to the Semantic Type A

• Class DISEASESSNCT = SNCT concepts assigned to ‘Diseases or Syndrome’

A

MTH restricted to SNCT

C

B

Page 37: The UMLS Semantic Network Support for semantic integration and reasoning

Relations in SNOMED CT

• MTH restricted to SNOMED CT• Relations whose SAB = SNOMED CT• 2,220,144 relations• 1,392,380 associative relations (including inverse relations)• 113 associative relationships (all have inverse except associated_with)• 18 relationships have less than 100 instances

– Has_time_aspect_of : 1– Has_property : 77

• The most frequent :– Has_onset : 114,173 – has_finding_site : 99,156 – has_method : 70,682

Page 38: The UMLS Semantic Network Support for semantic integration and reasoning

Relations in SNOMED CT• Focus on Diseases and Findings

• Class DISEASESSNCT = SNCT concepts assigned to ‘Disease or Syndrome’

• Class FINDINGSSNCT = SNCT concepts assigned to {‘Finding’ + ‘Sign or Symptom’}

Disease or Sd

MTH restricted to SNCT

Sign or symptom

Finding

Page 39: The UMLS Semantic Network Support for semantic integration and reasoning

Diseases-Diseases relations SNCT

• due_to

• definitional_manifestation_of

• associated_with

• occurs_before

• mapped_to

• has_finding_site

• has_associated_finding

• interprets• has_associated_morphology

Page 40: The UMLS Semantic Network Support for semantic integration and reasoning

Diseases-Diseases relations SNCT/SN

• due_to

• definitional_manifestation_of

• associated_with

• occurs_before

• mapped_to

• has_finding_site

• has_associated_finding

• interprets• has_associated_morphology

• result_of

• manifestation_of

• associated_with

• precedes , occurs_in, complicates?

• co-occurs_with

• degree_of

• process_of

• affects

SNCT SN

Page 41: The UMLS Semantic Network Support for semantic integration and reasoning

Findings-Diseases relations SNCT

• has_associated_finding / associated_finding_of

• has definitional manifestation/ definitional_manifestation_of

• interprets / is_interpreted_by/ has_interpretation

• occurs_after / occurs_before

• mapped_to /mapped from

• has_associated_morphology / associated_morphology_of

• due_to / cause_of

• focus_of

• has_finding_site

• isa / inverse is-a

Page 42: The UMLS Semantic Network Support for semantic integration and reasoning

Diseases-Findings relations SNCT/SN

• has_associated_finding / associated_finding_of

• has definitional manifestation/ definitional_manifestation_of

• interprets / is_interpreted_by/ has_interpretation

• occurs_after / occurs_before • mapped_to /mapped from • has_associated_morphology /

associated_morphology_of • due_to / cause_of • focus_of • has_finding_site • isa / inverse is-a

• associated_with

• manifestation_of

• diagnoses

• evaluation_of

SNCT SN

Page 43: The UMLS Semantic Network Support for semantic integration and reasoning

Diseases and Findings

Finding

Sign or Symptom Disease or syndrome

Associated_with

Evaluation_of

Manifestation_of

Diagnoses

is_a

Is_a5,592 instances

Page 44: The UMLS Semantic Network Support for semantic integration and reasoning

Diseases and Findings

Conceptual entity

Finding

Entity Event

Sign or Symptom Disease or syndrome

Pathologic function

Natural phenomenon or process

Page 45: The UMLS Semantic Network Support for semantic integration and reasoning

Diseases and Findings

Finding

Sign or Symptom

Disease or syndrome

C0000727Abdomen, acute

is_a C1300028Disorder characterizedby pain

Page 46: The UMLS Semantic Network Support for semantic integration and reasoning

Diseases and Findings

Finding

Sign or Symptom Disease or syndrome

C0008767Scar

has_finding_site

C1300028Endometriosisin scar of skin

Page 47: The UMLS Semantic Network Support for semantic integration and reasoning

Diseases and Findings

Conceptual entity

Finding

Entity Event

Sign or Symptom Disease or syndrome

Pathologic function

Natural phenomenon or process

Page 48: The UMLS Semantic Network Support for semantic integration and reasoning

Formal properties

• Guarino, Welty

• Rigidity– property that is essential to all the instances.

Person (+R). Physician (not R).

• Identity– there is a property that is both necessary and sufficient for identifying an

instance. Person (+I)

• Unity– instances are intrinsic wholes. Person (+U).

• Dependence– for all the instances x, necessarily some instance of Z must exist, which is

not a part of x, nor a constituent of x (+D). Food (+D)

Page 49: The UMLS Semantic Network Support for semantic integration and reasoning

Formal properties Rules

• Rules– (not U) cannot subsume (+U)

e.g., Substance cannot subsume Physical Object– […]

• Distinction between roles and sortal types– Roles: (Not Rigid) (+Dependent)– Sortal types : (+Rigid) (Not Dependent)

Page 50: The UMLS Semantic Network Support for semantic integration and reasoning

Formal properties: signs

• Signs or Symptoms are Roles• Metathesaurus concepts that are assigned only to

roles with no sortal Semantic Type represent a numerous set of entities

• About 90% of the MTH concepts assigned to Findings, and Signs or Symptoms are not assigned to another Semantic Type.

Page 51: The UMLS Semantic Network Support for semantic integration and reasoning

Roles vs relations

• Findings?• Sign or Symptom associated_with Disease or Syndrome

• Sign or Symptom diagnoses Disease or Syndrome

• Sign or Symptom evaluation_of Disease or Syndrome

• Sign or Symptom manifestation_of Disease or Syndrome

• Finding associated_with Disease or Syndrome

• Finding evaluation_of Disease or Syndrome

• Finding manifestation_of Disease or Syndrome

Page 52: The UMLS Semantic Network Support for semantic integration and reasoning

Diseases: frames

• Has_location

• Has_lesion : necrosis

• Has_process : infection– (has_agent)

• Has_discriminating_sign_or_finding– hematuria

• Occurs_in

Page 53: The UMLS Semantic Network Support for semantic integration and reasoning

Discussion

Page 54: The UMLS Semantic Network Support for semantic integration and reasoning

Perspectives (1) : coverage

• Extend the SN ???– Economy principle vs adding general concepts

• Resource integration ???– Needs in BIOmedical– Clarify conceptual entities– Semantic Types corresponding to general

entities

Page 55: The UMLS Semantic Network Support for semantic integration and reasoning

Perspectives (2) : compatibility

• Compatibility with general ontologies• Semantic web

• Alignment with existing domain ontologies• FMA (Zhang Medinfo 2004)

• SNOMED CT (Burgun ongoing work on SN relations)

• Rules (classification), consistency SN-MTH• E.g. sign or symptom is-a disease

Page 56: The UMLS Semantic Network Support for semantic integration and reasoning

Perspectives (3): formal aspects

• Formal ontology– Make relations and concepts more explicit, e.g. roles

(ULO), relationships between genes and diseases

– Cohérence, e.g. is-a relations between findings and diseases (studies and processing)

– Classification of new concepts, e.g upper MTH concepts (Bodenreider Medinfo 2004)

– Inference, e.g. use relations between anatomical sites and diseases to suggest new relations between diseases (Burgun submitted AMIA 2005)

Page 57: The UMLS Semantic Network Support for semantic integration and reasoning

References

• Mougin F, Burgun A, Loreal O, Le Beux P. Towards the automatic generation of biomedical sources schema. Medinfo. 2004;2004:783-7.

• Welty C, Guarino N. Supporting Ontological Analysis of Taxonomic Relationships (2001)  Data Knowledge Engineering, http://www.ladseb.pd.cnr.it/infor/Ontology/Papers

• Zhang S, Bodenreider O. Comparing Associative Relationships among Equivalent Concepts Across Ontologies. Medinfo. 2004;2004:459-66.