The UMLS Semantic Network Support for semantic integration and reasoning

Preview:

DESCRIPTION

The UMLS Semantic Network Support for semantic integration and reasoning. Workshop UMLS Semantic Network NLM, NIH, Bethesda, 7-8 Apr 2005 Anita Burgun. Overview. Semantic integration Role of the SN Integration of resources Integration of data Reasoning Reasoning with hierarchies - PowerPoint PPT Presentation

Citation preview

The UMLS Semantic NetworkSupport for semantic integration and

reasoning

Workshop UMLS Semantic Network

NLM, NIH, Bethesda, 7-8 Apr 2005

Anita Burgun

Overview

• Semantic integration– Role of the SN– Integration of resources– Integration of data

• Reasoning– Reasoning with hierarchies– Reasoning with associative relations

• Perspectives• Illustration

– Genes, gene products, diseases– Findings, signs, diseases

Semantic integration

1- Role of ontologies

IntegrationDWH

Patientfiles

External resources

SWISSPROT

MEDLINE

…..

Data Warehouse

Micro-arraydata

GENBANK

Gene instances

Ontologies

Mediation system

GOA

Local res.

Integrating data in the domain of organ failure and transplantation

Local Information Systems

EfGtransplantation

REINEnd stage renal failure

EfG terminology server

dialysis

T1

T2

T3

MAPPING

ONTO-TERM

mappingterm-term

Semantic NetworkMetathesaurus

Semantic integration

2- Resource Integration

IntegrationDWH

Patientfiles

External resources

SWISSPROT

MEDLINE

…..

Data Warehouse

Micro-arraydata

GENBANK

Gene instances

Ontologies

Mediation system

GOA

Local res.

Introduction

• Increasing need for physicians and biologists to access information on the Internet

• Biomedical sources

– Scattered

– Multiple heterogeneity

– Rapid evolution and frequent creation

• Integration

<OrgName> <OrgName_name> <OrgName_name_binomial> <BinomialOrgName> <BinomialOrgName_genus>Homo</BinomialOrgName_genus> <BinomialOrgName_species>sapiens</BinomialOrgName_species> </BinomialOrgName> </OrgName_name_binomial> </OrgName_name> <OrgName_lineage>Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo</OrgName_lineage> <OrgName_gcode>1</OrgName_gcode> <OrgName_mgcode>2</OrgName_mgcode> <OrgName_div>PRI</OrgName_div></OrgName>

<ORGANISM>Homo sapiens</ORGANISM><TAXONOMY>Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.</TAXONOMY>

<OrgName> <OrgName_name> <OrgName_name_binomial> <BinomialOrgName> <BinomialOrgName_genus>Homo</BinomialOrgName_genus> <BinomialOrgName_species>sapiens</BinomialOrgName_species> </BinomialOrgName> </OrgName_name_binomial> </OrgName_name> <OrgName_lineage>Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo</OrgName_lineage> <OrgName_gcode>1</OrgName_gcode> <OrgName_mgcode>2</OrgName_mgcode> <OrgName_div>PRI</OrgName_div></OrgName>

Objectives

• Overall: creating a system– Global access– Homogeneous and up-to-date information

• Specific: acquiring sources schemas– As automatically as possible– Dealing with updates, adding new resources– Generate different paths to access information

Sources schema

• Rarely available or hard to exploit• No existing standard• Identifying the schema of each source by

exploiting its contents– Informs on the type of information present

in the source– Extraction from its Web site

Use of UMLS

• Heterogeneity of schemas

• Need of a common vocabulary: the UMLS

• Example : finding the site of expression of a gene starting from a gene symbol

Results• 279 distinct terms extracted from 11 sources

– 232 found in the UMLS corresponding to 495 MTH concepts

• 318 were correct

• 177 were not

– 47 not found

• Of the 318 MTH concepts, 60 concepts are common to at least 2 distinct extracted terms (158 are specific)

Semantic Type Frequency Extracted term MT Concept

Intellectual Product 33 Other Database Links databases

Qualitative Concept 32 Approved Gene Symbol Approved

Functional Concept 28 BIOCHEMICAL FEATURES Biochemical

Spatial Concept 26 Chromosomal Location Location

Quantitative Concept 20Gross insertions & duplications

Gross

Gene or Genome 17 Related Genes Genes

Nucleic Acid, Nucleoside, or Nucleotide 14Additional Gene cDNA sequence

cDNA

Biologically Active Substance 13 Nucleotide Protein Proteins

Idea or Concept 10 Previously Approved Symbols symbol

Genetic Function 9 GENE FUNCTION Gene function, NOS

Organism Attribute 9 relative length Length

Temporal Concept 9 Aliases ALIAS

Amino Acid, Peptide, or Protein 6Name and origin of the protein

Protein

Indicator, Reagent, or Diagnostic Aid 5 Molecular reagents Reagents

Occupation or Discipline 5 Nomenclature History History <1>

Research Activity 5 CLONING Cloning

Disease or Syndrome 4 Disorders & Mutations Disease

Finding 4 view view

Nucleotide Sequence 4 SNPs Variants SNPs

Occupational Activity 4 Abstract Abstracting

Mapping ULT to the UMLS

• General concepts– Citation -> Organism attribute– Description -> Research activity– Symbol -> Idea or Concept– Name -> Intellectual product– History -> Finding– Matches -> Manufactured object– Link -> Chemical Viewed Structurally– Association -> Mental Process/ Social Behavior

General concepts

General classes« Metaterms »WordNet??

Upper Level Ontology

GeneralOntology

Domain Ontology

Idea or ConceptIntellectual ProductAttributesFunctional/Spatial/Temporal Concept

Semantic integration

3- Data Integration

Functional genomics

• Post genomics• Gene expression, protein function, biological

process, disease• van de Vijver MJ et al. A gene-expression

signature as a predictor of survival in breast cancer. N Engl J Med. 2002 Dec 19; 347(25): 1999-2009.

• Objective : provide « medical » annotation of genes (BioMeKe)

• GeneTraces (Cantor, Lussier)

Gene, gene product, disease

• HUGO : manage heterogeneity of data• Superoxide dismutase 1, soluble/ amyotrophic

lateral sclerosis 1 (adult)• C1420306 SOD1 gene (symbol) gene or Genome• C0669516 SOD1 gene product (symbol) Amino acid,

protein• C0002736 ALS (previous symbol) Disease or

Syndrome

• No relation in MTH

Gene, gene product, disease

• HUGO • Aconitase 1, soluble• C1412126 ACO1 gene (symbol) gene or

Genome• C0378502 ACO1 protein (symbol)/ IRP 1

protein (alias)• Amino acid, protein• OR relation between the two concepts in MTH

Gene, gene product, disease

• HUGO synonymous terms

T1 T3T2

C2

C1 C3

ST1 ST3

ST2

Gene, gene product, disease

Gene or Genome

AA, protein Disease or Syndrome

produces location_of

affectscauses

Reasoning

Reasoning

categorization

SN relations

Reasoning : relations

1- The hierarchy and the economy principle

The economy principle

• R1. Ad hoc precision– The intent is to establish a set of semantic types, which will be useful for a

variety of tasks without introducing undue complexity. The most specific semantic type in the semantic type hierarchy is assigned to the concept.

• R2. No hybrid types– Instead of creating a lattice structure, with hybrid types inheriting from

two supertypes, the SN has a single inheritance tree structure. As a consequence, a Metathesaurus concept inheriting from two STs is assigned to both types.

• R3. No category “other”– Rather than proliferating the number of semantic types to encompass

multiple additional subcategories, concepts that cannot be categorized by any sibling Semantic Type are simply assigned their common supertype.

The economy principle and the theory

• Intensions and extensions– Taxonomies (isa) are systems in which categories

(intensions) are related to one another by means of subordination, or, in class parlance (extensions), systems in which classes are related to one another by means of class inclusion.

• Categories and classes– When a category K has subcategories K1, K2, …. Kn, its

extension, the class CK is the union of the classes for each of its subcategories, i.e. CK1, CK2,……CKn.

Research Device

Manufactured object used primarily in carrying out scientific

research or experimentation

Medical Device

Manufactured object used primarily in the diagnosis, treatment, or prevention of

physiologic or anatomic disorders

Clinical Drug

Pharmaceutical preparation as produced by the manufacturer

Manufactured Object

physical object made by human beings

CMD CRD CCD

CMO

CMD CRD CCD

Categories

Classes

45 inch calibre bulletmagnetic tape, matches, corridor

Reasoning : relations

2- Associative relations

Diseases and Findings

Conceptual entity

Finding

Entity Event

Sign or Symptom Disease or syndrome

Pathologic function

Natural phenomenon or process

Diseases and Findings: SN

Finding

Sign or Symptom Disease or syndrome

Associated_with

Evaluation_of

Manifestation_of

Diagnoses

is_a

Relations SN

• Disease or Syndrome affects Disease or Syndrome • Disease or Syndrome associated_with Disease or Syndrome • Disease or Syndrome co-occurs_with Disease or Syndrome • Disease or Syndrome complicates Disease or Syndrome • Disease or Syndrome degree_of Disease or Syndrome • Disease or Syndrome manifestation_of Disease or Syndrome • Disease or Syndrome occurs_in Disease or Syndrome • Disease or Syndrome precedes Disease or Syndrome • Disease or Syndrome process_of Disease or Syndrome • Disease or Syndrome result_of Disease or Syndrome

Relations in SNOMED CT vs SN• Class ASNCT = SNCT concepts assigned to the Semantic Type A

• Class DISEASESSNCT = SNCT concepts assigned to ‘Diseases or Syndrome’

A

MTH restricted to SNCT

C

B

Relations in SNOMED CT

• MTH restricted to SNOMED CT• Relations whose SAB = SNOMED CT• 2,220,144 relations• 1,392,380 associative relations (including inverse relations)• 113 associative relationships (all have inverse except associated_with)• 18 relationships have less than 100 instances

– Has_time_aspect_of : 1– Has_property : 77

• The most frequent :– Has_onset : 114,173 – has_finding_site : 99,156 – has_method : 70,682

Relations in SNOMED CT• Focus on Diseases and Findings

• Class DISEASESSNCT = SNCT concepts assigned to ‘Disease or Syndrome’

• Class FINDINGSSNCT = SNCT concepts assigned to {‘Finding’ + ‘Sign or Symptom’}

Disease or Sd

MTH restricted to SNCT

Sign or symptom

Finding

Diseases-Diseases relations SNCT

• due_to

• definitional_manifestation_of

• associated_with

• occurs_before

• mapped_to

• has_finding_site

• has_associated_finding

• interprets• has_associated_morphology

Diseases-Diseases relations SNCT/SN

• due_to

• definitional_manifestation_of

• associated_with

• occurs_before

• mapped_to

• has_finding_site

• has_associated_finding

• interprets• has_associated_morphology

• result_of

• manifestation_of

• associated_with

• precedes , occurs_in, complicates?

• co-occurs_with

• degree_of

• process_of

• affects

SNCT SN

Findings-Diseases relations SNCT

• has_associated_finding / associated_finding_of

• has definitional manifestation/ definitional_manifestation_of

• interprets / is_interpreted_by/ has_interpretation

• occurs_after / occurs_before

• mapped_to /mapped from

• has_associated_morphology / associated_morphology_of

• due_to / cause_of

• focus_of

• has_finding_site

• isa / inverse is-a

Diseases-Findings relations SNCT/SN

• has_associated_finding / associated_finding_of

• has definitional manifestation/ definitional_manifestation_of

• interprets / is_interpreted_by/ has_interpretation

• occurs_after / occurs_before • mapped_to /mapped from • has_associated_morphology /

associated_morphology_of • due_to / cause_of • focus_of • has_finding_site • isa / inverse is-a

• associated_with

• manifestation_of

• diagnoses

• evaluation_of

SNCT SN

Diseases and Findings

Finding

Sign or Symptom Disease or syndrome

Associated_with

Evaluation_of

Manifestation_of

Diagnoses

is_a

Is_a5,592 instances

Diseases and Findings

Conceptual entity

Finding

Entity Event

Sign or Symptom Disease or syndrome

Pathologic function

Natural phenomenon or process

Diseases and Findings

Finding

Sign or Symptom

Disease or syndrome

C0000727Abdomen, acute

is_a C1300028Disorder characterizedby pain

Diseases and Findings

Finding

Sign or Symptom Disease or syndrome

C0008767Scar

has_finding_site

C1300028Endometriosisin scar of skin

Diseases and Findings

Conceptual entity

Finding

Entity Event

Sign or Symptom Disease or syndrome

Pathologic function

Natural phenomenon or process

Formal properties

• Guarino, Welty

• Rigidity– property that is essential to all the instances.

Person (+R). Physician (not R).

• Identity– there is a property that is both necessary and sufficient for identifying an

instance. Person (+I)

• Unity– instances are intrinsic wholes. Person (+U).

• Dependence– for all the instances x, necessarily some instance of Z must exist, which is

not a part of x, nor a constituent of x (+D). Food (+D)

Formal properties Rules

• Rules– (not U) cannot subsume (+U)

e.g., Substance cannot subsume Physical Object– […]

• Distinction between roles and sortal types– Roles: (Not Rigid) (+Dependent)– Sortal types : (+Rigid) (Not Dependent)

Formal properties: signs

• Signs or Symptoms are Roles• Metathesaurus concepts that are assigned only to

roles with no sortal Semantic Type represent a numerous set of entities

• About 90% of the MTH concepts assigned to Findings, and Signs or Symptoms are not assigned to another Semantic Type.

Roles vs relations

• Findings?• Sign or Symptom associated_with Disease or Syndrome

• Sign or Symptom diagnoses Disease or Syndrome

• Sign or Symptom evaluation_of Disease or Syndrome

• Sign or Symptom manifestation_of Disease or Syndrome

• Finding associated_with Disease or Syndrome

• Finding evaluation_of Disease or Syndrome

• Finding manifestation_of Disease or Syndrome

Diseases: frames

• Has_location

• Has_lesion : necrosis

• Has_process : infection– (has_agent)

• Has_discriminating_sign_or_finding– hematuria

• Occurs_in

Discussion

Perspectives (1) : coverage

• Extend the SN ???– Economy principle vs adding general concepts

• Resource integration ???– Needs in BIOmedical– Clarify conceptual entities– Semantic Types corresponding to general

entities

Perspectives (2) : compatibility

• Compatibility with general ontologies• Semantic web

• Alignment with existing domain ontologies• FMA (Zhang Medinfo 2004)

• SNOMED CT (Burgun ongoing work on SN relations)

• Rules (classification), consistency SN-MTH• E.g. sign or symptom is-a disease

Perspectives (3): formal aspects

• Formal ontology– Make relations and concepts more explicit, e.g. roles

(ULO), relationships between genes and diseases

– Cohérence, e.g. is-a relations between findings and diseases (studies and processing)

– Classification of new concepts, e.g upper MTH concepts (Bodenreider Medinfo 2004)

– Inference, e.g. use relations between anatomical sites and diseases to suggest new relations between diseases (Burgun submitted AMIA 2005)

References

• Mougin F, Burgun A, Loreal O, Le Beux P. Towards the automatic generation of biomedical sources schema. Medinfo. 2004;2004:783-7.

• Welty C, Guarino N. Supporting Ontological Analysis of Taxonomic Relationships (2001)  Data Knowledge Engineering, http://www.ladseb.pd.cnr.it/infor/Ontology/Papers

• Zhang S, Bodenreider O. Comparing Associative Relationships among Equivalent Concepts Across Ontologies. Medinfo. 2004;2004:459-66.

Recommended