Upload
dothuy
View
222
Download
1
Embed Size (px)
Citation preview
UMLS and semantic integration
Olivier Bodenreider
Lister Hill National Centerfor Biomedical Communications
Bethesda, Maryland - USA
Semantic Data Integration Workshop Amsterdam, The Netherlands
May 18, 2009
Lister Hill National Center for Biomedical Communications 2
Outline
Unified Medical Language System overviewUMLS MetathesaurusUMLS Semantic Network
Data integration questions
Lister Hill National Center for Biomedical Communications 3
Uses of biomedical ontologies
Knowledge managementAnnotating data and resourcesAccessing biomedical informationMapping across biomedical ontologies
Data integration, exchange and semantic interoperabilityDecision support
Data selection and aggregationDecision supportNLP applicationsKnowledge discovery
[Bodenreider, YBMI 2008]
Lister Hill National Center for Biomedical Communications 5
Motivation
Started in 1986National Library of Medicine“Long-term R&D project”
«[…] the UMLS project is an effort to overcome two significant barriers to effective retrieval of machine-readable information.
• The first is the variety of ways the same concepts are expressed in different machine-readable sources and by different people.
• The second is the distribution of useful information among many disparate databases and systems.»
Lister Hill National Center for Biomedical Communications 6
The UMLS in practice
DatabaseSeries of relational files
InterfacesWeb interface: Knowledge Source Server (UMLSKS)Application programming interfaces(Java and XML-based)
Applicationslvg (lexical programs)MetamorphoSys (installation and customization)RRF browser (browsing subsets)
The UMLS is not an end-user application
Lister Hill National Center for Biomedical Communications 7
UMLS 3 components
Lexical resourcesSPECIALIST LexiconLexical tools
MetathesaurusConceptsInter-concept relationships
Semantic NetworkSemantic typesSemantic network relationships
Lexicalresources
Ontologicalresources
Terminologicalresources
Lister Hill National Center for Biomedical Communications 9
Metathesaurus Basic organization
ConceptsSynonymous terms are clustered into a conceptProperties are attached to concepts, e.g.,
Unique identifierDefinition
RelationsConcepts are related to other conceptsProperties are attached to relations, e.g.,
Type of relationshipSource
Lister Hill National Center for Biomedical Communications 10
Source Vocabularies
152 source vocabularies19 languages
Broad coverage of biomedicine9.7M names2.1M concepts>10M relations
Common presentation
(2009AA)
Lister Hill National Center for Biomedical Communications 11
Biomedical terminologies
General vocabulariesanatomy (UWDA, Neuronames)drugs (RxNorm, First DataBank, Micromedex)medical devices (UMD, SPN)
Several perspectivesclinical terms (SNOMED CT)information sciences (MeSH, CRISP)administrative terminologies (ICD-9-CM, CPT-4)data exchange terminologies (HL7, LOINC)
Lister Hill National Center for Biomedical Communications 12
Biomedical terminologies (cont’d)
Specialized vocabulariesnursing (NIC, NOC, NANDA, Omaha, PCDS)dentistry (CDT)oncology (PDQ)psychiatry (DSM, APA)adverse reactions (COSTART, WHO ART)primary care (ICPC)
Terminology of knowledge bases (AI/Rheum, DXplain, QMR)
Lister Hill National Center for Biomedical Communications 13
Integrating subdomains
Biomedicalliterature
MeSH
Genomeannotations
GOModelorganisms
NCBITaxonomy
Geneticknowledge bases
OMIM
Clinicalrepositories
SNOMED CTOthersubdomains
…
Anatomy
FMA
UMLS
Lister Hill National Center for Biomedical Communications 14
Integrating subdomains
Biomedicalliterature
Genomeannotations
Modelorganisms
Geneticknowledge bases
Clinicalrepositories
Othersubdomains
Anatomy
Lister Hill National Center for Biomedical Communications 15
Trans-namespace integration
Genomeannotations
GOModelorganisms
NCBITaxonomy
Geneticknowledge bases
OMIMOther
subdomains
…
Anatomy
FMA
UMLSAddison Disease (D000224)
Addison's disease (363732003)
Biomedicalliterature
MeSH
Clinicalrepositories
SNOMED CT
UMLSC0001403
Lister Hill National Center for Biomedical Communications 16
Addison’s Disease: Concept
Addison’s Disease
C0001403
ADRENAL INSUFFICIENCY (ADDISON'S DISEASE) ADRENOCORTICAL INSUFFICIENCY, PRIMARY FAILURE Hypoadrenalisms, PrimaryMelasma addisonii Primary adrenal deficiency Asthenia pigmentosa Bronzed disease Insufficiency, adrenal primary Primary adrenocortical insufficiency Addison's, disease
Maladie d'Addison - FrenchAddison-Krankheit - GermanMorbo di Addison - ItalianDoença de Addison - PortugueseАДДИСОНОВА БОЛЕЗНЬ - Russianアジソン病 - Japanese
An adrenal disease characterized by the progressive destruction of the adrenal cortex, resulting in insufficient production of aldosterone and hydrocortisone. Clinical symptoms include anorexia; nausea; weight loss; muscle ewakness; and hyperpigmentation of the skin due to increase in circulating levels of ACTH precursor hormone which stimulates melanocytes.
Disease or Syndrome
SNOMED CTSNOMED IntlMeSHMedDRA…
Lister Hill National Center for Biomedical Communications 17
Metathesaurus Relationships
Symbolic relations: ~8 M pairs of conceptsStatistical relations : ~6 M pairs of concepts (co-occurring concepts)Mapping relations: ~150,000
Categorization: Relationships between concepts and semantic types from the Semantic Network
Heart
Concepts
Metathesaurus
38
237
49
5
16
13 22
Esophagus
Left PhrenicNerve
HeartValves
FetalHeart
Medias-tinum
SaccularViscus
AnginaPectoris
CardiotonicAgents
TissueDonors
AnatomicalStructure
Fully FormedAnatomicalStructure
EmbryonicStructure
Body Part, Organ orOrgan Component Pharmacologic
Substance
Disease orSyndrome
PopulationGroup
Semantic Types
SemanticNetwork
Lister Hill National Center for Biomedical Communications 20
Semantic Network
Semantic network relationships (54)hierarchical (isa = is a kind of)
among types– Animal isa Organism– Enzyme isa Biologically Active Substance
among relations– treats isa affects
non-hierarchicalSign or Symptom diagnoses Pathologic FunctionPharmacologic Substance treats Pathologic Function
Lister Hill National Center for Biomedical Communications 21
“Biologic Function” hierarchy (isa)
Biologic Function
Pathologic FunctionPhysiologic Function
Disease orSyndrome
Cell orMolecular
Dysfunction
ExperimentalModel ofDisease
OrganismFunction
Organor TissueFunction
CellFunction
MolecularFunction
Mental orBehavioral
Dysfunction
NeoplasticProcess
MentalProcess
GeneticFunction
Lister Hill National Center for Biomedical Communications 22
Associative (non-isa) relationshipsOrganism
process of
EmbryonicStructure
AnatomicalAbnormality
CongenitalAbnormality
AcquiredAbnormality
Fully FormedAnatomicalStructure
AnatomicalStructure
part of
OrganismAttribute
property of
BodySubstance
contains,produces
conceptualpart of
evaluation of
Body Systemconceptualpart of
part of
Body Part, Organ orOrgan Component
part of
Tissue
part of
Cell
part of
CellComponent
Gene orGenome
Body Spaceor Junction
adjacent to
location of
location of
evaluation ofFinding
Laboratory orTest Result
Sign orSymptom
BiologicFunction
PhysiologicFunction
PathologicFunction
Body Locationor Region
conceptualpart of
conceptualpart of
Injury orPoisoning
disrupts
disrupts
co-occurs with
Lister Hill National Center for Biomedical Communications 23
Why a semantic network?
Semantic Types serve as high level categories assigned to Metathesaurus concepts, independently of their position in a hierarchy
A relationship between 2 Semantic Types (ST) is a possible link between 2 concepts that have been assigned to those STs
The relationship may or may not hold at the concept levelOther relationships may apply at the concept level
Lister Hill National Center for Biomedical Communications 24
Relationships can inherit semantics
Semantic Network
Metathesaurus
AdrenalCortex
AdrenalCortical
hypofunction
Disease or SyndromeBody Part, Organ,
or Organ Component
Pathologic Functionisa
Biologic Function
isa
Fully FormedAnatomical
Structure
isa
location of
location of
Lister Hill National Center for Biomedical Communications 26
Semantic interoperability through the UMLS
Metathesaurus:Terminology/ontology integration
Terms from various terminologies linked through UMLS
Semantic Network:Top domain ontology
Framework for semantic categorization of conceptsTemplate for potential relations among concepts
Lister Hill National Center for Biomedical Communications 27
Potential contribution of UMLS to integration
Data consistencySN as a source of domain and range constraints for relations
Data queryResolve terms into conceptsSource of synonymy[Lexical variants, normalization]
Service queryService interoperability
Lister Hill National Center for Biomedical Communications 28
Potential contribution of UMLS to integration
ProvenanceRich source of metadata about terms
Data integrationMap terms/concepts across vocabulariesData integration through terminology integration
Semantic mediationUMLS as a the global schema
ReasoningLimited
[Mougin, DILS 2008]
Lister Hill National Center for Biomedical Communications 29
Data, metadata and semantics
Not specifically in UMLScaBIG
Cancer Biomedical Informatics Gridhttp://cabig.cancer.gov/National Cancer InstituteCancer Data Standards Registry and Repository (caDSR)http://ncicb.nci.nih.gov/NCICB/infrastructure/cacore_overview/cadsr
Common data elementsMetadata repository
Lister Hill National Center for Biomedical Communications 30
Use of the Metathesaurus in applications
Indexing, semantic annotation, codingMapping across vocabulariesAggregationSupport for Natural Language Processing applications (entity recognition)Source of value sets for information models
Lister Hill National Center for Biomedical Communications 31
Use of the Semantic Network in applications
Partition concepts into subdomainsAggregation
Support for Natural Language Processing applications (language understanding)Consistency checking of relations
MedicalOntologyResearch
Olivier Bodenreider
Lister Hill National Centerfor Biomedical CommunicationsBethesda, Maryland - USA
Contact:Web:
Lister Hill National Center for Biomedical Communications 33
References
UMLSumlsinfo.nlm.nih.gov
UMLS browsers(free, but UMLS license required)
Knowledge Source Server: umlsks.nlm.nih.gov
Semantic Navigator: http://mor.nlm.nih.gov/perl/semnav.pl
RRF browser(standalone application distributed with the UMLS)
Lister Hill National Center for Biomedical Communications 34
References
Recent overviewsBodenreider O. (2004). The Unified Medical Language System (UMLS): Integrating biomedical terminology. Nucleic Acids Research; D267-D270.Bodenreider O. From terminology integration to information integration: Unified Medical Language System (UMLS). BioRDF Teleconference, W3C Semantic Web Health Care and Life Sciences Interest Group, June 5, 2006.http://mor.nlm.nih.gov/pubs/pres/060605-BioRDF.pdf