50
Unified Medical Language System The graph behind the forest Institute for Discrete Sciences Workshop on Associating Semantics with Graphs Rutgers University April 16, 2007 Olivier Bodenreider Olivier Bodenreider Lister Hill National Center Lister Hill National Center for Biomedical Communications for Biomedical Communications Bethesda, Maryland Bethesda, Maryland - - USA USA

Unified Medical Language System

Embed Size (px)

Citation preview

Page 1: Unified Medical Language System

Unified Medical Language SystemThe graph behind the forest

Institute for Discrete SciencesWorkshop on Associating Semantics with Graphs

Rutgers UniversityApril 16, 2007

Olivier BodenreiderOlivier Bodenreider

Lister Hill National CenterLister Hill National Centerfor Biomedical Communicationsfor Biomedical CommunicationsBethesda, Maryland Bethesda, Maryland -- USAUSA

Page 2: Unified Medical Language System

Biomedical trees

Page 3: Unified Medical Language System

http://www.tolweb.org/tree/

Page 4: Unified Medical Language System

4Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communicationshttp://www.ncbi.nlm.nih.gov/Taxonomy/

Page 5: Unified Medical Language System

5Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Medical Subject HeadingsMedical Subject Headings

http://www.nlm.nih.gov/mesh/2007/MBrowser.html

Page 6: Unified Medical Language System

6Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Gene OntologyGene Ontology

http://amigo.geneontology.org/cgi-bin/amigo/go.cgi

Page 7: Unified Medical Language System

7Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

SNOMED Clinical TermsSNOMED Clinical Terms

http://www.clininfo.co.uk/clue5/clue.htm

Page 8: Unified Medical Language System

Biomedical trees revisited

Page 9: Unified Medical Language System

9Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Medical Subject HeadingsMedical Subject Headings

http://www.nlm.nih.gov/mesh/2007/MBrowser.html

Amino Acids, Peptides, and Proteins

Proteins

ContractileProteins

CytoskeletalProteins

MembraneProteins

Dystrophin

Muscle Proteins

Page 10: Unified Medical Language System

10Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Gene OntologyGene Ontology

http://amigo.geneontology.org/cgi-bin/amigo/go.cgi

biological process

metabolic process

regulation ofmetabolic process lipid metabolic process

regulation of lipid metabolic process

regulation ofbiological process

biological regulation

primary metabolic process

Page 11: Unified Medical Language System

11Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

SNOMED Clinical TermsSNOMED Clinical Terms

http://www.clininfo.co.uk/clue5/clue.htm

disorder of trunk

disorder of breast neoplasm of thorax

neoplasm of breast

disorder of thorax neoplasm of trunk

Page 12: Unified Medical Language System

Terminology integrationUnified Medical Language System

Page 13: Unified Medical Language System

13Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

AddisonAddison’’s disease in medical vocabulariess disease in medical vocabularies

SynonymsSynonymsAddisonianAddisonian syndromesyndromeBronzed diseaseBronzed diseaseAddison Addison melanodermamelanodermaAsthenia Asthenia pigmentosapigmentosaPrimary adrenal deficiencyPrimary adrenal deficiencyPrimary adrenal insufficiencyPrimary adrenal insufficiencyPrimary adrenocortical insufficiencyPrimary adrenocortical insufficiencyChronic adrenocortical insufficiencyChronic adrenocortical insufficiency

symptoms

clinicalvariants

eponym

Page 14: Unified Medical Language System

14Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Organize termsOrganize terms

Synonymous terms clustered into a conceptSynonymous terms clustered into a conceptPreferred termPreferred termUnique identifier (CUI)Unique identifier (CUI)

Addison's disease

Addison Disease MeSH D000224Primary hypoadrenalism MedDRA 10036696Primary adrenocortical insufficiency ICD-10 E27.1Addison's disease (disorder) SNOMED CT 363732003

C0001403

Page 15: Unified Medical Language System

Diseases of the endocrine system

Diseases of the Adrenal Glands

Addison’s Disease

Diseases/DiagnosesSNOMED International

Page 16: Unified Medical Language System

Endocrine Diseases

Adrenal Gland Diseases

Addison’s Disease

DiseasesMeSH

Adrenal Gland Hypofunction

Page 17: Unified Medical Language System

Endocrine disorder

Adrenal disorder

Adrenal cortical disorder

Adrenal cortical hypofunction

Addison’s Disease

AOD

Page 18: Unified Medical Language System

Endocrine disorder

Disorder of adrenal gland

Hypoadrenalism

Adrenal Hypofunction

Corticoadrenal insufficiency

Addison’s Disease

Read Codes

Page 19: Unified Medical Language System

Primary adrenocortical insufficiency

Other disorders ofadrenal gland

Disorders of otherendocrine gland

ICD-10

Page 20: Unified Medical Language System

20Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Organize conceptsOrganize concepts

InterInter--concept concept relationships: hierarchies relationships: hierarchies from the source from the source vocabulariesvocabulariesRedundancy: multiple Redundancy: multiple pathspathsOne One graphgraph instead of instead of multiple multiple treestrees(multiple inheritance)(multiple inheritance)

A

B D E H D E

B

G H

E F H

C

B C

A

E FD

G H

Page 21: Unified Medical Language System

Adrenal Cortex Diseases

Hypoadrenalism

Adrenal Gland Hypofunction

Adrenal cortical hypofunction

Endocrine Diseases

Adrenal Gland Diseases

organize concepts

Addison’s Disease

UMLS

SNOMEDMeSHAODRead Codes

Page 22: Unified Medical Language System

Endocrine Diseases

Adrenal Gland Diseases

Adrenal Cortex Diseases

Hypoadrenalism

Adrenal Gland Hypofunction

Adrenal cortical hypofunction

Addison’s Disease

Adrenal Cortex Dysfunction

Adrenal Dysfunction

Addison’s disease due to autoimmunity

Secondary hypocortisolism

Other disorders ofadrenal gland

Disorders of otherendocrine gland

Adrenal Glands

Adrenal Cortex

Endocrine System

Endocrine Glands

Abdominal organ Diseases

Page 23: Unified Medical Language System

23Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Source VocabulariesSource Vocabularies

139 source vocabularies139 source vocabularies17 languages17 languages

Broad coverage of biomedicineBroad coverage of biomedicine5.5M names5.5M names1.4M concepts1.4M concepts16M relations16M relations

Common presentationCommon presentation

(2007AA)

Page 24: Unified Medical Language System

Heart

Concepts

Metathesaurus

22

225

97

4

12

9 31

Esophagus

Left PhrenicNerve

HeartValves

FetalHeart

Medias-tinum

SaccularViscus

AnginaPectoris

CardiotonicAgents

TissueDonors

AnatomicalStructure

Fully FormedAnatomical

StructureEmbryonicStructure

Body Part, Organ orOrgan Component Pharmacologic

Substance

Disease orSyndrome

PopulationGroup

Semantic Types

SemanticNetwork

Page 25: Unified Medical Language System

Biomedical forestvs. graph

Page 26: Unified Medical Language System

26Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

UMLS Knowledge Source ServerUMLS Knowledge Source Server

http://umlsks.nlm.nih.gov/

Page 27: Unified Medical Language System

27Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

AddisonAddison’’s disease in UMLSKS (1)s disease in UMLSKS (1)

Page 28: Unified Medical Language System

28Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

AddisonAddison’’s disease in UMLSKS (2)s disease in UMLSKS (2)

Page 29: Unified Medical Language System

29Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

AddisonAddison’’s disease in UMLSKS (3)s disease in UMLSKS (3)

Page 30: Unified Medical Language System

30Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

AddisonAddison’’s disease in UMLSKS (4)s disease in UMLSKS (4)

Page 31: Unified Medical Language System

31Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

AddisonAddison’’s disease in UMLSKS (5)s disease in UMLSKS (5)

Page 32: Unified Medical Language System

32Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

UMLS Semantic NavigatorUMLS Semantic Navigator

http://mor.nlm.nih.gov/perl/semnav.pl

Page 33: Unified Medical Language System

33Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

AmiGOAmiGO

http://amigo.geneontology.org/cgi-bin/amigo/go.cgi

Page 34: Unified Medical Language System

34Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

GenNavGenNav

http://mor.nlm.nih.gov/perl/gennav.pl

Page 35: Unified Medical Language System

Semantics of the UMLS graphIssues and challenges

Page 36: Unified Medical Language System

36Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Visualization of large graphsVisualization of large graphs

Page 37: Unified Medical Language System

37Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Visualization of large graphsVisualization of large graphs

Page 38: Unified Medical Language System

38Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

AcyclicityAcyclicity

A

Reflexive13,000

B

A

Direct1800

B

A

ED

G H

Indirect120

“back edge” from a child concept to a parent concept

Page 39: Unified Medical Language System

39Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

UnderspecificationUnderspecification of relationshipsof relationships

Relationship Relationship ““attributeattribute”” not always presentnot always presentRelations used to create hierarchies vs. Relations used to create hierarchies vs. hierachicalhierachicalrelationsrelations

Page 40: Unified Medical Language System

40Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Which tasks?Which tasks?

Information integrationInformation integrationMappingMapping

Depending on the degree of human involvementDepending on the degree of human involvementHypothesis generation / validationHypothesis generation / validationKnowledge discoveryKnowledge discoveryAutomated reasoningAutomated reasoning

Knowledge standardizationKnowledge standardizationCommon formatCommon formatCommon semanticsCommon semantics

Page 41: Unified Medical Language System

41Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Which formalisms?Which formalisms?

SKOS SKOS –– ThesaurusThesaurusSimple Knowledge Organization SchemaSimple Knowledge Organization Schema

RDF RDF –– ConceptConcept--RelationshipRelationship--Concept triplesConcept triplesResource Description FrameworkResource Description Framework

Description Logics / FramesDescription Logics / FramesOWL Web Ontology LanguageOWL Web Ontology LanguageProtProtééggéé (frames / OWL)(frames / OWL)OBO Open Biomedical OntologyOBO Open Biomedical Ontology

Rule languagesRule languagesFormal logicFormal logic

Page 42: Unified Medical Language System

42Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Which identifiers?Which identifiers?

For conceptsFor conceptsNamespaces, ontologies, knowledge basesNamespaces, ontologies, knowledge bases

OBO OBO –– Open Biomedical OntologiesOpen Biomedical OntologiesUMLS UMLS –– Unified Medical Language SystemUnified Medical Language SystemNCBI Entrez (Entrez Gene, NCBI Entrez (Entrez Gene, GenBankGenBank, , UniGeneUniGene, , ……))

Mappings across information sourcesMappings across information sources

For relationshipsFor relationships

Page 43: Unified Medical Language System

Conclusions

Page 44: Unified Medical Language System

44Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Integrating subdomainsIntegrating subdomains

Biomedicalliterature

Biomedicalliterature

MeSH

Genomeannotations

Genomeannotations

GOModelorganisms

Modelorganisms

NCBITaxonomy

Geneticknowledge bases

Geneticknowledge bases

OMIM

Clinicalrepositories

Clinicalrepositories

SNOMEDOthersubdomains

Othersubdomains

AnatomyAnatomy

UWDA

UMLS

Page 45: Unified Medical Language System

45Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Integrating subdomainsIntegrating subdomains

Biomedicalliterature

Biomedicalliterature

Genomeannotations

Genomeannotations

Modelorganisms

Modelorganisms

Geneticknowledge bases

Geneticknowledge bases

Clinicalrepositories

Clinicalrepositories

Othersubdomains

Othersubdomains

AnatomyAnatomy

Page 46: Unified Medical Language System

46Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

From From glycosyltransferaseglycosyltransferaseto to congenital muscular dystrophycongenital muscular dystrophy

MIM:608840 Muscular dystrophy, congenital, type 1D

GO:0008375

has_associated_phenotype

has_molecular_function

EG:9215LARGE

acetylglucosaminyl-transferase

GO:0016757glycosyltransferase

GO:0008194isa

GO:0008375 acetylglucosaminyl-transferase

GO:0016758

Page 47: Unified Medical Language System

MedicalOntologyResearch

Olivier BodenreiderOlivier Bodenreider

Lister Hill National CenterLister Hill National Centerfor Biomedical Communicationsfor Biomedical CommunicationsBethesda, Maryland Bethesda, Maryland -- USAUSA

Contact:Contact:Web:Web:

[email protected]@nlm.nih.govmor.nlm.nih.govmor.nlm.nih.gov

Page 48: Unified Medical Language System

48Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

UMLS ReferencesUMLS References

UMLSUMLSumlsinfo.nlm.nih.govumlsinfo.nlm.nih.gov

UMLS browsersUMLS browsers(free, but UMLS license required)(free, but UMLS license required)

Knowledge Source Server: Knowledge Source Server: umlsks.nlm.nih.govumlsks.nlm.nih.gov

Semantic Navigator: Semantic Navigator: http://http://mor.nlm.nih.gov/perl/semnav.plmor.nlm.nih.gov/perl/semnav.pl

RRF browserRRF browser(standalone application distributed with the UMLS)(standalone application distributed with the UMLS)

Page 49: Unified Medical Language System

49Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

UMLS ReferencesUMLS References

Gentle introductionGentle introductionBodenreider O. (2004). Bodenreider O. (2004). The Unified Medical Language The Unified Medical Language System (UMLS): Integrating biomedical terminologySystem (UMLS): Integrating biomedical terminology. . Nucleic Acids ResearchNucleic Acids Research; D267; D267--D270.D270.http://mor.nlm.nih.gov/pubs/pdf/2004http://mor.nlm.nih.gov/pubs/pdf/2004--narnar--ob.pdfob.pdf

Seminal paperSeminal paperLindberg, D. A., Humphreys, B. L., & McCray, A. T. Lindberg, D. A., Humphreys, B. L., & McCray, A. T. (1993). (1993). The Unified Medical Language SystemThe Unified Medical Language System. . Methods Methods InfInf Med, 32Med, 32(4), 281(4), 281--91.91.

Page 50: Unified Medical Language System

50Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Biomedical information integration Biomedical information integration through RDFthrough RDF

Biomedical perspectiveBiomedical perspectiveSahoo S, Zeng K, Bodenreider O, Sheth AP. Sahoo S, Zeng K, Bodenreider O, Sheth AP. (2007). (2007). From From ““glycosyltransferaseglycosyltransferase”” to to ““congenital muscular dystrophycongenital muscular dystrophy””: : Integrating knowledge from NCBI Integrating knowledge from NCBI EntrezEntrez Gene and the Gene Gene and the Gene OntologyOntology. . Proceedings of Proceedings of MedinfoMedinfo (in press)(in press)..http://mor.nlm.nih.gov/pubs/pdf/2007http://mor.nlm.nih.gov/pubs/pdf/2007--medinfomedinfo--ss.pdfss.pdf

Semantic Web perspectiveSemantic Web perspectiveSahoo S, Zeng K, Bodenreider O, Sheth AP. Sahoo S, Zeng K, Bodenreider O, Sheth AP. (2007). (2007). An An experiment in integrating large biomedical knowledge resources experiment in integrating large biomedical knowledge resources with RDF: Application to associating genotype and phenotype with RDF: Application to associating genotype and phenotype informationinformation. . Proceedings of the workshop on Health Care and Life Proceedings of the workshop on Health Care and Life Sciences Data Integration for the Semantic Web at the 16th Sciences Data Integration for the Semantic Web at the 16th International World Wide Web Conference (WWW2007) (in press)International World Wide Web Conference (WWW2007) (in press)..http://mor.nlm.nih.gov/pubs/pdf/2007http://mor.nlm.nih.gov/pubs/pdf/2007--www_hclswww_hcls--ss.pdfss.pdf