Upload
chimezie-ogbuji
View
1.243
Download
0
Embed Size (px)
DESCRIPTION
Some common needs for the patient registries, Electronic Health Record (EHR) systems, and clinical research repositories of the future are: semantic interoperability, adoption of standardized clinical terminology, adhoc and distributed querying interfaces, and integration with extant databases and web-based systems. A suite of standards has recently emerged from the consortium responsible for the development and oversight of the protocols of the World-wide Web (WWW). They were conceived to address data integration challenges associated with internet and intranet applications. Many of these standards and technologies are capable of addressing the challenges common to health information systems. In this talk, an introductory overview of these technologies, how they address these challenges, and a brief discussion of projects where they have been used is given.
Citation preview
Semantic Web Technologies: A Paradigm for Medical InformaticsChimezie Ogbuji (Owner, Metacognition LLC.)
http://metacognition.info/presentations/SWTMedicalInformatics.pdfhttp://metacognition.info/presentations/SWTMedicalInformatics.ppt
Who I amCirca 2001: Introduced to web standards and
Semantic Web technologies2003-2011: Lead architect of CCF in-house
clinical repository project2006-2011: Member representative of CCF in
World-wide Web Consortium (W3C)◦ Editor of various standards and Semantic Web Health
Care and Life Sciences Interest Group chair2011-2012: Senior Research Associate at
CWRU Center for Clinical Investigations2012-current: Started business providing
resource and data management software for home healthcare agencies (Metacognition LLC)
Medical Informatics ChallengesSemantic interoperability
◦Exchange of data with common meaning between sender and receiver
Most of the intended benefits of HIT depend on interoperability between systems
Difficulties integrating patient record systems with other information resources are among the major issues hampering their effectiveness◦ Interoperability is a major goal for meaningful
use of Electronic Health Records (EHR)
Rodrigues et al. 2013; Kadry et al. 2010; Shortliffe and Cimino, 2006
Requirements and SolutionsSemantic interoperability
requires:◦Structured data◦A common controlled vocabulary
Solutions emphasize the meaning of data rather than how they are structured◦“Semantic” paradigms
Registries and Research DBsPatient registries and clinical
research repositories capture data elements in a uniform manner
The structure of the underlying data needs to be able to evolve along with the investigations they support
Thus, schema extensibility is important
Querying InterfacesStandardized interfaces for
querying facilitate:◦Accessibility to clinical information
systems◦Distributed querying of data from
where they resideRequires:
◦Semantically-equivalent data structuresAlternatively, data are centralized
in data warehouses
Austin et al. 2007, “Implementation of a query interface for a generic record server”
Biomedical OntologiesOntologies are artifacts that conceptualize a
domain as a taxonomy of classes and constraints on relationships between their members
Represented in a particular formalismIncreasingly adopted as a foundation for the
next generation of biomedical vocabulariesConstruction involves representing a domain
of interest independent of behavior of applications using an ontology
Important means towards achieving semantic interoperability
Biomedical Ontology CommunitiesProminent examples of adoption by
life science and healthcare terminology communities:◦The Open Biological and Biomedical
Ontologies (OBO) Foundry◦Gene Ontology (GO)◦National Center for Biomedical
Ontology (NCBO) Bioportal◦International Health Terminology
Standards Development Organization (IHTSDO)
Semantic Web and Technologies
The Semantic Web is a vision of how the existing infrastructure of the World-wide Web (WWW) can be extended such that machines can interpret the meaning of data on it
Semantic Web technologies are the standards and technologies that have been developed to achieve the vision
An Analogy(Technological) singularity is a
theoretical moment when artificial intelligence (AI) will have progressed to a greater-than-human intelligence
Despite remaining in the realm of science fiction, it has motivated many useful developments along the way◦The use of ontologies for knowledge
representation and IBM Watson capabilities, for example
Background: GraphsGraphs are data structures
comprising nodes and edges that connect them
The edges can be directionalEither the nodes, the edges, or
both can be labeledThe labels provide meaning to
the graphs (edge labels in particular)
Resource Description FrameworkThe Resource Description
Framework (RDF) is a graph-based knowledge representation language for describing resources
It’s edges are directional and both nodes and edges are labeled
It uses Universal Resource Identifiers (URI) for labeling
Foundation for Semantic Web technologies
RDF: ContinuedThe edges are statements (triples)
that go from a subject to an objectSome objects are text valuesSome subjects and objects can be
left unlabeled (Blank nodes)◦Anonymous resources: not important to
label them uniquelyThe URI of the edge is the predicatePredicates used together for a
common purpose are a vocabulary
Subject: Dr. X (a URI)Object: ChimePredicate: treatsVocabulary:
◦treats, subject of record, author, and full name
RDF vocabulariesHow meaning is interpreted from an RDF graphThere are vocabularies that constrain how
predicates are used◦ Want a sense of treats where the subject is a
clinician and the object is a patient There is a predicate relating resources to the
classes they are a member of (type)There are vocabularies that define constraints
on class hierarchiesThese comprise a basic RDF Schema (RDFS)
languageRepresented as an RDF graph
Ontologies for RDFThe Ontology Web Language (OWL) is
used to describe ontologies for RDF graphs
More sophisticated constraints than RDFS
Commonly expressed as an RDF graphDefines the meaning of RDF statements
through constraints:◦On their predicates◦On the classes the resources they relate
belong to
OWL FormatsMost common format for
describing ontologiesDistribution format of ontologies
in the NCBO BioPortalSNOMED CT distributions include
an OWL representation◦RDF graphs can describe medical
content in a SNOMED CT-compliant way through the use of this vocabulary
Validation and DeductionOWL is based on a formal,
mathematical logic that can be used for validating the structure of an ontology and RDF data that conform to it (consistency checking)
Used to deduce additional RDF statements implied by the meaning of a given RDF graph (logical inference)
Logical reasoners are used for this
InferenceCan infer anatomical location
from SNOMED CT definitions
Hypertension DX <-> 1201005 / “Benign essential hypertension (disorder)”
Querying RDF GraphsSPARQL is the official query language for
RDF graphsComparable to relational query languages
◦Primary difference: it queries RDF triples, whereas SQL queries tables of arbitrary dimensions
Includes various web protocols for querying RDF graphs
Foundation of SPARQL is the triple pattern(?clinician, treats, ?patient)
◦?clinician and ?patient are variables (like a wildcard)
Which physicians have given essential hypertension diagnoses and to whom?
(?physician, author, ?dx)(?physician, treats, ?patient)(?dx, subject of record, ?patient)(?dx, type, Hypertension DX)
?physician ?patient ?dx
Dr. X Chime …
SPARQL over Relational DataMost common implementations convert
SPARQL to SQL and evaluate over:◦a relational databases designed for RDF
storage◦an existing relational database
There are products for both approachesFormer requires native storage of RDF
◦Relational structure doesn’t change even as RDF vocabulary does (schema extensibility)
Elliot et al. 2009, “A Complete Translation from SPARQL into Efficient SQL”
SPARQL over Existing Relation Data“Virtual RDF view”
◦Translation to SQL follows a given mapping from existing relational structures to an RDF vocabulary
◦Allows non-disruptive evolution of existing systems
◦Well-suited as a standard querying interface over clinical data repositories
◦They can be queried as SPARQL, securely over encrypted HTTP
Example: Cleveland Clinic (SemanticDB)Content repository and data production
system released in Jan. 200880 million (native) RDF statements
◦Uses vocabulary from a patient record OWL ontology for the registry
Based on◦Existing registry of heart surgery and CV
interventions◦200,000 patient records◦Generating over 100 publications per year
Pierce et al. 2012, “SemanticDB: A Semantic Web Infrastructure for Clinical Research and Quality Reporting”
Cohort IdentificationInterface developed in
conjunction with CycorpLeverage their logical reasoning
system (Cyc)◦Identifies cohorts using natural
language (NL) sentence fragments◦Converts fragments to SPARQL◦SPARQL is evaluated against RDF
store
Example: Mayo Clinic (MCLSS)Mayo Clinic Life Sciences System
(MCLSS)◦Effort to represent Mayo Clinic EHR
data as RDF graphs◦Patient demographics, diagnoses,
procedures, lab results, and free-text notes
◦Goal was to wrap MCLSS relational database and expose as read-only, query-able RDF graphs that conform to standard ontologies
◦Virtual RDF viewPathak et al. 2012, "Using Semantic Web Technologies for Cohort Identification from Electronic Health Records for Clinical Research"
Example: Mayo Clinic (CEM)Clinical Element Model (CEM)
◦Represents logical structure of data in EHR◦Goal: translate CEM definitions into OWL
and patient (instance) data into conformant RDF
◦Use tools (logical reasoners) to check semantic consistency of the ontology, instance data, and to extract new knowledge via deduction
◦Instance data validation: correct number of linked components, value
within data range, existence of units, etc.Tao et al. 2012, ”A semantic-web oriented representation of the clinical element model for secondary use of electronic health records data"
SummarySchema extensibility
◦Use of RDFSemantic Interoperability
◦Domain modeling using OWL and RDFSStandardized query interfaces
◦Querying over SPARQLIncremental, non-disruptive adoption
◦Virtual RDF viewsMain challenge: highly disruptive
innovation