View
1.499
Download
1
Category
Preview:
DESCRIPTION
Citation preview
Semantic Web Technologies Semantic Web Technologies as a Framework for Clinical as a Framework for Clinical
InformaticsInformatics
Chimezie Ogbuji (CCF)Chris Pierce (CCF)
Chris Deaton (Cycorp)
Semantic Technology Conference16 June 2009
MeMe
I work in the Heart and Vascular Institute at the Cleveland Clinic
We store and query patient populations as an RDF dataset
Ph.D student at Case Western Reserve University
Researching medical informatics methodology
OutlineOutlineRelevant methods in clinical
informatics
Traditional challenges in cohort identification
RDF dataset and managing patient populations
Our cohort identification system
Challenges with current standards
What is Informatics?What is Informatics?
The science of information:• Gathering
• Analysis
• Representation
Scientific method: • Body of techniques for investigating
phenomena, acquiring new knowledge, or correcting & integrating previous knowledge.
BioinformaticsBioinformatics
Bioinformatics• Discipline of gathering, analyzing,
and representing the structure / function of genes and proteins and correlating these to disease and population variation.
Medical InformaticsMedical Informatics
Medical informatics: • Discipline of gathering, analyzing,
and representing longitudinal patient studies in health and disease while providing decision support or predictive tools to assist in the diagnosis and prognosis of clinical patient care.
Cohort StudiesCohort Studies
Longitudinal study: • Research study that involves
repeated observations of the same items over long periods of time.
Cohort: • Group of subjects — most often
humans from a given population — characterized by the experience of an event in a particular time span.
Retrospective Cohort StudiesRetrospective Cohort Studies
Observational clinical study: • A longitudinal study that looks
back in time
Dependent on curated patient record content
We primarily do observational studies from our cardiothoracic patient registry
Reasoning Methods in Reasoning Methods in Biomedical InformaticsBiomedical Informatics
Areas of Applied OntologyAreas of Applied Ontology
Controlled vocabulary standards and management
Reporting and export of patient record content for analysis and aggregation
Population-based research• Identification of cohorts
Challenges in Challenges in Traditional Traditional CCohort Identificationohort Identification
Domain-specific criteria are conceived by researches who dialog with DBA(s)• DBAs translate this into joins,
aggregation, text matching, etc.
Mostly an exercise in navigation of data structure
Organization of content cannot easily evolve
Patient RecordsPatient Records
Computer-based Patient Record: • An electronic patient record that
resides in a system specifically designed to support users through availability of complete and accurate data, practitioner reminders and alerts, clinical decision support systems, links to bodies of medical knowledge, and other aids.
Patient Records Cont.Patient Records Cont.
Longitudinal patient record: • Patient records from different
times, providers, and sites of care that are linked to form a lifelong view of a patient’s health care experience or a single patient record system with the same characteristics.
RDF DatasetsRDF Datasets
• “A SPARQL query is executed against an RDF Dataset which represents a collection of graphs. An RDF Dataset comprises one graph, the default graph, which does not have a name, and zero or more named graphs, where each named graph is identified by an IRI.“
RDF Datasets Cont.RDF Datasets Cont.
Similar to a document collection in XPath 2.0
The GRAPH operator can be used to scope query patterns to a particular graph or within all named graph
SPARQL &Cohort SPARQL &Cohort IdentificationIdentification
One named graph per patient record (a patient record graph)
Each patient record graph is allocated a URI
No significant cross-graph statements.• Beyond cohort identification, most
processing happens within a single patient record graph
Use of Named GraphsUse of Named Graphs
In our vocabulary, there are instances of PatientRecord, Operation, Patient, etc.
PatientRecord resources share a URI with their containing graph
GRAPH operator can be used to optimize the search space
Use of Named Graphs Cont.Use of Named Graphs Cont.
Easy to parallelize computation and optimal for cohort querying• Constraints in the first part of
query are cross-graph while the second part are intra-graph
Patient Record OntologyPatient Record Ontology
3974+ OWL Classes, 171 Object properties, and 217 Datatype properties
Diseases, findings, symptoms, medication, procedures, etc…
SHOIN(D) expressiveness • OWL-DL
Ontology: DiagnosesOntology: Diagnoses
Ontology: Coronary AnatomyOntology: Coronary Anatomy
Ontology: PathogensOntology: Pathogens
Ontologies: Family HistoryOntologies: Family History
Integration with Cyc KBIntegration with Cyc KB
Patient record ontology is aligned to Cyc common sense ontology
Lexical metadata are added to facilitate natural language processing
Cyc SKSI protocol was extended to support SPARQL
Semantic Research AssistantSemantic Research Assistant
Cyc-based medical expert system for cohort identification
Natural-language driven interface composes logical queries
Queries are generated against a SPARQL Protocol service
Leverages ontology alignment
““Semantic Interface”Semantic Interface”
OWL serves as the schema for a cohort’s SPARQL protocol service
SPARQL is the query interlingua
The Cyc KB’s common sense ontology and NLP capabilities shield the researcher from SPARQL, RDF, and OWL
ScreenshotsScreenshots
CycL QueryCycL Query
(thereExists ?ID (thereExists ?PATIENT (and (cCFhasLeftAtriumDiameter ?CATH-OR-ECHO ?DISTANCE) (patientTreated ?CATH-OR-ECHO ?PATIENT) (cCFCCFID ?PATIENT ?ID) (isa ?CATH-OR-ECHO Echocardiogram) (patientTreated ?CATH-OR-ECHO ?PATIENT) (or (and (patientSex ?PATIENT MaleHuman) (greaterThan ?DISTANCE ( (Centi Meter) 4.2))) (and (patientSex ?PATIENT FemaleHuman) (greaterThan ?DISTANCE ( (Centi Meter) 3.8)))) (temporallyBetween-Inclusive ?CATH-OR-ECHO (MonthFn January (YearFn 2008)) (DayFn 15 (MonthFn March (YearFn 2008)))))))
SPARQL QueriesSPARQL Queries
SELECT ?VAR0 ?VAR1 ?VAR2 ?VAR3 ?VAR4 ?VAR5 ?VAR6WHERE { ?VAR0 ptrec:hasSex ptrec:Sex_female . ?VAR0 a ptrec:Patient . ?VAR1 dnode:contains ?VAR0 . ?VAR1 a ptrec:PatientRecord . ?VAR1 dnode:contains ?VAR2 . ?VAR2 a ptrec:Event_evaluation_echocardiogram> . ?VAR2 ptrec:hasLeftAtriumDiameter ?VAR3 . FILTER (?VAR3 > xsd:float(3.8)) ?VAR2 dnode:contains ?VAR4 . ?VAR4 a ptrec:EventStartDate . ?VAR4 ptrec:hasDateTimeMax ?VAR5 . FILTER (?VAR5 > xsd:dateTime("2007-12-31T23:59:59")) FILTER (xsd:dateTime("2008-03-16T00:00:00") > ?VAR5) ?VAR0 ptrec:hasCCFID ?VAR6 .}
ChallengesChallenges
Representing negation in SPARQL is painfully cumbersome• Patients who had X but not Y
No equivalent of SQL’s IN operator• Find patients who had a diagnoses
of an myocardial infarction, renal failure, or atrial fibrillation
Challenges Cont.Challenges Cont.
SPARQL specification doesn’t allow matching blank nodes by name
No sufficient, readily-available medical record ontologies• We created our own
Protocol doesn’t easily support a way to abort running queries
Questions?Questions?
Case Study: A Semantic Web Content Repository for Clinical Research
http://www.w3.org/2001/sw/sweo/public/UseCases/ClevelandClinic/
Email ogbujic@ccf.org for (updated) copy of slides
Recommended