25
Quality Assurance of the Content Quality Assurance of the Content of a Large DL-based Terminology of a Large DL-based Terminology using Mixed Lexical and Semantic using Mixed Lexical and Semantic Criteria: Criteria: Experience with SNOMED CT Experience with SNOMED CT Alan Rector, Luigi Iannone, Robert Stevens [email protected]

Alan Rector, Luigi Iannone, Robert Stevens [email protected]

  • Upload
    miles

  • View
    27

  • Download
    0

Embed Size (px)

DESCRIPTION

Quality Assurance of the Content of a Large DL-based Terminology using Mixed Lexical and Semantic Criteria: Experience with SNOMED CT. Alan Rector, Luigi Iannone, Robert Stevens [email protected]. “A report from the trenches”. - PowerPoint PPT Presentation

Citation preview

Page 1: Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk

Quality Assurance of the Content of a Quality Assurance of the Content of a Large DL-based Terminology using Mixed Large DL-based Terminology using Mixed

Lexical and Semantic Criteria: Lexical and Semantic Criteria:

Experience with SNOMED CTExperience with SNOMED CT

Alan Rector, Luigi Iannone, Robert [email protected]

Page 2: Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk

2

““A report from the trenches”A report from the trenches”

► SNOMED-CT - mandated terminology for electronic patient records in UK, US, & worldwide aspirations

► The result of a merger of two other systems• SNOMED and Clinical Terms v3• Long history with much opportunity for error

►Expressed in a Description Logic and now available in OWL • subset of EL++ without disjoint axioms

►Has been resistant to independent analysis although many known problems

• Despite several global QA attempts based on lexical criteria that have identified errors without explaining them

Page 3: Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk

3

It’s very big - and classification matters It’s very big - and classification matters

► ~400,000 Concepts/Classes; >1,000,000 axioms

► Much of richness only evident in classified for m►Most errors only present in classified form

stated Classified

Page 4: Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk

4

……and some classification horrendously and some classification horrendously complicated (complicated (Skin of AnkleSkin of Ankle))

Page 5: Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk

5

An experiment of opportunityAn experiment of opportunity

► The opportunities►Tried to use SNOMED for Commercial Collaboration on Clinical

Systems

►Tried to use SNOMED as contribution to WHO’s revsion of International Classification of Diseases (ICD-11)

►Problems with both

►Therefore, experiment if QA & repair were possible• Conventional wisdom said that it was not

► However, we had new resources►Core Problem List Subset from NLM (8500 most used classes)

►Software to extract “modules”

►SNOROCKET Classifier for EL++

►4-8GB machines

Page 6: Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk

6

Step 1: Cut it down & find a Step 1: Cut it down & find a classifier classifier

► Find a subset

►UMLS Core Problem List subset -

►8500 most used disease concepts• Collected by US National Library of Medicine by combining sets from 6 major institutions.

► Extract a “Module” (built into OWL API v3)

►Use core subset as “signature”

►Guaranteed that all inferences amongst the classes in “signature” in whole will hold in module

►35,000 concepts - including most of anatomy

► Find a classifier that can cope - at least two for checking

►SNOROCKET (EL++) polynomial time subset of OWL (30 sec)

►Pellet 2.1 (200 sec)

►FaCT++ (250 sec)•

Page 7: Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk

7

Step 2: Pick some areas of interest to Step 2: Pick some areas of interest to clinicians: some with anomalies clinicians: some with anomalies already spotted already spotted

► Myocardial Infarction (Heart attack)► Should be a kind of Ischemic Heart Disease, but wasn’t

► Hypertension (High blood pressure)► Odd to find it a kind of Soft Tissue disorder

► Diabetes► Odd to find it as a Disorder of the Abdomen

► Allergies► Odd to find some but not all autoimmune disorders classified as

Allergies.

► …

Page 8: Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk

8

► Look up hierarchy (with OWLViz)►Let clinicians find important concepts and check them

• Face validity and then look up the hierarchy

►Check any anomalies against the complete SNOMED in standard browser

• Guard against artifacts in various transformations

►Trace anomalies to their root

►Decide which links to add or break

►Decide how to break them

►Edit, classify and check• Hierarchies• Usages

Look at classification:Look at classification:Most initial errors spotted looking upwardsMost initial errors spotted looking upwards

Page 9: Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk

9

OwlViz Upwards for HypertensionOwlViz Upwards for Hypertension

Page 10: Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk

10

And check for the desired resultAnd check for the desired result

Page 11: Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk

11

Check in standard browser in full Check in standard browser in full SNOMED (snob.eggbird.eu/) SNOMED (snob.eggbird.eu/)

Page 12: Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk

12

Examine definition & formulate Examine definition & formulate solutionsolution

Disorder of blood vessel that (Finding site some Systemic arterial structure) and (Has definitional manifestation some Increased blood pressure))

Disorder of blood vessel that (Finding site some Cardiovascular system structure) and (Has definitional manifestation some Increased blood pressure)

Page 13: Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk

13

Then check usages for unwanted results - Then check usages for unwanted results - anything that should relate to arteries instead of Cardiovascular system?anything that should relate to arteries instead of Cardiovascular system?

Page 14: Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk

Also look down hierarchy:Also look down hierarchy:Combine lexical & semantic searchCombine lexical & semantic search

► Hard to spot what is missing►Hypertensive disorders included some complications as well as

kinds of hypertension. Did it contain them all?

► Use OPPL combining lexical, owl semantics & queries► ?C:CLASS=MATCH(“.*[Hh]ypertensive.*”) lexical

SELECT ?C SubClassOf Thing open world OWL semanticsWHERE FAIL ?C SubClassOf “Hypertensive disorder” closed world queryBEGIN ADD ?C SubClassOf Candidate_hypertensive END; action

► Classify and look at odd cases…

14

Page 15: Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk

Classify and look at odd casesClassify and look at odd cases

15

Page 16: Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk

Look for regularitiesLook for regularities

► Of hypertensive complications ►1 linked to Hypertensive disorder by property due to

►1 linked to Hypertensive disorder by property associated with

►2 are subclasses of Hypertensive disorder

►2 not linked at all

► No class for Hypertensive complication►Although there is a class for Diabetic complication

► Regularise►Create classes for

• Hypertension,

• Hypertensive complication and

• Hypertension AND/OR Hypertensive complication

►Edit all complications to schema:Disorder due to some Hypertension

16

Page 17: Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk

Which concept should carry the old ID?Which concept should carry the old ID?► Look at usages of Hypertensive disorder

►All fit Hypertension; none fit Hypertensive complication

►Therefore, label original ID for Hypertensive disorder as Hypertension

• New Hierarchy:‣ Hypertension AND/OR Hypertensive complication new ID/concept

Hypertension old ID/concept …kinds of hypertension Hypertensive complication… new ID/concept … kinds of hypertensive complication

17

Page 18: Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk

Looking down hierarchy:Looking down hierarchy:Analysis by categorisationAnalysis by categorisation

► Even short alphabetic lists are difficult to check

18

► Break it up logically

?

?

Page 19: Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk

19

Always trace errors to root to fixAlways trace errors to root to fixmish mash modellingmish mash modelling

► Simple error►The axiom that Skin is a kind of Soft tissue was omitted

►Therefore Injuries to skin are not listed as kinds of Soft tissue injuries

► Authors have noticed some cases and tried to compensate

►Cut of skin of foot is a kind of soft tissue injury, butCut of the skin of lower limb was NOT a soft tissue injury

►One axiom to fix it all: Skin subClassOf SoftTissue: • And then a script to find the redundant axioms

Page 20: Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk

20

Trace errors to their roots: Trace errors to their roots: Incomplete modelling: ExampleIncomplete modelling: Example

► Why is Myocardial Infarction not a kind of Ischemic Heart Disease?• Ischemia = “lack of blood supply”• Myocardium = “Heart muscle”

►Infarction not fully defined in SNOMED. References say…• “Tissue death due to ischemia”

►Ischemic heart disease not fully defined SNOMED, Refs say…• Heart disease due to ischemia

►Ischemic disorder does not exist in SNOMED, Natural closure…• Disorder due to some Ischemia - NB always involves Cardiovascular system

► Add definitions and Myocardial infarction classified correctly►Also discover a long list of Ischemic disease that have not been

classified as cardiovascular

► Check lexically for other uses of “ischemic”►None found in this subset

Page 21: Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk

21

Error in schema for anatomy:Error in schema for anatomy:Conflates branches with partsConflates branches with parts

► Example►Injury to artery of the ankle is located in the pelvis and in the

abdomen (as well as the ankle)!

► Extends to all nerves & blood vessels

► Requires a generic change►Simplest involves about 20 axioms for arteries

Page 22: Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk

23

Overgeneralisation – explains many argumentsOvergeneralisation – explains many arguments

► The dictionary says “Neuropathy” is a disease of nerves►But in practice it is a “dysfunction” of nerves

• Doctors don’t consider tumors or injuries to nerves to be neuropathies

►SNOMED often does not distinguish structural and functional disorders

• Needs a consistent pattern:

Page 23: Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk

24

Naming issuesNaming issues

► All SNOMED terms have at least two names►“Fully qualified name” & “Preferred name”

►“Fully qualified names” should be consistent but…

► Example - conflicting names►“Immune hypersensitivity disorder (disorder) = “Allergic

disorder”

►Structure nodes in SEP triples• “Structure of X”, “X Structure”, X

‣ Leads to “Swelling of gums” is kind of “Swelling of face”

Page 24: Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk

25

Doing everything in a separate moduleDoing everything in a separate module(insofar as possible)(insofar as possible)

Perform queries as “probes”

Perform queries as “probes”

Keep changes in Modules

Compromise: System of diffs and merges

Page 25: Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk

26

Summary: QA of a large DL-based Summary: QA of a large DL-based ontology is possible!ontology is possible!

► Find a useful subset and use it as signature to extract a manageable module

► Start with things that are important to your experts

►Look upwards rather than downwards in the first instance

►Follow up analogies and patterns

►When looking downwards enrich categorization to reduce noise• Combine lexical and semantic techniques

► Analysis by synthesis -

► test alternative potential changes with classifier

►as far as possible in a separate module; scripting where possible

► Tooling gaps / weaknesses

►Scripting tools need work

►Combining filtering with imports

►Diffs & change management – needed but don’t enough

► Log everything!