25
Quality Evaluation of Cancer Study Common Data Elements Using the UMLS Semantic Network Guoqian Jiang, PhD, Harold R. Solbrig, Christopher G. Chute, MD, DrPH Division of Biomedical Statistics and Informatics,  Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, MN, 55905 AMIA CRI Summit 2011. March 10, 2011. San Francisco, CA

Guoqian Jiang, PhD - Quality Evaluation of Cancer Study

  • Upload
    amia

  • View
    220

  • Download
    0

Embed Size (px)

Citation preview

8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study

http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 1/25

Quality Evaluation of Cancer Study Common Data Elements

Using the UMLS Semantic Network

Guoqian Jiang, PhD, Harold R. Solbrig,

Christopher G. Chute, MD, DrPH

Division of Biomedical Statistics and Informatics, 

Department of Health Sciences Research,

Mayo Clinic College of Medicine, Rochester, MN, 55905

AMIA CRI Summit 2011. March 10, 2011. San Francisco, CA

8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study

http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 2/25

Introduction

•  Semantic interoperability amongterminologies, data elements, andinformation model is fundamental andcritical for sharing information.

•  Consistent use of controlled terminologyis essential to support efficient, end-to-enddata flows, including the aggregation andanalysis of large data sets as well astimely response to important clinicalevents.

8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study

http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 3/25

NCI Cancer Common OntologicRepresentation Environment (caCORE)

Komatsuoulis GA, et al. caCORE version 3: Implementation of a model driven, service-oriented

architecture for semantic interoperability. J Biomed Inform. 2008; 41(1):106-23.

8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study

http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 4/25

ISO/IEC 11179 Data Element Structure

8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study

http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 5/25

Issues

• The potential of the binding has notyet been fully explored.

• There is a very limited toolbox atpresent for quality assurance (QA) of meta-data registered in such arepository like the caDSR.

8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study

http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 6/25

The UMLS Semantic Network (SN)

•  aims to provide aconsistent categorizationof all concepts representedin the UMLS Meta-

thesaurus

•  and to provide a set of useful relationshipsbetween these concepts

•  It has been widely used in

terminology qualityassurance, structurevalidation, and newrelationship discovery

8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study

http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 7/25

Objective

• To explore the role of terminologicalannotations on quality evaluation for 

the caDSR CDEs.•  We profiled the terminological

concepts associated with thestandard structure of the caDSRCDEs using the UMLS SN.

8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study

http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 8/25

The linkage between the data element

constructs and the UMLS SN 

8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study

http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 9/25

Data Collection

•  We accessed the caDSR CDE Browser 

•  Root node "caDSR Contexts"

•  Workflow Status "RELEASED“•  We extracted mappings between NCIt

codes, UMLS Concept Unique Identifiers(CUI) and semantic types

•  Data file "MRSTY.RRF" from NCIMetathesaurus (NCIM) version 200904D

•  Data file of the mappings

8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study

http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 10/25

Data Processing

• Extract the data element conceptannotations.

• Link the NCIt concept annotationswith the semantic types

8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study

http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 11/25

Profiling analysis and evaluation

•  We then calculated the frequency of thesemantic types for the object classconcepts (category ObjectClass) and theproperty concepts (category Property).

•  To distinguish the category specificsemantic type group, we rank the semantictypes for each category by setting thefiltering criteria,

•  i.e. frequency greater than 100 andratio of the frequency between the twocategories greater than 2 times.

8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study

http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 12/25

Profiling analysis and evaluation

•  We then isolated the set of Object Classand Property concepts that did not fit intothe resulting profile and

•  performed a preliminary evaluation on asmall sample of these to determine

•  whether, in fact, these elements may

have been misclassified and,•  by inference, whether the category

specific semantic type might be auseful auditing tool for data elementcuration.

8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study

http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 13/25

Results

•  In total, there are 42,426 data elementsregistered in the caDSR database as of February 1, 2010.

•  Of them, 17,798 data elements have aworkflow status "RELEASED" while 17,526primary object class/property conceptpairs were identified.

•  Of the pairs, there are 6,625 distinct pairs,comprising 1,801 distinct object classconcepts and 1,759 property concepts.

8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study

http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 14/25

Profiling by semantic types

Object Class

Property

8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study

http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 15/25

8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study

http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 16/25

20 sample data elements

From T061 (Therapeutic or Preventive Procedure )

8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study

http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 17/25

18 sample data elements

From other 9 semantic types

in category ObjectClass

8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study

http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 18/25

Discussion

• The dominant semantic types can beused to trigger an auditing process

for the curation of the CDEs.• Our preliminary evaluation results

validated the observation that thesemantic annotation of a data

element, which did not observe theprinciple of disjointness, had a highprobability to have issues with itsmodeling and curation.

8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study

http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 19/25

Lack of constraints in ISO/IEC 11179 standard

8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study

http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 20/25

Disjointness Principle

• Upper level ontologies

• the basic formal ontology (BFO)

and relation ontology by B. Smith,• or the four-category ontology by

E.J. Lowe

8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study

http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 21/25

Constraint Example by BasicFormal Ontology

•  Linking to ISO/IEC 11179model, a constraint can bemade like “an object class

concept has to be anindependent continuant or processual entity whereasa property concept can notbe such entity”.

•  Accordingly, it would be

ideal if the structure of theUMLS SN can follow thedisjointness principlesdefined in the BFO.

8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study

http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 22/25

Contributing Factors of Misclassification Issue

• The certain structural problem of theUMLS SN itself may probably cause

false positive results.• The current content distribution of 

the meta-data repository may justrepresent a portion of cancer study

domains, so the dominant semantictypes identified from this study mayprobably not be complete.

8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study

http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 23/25

Summary

•  The UMLS SN based profiling approach isfeasible for the quality assuranceof thecancer study CDEs.

•  We consider that this approach couldprovide useful insight about how to buildmechanisms of quality assurance in ameta-data repository, and would be useful

for semantic infrastructure development innext generation of the NCI caDSR.

8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study

http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 24/25

Acknowledgement

• This study is supported in part byNCI caBIG Vocabulary Knowledge

Center.

8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study

http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 25/25

Thank you!

Questions?