1
13 (163,682) 15 (223,616) 7 (110,847) 1 (27,073) 10 (464,933) 1,044 12,513 14,830 14,380 29,554 5 (72,321) Number of new vocabularies and approximate number of concepts to be incorporated to the different domains Standardisation of European medical vocabularies and its incorporation to the OMOP CDM Alexander Davydov 1 , Dmitry Dymshyts 1 , Vlad Korsik 1 , Oleg Zhuk 1 , Alina Vaziuro 1 , Christian Reich 2 1 Odysseus Data Services Inc., Cambridge, MA, USA; 2 IQVIA, Cambridge, MA, USA The healthcare systems of the European Union constantly generate a myriad of disparate electronic medical data, which due to its incompatibility and interoperability is not easy to analyse using standardized and systematic approaches. This is the result of different languages, healthcare constellations, EHR systems and policies [1]. The Electronic Health Data in a European Network (EDHEN) consortium is aimed at addressing the problems of data fragmentation: It will provide a harmonised model to address the structural heterogeneity and the use of different coding standards [2]. Since the standardisation of health data to the OMOP Common Data Model and the adoption of analytical tools developed by OHDSI is central to EHDEN, efficient collaboration might only be possible in a close interaction with comprehensive planning, development and maintenance of European medical vocabularies. The aims of this investigation are: assessment of the breadth of the European medical vocabularies; estimation of its current coverage in the OMOP Standardized Vocabularies; development of a plan on new vocabularies implementation; estimation of resources needed. Contact: [email protected] Background Country Domain Condition Procedure Drug Measurement Device Other/ mixed Austria ICD10 - WHO - BMG ICD - O - 3 APC APPC ATC LOINC - DE LKF - PCS Belgium ICD9 - CM ICD10 - CM - BE ICD - O - 3 ICD9 - CM ICD10 - PCS - BE GGR ATC LOINC - BE APR - DRG SNOMED - CT - BE* Czech Republic ICD10 - CZ (MKN) ICD11* ICF ICD - O - 3 - CV ORPHANET/OMIM/SSIEM TNM8 ATC SUKL ZT GMDN MeSH - CZ DRG - IR SNOMED - CT - CZ* MedDRA - CZE* VZP Denmark ICD10 - DK (SKS) ICD - O - 3 NCSP - DK ATC NPU - DK SKS DRG - DK SNOMED - CT - DK Estonia ICD10 - WHO - ET TNM8 ICD - O - 3 NCSP - ET ATC LOINC - ET EUDAMED MSA GMDN SNOMED - CT - ET* France ICD10 - FR (CIM) CCAM / ACPC CSARR BDPM ATC LOINC - FR LPP MeSH - FR Germany ICD10 - GM ICD10 - WHO - DE ICD11* ICD - O - 3 ICF Alpha ID - SE OPS AMIS LOINC - DE UCUM EDMA UMDNS GMDN* G - DRG MeSH - DE Ireland ICD10 - WHO - AM ACHI ATC LOINC ACS SNOMED - CT* Italy ICD10 - WHO - IT ICF & ICF - CY ICPC2* ICD11* ICD9 - CM ICHI* CIPI* LOINC - IT CND GMDN CMS - DRG Netherlands ICD10 - NL ICF ICPC2 - NL ICECI ICD - O - 3 CMSV MIB ATC LOINC - NL ISO 9999 - MD GMDN MeSH - DUT SNOMED - CT - NL Poland ICD10 - PL (MKC) ICF ICNP ICD9 - CM ATC LOINC* GMDN SNOMED - CT - PL* Portugal ICD9 - CM ICD10 - CM - PT ICF & ICF - CY ICD - O - 3 ESOD ICD9 - CM ICD10 - PCS CNPEM ATC LOINC - PT SNOMED - CT* CDM GMDN Spain ICD - 10 - BAQ ICD - 10 - CAT ICF & ICF - CY LOINC - ES GMDN SNOMED - CT - ES Sweden ICD10 - SE ICF ICD - O - 3 NPL NSL ATC KVA/SVF SNOMED - SV KSI NordDRG - SV MeSH - SV Switzerland ICD10 - GM ICF ICD - O - 3 CHOP ATC SwissMedic LOINC - DE SwissDRG United Kingdom ICD10 - WHO ICD - O - 3 OPCS - 4 NICIP dm+d Gemscript SNOMED - DrugExt LOINC dm+d SNOMED - CT - UK Read Results Results OMOPed; OMOPed but review / slightly changes are needed; partially OMOPed; not OMOPed, but approach was developed on similar vocabulary; not OMOPed and new approach is needed; * – not widely used / planned to be implemented. Table 1. The list of European medical vocabularies and its current coverage in the OMOP CDM Figure 1. WHO implementation database: ICD10 utilization in the healthcare system of Germany References: 1. http://www.ehden.eu/ 2. https://ec.europa.eu/info/funding-tenders/opportunities/portal/screen/opportunities/topic-details/imi2-2017-12-04 3. https://www.who.int/classifications/en/ 4. http://apps.who.int/gho/data/node.whofic 5. https://ec.europa.eu/cefdigital/wiki/pages/viewpage.action?pageId=52603401 6. https://pharm-infolinks.webnode.cz/drug-databases/national-drug-databases-eu/ The latest available versions of open-access vocabularies were downloaded and compared to existing OMOP CDM concepts. To estimate the vocabularies coverage a series of SQL queries were used. To collect the necessary information, the following web-resources were accessed: official websites of standard committees in 16 European countries; the World Health Organisation's international classifications page and implementation database [3, 4]; other open sources [5, 6]. Material and Methods 0 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 45,000 ICD10-WHO ICD10-GM ICD10-CZ (MKN) ICD10-NL ICD10-SE ICD10-PL (MKC) ICD10-DK ICD10-BAQ ICD10-AM Number of concepts in different versions of ICD10 vocabulary and its matching to the existing OMOPed ICD10 vocabularies Do not exist in any ICD vocabulary Match to the US ICD-10-CM (with semantic variance) Equivalent to the original ICD10-WHO 3. Coverage of local vocabularies. Due to utilization in existing European data, some vocabularies are covered by OMOP: Fully integrated with automatic refreshes and mapping - GGR (Belgium), OPCS-4 (multi-country), dm+d, Read and Gemscript (UK); Partial integration – BDPM (France), AMIS (Germany), DRG (multi-country) and MedDRA (multi-country). 4. Heterogeneity in coverage among domains. Measurement: most covered, mainly because of LOINC wide usage in Europe. Drug: on a high level we can easily deal with any country drugs encoded by ATC, but for a Drug Product level we still need to OMOP a lot of national drug vocabularies. Condition: would be covered enough after the several international vocabularies (ICF, ICF-CY, TNM8) and ICD10 national extensions are processed. Procedure: the most heterogeneous group and still need to be mapped. The new mapping algorithm based on Procedure attributes might be applied to the vocabularies listed above. Device: still in a grey zone. 5. New challenges. An implementation of ICD-11 is a great challenge for the OMOP CDM, but is likely to greatly help the standardisation process in Europe. Not all the mentioned vocabularies (especially those with limited national implementation) need to be urgently incorporated into the OMOP CDM. Vocabulary-specific features that need to be taken into account. For example, the ICD-10 Dual Classification (Dagger and Asterisk) system allows recording of certain diagnostic subtleties like the combination of two codes: a primary and mandatory code for the underlying disease, marked with dagger sign (+), and an optional additional code for the manifestation in a particular organ or site, marked with an asterisk (*). That is why vocabulary mapping of asterisk codes may be a challenge requiring a synchronous usage of both codes from combination. 0 50,000 100,000 150,000 200,000 250,000 300,000 350,000 MedDRA DRG AMIS BDPM Gemscript Read dm+d OPCS-4 GGR Number of concepts in European local vocabularies and its coverage in the OMOP CDM OMOPed and mapped OMOPed, not mapped Not OMOPed Conclusions 1. OHDSI dissemination to the European countries is a continuous process that will intensify after the EHDEN project has launched. 2. Patient data transformation to the OMOP CDM starts with medical vocabularies being mapped to standard concepts, which requires serious efforts and resources. 3. Although a lot of work has already been done, the medical vocabularies used in Europe for healthcare coding purposes are not covered completely by the OMOP CDM. 4. Heterogeneity of national systems and existence of local editions of international vocabularies (ICD, SNOMED, DRG) will require additional attention. 5. The current incomplete coverage of vocabularies by OMOP CDM, the lack of precise information or vocabulary source files result in a great bulk of work need to be done. 6. Having a large variety of methods, tools and practical experience, the OHDSI Vocabulary Team is prepared to take on the onset of European vocabulary incorporation and will need to prepare a comprehensive sequential development plan. The general picture of European vocabularies coverage by OMOP CDM is characterised by the following patterns: 1. Lack of available structured information. Web resources are not perfect, often have just reduced English version. Not all the used vocabularies are precisely described or even mentioned. Even if mentioned, implementation level / application areas / current version often remains unclear. Source files are often not available or just text pdf version is provided. 2. Mostly sufficient coverage of widely used vocabularies. Some of them are represented as “linguistic variants”, i.e. translated to national languages without any semantic modifications (ICD9-CM, ICD10-CM, ICD-O-3, ATC, LOINC and MeSH). Once an international version of a vocabulary is implemented, all the national versions may be processed automatically storing the concept name interpretations as concept synonyms. Another variant of “national modification or extensions” use the original vocabularies as a basis and the application of a number of adjustments and alterations (ICD-10 and SNOMED). E.g., ICD10-GM, the German modification of ICD10-WHO is currently being revised annually and used for coding of diagnoses in the outpatient and inpatient care. Among 16,126 concept codes, 13,473 (83.5%) are equivalent to the original ICD10-WHO codes, 1,135 codes (7.0%) match to the US ICD-10-CM vocabulary (albeit with semantic variance), while 1,518 (9.4%) do not exist in any ICD vocabulary. The processing of this type of vocabularies generally needs some additional efforts, including reviewing, mapping, etc. But anyway this is much more easily than development of obscure one. It is noteworthy that an unaltered translation of ICD-10-WHO is simultaneously used in Germany for death reasons registration. National ICD10 modifications were tested on codes equivalence considered that ICD10-WHO is taken as a basis (if ICD10-WHO code match, then it’s completely equal). The US ICD10-CM can be helpful as well since codes are mostly the same, so it may be used in the future mapping (Figure 2). Results Figure 2. ICD10-AM and ICD10-CM matching and its usage for future mapping P74.31 Hyperkalaemia of newborn P74.31 Hyperkalemia of newborn 206491006 Transitory neonatal hyperkalemia S62.22 Fracture of shaft of first metacarpal bone S62.22 Rolando's fracture 69916004 Fracture of base of thumb S32.02 Fracture of lumbar vertebra, L2 S32.02 Fracture of second lumbar vertebra 721495002 Fracture of second lumbar vertebra S22.03 Fracture of thoracic vertebra, T5-T6 S22.03 Fracture of third thoracic vertebra 721417009 Fracture of third thoracic vertebra E09 Intermediate hyperglycaemia E09 Drug or chemical induced diabetes mellitus 5368009 Drug-induced diabetes mellitus R79.82 Elevated prostate specific antigen R79.82 Elevated C-reactive protein 166584001 C-reactive protein abnormal ICD10-AM source concepts ICD10-CM concepts SNOMED standard concepts code match Maps to Semantic variance – may not be used No semantic variance – may be used

Standardisation of European medical vocabularies and its ... · Lack of available structured information. • Web resources are not perfect, often have just reduced English version

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Standardisation of European medical vocabularies and its ... · Lack of available structured information. • Web resources are not perfect, often have just reduced English version

13 (163,682)

15 (223,616)

7 (110,847)

1 (27,073)

10 (464,933)

1,044

12,513

14,830

14,380

29,554

5 (72,321)

Number of new vocabularies and approximate number of concepts to be incorporated to the different domains

Standardisation of European medical vocabularies and its incorporation to the OMOP CDMAlexander Davydov1, Dmitry Dymshyts1, Vlad Korsik1, Oleg Zhuk1, Alina Vaziuro1, Christian Reich2

1Odysseus Data Services Inc., Cambridge, MA, USA; 2IQVIA, Cambridge, MA, USA

The healthcare systems of the European Union constantly generate a myriad of disparate electronic medical data, which due to its incompatibility and interoperability is not easy to analyse using standardized and systematic approaches. This is the result of different languages, healthcare constellations, EHR systems and policies [1]. The Electronic Health Data in a European Network (EDHEN) consortium is aimed at addressing the problems of data fragmentation: It will provide a harmonised model to address the structural heterogeneity and the use of different coding standards [2].Since the standardisation of health data to the OMOP Common Data Model and the adoption of analytical tools developed by OHDSI is central to EHDEN, efficient collaboration might only be possible in a close interaction with comprehensive planning, development and maintenance of European medical vocabularies.The aims of this investigation are:• assessment of the breadth of the European medical vocabularies;• estimation of its current coverage in the OMOP Standardized Vocabularies;• development of a plan on new vocabularies implementation;• estimation of resources needed.

Contact: [email protected]

Background

CountryDomain

Condition Procedure Drug Measurement Device Other/ mixed

Austria ICD10-WHO-BMGICD-O-3

APCAPPC

ATC LOINC-DE LKF-PCS

BelgiumICD9-CMICD10-CM-BEICD-O-3

ICD9-CMICD10-PCS-BE

GGRATC

LOINC-BE APR-DRGSNOMED-CT-BE*

Czech Republic

ICD10-CZ (MKN)ICD11*ICFICD-O-3-CVORPHANET/OMIM/SSIEMTNM8

ATCSUKL

ZTGMDN

MeSH-CZDRG-IRSNOMED-CT-CZ*MedDRA-CZE*VZP

DenmarkICD10-DK (SKS)ICD-O-3

NCSP-DK ATC NPU-DK SKSDRG-DKSNOMED-CT-DK

EstoniaICD10-WHO-ETTNM8ICD-O-3

NCSP-ET ATC LOINC-ET EUDAMEDMSAGMDN

SNOMED-CT-ET*

FranceICD10-FR (CIM) CCAM / ACPC

CSARRBDPMATC

LOINC-FR LPP MeSH-FR

Germany

ICD10-GMICD10-WHO-DEICD11*ICD-O-3ICFAlpha ID-SE

OPS AMIS LOINC-DEUCUM

EDMAUMDNSGMDN*

G-DRGMeSH-DE

Ireland ICD10-WHO-AM ACHI ATC LOINC ACSSNOMED-CT*

ItalyICD10-WHO-ITICF & ICF-CYICPC2*ICD11*

ICD9-CMICHI*CIPI*

LOINC-IT CNDGMDN

CMS-DRG

NetherlandsICD10-NLICFICPC2-NLICECIICD-O-3

CMSV MIBATC

LOINC-NL ISO 9999-MDGMDN

MeSH-DUTSNOMED-CT-NL

PolandICD10-PL (MKC)ICFICNP

ICD9-CM ATC LOINC* GMDN SNOMED-CT-PL*

PortugalICD9-CMICD10-CM-PTICF & ICF-CYICD-O-3ESOD

ICD9-CMICD10-PCS

CNPEMATC

LOINC-PTSNOMED-CT*

CDMGMDN

SpainICD-10-BAQICD-10-CATICF & ICF-CY

LOINC-ES GMDN SNOMED-CT-ES

SwedenICD10-SEICFICD-O-3

NPLNSLATC

KVA/SVFSNOMED-SVKSINordDRG-SVMeSH-SV

SwitzerlandICD10-GMICFICD-O-3

CHOP ATCSwissMedic

LOINC-DE SwissDRG

United Kingdom

ICD10-WHOICD-O-3

OPCS-4NICIP

dm+dGemscriptSNOMED-DrugExt

LOINC dm+d SNOMED-CT-UKRead

Results Results

OMOPed; OMOPed but review / slightly changes are needed; partially OMOPed; not OMOPed, but approach was developedon similar vocabulary; not OMOPed and new approach is needed; * – not widely used / planned to be implemented.

Table 1. The list of European medical vocabularies and its current coverage in the OMOP CDM

Figure 1. WHO implementation database: ICD10 utilization in the

healthcare system of Germany

References:1. http://www.ehden.eu/2. https://ec.europa.eu/info/funding-tenders/opportunities/portal/screen/opportunities/topic-details/imi2-2017-12-043. https://www.who.int/classifications/en/4. http://apps.who.int/gho/data/node.whofic5. https://ec.europa.eu/cefdigital/wiki/pages/viewpage.action?pageId=526034016. https://pharm-infolinks.webnode.cz/drug-databases/national-drug-databases-eu/

The latest available versions of open-access vocabularies were downloaded and compared to existing OMOP CDM concepts. To estimate the vocabularies coverage a series of SQL queries were used.

To collect the necessary information, the following web-resources were accessed:• official websites of standard committees in 16 European countries;• the World Health Organisation's international classifications page and implementation database [3, 4];• other open sources [5, 6].

Material and Methods

0

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

45,000

ICD10-WHO ICD10-GM ICD10-CZ(MKN)

ICD10-NL ICD10-SE ICD10-PL(MKC)

ICD10-DK ICD10-BAQ ICD10-AM

Number of concepts in different versions of ICD10 vocabulary and its matching to the existing OMOPed ICD10 vocabularies

Do not exist in any ICD vocabularyMatch to the US ICD-10-CM (with semantic variance)Equivalent to the original ICD10-WHO

3. Coverage of local vocabularies.Due to utilization in existing European data, some vocabularies are covered by OMOP: • Fully integrated with automatic refreshes and mapping - GGR (Belgium), OPCS-4 (multi-country), dm+d, Read and

Gemscript (UK);• Partial integration – BDPM (France), AMIS (Germany), DRG (multi-country) and MedDRA (multi-country).

4. Heterogeneity in coverage among domains.• Measurement: most covered, mainly because of LOINC wide usage in Europe.• Drug: on a high level we can easily deal with any country drugs encoded by ATC, but for a Drug Product level we still

need to OMOP a lot of national drug vocabularies.• Condition: would be covered enough after the several international vocabularies (ICF, ICF-CY, TNM8) and ICD10

national extensions are processed.• Procedure: the most heterogeneous group and still need to be mapped. The new mapping algorithm based on

Procedure attributes might be applied to the vocabularies listed above.• Device: still in a grey zone.5. New challenges.• An implementation of ICD-11 is a great challenge for the OMOP CDM, but is likely to greatly help the standardisation

process in Europe.• Not all the mentioned vocabularies (especially those with limited national implementation) need to be urgently

incorporated into the OMOP CDM.• Vocabulary-specific features that need to be taken into account. For example, the ICD-10 Dual Classification (Dagger

and Asterisk) system allows recording of certain diagnostic subtleties like the combination of two codes: a primary and mandatory code for the underlying disease, marked with dagger sign (+), and an optional additional code for the manifestation in a particular organ or site, marked with an asterisk (*). That is why vocabulary mapping of asterisk codes may be a challenge requiring a synchronous usage of both codes from combination.

0 50,000 100,000 150,000 200,000 250,000 300,000 350,000

MedDRADRGAMIS

BDPMGemscript

Readdm+d

OPCS-4GGR

Number of concepts in European local vocabularies and its coverage in the OMOP CDM

OMOPed and mapped OMOPed, not mapped Not OMOPed

Conclusions1. OHDSI dissemination to the European countries is a continuous process that will intensify after the EHDEN project

has launched.2. Patient data transformation to the OMOP CDM starts with medical vocabularies being mapped to standard

concepts, which requires serious efforts and resources.3. Although a lot of work has already been done, the medical vocabularies used in Europe for healthcare coding

purposes are not covered completely by the OMOP CDM.4. Heterogeneity of national systems and existence of local editions of international vocabularies (ICD, SNOMED, DRG)

will require additional attention.5. The current incomplete coverage of vocabularies by OMOP CDM, the lack of precise information or vocabulary

source files result in a great bulk of work need to be done.6. Having a large variety of methods, tools and practical experience, the OHDSI Vocabulary Team is prepared to take

on the onset of European vocabulary incorporation and will need to prepare a comprehensive sequential development plan.

The general picture of European vocabularies coverage by OMOP CDM is characterised by the following patterns:

1. Lack of available structured information.• Web resources are not perfect, often have just reduced English version.• Not all the used vocabularies are precisely described or even mentioned.• Even if mentioned, implementation level / application areas / current version often remains unclear.• Source files are often not available or just text pdf version is provided.

2. Mostly sufficient coverage of widely used vocabularies.• Some of them are represented as “linguistic variants”, i.e. translated to national languages without any semantic

modifications (ICD9-CM, ICD10-CM, ICD-O-3, ATC, LOINC and MeSH). Once an international version of a vocabulary is implemented, all the national versions may be processed automatically storing the concept name interpretations as concept synonyms.

• Another variant of “national modification or extensions” use the original vocabularies as a basis and the application of a number of adjustments and alterations (ICD-10 and SNOMED). E.g., ICD10-GM, the German modification of ICD10-WHO is currently being revised annually and used for coding of diagnoses in the outpatient and inpatient care. Among 16,126 concept codes, 13,473 (83.5%) are equivalent to the original ICD10-WHO codes, 1,135 codes (7.0%) match to the US ICD-10-CM vocabulary (albeit with semantic variance), while 1,518 (9.4%) do not exist in any ICD vocabulary. The processing of this type of vocabularies generally needs some additional efforts, including reviewing, mapping, etc. But anyway this is much more easily than development of obscure one. It is noteworthy that an unaltered translation of ICD-10-WHO is simultaneously used in Germany for death reasons registration.

National ICD10 modifications were tested on codes equivalence considered that ICD10-WHO is taken as a basis (if ICD10-WHO code match, then it’s completely equal). The US ICD10-CM can be helpful as well since codes are mostly the same, so it may be used in the future mapping (Figure 2).

Results

Figure 2. ICD10-AM and ICD10-CM matching and its usage for future mapping

P74.31 Hyperkalaemia of newborn P74.31 Hyperkalemia of newborn 206491006 Transitory neonatal hyperkalemia

S62.22 Fracture of shaft of first metacarpal bone S62.22 Rolando's fracture 69916004 Fracture of base of thumb

S32.02 Fracture of lumbar vertebra, L2 S32.02 Fracture of second lumbar vertebra 721495002 Fracture of second lumbar vertebra

S22.03 Fracture of thoracic vertebra, T5-T6 S22.03 Fracture of third thoracic vertebra 721417009 Fracture of third thoracic vertebra

E09 Intermediate hyperglycaemia E09 Drug or chemical induced diabetes mellitus 5368009 Drug-induced diabetes mellitus

R79.82 Elevated prostate specific antigen R79.82 Elevated C-reactive protein 166584001 C-reactive protein abnormal

ICD10-AM source concepts ICD10-CM concepts SNOMED standard conceptscode

matchMaps

to

Semantic variance – may not be used

No semantic variance – may be used