23
Phenotype Capture in Genetic Variant Databases Peng Chen School of Computer and Information Science [email protected] Supervisor: Dr Jan Stanek Research Fields: Health Informatics

Phenotype Capture in Genetic Variant Databases

  • Upload
    walda

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

Phenotype Capture in Genetic Variant Databases. Peng Chen School of Computer and Information Science [email protected] Supervisor: Dr Jan Stanek Research Fields: Health Informatics - PowerPoint PPT Presentation

Citation preview

Page 1: Phenotype Capture in  Genetic Variant Databases

Phenotype Capture in Genetic Variant Databases

Peng Chen

School of Computer and Information Science

[email protected]

Supervisor: Dr Jan Stanek

Research Fields: Health Informatics

Health Computer Science

Health Information System

Page 2: Phenotype Capture in  Genetic Variant Databases

Outline

Motivation Research Question Literature Methodology Phenotype Data Review Result The openEHR Archetypes Review Result Phenotype Capture Experiment Result Conclusion

Page 3: Phenotype Capture in  Genetic Variant Databases

Motivation

1950s health computer science, EHR (Electronic Health Record)

Slow development

Bio-medical research & EHR systems

Genotype – Phenotype correlation

Page 4: Phenotype Capture in  Genetic Variant Databases

Research Question

Can the existing standard openEHR be used to capture and store phenotype data/clinical data?

Hypothesis one: most of the phenotype data in genetic variant databases is not coded, has little clinical details, not stored in a consistent manner.

Hypothesis two: openEHR is potentially suitable to store phenotype data as a standard.

Page 5: Phenotype Capture in  Genetic Variant Databases

Literature

Claustres et al. (2002) ‘Time for a Unified System of Mutation Description and Reporting: A Review of Locus-Specific Mutation Databases’

Mitropoulou et al. (2010) ‘Locus-specific database domain and data content analysis: evolution and content maturation toward clinical use’

Spath & Grimson (2011) ‘Applying the archetype approach to the database of a biobank information management system’

Chen et al. (2009) ‘Archetype-based conversion of EHR content models: pilot experience with a regional EHR system’

Page 6: Phenotype Capture in  Genetic Variant Databases

Methodology

Criteria form for phenotype review                     1. Storage       4. Granualrity    Collect phenotypes     Overall granularity level    Internal storage     Partial fine-grained phenotypesProprietary external storage          Foreign external storage     5. Curation              Curated      2. Terminology            Formal terminology     6. Multiple phenotypes    Proprietary terms (mapped to   Single phenotype     a recognised terminology)   Multiple phenotype    External terminology used directly          Recognised terminology     7. Case level            Variant-level phenotypes    3. Coding standard     Case-level phenotypes    Formal coding standard            Proprietary codes (mapped to   8. Database       a recognised coding standard)   Database family    External coding standard used directly Flatform      Recognised coding standard                                        

Page 7: Phenotype Capture in  Genetic Variant Databases

Methodology

The openEHR phenotype capture model

Page 8: Phenotype Capture in  Genetic Variant Databases

Methodology

Data integration workflow towards a proposed health care EHR integration architecture

Page 9: Phenotype Capture in  Genetic Variant Databases

Phenotype Data Review Result

Reviewed 1224 databases, 978 collect phenotype, all stored in internal storages.

40 (4.1%) has formal terminology, 30 (3.1%) has formal coding. 959 (98%) store low-granularity phenotype data. 604 (62%) were curated by experts. 534 (54.6%) store single phenotype data, 444 (45.5%) store multiple

phenotype data. 757 (77.4%) store phenotypes on case basis, 221 (22.6%) on

variant basis. Database:

Database family Number Platform

LOVD 614 MySQL

UMD 13 4D SQL DB

63% of databases are LOVD

Platform Number

MySQL DB 617

Web page table form 209

Web page free text 132

4D SQL DB 13

PDF table form 4

Excel table form 2

Web page bar chart 1

Page 10: Phenotype Capture in  Genetic Variant Databases

Phenotype Data Review Result

Phenotype samples:

Sample 1: ‘MRX’, ‘ARRP’, ‘AMD’, ‘arCRD’, ‘CIPA or HSN IV (H406Y + G613V are polymorphisms)’, ‘Type I, type II, non syndromic recessive’

Sample 2: ‘Failure to thrive; Pneumocystis carinii pneumonia; Diarrhea; Marked lymphopenia’

Sample 3: Symptoms Other bacterial infections:

Symptoms Pseudomonas aeruginosa;

Symptoms Escherichia coli;

Symptoms Stenotrophomonas maltophilia;

Symptoms Other; Enterobacter cloacae

Symptoms Other symptoms: perirectal abscess and failure of the

Symptoms umbilical stump to involute, recurrent perirectal

Symptoms abscesses, an infected urachal cyst, a failure to heal

Symptoms surgical wounds, and the absence of pus in infected areas,

Symptoms leucocytosis, neutrophilia, hypochromic anemia

Treatment Bone marrow transplantation: Yes

Treatment Donor: matched sibling

Treatment Outcome: alive and well

Comment D57N mutant behaves in a dominant-negative fashion at the

Comment cellular level

Page 11: Phenotype Capture in  Genetic Variant Databases

The openEHR Archetypes Review Result

Reviewed 283 existing openEHR archetypes

Multilingual translation mechanism Term binding mechanism

Criteria Result

Number of terms 7361

Number of term bindings 94

Coding system SNOMED-CT, LOINC

Has term binding 7 (0.24% archetypes)

Has multilingual translations 83 (29.3% archetypes)

Languages English, German, Arabic, Portuguese, Japanese, Russian, Dutch, Chinese, Spanish, Farsi

Compile failure 14

Page 12: Phenotype Capture in  Genetic Variant Databases

Multilingual translation mechanism - example

ontology

terminologies_available = <"SNOMED-CT", ...>

term_definitions = <

["zh-cn"] = <

items = <

...

["at0004"] = <

text = <" 收缩压 ">

description = <" 一个血液循环周期中,系统性动脉血压高峰值。 收缩期血压 ">

["de"] = <

items = <

...

["at0004"] = <

text = <"Systolisch">

description = <"Der höchste arterielle Blutdruck eines Zyklus - gemessen in der systolischen oder

Kontraktionsphase des Herzens.">

["en"] = <

items = <

...

["at0004"] = <

text = <"Systolic">

description = <"Peak systemic arterial blood pressure - measured in systolic or contraction phase of the heart cycle.">

>

(ADL display)

The openEHR Archetypes Review Result

Page 13: Phenotype Capture in  Genetic Variant Databases

The openEHR Archetypes Review Result

Multilingual translation mechanism - compare

Page 14: Phenotype Capture in  Genetic Variant Databases

Term binding mechanism

term_bindings = <

["SNOMED-CT"] = <

items = <

["at0000"] = <[SNOMED-CT(2003)::163020007]>

["at0004"] = <[SNOMED-CT(2003)::163030003]>

["at0005"] = <[SNOMED-CT(2003)::163031004]>

["at0013"] = <[SNOMED-CT(2003)::246153002]>

>

>

(ADL display)

The openEHR Archetypes Review Result

Page 15: Phenotype Capture in  Genetic Variant Databases

Phenotype Capture Experiment Result

The chosen sample:

The mapping of concepts:

Diagnosis Wiskott Aldrich syndrome

Symptoms Platelets

Symptoms At date of diagnosis: Count: 28,000/µL

Treatment Bone marrow transplatation: Yes

Treatment Donor: mismatched family donor

Page 16: Phenotype Capture in  Genetic Variant Databases

Phenotype Capture Experiment Result

The openEHR archetypes mapping:

Evaluation DiagnosisObservation SymptomAction Treatment

NO. Archetypes Entry items

1 openEHR-EHR-EVALUATION.problem-diagnosis.v1.adl Diagnosis

2 openEHR-EHR-OBSERVATION.lab_test-full_blood_count.v1.adl Platelet count

3 openEHR-EHR-ACTION.procedure.v1.adl Procedure, Comments

Page 17: Phenotype Capture in  Genetic Variant Databases

Phenotype Capture Experiment Result

Phenotype capture snapshots:

Page 18: Phenotype Capture in  Genetic Variant Databases

Phenotype Capture Experiment Result

Phenotype capture snapshots:

Page 19: Phenotype Capture in  Genetic Variant Databases

Phenotype Capture Experiment Result

Phenotype capture snapshots:

Page 20: Phenotype Capture in  Genetic Variant Databases

Phenotype Capture Experiment Result

Phenotype capture snapshots:

Page 21: Phenotype Capture in  Genetic Variant Databases

A conceptual patient-centric EHR data warehouse schema

Page 22: Phenotype Capture in  Genetic Variant Databases

Conclusion

The research results have justified the hypotheses and have matched the expected outcomes

The openEHR standard is potentially suitable for storing clinical data, even for integrating health information systems.

The multilingual language mechanism and term binding mechanism are two strong evidences for semantic interoperability between heterogeneous systems.

We need international cooperation on managing the archetypes and completing a full set of archetypes for health concepts.

We need international agreement on choosing terminologies and enhancing the terminologies for resolving semantic conflicts.

Page 23: Phenotype Capture in  Genetic Variant Databases

Conclusion

The philosophy and the future

A health care EHR integration architecture

Archetype-ontology

Cognitive IS

Human friendly

Robust, scalable, integrated

Semantic interoperability

Syntactic consistency

Data modelling neutral

Start from learning terms and concepts

IS essentially for communication

Ubiquitous information computing