Cytogenetics payne lab_presentation_08282013

  • View
    166

  • Download
    1

  • Category

    Science

Preview:

Citation preview

CytoGPS (CytoGenetic Pattern Sleuth)Arka PattanayakZachary Abrams

Informatics Research & Development, Dept. of Biomedical Informatics at The Ohio State

University

08/28/2013

2

• Complex chromosomal aberration data – structure and knowledge.

• Inherently descriptive grammar – International System for human Cytogenetic Nomenclature (ISCN).

MOTIVATION : DATA

SOLUTION : CytoGPS

APPLICATIONS : MULTIPLE

• Parse karyotypes using Context-Free Grammar (CFG) rules.

• Extract morphological phrases.• Map phrases to abstract biological

meta-model.

• Discovery of important, obfuscated patterns in cytogenetic data.

• Targeted Treatment.• In-silico drug studies.

CytoGPS: 3-month status report

MOTIVATION Existing cytogenetic data:• Structured.• ISCN-conformant.• Multi-dimensional.

Minimal exploitation due to its informational complexity:• Syntactic variability.• Information density.• Human error.

3CytoGPS: 3-month status report

4

The Rulebook

CytoGPS: 3-month status report

CytoGPS PlatformSmart Parser of

Karyotypes

EBNF Grammar

Rules

Parser Generator

s and Parse Tree

Visitors

Biologically Abstracted Meta-Model

Mapper DSL

Genetic Pattern Matching

ML Algorithms

Phenotype-

phenotype Matching

5

CytoGPS: CytoGenetic Pattern Sleuth

CytoGPS: 3-month status report

Parsing Complex Karyotypes using SPoK (Smart Parser of Karyotypes)

6CytoGPS: 3-month status report

SPoK (Smart Parser of

Karyotypes)

• Enables in-silico analyses of complex karyotypes.

• Based on well-studied fundamentals in computational parsing (CFG, EBNF).

• Disease-agnostic.

• Multi-disciplinary effort - Biomedical Informatics, Cytogenetics, Hematology.

• ~76% of 3000 publicly available ISCN 2009 karyotypes were successfully parsed with this method.

7CytoGPS: 3-month status report

8

SPoK: Context-Free Grammar Rules

CytoGPS: 3-month status report

9

SPoK: Parser Generation

Deterministic Parser

ANTLR

CytoGPS: 3-month status report

10CytoGPS: 3-month status report

SPoK: A Parse Tree Showing the Morphological Deconstruction of a Complex Karyotype

46,XY,del(17)(p12),t(12;15)(p13;q20)

Functional Abstraction using LossGainFusion (Biologically Abstracted Meta-Model)

11CytoGPS: 3-month status report

LGF(Biologically Abstracted Meta-Model)

• Abstraction of ISCN aberrations observed in chromosomal bands to their biologically functional outcomes.

• Using a custom Domain-Specific Language (DSL)

• Karyotype complexity-agnostic.

• Human-readable karyotypes to machine-readable construct.

• ~90% of parsed karyotypes were successfully mapped using this model.

12CytoGPS: 3-month status report

13

LGF: Understanding Oncogenic Effects with an Abstracted Meta-Model

CytoGPS: 3-month status report

del(17)(p12)

del 17p12

del1:L

46,XY,del(17)(p12),t(12;15)(p13;q20)

t(12;15)(p13;q20)

t 12p13 15q20

t2:F,F

1.Complete karyotype

2.Chromosomalaberrations

3.ID and chromosomal locations

14CytoGPS: 3-month status report

LGF: Morphological Decomposition of Karyotypes

der(4)t(4;13)(p14;p18)

der(4) t(4;13)(p14;p18)

t 4p14 13p18der 4A B C D E

A+C=F

B,D,E add up to 3

F3:B,D,EWe don’t need B so we don’t put a annotation at that location. We need to put the biological response forD and E in there respective locations.

F3:,FL,FG

15CytoGPS: 3-month status report

LGF: Morphological Decomposition of Karyotypes (more complex example)

16

LGF: Domain-Specific Language for Mapping ISCN Aberrations to the Meta-Model.

CytoGPS: 3-month status report

Genetic Pattern Matching

17CytoGPS: 3-month status report

GPM: Genetic Pattern Matching using C.A.R.T. (Classification And Regression Tree) Algorithm

18

Features-band locations on X-axis

Karyotypes on Y-axis

1p36.3 1p36.2 1p36.1 … 1q44 2p25 … yp12

1 1 0 0 0 0 0 01 1 0 0 0 0 0 01 1 0 0 0 0 0 0

0 0 1 0 0 0 0 00 0 1 0 0 0 0 00 0 1 0 0 0 0 00 0 1 0 0 0 0 0

0 0 0 0 1 1 0 00 0 0 0 1 1 0 0

First cut

Second cut

Applied Biomedical Informatics using CytoGPS: A Case Study

19CytoGPS: 3-month status report

Case Study: In-silico Drug Studies

Raw ISCN Karyotypes. Parse Machine-readable Construct Map ISCN aberration to gene-set Map gene-set to known chemical reagent databases.

An end-to-end in-silico solution for Drug Studies Significant cost savings. Rapid. Flatter learning curve to operate such a system

compared to wet-lab testing.

20CytoGPS: 3-month status report

22

Case Study: Extracting Genetic Information

CytoGPS: 3-month status report

Zachary Abrams

Lori Dalton, PhD

Philip R. O. Payne, PhD

Arka Pattanayak

Raj Muthusamy,

PhD

Nyla Heerema,

PhD

William Kenworthy

Sarah Yousef

Alex Mysiw

Yuxiang Kou

Michael Berkovich

23CytoGPS: 3-month status report

24CytoGPS: 3-month status report

25CytoGPS: 3-month status report

Recommended