11

Click here to load reader

SysTox Comp Challenge ISMB TechTrack Challenge Intro

Embed Size (px)

Citation preview

Page 1: SysTox Comp Challenge ISMB TechTrack Challenge Intro

www.sbvimprover.com

The Systems Toxicology Computational Challenge:

Marker of Exposure Response Identification

Overview

Carine Poussin, PhDJuly 11th 2016, ISMB, Orlando

Page 2: SysTox Comp Challenge ISMB TechTrack Challenge Intro

2 © 2016 sbv IMPROVER

sbv IMPROVER in a Nutshell

Page 3: SysTox Comp Challenge ISMB TechTrack Challenge Intro

3 © 2016 sbv IMPROVER

IMPROVER: Industrial Methodology for Process Verification in Research

Verifies individual methods using double blind performance assessment

Project initiated 6 years ago and funded by Philip Morris International R&D

Aims to verify methods & data in systems biology / toxicology

Nature Biotechnology 2011 Sep 8;29(9):811-5

Bioinformatics 2012 28(9):1193-1201

Page 4: SysTox Comp Challenge ISMB TechTrack Challenge Intro

4 © 2016 sbv IMPROVER

Previous sbv IMPROVER crowd-sourcing challenges

1. Diagnostic Signature Challenge (2012)

The identification of gene expression signatures and computational methods for diagnostic classification

2. Species Translation Challenge (2013)

The translatability of biological system perturbations across species

3. Network Verification Challenge (2014-2015)

The verification and enhancement of biological causal networks representative of various biological processes

Challenge outcome including best performing

methods and lessons learned were:

• Presented in symposium

• Published in peer-reviewed journals

https://sbvimprover.com/sbv-improver/publications/

Page 5: SysTox Comp Challenge ISMB TechTrack Challenge Intro

5 © 2016 sbv IMPROVER

2016 - The Systems Toxicology Computational Challenge

https://sbvimprover.com/challenge-4

Page 6: SysTox Comp Challenge ISMB TechTrack Challenge Intro

6 © 2016 sbv IMPROVER

Overview

To develop a classification approach that identifies a blood gene signature capable of associating subjects to the correct exposure group

Blood samples

Microarray-based geneexpression profiles Smokers (S)

Non-currentsmokers (NCS)

FormerSmokers

(FS)

NeverSmokers

(NS)

Page 7: SysTox Comp Challenge ISMB TechTrack Challenge Intro

7 © 2016 sbv IMPROVER

The Systems Toxicology Computational ChallengeMarker of Exposure Response Identification

Goal: to develop inductive blood gene signature-based classification models to predict smoking exposure or cessation status

Robust and sparse gene signatures do not exceed 40 genes

Sub-challenge 1 (SC1)Human blood signature as exposure response marker

Human signatures Species-independent signatures

Sub-challenge 2 (SC2)Species translatable blood gene signature as exposure response marker

Page 8: SysTox Comp Challenge ISMB TechTrack Challenge Intro

8 © 2016 sbv IMPROVER

Classification models

Inductive classification model as opposed to transductive

Apply a trained classification model to predict class label of any new individual sample (no retraining or use training data)

Trainingsamples

?

Classification model F(x, y)

Induction

Class label ?

New (y, x)

Deduction

Transduction

Population

Page 9: SysTox Comp Challenge ISMB TechTrack Challenge Intro

9 © 2016 sbv IMPROVER

Training and Test data

1. Randomized samples2. Split test data in 2 subsets sequentially

released

Subset 1 Subset 2

Common set of 64 samples

Data released20 Nov 2015

Submission closed7 Apr 2016 Data released

11 Apr 2016

Submission closed29 Apr 2016

TRAIN

GEX DATA+

CLASS LABELS

Use of additionalpublic and privatedatasets allowed

TRAIN INDUCTIVE MODEL

PREDICT CLASS LABEL

APPLY

TEST

GEX DATA CLASS LABEL

GOLD STANDARD=

Page 10: SysTox Comp Challenge ISMB TechTrack Challenge Intro

10 © 2016 sbv IMPROVER

Study datasetsGene expression data generated from human and mouse blood samples

Freedom to use two separate models for 2-class prediction for each step, or directly a 3-class prediction model

https://sbvimprover.com/challenge-4/the-computational-challenge/hbs

FS

NS Task 2:signature‐based

classification model 2(FS vs NS)

Step-wise class predictions

S

NCS

Task 1:signature‐basedclassification model 1(S vs NCS)

S/3R4F: Smokers / 3R4F (exposure to smoke from a reference cigarette)

FS/Cess: Former smokers / CessationNS/Sham: Never smokers / ShamNCS: Non-current smoker

TRAIN

S(109)

FS(57)

NS(58)

3R4F(40)

Cess(27)

Sham(45)

TEST

S(27)

FS(26)

NS(28)

3R4F(12)

Cess(8)

Sham(13)

SC1

SC2

x x

x x x x

dset1 dset2 dset5dset4

Page 11: SysTox Comp Challenge ISMB TechTrack Challenge Intro

11 © 2016 sbv IMPROVER

Submissions and Timelines

Submissions for each sub-challenge• 4 class label prediction files• 2 gene lists• 1 write up

• Px= confidence value that a sample belongs to a class

• P1+P2=1 ; P1≠P2

Sample ID Smoker Non-current smoker

Sample 1 P1 P2

Sample 2 0.95 0.05

Sample 3 0.94 0.04

….

Sample M 0.85 0.15

Sample ID Former Smoker

Non-current smoker

Sample 1 P1 P2

Sample 2 0.95 0.05

Sample 3 0.94 0.04

….

Sample M 0.85 0.15