Click here to load reader
Upload
nguyenngoc
View
213
Download
1
Embed Size (px)
Citation preview
www.sbvimprover.com
The Systems Toxicology Computational Challenge:
Marker of Exposure Response Identification
Overview
Carine Poussin, PhDJuly 11th 2016, ISMB, Orlando
2 © 2016 sbv IMPROVER
sbv IMPROVER in a Nutshell
3 © 2016 sbv IMPROVER
IMPROVER: Industrial Methodology for Process Verification in Research
Verifies individual methods using double blind performance assessment
Project initiated 6 years ago and funded by Philip Morris International R&D
Aims to verify methods & data in systems biology / toxicology
Nature Biotechnology 2011 Sep 8;29(9):811-5
Bioinformatics 2012 28(9):1193-1201
4 © 2016 sbv IMPROVER
Previous sbv IMPROVER crowd-sourcing challenges
1. Diagnostic Signature Challenge (2012)
The identification of gene expression signatures and computational methods for diagnostic classification
2. Species Translation Challenge (2013)
The translatability of biological system perturbations across species
3. Network Verification Challenge (2014-2015)
The verification and enhancement of biological causal networks representative of various biological processes
Challenge outcome including best performing
methods and lessons learned were:
• Presented in symposium
• Published in peer-reviewed journals
https://sbvimprover.com/sbv-improver/publications/
5 © 2016 sbv IMPROVER
2016 - The Systems Toxicology Computational Challenge
https://sbvimprover.com/challenge-4
6 © 2016 sbv IMPROVER
Overview
To develop a classification approach that identifies a blood gene signature capable of associating subjects to the correct exposure group
Blood samples
Microarray-based geneexpression profiles Smokers (S)
Non-currentsmokers (NCS)
FormerSmokers
(FS)
NeverSmokers
(NS)
7 © 2016 sbv IMPROVER
The Systems Toxicology Computational ChallengeMarker of Exposure Response Identification
Goal: to develop inductive blood gene signature-based classification models to predict smoking exposure or cessation status
Robust and sparse gene signatures do not exceed 40 genes
Sub-challenge 1 (SC1)Human blood signature as exposure response marker
Human signatures Species-independent signatures
Sub-challenge 2 (SC2)Species translatable blood gene signature as exposure response marker
8 © 2016 sbv IMPROVER
Classification models
Inductive classification model as opposed to transductive
Apply a trained classification model to predict class label of any new individual sample (no retraining or use training data)
Trainingsamples
?
Classification model F(x, y)
Induction
Class label ?
New (y, x)
Deduction
Transduction
Population
9 © 2016 sbv IMPROVER
Training and Test data
1. Randomized samples2. Split test data in 2 subsets sequentially
released
Subset 1 Subset 2
Common set of 64 samples
Data released20 Nov 2015
Submission closed7 Apr 2016 Data released
11 Apr 2016
Submission closed29 Apr 2016
TRAIN
GEX DATA+
CLASS LABELS
Use of additionalpublic and privatedatasets allowed
TRAIN INDUCTIVE MODEL
PREDICT CLASS LABEL
APPLY
TEST
GEX DATA CLASS LABEL
GOLD STANDARD=
10 © 2016 sbv IMPROVER
Study datasetsGene expression data generated from human and mouse blood samples
Freedom to use two separate models for 2-class prediction for each step, or directly a 3-class prediction model
https://sbvimprover.com/challenge-4/the-computational-challenge/hbs
FS
NS Task 2:signature‐based
classification model 2(FS vs NS)
Step-wise class predictions
S
NCS
Task 1:signature‐basedclassification model 1(S vs NCS)
S/3R4F: Smokers / 3R4F (exposure to smoke from a reference cigarette)
FS/Cess: Former smokers / CessationNS/Sham: Never smokers / ShamNCS: Non-current smoker
TRAIN
S(109)
FS(57)
NS(58)
3R4F(40)
Cess(27)
Sham(45)
TEST
S(27)
FS(26)
NS(28)
3R4F(12)
Cess(8)
Sham(13)
SC1
SC2
x x
x x x x
dset1 dset2 dset5dset4
11 © 2016 sbv IMPROVER
Submissions and Timelines
Submissions for each sub-challenge• 4 class label prediction files• 2 gene lists• 1 write up
• Px= confidence value that a sample belongs to a class
• P1+P2=1 ; P1≠P2
Sample ID Smoker Non-current smoker
Sample 1 P1 P2
Sample 2 0.95 0.05
Sample 3 0.94 0.04
….
Sample M 0.85 0.15
Sample ID Former Smoker
Non-current smoker
Sample 1 P1 P2
Sample 2 0.95 0.05
Sample 3 0.94 0.04
….
Sample M 0.85 0.15