Upload
amia
View
217
Download
0
Embed Size (px)
Citation preview
7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling
1/16
An empirical framework
for genome-wide
single nucleotide
polymorphism-basedpredictive modeling
Charalampos S. Floudas, MD, PhD, MS
Jeya Balaji Balasubramanian, MS
Marjorie Romkes, PhD
Vanathi Gopalakrishnan, PhD
Department of Biomedical Informatics
7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling
2/16
Department of Biomedical InformaticsPRoBE Lab
2 of 16
TBI 2013
A workflow for prediction in cancer
Predicting Risk of early recurrence in
early stage non-small cell lung cancer
(NSCLC)
SNPR workflow
Genome-wide Single Nucleotide
Polymorphisms (SNP)
Bayesian rule learning (BRL) system
7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling
3/16
Department of Biomedical InformaticsPRoBE Lab
3 of 16
TBI 2013
Translational Bioinformatics
Includes prediction of clinical outcomes
from available genomic data
Genomic data:
High-dimensional
Many modalities
Different aspects of disease
7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling
4/16
Department of Biomedical InformaticsPRoBE Lab
4 of 16
TBI 2013
Translational Bioinformatics
Multiple clinical outcomes
Combinations of datasets and outcomes
Collaborative effort
Many tools available
Workflows
Flexibility of design
Reproducibility of research
7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling
5/16
Department of Biomedical InformaticsPRoBE Lab
5 of 16
TBI 2013
Core elements
Subjects: 86 early stage NSCLC patients
University of Pittsburgh Cancer Institute
Lung SPORE cohort
Importance
Predictors dataset: Affymetrix SNP Array
6.0, 1 million SNPs
Outcome: categorical disease freesurvival (DFS), good vs. poor, 1952 days
7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling
6/16
Department of Biomedical InformaticsPRoBE Lab
6 of 16
TBI 2013
Workflow tools
Affymetrix Genotyping Console
Quality control (QC), genotype calling
After QC: 67 samples (50 poor DFS, 17 good)
PLINK
QC of Genotypes (MAF, etc.) feature selection
(2) for BRL and export of features
BRL system
Predictive rules (sets of SNPs) and metrics
7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling
7/16
Department of Biomedical InformaticsPRoBE Lab
7 of 16
TBI 2013
BRL system elements
Rule learner (RL)
Bayesian Rule Learner (BRL)
Bayesian scoring induces Bayesian networks
Rule models
Global (GBRL): full
Local (LBRL): decision tree representation
7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling
8/16
Department of Biomedical InformaticsPRoBE Lab
8 of 16
TBI 2013
Workflow tools
SQLite
Fine selection of datasets and clinical parameters
Unix command line tools
Operations on datasets (Affymetrix genotypes to
PLINK, PLINK selected features to BRL)
7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling
9/16
Department of Biomedical InformaticsPRoBE Lab
9 of 16
TBI 2013
7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling
10/16
Department of Biomedical InformaticsPRoBE Lab
10 of 16
TBI 2013
Results - feature selection
100 SNPs from PLINK 2
44 intragenic -> 33 genes
Functional analysis (Ingenuity IPA)
most significantly associated disease is
cancer (9 of 33 genes)
most significantly associated biological
function cell-to-cell signaling andinteraction (8 of 33 genes)
7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling
11/16
Department of Biomedical InformaticsPRoBE Lab
11 of 16
TBI 2013
Results - feature selection
CHODL (chondrolectin) gene
associated with shorter survivalin NSCLC
CDH13 (cadherin 13) gene
hypermethylated in NSCLC
CHST11 (carbohydrate (chondroitin 4)
sulfotransferase 11) gene
associated with lung colonization in breast
cancer
7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling
12/16
Department of Biomedical InformaticsPRoBE Lab
12 of 16
TBI 2013
Results BRL prediction
5 fold cross
validation
7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling
13/16
Department of Biomedical InformaticsPRoBE Lab
13 of 16
TBI 2013
Conclusions
Our empirical workflow (SNPR)
Efficiently overcomes challenges of
prediction using high-dimensional datasets
Achieves biological relevance and goodpredictive performance
Can be generalized and adapted
Other experimental platforms, data miningtasks
7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling
14/16
Department of Biomedical InformaticsPRoBE Lab
14 of 16
TBI 2013
Limitations
Small sample size
No independent testing cohort
Categorization of survival instead of
time-to-event analysis
7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling
15/16
Department of Biomedical InformaticsPRoBE Lab
15 of 16
TBI 2013
Acknowledgments
Cancer Biomarkers Facility of the University ofPittsburgh Cancer Institute, award
P30CA047904
Grant support: National Cancer Institute Award Number
P50CA090440
National Library of Medicine Award Number
R01LM010950
National Institute of General Medical Sciences
Award Number R01GM100387
7/28/2019 An Empirical Framework for Genome-wide Single Nucleotide Polymorphism-based Predictive Modeling
16/16
Department of Biomedical InformaticsPRoBE Lab
16 of 16
TBI 2013
Thank you