Upload
nacho-caballero
View
337
Download
1
Embed Size (px)
Citation preview
geneEXTRAPOLATION
modelsTOXICOGENOMIC
for
datadaniel gusenleitner
nacho caballero
Testing for carcinogenicity is costly
Genes show clustered responses
Expressioncorrelatesbetween platforms
2K A
rray
s
11000 Genes
1KLandmark
Genes
10KRegularGenes
We want to extrapolate the expression of regular genes
1KLandmark
Genes
Expression Gene 1 = X1β1 + X2β2 + …+ X2Kβ2K
X
Predicted Expression = Xβ
Expression Gene 2 = X1β1 + X2β2 + …+ X2Kβ2K
Expression Gene 10K = X1β1 + X2β2 + …+ X2Kβ2K
…
2K A
rray
s
We fit a linearmodel to eachregular gene
http://cran.r-project.org/web/packages/glmnet/index.html
Elastic Net
mean e
rror
number of variables
glmnet: Lasso and elastic-net regularized generalized linear models
Neural Networks
landmarkgenes
hidden layer
regular genes
http://cran.r-project.org/web/packages/nnet/index.html
nnet: Feed-forward Neural Networks and Multinomial Log-Linear Models
signal-to-
noise
intensity standard deviationextrapolation mean errorSNR =
mean fluorescent intensity
Inte
nsi
ty v
ari
ati
on
rati
o
Building 10451 models takes a long time…
total runtime
runtime per
model
single CPU
runtime
linear regression
120 x 3 h 2 min 360 h
elasticnet
120 x 16 h 11 min 1920 h
neural network
50 x 0.75 h 45 min 7800 h ?
Signal-to-Noise ComparisonE-net LM NN
ENSRNOG00000013133 135.58 19.62 13.21ENSRNOG00000011861 209.82 28.82 12.08ENSRNOG00000033466 190.81 26.58 11.86ENSRNOG00000036816 197.82 23.09 9.93ENSRNOG00000003515 273.62 29.35 8.68ENSRNOG00000002254 53.43 8.83 7.21ENSRNOG00000031266 76.19 8.19 6.70ENSRNOG00000005963 145.06 6.99 6.49ENSRNOG00000008613 38.86 3.97 6.07ENSRNOG00000023095 13.57 2.70 5.98ENSRNOG00000020947 17.27 2.41 5.04ENSRNOG00000007258 103.77 13.71 4.91ENSRNOG00000019813 16.53 3.01 4.68ENSRNOG00000014232 61.69 9.17 4.05ENSRNOG00000002454 50.71 5.58 3.80ENSRNOG00000018201 5.04 1.64 3.39
The elastic net outperforms standard linear regression
Sig
nal-
to-n
ois
e r
ati
o
Elastic Net
Linear Regression
Additional feature selection
Performance of extrapolation models on carcinogenicity classifiers
Correlation between Luminex and Affymetrix chips