View
617
Download
1
Category
Tags:
Preview:
DESCRIPTION
Nimbus is a ruby gem implementing random forest algorithm for genomic selection contexts. http://www.nimbusgem.org Presentation from the EAAP 2011 conference in Stavanger, Norway.
Citation preview
An opensource library to implement random forests in genomic contexts
Oscar González-Recio Juanjo Bazán Selma Forni
The Problem
The ProblemMassive amount of information from high troughput genotyping platforms.
The ProblemMassive amount of information from high troughput genotyping platforms.
Need to extract knowledge from large, noisy, redundant, missing and fuzzy data.
The ProblemMassive amount of information from high troughput genotyping platforms.
Massive amount of information consumes the attention of its recipients. We need to allocate that attention efciently.
Need to extract knowledge from large, noisy, redundant, missing and fuzzy data.
Why Random Forest?
Why Random Forest?Using Machine Learning techniques we can extract hidden relationships that exist in these huge volumes of data and do not follow a particular parametric design.
Why Random Forest?Using Machine Learning techniques we can extract hidden relationships that exist in these huge volumes of data and do not follow a particular parametric design.
Random Forest have desirable statistical properties.
Why Random Forest?Using Machine Learning techniques we can extract hidden relationships that exist in these huge volumes of data and do not follow a particular parametric design.
Random Forest have desirable statistical properties.
Random Forest scales well computationally.
Using Machine Learning techniques we can extract hidden relationships that exist in these huge volumes of data and do not follow a particular parametric design.
Random Forest have desirable statistical properties.
Random Forest scales well computationally.
Random Forest performs extremely well in a variety of possible complex domains (Breiman, 2001; Gonzalez-Recio & Forni, 2011).
Why Random Forest?
The Algorithm
Ensemble methods:
- Combination of diferent methods (usually simple models).- They have very good predictive ability because use additivity of models performances.
Based on Classifcation And Regression Trees (CART).
Use Randomization and Bagging.
Performs Feature Subset Selection.
Convenient for classifcation problems.
Fast computation.
Simple interpretation of results for human minds.
Previous work in genome-wide prediction (Gonzalez-Recio and Forni, 2011)
The Algorithm
The AlgorithmPerform bootstrap on data: Ψ* = (y, X)
Build a CART ( fi (y, X) = ht (x) ) using only mtry proportion of SNPs in each node.
Repeat M times to reduce residuals by a factor of M.
Average estimates c0 = μ ; ci = 1/M
Let Ψ = (y, X) be a set of data, with
y = vector of phenotypes (response variables)
X = (x1, x2) = matrix of features
y1 x11 x12… … ...yi xi1 xi2… … ...yn xn1 xn2
The AlgorithmClassifcation and regression trees:
The Algorithm
Nimbus library
Nimbus library
Written in Ruby
www.ruby-lang.org
Open source programming language
Syntax focused on simplicity
Natural to read and easy to write
Nimbus library
How to install:
> gem install nimbus
Prerequisites:
Ruby and Rubygems (default library manager) installed in the system
Nimbus library
How to run:
> nimbus
Confguration:
Via confg.yml fle
Nimbus libraryconfg.yml fle:
Nimbus libraryInput fles:
training
testing
Nimbus libraryFeatures, use cases:
Training of a prediction forestTraining of a prediction forest
Genomic Prediction of a testing sample
Using a training set of individuals
Nimbus creates a reutilizable forest
Nimbus calculates SNP importancesGeneralization error are computed for every tree in the forest
Using a custom/reused forest, specifi ed via confi g.yml
Using a new trained forest
OutputsRandom Forest fle
In standard YAML format
OutputsPredictions for the training sample
OutputsPredictions for the testing sample
OutputsSNP importances
More info:Nimbus website:
Source code:
Report bugs/request features:
www.nimbusgem.org
www.github.com/xuanxu/nimbus/issues
www.github.com/xuanxu/nimbus
Thank you!
Questions?
Recommended