Upload
yeardley-duke
View
26
Download
3
Embed Size (px)
DESCRIPTION
Gist 2.3. John H. Phan MIBLab Summer Workshop June 28th, 2006. Overview. Gist 2.3 Tools Support Vector Machine (SVM) classification Kernel Principal Component Analysis (KPCA). Gist 2.3 Overview. Gist is a set of command line programs written in C Primary programs SVM and KPCA - PowerPoint PPT Presentation
Citation preview
Gist 2.3
John H. Phan
MIBLab Summer Workshop
June 28th, 2006
Overview
• Gist 2.3 Tools
– Support Vector Machine (SVM) classification
– Kernel Principal Component Analysis (KPCA)
Gist 2.3 Overview
• Gist is a set of command line programs
written in C
– Primary programs
• SVM and KPCA
– Auxiliary programs
• Ranking and feature selection
– Web interface for the SVM component
Support Vector Machines
• Supervised classification method
• Maximal margin hyperplane
http://www.dtreg.com/svm.htm
Primary Gist Programs
• gist-train-svm – train support vector machine
• gist-classify – classify points with a trained
support vector machine
• gist-fast-classify – linear optimized classification
• gist-kpca – kernel principal component analysis
• gist-project – project points onto KPCA
components
Auxiliary Gist Programs• gist-fselect – linear feature selection
• gist-matrix – basic matrix manipulations
• gist-score-svm – performance of gist-train-svm
and gist-classify
• gist-rfe – recursive feature elimination
• gist-sigmoid – classification probabilities
• gist2html – convert output to HTML
• gist-kernel – create a square kernel matrix
gist-train-svm
• Train a support vector machine
–Input file is tab delimited but transposed
–Output file contains 5 columns
• Label, binary classification, SVM
weights, predicted classification,
discriminant value
gist-fselect – Feature Selection• Fisher Criterion Score
• t-test
• Welch t-test
• Mann-Whitney
• SAM (significance analysis of microarrays)
• Threshold number of mis-classifications
gist-score-svm• Compute False and true positives on
training and test sets
• Compute area under the ROC curves for
training and test sets
gist-rfe• Recursive feature elimination – SVM
–Initialize the data to contain all features
–Train an SVM on the data
–Rank features according to SVM weights
–Eliminate lower 50% of features
–Repeat until 1 feature is left
Gist SVM Web Interface• SVM Training and Testing
• Normalize data by mean centering or z-score
• Adjust kernel settings (linear, polynomial, or radial
basis)
• Demo (http://svm.sdsc.edu/svm-intro.html)
Comparison to MAGMA
• Normalizations
– Row (gene) mean center
– Row (gene) median center
– Column mean center
– Column median center
– Row z-score
– Column z-score
– Quantile
– Handles missing values
MAGMA Gist (Web)
• Normalizations
– Column (sample) mean center
– Column (sample) z-score
Comparison to MAGMA
• Classifiers
– SVM
– Fisher’s Discriminant
– SDF
• Data Representation
– Visualization of classifiers
– Database storage
MAGMA Gist (Web)
• Classifiers
– SVM
• Data Representation
– Text files
– HTML output
Comparison to MAGMA
• Ranking Methods
– Resubstitution
– Cross validation
– Bootstrap
– Bolstering
MAGMA Gist (Web)
• Ranking Methods
– Fisher criterion
– T-test
– SAM
– Mann-Whitney
– Welch t-test