14
Gist 2.3 John H. Phan MIBLab Summer Workshop June 28th, 2006

Gist 2.3

Embed Size (px)

DESCRIPTION

Gist 2.3. John H. Phan MIBLab Summer Workshop June 28th, 2006. Overview. Gist 2.3 Tools Support Vector Machine (SVM) classification Kernel Principal Component Analysis (KPCA). Gist 2.3 Overview. Gist is a set of command line programs written in C Primary programs SVM and KPCA - PowerPoint PPT Presentation

Citation preview

Page 1: Gist 2.3

Gist 2.3

John H. Phan

MIBLab Summer Workshop

June 28th, 2006

Page 2: Gist 2.3

Overview

• Gist 2.3 Tools

– Support Vector Machine (SVM) classification

– Kernel Principal Component Analysis (KPCA)

Page 3: Gist 2.3

Gist 2.3 Overview

• Gist is a set of command line programs

written in C

– Primary programs

• SVM and KPCA

– Auxiliary programs

• Ranking and feature selection

– Web interface for the SVM component

Page 4: Gist 2.3

Support Vector Machines

• Supervised classification method

• Maximal margin hyperplane

http://www.dtreg.com/svm.htm

Page 5: Gist 2.3

Primary Gist Programs

• gist-train-svm – train support vector machine

• gist-classify – classify points with a trained

support vector machine

• gist-fast-classify – linear optimized classification

• gist-kpca – kernel principal component analysis

• gist-project – project points onto KPCA

components

Page 6: Gist 2.3

Auxiliary Gist Programs• gist-fselect – linear feature selection

• gist-matrix – basic matrix manipulations

• gist-score-svm – performance of gist-train-svm

and gist-classify

• gist-rfe – recursive feature elimination

• gist-sigmoid – classification probabilities

• gist2html – convert output to HTML

• gist-kernel – create a square kernel matrix

Page 7: Gist 2.3

gist-train-svm

• Train a support vector machine

–Input file is tab delimited but transposed

–Output file contains 5 columns

• Label, binary classification, SVM

weights, predicted classification,

discriminant value

Page 8: Gist 2.3

gist-fselect – Feature Selection• Fisher Criterion Score

• t-test

• Welch t-test

• Mann-Whitney

• SAM (significance analysis of microarrays)

• Threshold number of mis-classifications

Page 9: Gist 2.3

gist-score-svm• Compute False and true positives on

training and test sets

• Compute area under the ROC curves for

training and test sets

Page 10: Gist 2.3

gist-rfe• Recursive feature elimination – SVM

–Initialize the data to contain all features

–Train an SVM on the data

–Rank features according to SVM weights

–Eliminate lower 50% of features

–Repeat until 1 feature is left

Page 11: Gist 2.3

Gist SVM Web Interface• SVM Training and Testing

• Normalize data by mean centering or z-score

• Adjust kernel settings (linear, polynomial, or radial

basis)

• Demo (http://svm.sdsc.edu/svm-intro.html)

Page 12: Gist 2.3

Comparison to MAGMA

• Normalizations

– Row (gene) mean center

– Row (gene) median center

– Column mean center

– Column median center

– Row z-score

– Column z-score

– Quantile

– Handles missing values

MAGMA Gist (Web)

• Normalizations

– Column (sample) mean center

– Column (sample) z-score

Page 13: Gist 2.3

Comparison to MAGMA

• Classifiers

– SVM

– Fisher’s Discriminant

– SDF

• Data Representation

– Visualization of classifiers

– Database storage

MAGMA Gist (Web)

• Classifiers

– SVM

• Data Representation

– Text files

– HTML output

Page 14: Gist 2.3

Comparison to MAGMA

• Ranking Methods

– Resubstitution

– Cross validation

– Bootstrap

– Bolstering

MAGMA Gist (Web)

• Ranking Methods

– Fisher criterion

– T-test

– SAM

– Mann-Whitney

– Welch t-test