Download ppt - Gist 2.3

Transcript
Page 1: Gist 2.3

Gist 2.3

John H. Phan

MIBLab Summer Workshop

June 28th, 2006

Page 2: Gist 2.3

Overview

• Gist 2.3 Tools

– Support Vector Machine (SVM) classification

– Kernel Principal Component Analysis (KPCA)

Page 3: Gist 2.3

Gist 2.3 Overview

• Gist is a set of command line programs

written in C

– Primary programs

• SVM and KPCA

– Auxiliary programs

• Ranking and feature selection

– Web interface for the SVM component

Page 4: Gist 2.3

Support Vector Machines

• Supervised classification method

• Maximal margin hyperplane

http://www.dtreg.com/svm.htm

Page 5: Gist 2.3

Primary Gist Programs

• gist-train-svm – train support vector machine

• gist-classify – classify points with a trained

support vector machine

• gist-fast-classify – linear optimized classification

• gist-kpca – kernel principal component analysis

• gist-project – project points onto KPCA

components

Page 6: Gist 2.3

Auxiliary Gist Programs• gist-fselect – linear feature selection

• gist-matrix – basic matrix manipulations

• gist-score-svm – performance of gist-train-svm

and gist-classify

• gist-rfe – recursive feature elimination

• gist-sigmoid – classification probabilities

• gist2html – convert output to HTML

• gist-kernel – create a square kernel matrix

Page 7: Gist 2.3

gist-train-svm

• Train a support vector machine

–Input file is tab delimited but transposed

–Output file contains 5 columns

• Label, binary classification, SVM

weights, predicted classification,

discriminant value

Page 8: Gist 2.3

gist-fselect – Feature Selection• Fisher Criterion Score

• t-test

• Welch t-test

• Mann-Whitney

• SAM (significance analysis of microarrays)

• Threshold number of mis-classifications

Page 9: Gist 2.3

gist-score-svm• Compute False and true positives on

training and test sets

• Compute area under the ROC curves for

training and test sets

Page 10: Gist 2.3

gist-rfe• Recursive feature elimination – SVM

–Initialize the data to contain all features

–Train an SVM on the data

–Rank features according to SVM weights

–Eliminate lower 50% of features

–Repeat until 1 feature is left

Page 11: Gist 2.3

Gist SVM Web Interface• SVM Training and Testing

• Normalize data by mean centering or z-score

• Adjust kernel settings (linear, polynomial, or radial

basis)

• Demo (http://svm.sdsc.edu/svm-intro.html)

Page 12: Gist 2.3

Comparison to MAGMA

• Normalizations

– Row (gene) mean center

– Row (gene) median center

– Column mean center

– Column median center

– Row z-score

– Column z-score

– Quantile

– Handles missing values

MAGMA Gist (Web)

• Normalizations

– Column (sample) mean center

– Column (sample) z-score

Page 13: Gist 2.3

Comparison to MAGMA

• Classifiers

– SVM

– Fisher’s Discriminant

– SDF

• Data Representation

– Visualization of classifiers

– Database storage

MAGMA Gist (Web)

• Classifiers

– SVM

• Data Representation

– Text files

– HTML output

Page 14: Gist 2.3

Comparison to MAGMA

• Ranking Methods

– Resubstitution

– Cross validation

– Bootstrap

– Bolstering

MAGMA Gist (Web)

• Ranking Methods

– Fisher criterion

– T-test

– SAM

– Mann-Whitney

– Welch t-test