Upload
alexis-blair
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
• This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al.
• Bayes’ rule:
• Pr is probability
• Equation means: “How does the probability of an item being a member of group change, given evidence x”
Bayesian Discriminant Analysis
Prior probabilityThis can be a problem!
Bayesian Discriminant Analysis• Bayes’ rule can be turned into a classification
rule:
=> Choose group 1
*If priors are both 0.5, decision boundaries are
where curves cross
• If the data is multivariate normal drawn from the same population, the decision rule becomes:
Bayes-Gaussian Discriminant Analysis
slope intercept
with the “distance” defined as:
and
• Note that if the data is just 1D this is just an equation for a line:
Like an average cov mat
• If the data is multivariate normal but drawn from different populations, the decision rule is the same but the “decision distance” becomes:
Bayes-Gaussian Discriminant Analysis
b c
• Note that if the data is just 1D this is an equation for a parabola:
New quadratic term
a
• The “quadratic” version is always called quadratic discriminant analysis, QDA
• The “linear” version is called by a number of names!• linear discriminant analysis, LDA
• Some combination of of the above with the words, Gaussian or classification
• A number of techniques use the name LDA!• Important to specify the equations used to tell the
difference!
Bayes-Gaussian Discriminant Analysis
Bayes-Gaussian Discriminant Analysis
Groups have similar covariance structure:linear discriminant rule should work well
Groups have different covariance structure:quadratic discriminant rule may work better
• This supervised technique is called Linear Discriminant Analysis (LDA) in R• Also called Fisher linear discriminant analysis
• CVA is closely related to linear Bayes-Gaussian discriminant analysis
• Works on a principle similar to PCA: Look for “interesting directions in data space”• CVA: Find directions in space which best separate
groups.• Technically: find directions which maximize ratio of
between group to within variation
Canonical Variate Analysis
Canonical Variate Analysis
Project on PC1:Not necessarily good group separation!
Project on CV1:Good group separation!
Note: There are #groups -1 or p CVswhich ever is smaller
• Use between-group to within-group covariance matrix, W-1B to find directions of best group separation (CVA loadings, Acv):
• CVA can be used for dimension reduction.
• Caution! These “dimensions” are not at right angles (i.e. not orthogonal)
• CVA plots can thus be distorted from reality
• Always check loading angles!
• Caution! CVA will not work well with very correlated data
Canonical Variate Analysis
Canonical Variate Analysis
2D CVA of gasoline data set: 2D PCA of gasoline data set:
• Distance metric used in CVA to assign group i.d. of an unknown data point:
• If data is Gaussian and group covariance structures are the same then CVA classification is the same as Bayes-Gaussian classification.
Canonical Variate Analysis
• PLS-DA is a supervised discrimination technique and very popular in chemometrics• Works well with highly correlated variables (like in
spectroscopy)• Lots of correlation causes CVA to fail!
• Group labels coded into a “response matrix” Y• PLS searches for directions of maximum covariance in X and Y.
• Loading for X can be used like PCA loading• Dimension reduction
• Loading plots
Partial Least Squares Discriminant Analysis
Partial Least Squares Discriminant Analysis• PLS-DA theory: Find an (approximate) linear
relationship between experimental (explanatory) variables and group labels (response variables)• Y=XB+E
• X=TPT+EX
• Y=UQT+EY
• So substituting: UQT=TPTB+E
exp. vars.lbls.
“error” or “residuals” matrix
PLS-scores PLS-loadings
*Use these “Y-scores” with a “soft-max” or “Bayes” to pick “most-likely” group label
Partial Least Squares Discriminant Analysis• How do we solve this for T, P and U??• Objective: maximize covariance between X and Y scores, T
and U.
• Various procedure to do this:
• Kernel-PLS
• SIMPLS
• NIPLS
• Give close, but slightly different numerical results
• In R, functions are:
• plsr (pls package)
• spls (spls package)
• Easiest: plsda (caret pakage)
Partial Least Squares Discriminant Analysis
2D PLS of gasoline data set: 2D PCA of gasoline data set:
• Group assignments of observation vectors are made by interpreting Y scores.• Typically “soft-max” function is used.
Partial Least Squares Discriminant Analysis
Observation Vectors
Y-s
core
s