2014 arranged by Daniel Korenblum Bayes Impact · 2014-10-27 · References Lecture Notes 2004 Figueiredo, Lecture Notes on Bayesian Estimation and Classification Martinez et al.,

Statistical InferenceBayes Impact2014

arranged by Daniel Korenblum

The Inference Problem

is estimation of an unknown quantity

Problem Solution / Method Algorithm / Statistic

Point Estimation Maximum Likelihood (ML) Gradient Descent

Minimum-Variance Unbiased (MVU) Estimator Least Squares

Maximum Posterior (MAP/GMLE) Gradient Descent

Posterior Mean (PM) Markov-chain Monte Carlo (MCMC)

Error Bars / Confidence(Estimator Error)

Confidence Interval / Region Covariance/Information, Resampling

Credibility Interval / Region Evidentiary Credible Region (2014)

Classification and Clustering(Pattern Recognition)

Unsupervised Learning Cluster Analysis

(Semi) Supervised Learning Discriminant, Generative, SVM, kNN, trees

Feature Selection Ranking, Filtering, Greedy, Sparse

Model Selection Hypothesis Testing Significance Tests (Holy Trinity)

Model Evidence Marginal Likelihood

Inference/Estimation Subject Areas

1. Frequentist inference as an optimization problem: maximize the likelihood over all observations2. Bayesian inference as distribution estimation: the posterior distribution estimate is “the inference”3. Decision theory can be used to derive estimates from posteriors by minimizing decision risk/loss

Scope and Outline

1. Likelihood models and model comparison2. Frequentist and Bayesian approaches

2.1. Frequentist Inference2.1.1. Analytic - set the derivative of sample log-likelihood equal to zero and solve2.1.2. Numerical - use local or global optimization algorithms (e.g. steepest descent)

2.2. Bayesian Inference2.2.1. Choose a prior distribution2.2.2. Product of likelihood and prior yields unnormalized posterior distribution2.2.3. Select an objective / risk / loss and minimize its expected value over the posterior

3. Statistics and algorithms3.1. Regression: using the noise distribution to choose appropriate objective / risk / loss3.2. Estimator error: bias-variance trade-off, small bias can reduce variance and MSE3.3. Classification: choosing between generative, discriminative, or discriminant approaches

Topics covered

Topics not covered1. Stochastic process models / methods (e.g. Markov models)2. Time series analysis / 1-D signal processing, multidimensional signal processing3. Black/gray box models (e.g. artificial neural networks, decision trees, ensembles)4. Information theoretic approaches (maximum entropy, mutual information, K-L divergence)5. Control theory, duality theory, convex analysis, global optimization

Introduction to Statistical Inference

Frequentist Inference

Likelihood theory (Fisher ~1920)

Likelihood Theory

Likelihood functions are not probability density functions. The integral of a likelihood function is not in general 1.

variablefixed

fixed

variable

variable

Frequentist Inference & Decision Theory

Frequentist Risk/Loss Function:

Frequentist Risk Example: Squared Error

Frequentist Decision Theoretic Objective

Bayesian Inference

posterior distribution & minimum risk/loss

Bayesian Conditional Distributions

Bayesian Update, Inverse Problems

Prior Function and Regularization Term

Bayesian Posterior Loss

When the prior is improper, an estimator which minimizes the posterior expected loss is referred to as a generalized Bayes estimator.

Risk/Loss and Regularization Functions

Risk/Loss Functions and Derivatives

http://dl.acm.org.oca.ucsc.edu/citation.cfm?id=1281270



Point Estimation

maximum likelihood, least squares

Maximum Likelihood Estimation (MLE)

Linear Regression / Least Squares

Orthogonal Projections & Least Squares

http://en.wikipedia.org/wiki/Linear_least_squares_(mathematics)#Properties_of_the_least-squares_estimators

Nonlinear Regression / Least Squares

Generalized Linear Models

Maximum Likelihood Noise Dependence

MLE Estimator: Gamma Distribution

MVU Estimator: Mean of Uniform Noise

Posterior Mean and Maximum Posterior

Median Posterior Density

Example: Changepoint Detection

Example: Changepoint Detection

PM Example: Bayesian Prediction

Error Bars / Uncertainty

Fisher information, confidence regions

Negative Log-Likelihood & Uncertainty

Likelihood Geometry and Contours

Score Function & Fisher Information

Fisher Information / Precision

Estimator Error

proof of Cramer-Rao Lower-Bound:

http://ens.ewi.tudelft.nl/Education/courses/et4386/Slides/01.estimation.pdf



Bayesian Mean Squared Error

Bayesian Minimum Mean Squared Error

Classification

cluster analysis, supervised learning

Bayesian Classification

Bayes Classifier Risk/Loss

Bayesian Classifier Decision Error

Bayesian Classifier Posterior Density

Example: Support Vector Machine

Classifier Comparison Example

Feature Selection

ranking, filtering, greedy, sparse, hybrid

Introduction to Feature Selection

Feature Selection Approaches

Filtering / Subset Selection Algorithms

Exhaustive Search & Zero-norm Penalty

Basis Pursuit / LASSO / Elastic Net

Cluster Analysis

also known as unsupervised learning

Introduction to Cluster Analysis

Cluster Analysis Algorithm CategoriesHierarchical Non-hierarchical

Crisp

Fuzzy

k-means

spectral clustering, fuzzy k-means

agglomerative clustering

Hierarchical unsupervised fuzzy clustering (Geva 1999)

Hierarchical Agglomerative Clustering

Clustering Algorithm Comparisons

Model Selection

Cross-Validation, LR, ICs, model evidence

Parsimony and Occam’s Razor

Cross-Validation

Likelihood Ratio Test for Nested Models

Aikake & Bayesian Information Criteria

Deviance Information Criterion

Model Selection Discussion

Bayesian Model Selection

Bayes Factors & Bias-Variance Tradeoffs

Bayesian Model Selection Example

interpretations, debates, and paradoxes

Philosophy

Bertrand Paradox

Bertrand Paradox: Jaynes’ Solution

Bertrand Paradox: Disambiguation

References

ReferencesLecture Notes

2004 Figueiredo, Lecture Notes on Bayesian Estimation and Classification

Martinez et al., Estimation and Detection

Books

2006 Bishop, Pattern Recognition and Machine Learning

2009 Hastie et al, The Elements of Statistical Learning

2012 MacKay, Information Theory, Inference, and Learning Algorithms

Wiki

http://wikipedia.org

http://www.lx.it.pt/~mtf/learning/Bayes_lecture_notes.pdf

http://www.lx.it.pt/~mtf/learning/Bayes_lecture_notes.pdf

http://ens.ewi.tudelft.nl/Education/courses/et4386/

http://ens.ewi.tudelft.nl/Education/courses/et4386/

http://research.microsoft.com/en-us/um/people/cmbishop/PRML/

http://research.microsoft.com/en-us/um/people/cmbishop/PRML/

http://statweb.stanford.edu/~tibs/ElemStatLearn/

http://statweb.stanford.edu/~tibs/ElemStatLearn/

http://www.inference.phy.cam.ac.uk/itila/

http://www.inference.phy.cam.ac.uk/itila/



Documents

2014 arranged by Daniel Korenblum Bayes Impact · 2014-10-27 · References Lecture Notes 2004 Figueiredo, Lecture Notes on Bayesian Estimation and Classification Martinez et al.,