Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Statistical InferenceBayes Impact2014
arranged by Daniel Korenblum
The Inference Problem
is estimation of an unknown quantity
Problem Solution / Method Algorithm / Statistic
Point Estimation Maximum Likelihood (ML) Gradient Descent
Minimum-Variance Unbiased (MVU) Estimator Least Squares
Maximum Posterior (MAP/GMLE) Gradient Descent
Posterior Mean (PM) Markov-chain Monte Carlo (MCMC)
Error Bars / Confidence(Estimator Error)
Confidence Interval / Region Covariance/Information, Resampling
Credibility Interval / Region Evidentiary Credible Region (2014)
Classification and Clustering(Pattern Recognition)
Unsupervised Learning Cluster Analysis
(Semi) Supervised Learning Discriminant, Generative, SVM, kNN, trees
Feature Selection Ranking, Filtering, Greedy, Sparse
Model Selection Hypothesis Testing Significance Tests (Holy Trinity)
Model Evidence Marginal Likelihood
Inference/Estimation Subject Areas
1. Frequentist inference as an optimization problem: maximize the likelihood over all observations2. Bayesian inference as distribution estimation: the posterior distribution estimate is “the inference”3. Decision theory can be used to derive estimates from posteriors by minimizing decision risk/loss
Scope and Outline
1. Likelihood models and model comparison2. Frequentist and Bayesian approaches
2.1. Frequentist Inference2.1.1. Analytic - set the derivative of sample log-likelihood equal to zero and solve2.1.2. Numerical - use local or global optimization algorithms (e.g. steepest descent)
2.2. Bayesian Inference2.2.1. Choose a prior distribution2.2.2. Product of likelihood and prior yields unnormalized posterior distribution2.2.3. Select an objective / risk / loss and minimize its expected value over the posterior
3. Statistics and algorithms3.1. Regression: using the noise distribution to choose appropriate objective / risk / loss3.2. Estimator error: bias-variance trade-off, small bias can reduce variance and MSE3.3. Classification: choosing between generative, discriminative, or discriminant approaches
Topics covered
Topics not covered1. Stochastic process models / methods (e.g. Markov models)2. Time series analysis / 1-D signal processing, multidimensional signal processing3. Black/gray box models (e.g. artificial neural networks, decision trees, ensembles)4. Information theoretic approaches (maximum entropy, mutual information, K-L divergence)5. Control theory, duality theory, convex analysis, global optimization
Introduction to Statistical Inference
Frequentist Inference
Likelihood theory (Fisher ~1920)
Likelihood Theory
Likelihood functions are not probability density functions. The integral of a likelihood function is not in general 1.
variablefixed
fixed
variable
variable
Frequentist Inference & Decision Theory
Frequentist Risk/Loss Function:
Frequentist Risk Example: Squared Error
Frequentist Decision Theoretic Objective
Bayesian Inference
posterior distribution & minimum risk/loss
Bayesian Conditional Distributions
Bayesian Update, Inverse Problems
Prior Function and Regularization Term
Bayesian Posterior Loss
When the prior is improper, an estimator which minimizes the posterior expected loss is referred to as a generalized Bayes estimator.
Risk/Loss and Regularization Functions
Risk/Loss Functions and Derivatives
http://dl.acm.org.oca.ucsc.edu/citation.cfm?id=1281270
Point Estimation
maximum likelihood, least squares
Maximum Likelihood Estimation (MLE)
Linear Regression / Least Squares
Orthogonal Projections & Least Squares
http://en.wikipedia.org/wiki/Linear_least_squares_(mathematics)#Properties_of_the_least-squares_estimators
Nonlinear Regression / Least Squares
Generalized Linear Models
Maximum Likelihood Noise Dependence
MLE Estimator: Gamma Distribution
MVU Estimator: Mean of Uniform Noise
Posterior Mean and Maximum Posterior
Median Posterior Density
Example: Changepoint Detection
Example: Changepoint Detection
PM Example: Bayesian Prediction
Error Bars / Uncertainty
Fisher information, confidence regions
Negative Log-Likelihood & Uncertainty
Likelihood Geometry and Contours
Score Function & Fisher Information
Fisher Information / Precision
Estimator Error
proof of Cramer-Rao Lower-Bound:
http://ens.ewi.tudelft.nl/Education/courses/et4386/Slides/01.estimation.pdf
Bayesian Mean Squared Error
Bayesian Minimum Mean Squared Error
Classification
cluster analysis, supervised learning
Bayesian Classification
Bayes Classifier Risk/Loss
Bayesian Classifier Decision Error
Bayesian Classifier Posterior Density
Example: Support Vector Machine
Classifier Comparison Example
Feature Selection
ranking, filtering, greedy, sparse, hybrid
Introduction to Feature Selection
Feature Selection Approaches
Filtering / Subset Selection Algorithms
Exhaustive Search & Zero-norm Penalty
Basis Pursuit / LASSO / Elastic Net
Cluster Analysis
also known as unsupervised learning
Introduction to Cluster Analysis
Cluster Analysis Algorithm CategoriesHierarchical Non-hierarchical
Crisp
Fuzzy
k-means
spectral clustering, fuzzy k-means
agglomerative clustering
Hierarchical unsupervised fuzzy clustering (Geva 1999)
Hierarchical Agglomerative Clustering
Clustering Algorithm Comparisons
Model Selection
Cross-Validation, LR, ICs, model evidence
Parsimony and Occam’s Razor
Cross-Validation
Likelihood Ratio Test for Nested Models
Aikake & Bayesian Information Criteria
Deviance Information Criterion
Model Selection Discussion
Bayesian Model Selection
Bayes Factors & Bias-Variance Tradeoffs
Bayesian Model Selection Example
interpretations, debates, and paradoxes
Philosophy
Bertrand Paradox
Bertrand Paradox: Jaynes’ Solution
Bertrand Paradox: Disambiguation
References
ReferencesLecture Notes
2004 Figueiredo, Lecture Notes on Bayesian Estimation and Classification
Martinez et al., Estimation and Detection
Books
2006 Bishop, Pattern Recognition and Machine Learning
2009 Hastie et al, The Elements of Statistical Learning
2012 MacKay, Information Theory, Inference, and Learning Algorithms
Wiki
http://wikipedia.org