30
Logistic Regression Rong Jin

Middle Term Exam

  • Upload
    stefan

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

Middle Term Exam. 02/28 (Thursday), take home, turn in at noon time of 02/029 (Friday). Project. 03/14 (Phase 1): 10% of training data is available for algorithm development 04/04 (Phase 2): full training data and test examples are available - PowerPoint PPT Presentation

Citation preview

Page 1: Middle Term Exam

Logistic Regression

Rong Jin

Page 2: Middle Term Exam

Logistic Regression

• Generative models often lead to linear decision boundary

• Linear discriminatory model• Directly model the linear decision boundary

• w is the parameter to be decided

Page 3: Middle Term Exam

Logistic Regression

Page 4: Middle Term Exam

Logistic Regression

Learn parameter w by Maximum Likelihood Estimation (MLE)

• Given training data

Page 5: Middle Term Exam

Logistic Regression

• Convex objective function, global optimal• Gradient descent Classification error

Page 6: Middle Term Exam

Logistic Regression

• Convex objective function, global optimal• Gradient descent Classification error

Page 7: Middle Term Exam

Illustration of Gradient Descent

Page 8: Middle Term Exam

How to Decide the Step Size ?

• Back track line search

Page 9: Middle Term Exam

Example: Heart Disease

• Input feature x: age group id• Output y: if having heart disease

• y=1: having heart disease• y=-1: no heart disease

1: 25-29

2: 30-34

3: 35-39

4: 40-44

5: 45-49

6: 50-54

7: 55-59

8: 60-64

0

2

4

6

8

10

1 2 3 4 5 6 7 8

Age group

Num

ber o

f Peo

ple

No heart Disease

Heart disease

Page 10: Middle Term Exam

Example: Heart Disease

0

2

4

6

8

10

1 2 3 4 5 6 7 8

Age group

Num

ber o

f Peo

ple

No heart Disease

Heart disease

Page 11: Middle Term Exam

Example: Text Categorization

Learn to classify text into two categories• Input d: a document, represented by a word

histogram• Output y=1: +1 for political document, -1 for non-

political document

Page 12: Middle Term Exam

Example: Text Categorization

• Training data

Page 13: Middle Term Exam

Example 2: Text Classification

• Dataset: Reuter-21578• Classification accuracy

• Naïve Bayes: 77%• Logistic regression: 88%

Page 14: Middle Term Exam

Logistic Regression vs. Naïve Bayes

• Both are linear decision boundaries

• Naïve Bayes:

• Logistic regression: learn weights by MLE• Both can be viewed as modeling p(d|y)

• Naïve Bayes: independence assumption• Logistic regression: assume an exponential family

distribution for p(d|y) (a broad assumption)

Page 15: Middle Term Exam

Logistic Regression vs. Naïve Bayes

Page 16: Middle Term Exam

Discriminative vs. Generative

Discriminative ModelsModel P(y|x) Pros• Usually good performance Cons• Slow convergence• Expensive computation• Sensitive to noise data

Generative ModelsModel P(x|y)Pros• Usually fast converge• Cheap computation• Robust to noise dataCons• Usually performs worse

Page 17: Middle Term Exam

Overfitting Problem

Consider text categorization

• What is the weight for a word j appears in only one training document dk?

Page 18: Middle Term Exam

Overfitting Problem

Page 19: Middle Term Exam

Using regularization Without regularization

Iteration

Overfitting Problem

Decrease in the classification accuracy of test data

Page 20: Middle Term Exam

Solution: Regularization

Regularized log-likelihood

The effects of regularizer• Favor small weights• Guarantee bounded norm of w• Guarantee the unique solution

Page 21: Middle Term Exam

Regularized Logistic Regression

Using regularization Without regularization

Iteration

Classification performance by regularization

Page 22: Middle Term Exam

Regularization as Robust Optimization

• Assume each data point is unknown but bounded in a sphere of radius r and center xi

Page 23: Middle Term Exam

Sparse Solution by Lasso Regularization

RCV1 collection: • 800K documents• 47K unique words

Page 24: Middle Term Exam

Sparse Solution by Lasso Regularization

How to solve the optimization problem?• Subgradient descent• Minimax

Page 25: Middle Term Exam

Bayesian Treatment

• Compute the posterior distribution of w

• Laplacian approximation

Page 26: Middle Term Exam

Bayesian Treatment

• Laplacian approximation

Page 27: Middle Term Exam

Multi-class Logistic Regression

• How to extend logistic regression model to multi-class classification ?

Page 28: Middle Term Exam

Conditional Exponential Model

• Let classes be

• Need to learn

Normalization factor (partition function)

Page 29: Middle Term Exam

Conditional Exponential Model

• Learn weights ws by maximum likelihood estimation

• Any problem ?

Page 30: Middle Term Exam

Modified Conditional Exponential Model