26
FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 1 / 15

FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

FA17 10-701 Homework 2 Recitation 2

Logan Brooks,Matthew Oresky,Guoquan Zhao

October 2, 2017

Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 1 / 15

Page 2: FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

Outline

1 Code VectorizationWhyLinear Regression Example

2 Regularization

Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 2 / 15

Page 3: FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

Outline

1 Code VectorizationWhyLinear Regression Example

2 Regularization

Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 3 / 15

Page 4: FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

How to Vectorize CodeLinear Regression Example

The goal in vectorization is to write code that avoids loops and useswhole-array operations. Why?

Efficient!

Advanced algorithms to do computation on matrices and vectors.

Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 4 / 15

Page 5: FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

How to Vectorize CodeLinear Regression Example

The goal in vectorization is to write code that avoids loops and useswhole-array operations. Why?

Efficient!

Advanced algorithms to do computation on matrices and vectors.

Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 4 / 15

Page 6: FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

Outline

1 Code VectorizationWhyLinear Regression Example

2 Regularization

Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 5 / 15

Page 7: FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

Linear Regression Example

Objective Function : L

∑ni=1(X (i)Tw − yi )

2

X (i)T is ith row of data. 1 x p

w is a p x 1 column vector.

y is a n x 1 column vector.dLdwj

=?

dLdwj

=∑n

i=1 2X(i)j (X (i)Tw − yi )

Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 6 / 15

Page 8: FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

Linear Regression Example

Objective Function : L∑ni=1(X (i)Tw − yi )

2

X (i)T is ith row of data. 1 x p

w is a p x 1 column vector.

y is a n x 1 column vector.dLdwj

=?

dLdwj

=∑n

i=1 2X(i)j (X (i)Tw − yi )

Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 6 / 15

Page 9: FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

Linear Regression Example

Objective Function : L∑ni=1(X (i)Tw − yi )

2

X (i)T is ith row of data. 1 x p

w is a p x 1 column vector.

y is a n x 1 column vector.dLdwj

=?

dLdwj

=∑n

i=1 2X(i)j (X (i)Tw − yi )

Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 6 / 15

Page 10: FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

Linear Regression Example

dLdwj

=∑n

i=1 2X(i)j (X (i)Tw − yi )

Octave Code:

How can we vectorize it?

One way: Directly rewrite the formula. Remove the summation. Wecan easily remove one of the for loop to calculate xw because xw canbe changed to X (i , :) ∗ w .

Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 7 / 15

Page 11: FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

Linear Regression Example

dLdwj

=∑n

i=1 2X(i)j (X (i)Tw − yi )

Octave Code:

How can we vectorize it?

One way: Directly rewrite the formula. Remove the summation. Wecan easily remove one of the for loop to calculate xw because xw canbe changed to X (i , :) ∗ w .

Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 7 / 15

Page 12: FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

Linear Regression Example

dLdwj

=∑n

i=1 2X(i)j (X (i)Tw − yi )

Octave Code:

How can we vectorize it?

One way: Directly rewrite the formula. Remove the summation. Wecan easily remove one of the for loop to calculate xw because xw canbe changed to X (i , :) ∗ w .

Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 7 / 15

Page 13: FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

Linear Regression Example

Sometimes it’s not very intuitive.

Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 8 / 15

Page 14: FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

Linear Regression Example

dLdwj

=∑n

i=1 2X(i)j (X (i)Tw − yi )

Another way: Build the matrix from the formula. We already knowthat if the formula can be rewritten in a matrix form, each entry canbe calculated as the summation of some multiplication among a rowand a column.

Pretend that you know the result is given by matrix multiplication.Construct the matrix by figuring out which element should be putinto which slot. Draw two empty matrices and fill in the blanks.

Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 9 / 15

Page 15: FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

Linear Regression Example

dLdwj

=∑n

i=1 2X(i)j (X (i)Tw − yi )

Another way: Build the matrix from the formula. We already knowthat if the formula can be rewritten in a matrix form, each entry canbe calculated as the summation of some multiplication among a rowand a column.

Pretend that you know the result is given by matrix multiplication.Construct the matrix by figuring out which element should be putinto which slot. Draw two empty matrices and fill in the blanks.

Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 9 / 15

Page 16: FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

Linear Regression Example

dLdwj

=∑n

i=1 2X(i)j (X (i)Tw − yi )

Another way: Build the matrix from the formula. We already knowthat if the formula can be rewritten in a matrix form, each entry canbe calculated as the summation of some multiplication among a rowand a column.

Pretend that you know the result is given by matrix multiplication.Construct the matrix by figuring out which element should be putinto which slot. Draw two empty matrices and fill in the blanks.

Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 9 / 15

Page 17: FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

Linear Regression Example

dLdwj

=∑n

i=1 2X(i)j (X (i)Tw − yi )

According to the formula, jth row of dL/dw can be written as thesum of multiplication of a row and a column. The jth row could be

[X(1)j ,X

(2)j , ...,X

(n)j ]. And the column could be

[(X 1Tw − y1), (X 2Tw − y2), ..., (X nTw − yn)]T

If we stack each data point row by row,

X =

X (1)T

X (2)T

...

X (n)T

=

X

(1)1 ,X

(1)2 , . . . ,X

(1)p

X(2)1 ,X

(2)2 , . . . ,X

(2)p

...

X(n)1 ,X

(n)2 , . . . ,X

(n)p

(1)

We can see that jth row of XT (or the jth column of X) is the sameas the row that we construct above! So the first matrix should be XT

And the column could be calculated by Xw − ydLdwj

=∑n

i=1 2X(i)j (X (i)Tw − yi ) = 2XT (Xw − y)

Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 10 / 15

Page 18: FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

Linear Regression Example

dLdwj

=∑n

i=1 2X(i)j (X (i)Tw − yi )

According to the formula, jth row of dL/dw can be written as thesum of multiplication of a row and a column. The jth row could be

[X(1)j ,X

(2)j , ...,X

(n)j ]. And the column could be

[(X 1Tw − y1), (X 2Tw − y2), ..., (X nTw − yn)]T

If we stack each data point row by row,

X =

X (1)T

X (2)T

...

X (n)T

=

X

(1)1 ,X

(1)2 , . . . ,X

(1)p

X(2)1 ,X

(2)2 , . . . ,X

(2)p

...

X(n)1 ,X

(n)2 , . . . ,X

(n)p

(1)

We can see that jth row of XT (or the jth column of X) is the sameas the row that we construct above! So the first matrix should be XT

And the column could be calculated by Xw − ydLdwj

=∑n

i=1 2X(i)j (X (i)Tw − yi ) = 2XT (Xw − y)

Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 10 / 15

Page 19: FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

Linear Regression Example

dLdwj

=∑n

i=1 2X(i)j (X (i)Tw − yi ) = 2XT (Xw − y)

Use same method for logistic regression if you cannot pass theAutolab’s autograder.

Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 11 / 15

Page 20: FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

Question 1

Which of the following is true about regularization?

a) One of the goals of regularization is to combat underfitting.

b) The L0 norm is rarely used in practice because it is non-differentiableand non-convex.

c) A model with regularization fits the training data better than a modelwithout regularization.

d) One way to understand regularization is that it makes the learningalgorithm prefer ”more complex” solutions.

Answer

b. The goal of regularization is to combat overfitting, so a model withoutregularization will better fit the training data. Regularization makes thesolution ”simpler” by penalizing higher or non-zero weights.

Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 12 / 15

Page 21: FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

Question 1

Which of the following is true about regularization?

a) One of the goals of regularization is to combat underfitting.

b) The L0 norm is rarely used in practice because it is non-differentiableand non-convex.

c) A model with regularization fits the training data better than a modelwithout regularization.

d) One way to understand regularization is that it makes the learningalgorithm prefer ”more complex” solutions.

Answer

b. The goal of regularization is to combat overfitting, so a model withoutregularization will better fit the training data. Regularization makes thesolution ”simpler” by penalizing higher or non-zero weights.

Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 12 / 15

Page 22: FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

Question 2

For a typical minimization problem with L1 regularization:

minβf (β) + λ||β||1

where f is the objective function, which statement is true about λ?

a) Larger λ leads to a sparser solution of β and a possible underfit of thedata.

b) Larger λ leads to a sparser solution of β and a possible overfit of thedata.

c) Larger λ leads to a denser solution of β and a possible underfit of thedata.

d) Larger λ leads to a denser solution of β and a possible overfit of thedata.

Answer

a) Larger λ leads to a sparser solution of β and a possible underfit of thedata.

Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 13 / 15

Page 23: FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

Question 2

For a typical minimization problem with L1 regularization:

minβf (β) + λ||β||1

where f is the objective function, which statement is true about λ?

a) Larger λ leads to a sparser solution of β and a possible underfit of thedata.

b) Larger λ leads to a sparser solution of β and a possible overfit of thedata.

c) Larger λ leads to a denser solution of β and a possible underfit of thedata.

d) Larger λ leads to a denser solution of β and a possible overfit of thedata.

Answer

a) Larger λ leads to a sparser solution of β and a possible underfit of thedata.

Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 13 / 15

Page 24: FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

Question 3

Consider a binary classification problem. Suppose I have trained a modelon a linearly separable training set, and then I get a new labeled data pointwhich is correctly classified by the model, and far away from the decisionboundary. If I now append this new point to my training set and re-train,would the learned decision boundary change for the following models?

a) Perceptron

b) Logistic Regression (using gradient descent)

Answer

a) No, if Perceptron has converged already, a correctly classified point willhave no effect.

b) Yes, Logistic regression will consider this data point’s gradient in itsgradient calculations and this will effect the learned weight vector. (Theweight vector likely won’t change by a meaningful amount however.)

Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 14 / 15

Page 25: FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

Question 3

Consider a binary classification problem. Suppose I have trained a modelon a linearly separable training set, and then I get a new labeled data pointwhich is correctly classified by the model, and far away from the decisionboundary. If I now append this new point to my training set and re-train,would the learned decision boundary change for the following models?

a) Perceptron

b) Logistic Regression (using gradient descent)

Answer

a) No, if Perceptron has converged already, a correctly classified point willhave no effect.

b) Yes, Logistic regression will consider this data point’s gradient in itsgradient calculations and this will effect the learned weight vector. (Theweight vector likely won’t change by a meaningful amount however.)

Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 14 / 15

Page 26: FA17 10-701 Homework 2 Recitation 210701/slides/10-701_Fall_2017_Recitation...FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew

For Further Reading I

https://www.gnu.org/software/octave/doc/v4.2.1/Basic-Vectorization.htmlOctave Doc.

Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 15 / 15