27
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/ 5: Multivariate Regression 1 CSC 4510 - M.A. Papalaskari - Villanova University The slides in this presentation are adapted from: Andrew Ng’s ML course http://www.ml-class.org/

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

Embed Size (px)

Citation preview

Page 1: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

CSC 4510 – Machine LearningDr. Mary-Angela PapalaskariDepartment of Computing SciencesVillanova University

Course website: www.csc.villanova.edu/~map/4510/

5: Multivariate Regression

1CSC 4510 - M.A. Papalaskari - Villanova University

The slides in this presentation are adapted from:• Andrew Ng’s ML course http://www.ml-class.org/

Page 2: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

Regression topics so far• Introduction to linear regression• Intuition – least squares approximation• Intuition – gradient descent algorithm• Hands on: Simple example using excel• How to apply gradient descent to minimize the cost

function for regression• linear algebra refresher

CSC 4510 - M.A. Papalaskari - Villanova University 2

Page 3: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

What’s next?• Multivariate regression• Gradient descent revisited

– Feature scaling and normalization– Selecting a good value for α

• Non-linear regression• Solving for analytically (Normal Equation)• Using Octave to solve regression problems

CSC 4510 - M.A. Papalaskari - Villanova University 3

Page 4: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

Andrew Ng

Size (feet2) Number of bedrooms

Number of floors

Age of home (years)

Price ($1000)

2104 5 1 45 4601416 3 2 40 2321534 3 2 30 315852 2 1 36 178… … … … …

Multiple features (variables).

CSC 4510 - M.A. Papalaskari - Villanova University 5

Page 5: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

Andrew Ng

Size (feet2) Number of bedrooms

Number of floors

Age of home (years)

Price ($1000)

2104 5 1 45 4601416 3 2 40 2321534 3 2 30 315852 2 1 36 178… … … … …

Multiple features (variables).

Notation:= number of features= input (features) of training example.

= value of feature in training example.CSC 4510 - M.A. Papalaskari - Villanova

University 6

Page 6: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

Andrew Ng

Size (feet2) Price ($1000)

2104 4601416 2321534 315852 178… …

Multiple features (variables).

CSC 4510 - M.A. Papalaskari - Villanova University 7

Page 7: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

CSC 4510 - M.A. Papalaskari - Villanova University 8

For convenience of notation, define .

Multivariate linear regression

Hypothesis:Previously:

Now:

Page 8: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

CSC 4510 - M.A. Papalaskari - Villanova University 9

Hypothesis:

Cost function:

Parameters:

(simultaneously update for every )

Repeat

Gradient descent:

Page 9: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

CSC 4510 - M.A. Papalaskari - Villanova University 10

(simultaneously update )

Gradient Descent

Repeat

Previously (n=1):

Page 10: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

CSC 4510 - M.A. Papalaskari - Villanova University 11

(simultaneously update )

Gradient Descent

Repeat

Previously (n=1):

New algorithm :Repeat

(simultaneously update for )

Page 11: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

CSC 4510 - M.A. Papalaskari - Villanova University 12

(simultaneously update )

Gradient Descent

Repeat

Previously (n=1):

New algorithm :Repeat

(simultaneously update for )

Page 12: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

CSC 4510 - M.A. Papalaskari - Villanova University 13

E.g. = size (0-2000 feet2)

= number of bedrooms (1-5)

Feature ScalingIdea: Make sure features are on a similar scale.

size (feet2)

number of bedrooms

Get every feature into range

Page 13: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

CSC 4510 - M.A. Papalaskari - Villanova University 14

E.g. = size (0-2000 feet2)

= number of bedrooms (1-5)

Feature ScalingIdea: Make sure features are on a similar scale.

Replace with to make features have approximately zero mean (Do not apply to ).Mean normalization

E.g.

Page 14: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

CSC 4510 - M.A. Papalaskari - Villanova University 15

Gradient descent

- “Debugging”: How to make sure gradient descent is working correctly.

- How to choose learning rate .

Page 15: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

CSC 4510 - M.A. Papalaskari - Villanova University 16

0 100 200 300 400

No. of iterations

Making sure gradient descent is working correctly.

- For sufficiently small , should decrease on every iteration.- But if is too small, gradient descent can be slow to converge.

Declare convergence if decreases by less than in one iteration?

Page 16: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

CSC 4510 - M.A. Papalaskari - Villanova University 17

Summary: Choosing - If is too small: slow convergence.- If is too large: may not decrease on

every iteration; may not converge.

To choose , try

Page 17: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

Andrew Ng

Housing prices prediction

CSC 4510 - M.A. Papalaskari - Villanova University 18

Page 18: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

Andrew Ng

Polynomial regression

Price(y)

Size (x)

CSC 4510 - M.A. Papalaskari - Villanova University 19

Page 19: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

Andrew Ng

Choice of features

Price(y)

Size (x)

CSC 4510 - M.A. Papalaskari - Villanova University 20

Page 20: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

Andrew Ng

Gradient Descent

Normal equation: Method to solve for analytically.

CSC 4510 - M.A. Papalaskari - Villanova University 21

Page 21: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

Andrew Ng

Intuition: If 1D

Solve for

(for every )

CSC 4510 - M.A. Papalaskari - Villanova University 22

Page 22: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

Andrew Ng

Size (feet2) Number of bedrooms

Number of floors

Age of home (years)

Price ($1000)

1 2104 5 1 45 4601 1416 3 2 40 2321 1534 3 2 30 3151 852 2 1 36 178

Size (feet2) Number of bedrooms

Number of floors

Age of home (years)

Price ($1000)

2104 5 1 45 4601416 3 2 40 2321534 3 2 30 315852 2 1 36 178

Examples:

CSC 4510 - M.A. Papalaskari - Villanova University 23

Page 23: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

Andrew Ng

examples ; features.

E.g. If

CSC 4510 - M.A. Papalaskari - Villanova University 25

Page 24: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

Andrew Ng

is inverse of matrix .

Octave: pinv(X’*X)*X’*y

CSC 4510 - M.A. Papalaskari - Villanova University 26

Page 25: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

Andrew Ng

training examples, features.Gradient Descent Normal Equation

• No need to choose .• Don’t need to iterate.

• Need to choose . • Needs many iterations.• Works well even

when is large.• Need to compute

• Slow if is very large.

CSC 4510 - M.A. Papalaskari - Villanova University 27

Page 26: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

CSC 4510 - M.A. Papalaskari - Villanova University 28

Notes on Supervised learning and Regression

http://see.stanford.edu/materials/aimlcs229/cs229-notes1.pdf

Octave http://www.gnu.org/software/octave/

Wiki: http://www.octave.org/wiki/index.php?title=Main_Page

documentation:http://www.gnu.org/software/octave/doc/interpreter/

Page 27: CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: map/4510

CSC 4510 - M.A. Papalaskari - Villanova University 29

Exercise For next class: 1. Download and install Octave (Alternative: if you have MATLAB, you can use it instead.)2. Verify that it is working by typing in an Octave command window:

x = [0 1 2 3]y = [0 2 4 6]plot(x,y) This example defines two vectors, x y and should display a plot showing a straight line (the line y=2x). If you get an error at this point, it may be that gnuplot is not installed or cannot access your display. If you are unable to get this to work, you can still do the rest of this exercise, because it does not involve any plotting (just restart Octave). You might refer to the Octave wiki for installation help but if you are stuck, you can get some help troubleshooting this on Friday afternoon 3-4pm in the software engineering lab (mendel 159).

3. Create a few matrices and vectors, eg:A = [1 2; 3 4; 5 6]V = [3 5 -1 0 7]

4. Try some of the elementary matrix and vector operations from our linear algebra slides (adding, multiplying between matrices, vectors and scalars)

5. Print out a log of your session