65
Lecture 2: Supervised Learning Linear regression with one variable 1 Reading: Chapter 17, “Bayesian Reasoning and Machine Learning” Page 345-348 Chapter 03, “Pattern Recognition and Machine Learning” of Christopher M. Bishop, Page 137 Chapter 11, “Data Mining A Knowledge Discovery Approach”, from page 346 Chapter 18 , “Artificial Intelligence A Modern Approach”, from page 718

02 Linear Regresstion with one variable- andrew ng

Embed Size (px)

Citation preview

Page 1: 02 Linear Regresstion with one variable- andrew ng

Lecture 2:Supervised LearningLinear regression with one variable

1

Reading:• Chapter 17, “Bayesian Reasoning and Machine Learning” Page 345-348• Chapter 03, “Pattern Recognition and Machine Learning” of Christopher M. Bishop, Page 137• Chapter 11, “Data Mining A Knowledge Discovery Approach”, from page 346• Chapter 18 , “Artificial Intelligence A Modern Approach”, from page 718

Page 2: 02 Linear Regresstion with one variable- andrew ng

Model representation

2

Page 3: 02 Linear Regresstion with one variable- andrew ng

500 1000 1500 2000 2500 30000

100000

200000

300000

400000

500000

500 1000 1500 2000 2500 30000

100000

200000

300000

400000

500000Housing Prices(Portland, OR)

Price(in 1000s of dollars)

Size (feet2)

Supervised Learning

Given the “right answer” for each example in the data.

Regression Problem

Predict real-valued output

Classification Problem

Discrete valued output3

Page 4: 02 Linear Regresstion with one variable- andrew ng

Notation:

m = Number of training examples x’s = “input” variable / features y’s = “output” variable / “target” variable

Size in feet2 (x) Price ($) in 1000's (y)

2104 4601416 2321534 315852 178… …

Training set ofhousing prices(Portland, OR)

(x, y) – one training example(x(i), y(i)) – ith training example i is an index to training set

x(1) = 2104x(3) = 1534y(4) = 178y(2) = 232

4

m

Page 5: 02 Linear Regresstion with one variable- andrew ng

Training Set

Learning Algorithm

hSize of house

Estimated price

How do we represent h ?

Linear regression with one variableUnivariate linear regression

5

x Hypothesis Estimated value of y

h is a function h maps from x’s to y’s

hθ(x) = θ0 + θ1x

Shorthand: h(x)

hθ(x) = θ0 + θ1x

Page 6: 02 Linear Regresstion with one variable- andrew ng

In summary

• A hypothesis h takes in some variable(s)• Uses parameters determined by a learning

system• Outputs a prediction based on that input

6

Page 7: 02 Linear Regresstion with one variable- andrew ng

Cost function

• A cost function let us figure out how to fit the best straight line to our data

7

Page 8: 02 Linear Regresstion with one variable- andrew ng

How to choose ‘s ?

Training Set

Hypothesis:

‘s: Parameters

Size in feet2 (x) Price ($) in 1000's (y)

2104 4601416 2321534 315852 178… …

8

m

Page 9: 02 Linear Regresstion with one variable- andrew ng

0 1 2 30

1

2

3

0 1 2 30

1

2

3

0 1 2 30

1

2

3

9

h(x) = 1.5 + 0.xh(x) = 0 + 0.5.x

h(x) = 1 + 0.5.x

Different parameter values give different functions

Positive slope if θ1> 0

Page 10: 02 Linear Regresstion with one variable- andrew ng

10

• hθ(x) is a "y imitator" • Tries to convert the x into y• Considering we already have y we can

evaluate how well hθ(x) does this

y

x

Idea: Choose so that is close to for our training examples

Page 11: 02 Linear Regresstion with one variable- andrew ng

11

y

x

Idea: Choose so that is close to for our training examples

Minimization problem

Minimal deviation of x from y

Page 12: 02 Linear Regresstion with one variable- andrew ng

12

h𝜃 (𝑥 (𝑖 ) )=𝜃0+𝜃1𝑥(𝑖)

𝐽 (𝜃0 , 𝜃1 )= 12𝑚∑

𝑖=1

𝑚

(h𝜃 (𝑥 (𝑖 ) )− 𝑦 ( 𝑖 ))2

minimize𝜃0𝜃 1

𝐽 (𝜃0 ,𝜃1 )Cost function

Page 13: 02 Linear Regresstion with one variable- andrew ng

𝐽 (𝜃0 , 𝜃1 )= 12𝑚∑

𝑖=1

𝑚

(h𝜃 (𝑥 (𝑖 ) )− 𝑦 ( 𝑖 ))2

• This cost function is also called the squared error cost function– Reasonable choice for most regression functions– Probably most commonly used function

Page 14: 02 Linear Regresstion with one variable- andrew ng

Hypothesis:

Parameters:

Cost Function:

Goal:

Simplified versionCost function intuition I

14

𝜃0=0

0 1 2 30

1

2

3

0 1 2 30

1

2

3

Page 15: 02 Linear Regresstion with one variable- andrew ng

0 1 2 30

1

2

3

y

x

(for fixed , this is a function of x) (function of the parameter )

-0.5 0 0.5 1 1.5 2 2.50

1

2

3

15

𝐽 (𝜃1 )= 12𝑚∑

𝑖=1

𝑚

(h𝜃 (𝑥 (𝑖 ) )−𝑦 (𝑖 ) )2

𝐽 (𝜃1 )= 12×3

∑𝑖=1

𝑚

(𝜃1𝑥−𝑦 (𝑖 ) )2=16

(02+02+02 )=0𝐽 (1 )=0

Page 16: 02 Linear Regresstion with one variable- andrew ng

0 1 2 30

1

2

3

y

x

(for fixed , this is a function of x) (function of the parameter )

-0.5 0 0.5 1 1.5 2 2.50

1

2

3

16

𝐽 (𝜃1 )= 12𝑚∑

𝑖=1

𝑚

(𝜃1𝑥(𝑖)− 𝑦 (𝑖 ) )2

¿12×3

((0 .5−1)2+(1−2)2+(1.5−3)2 )=16

(0.52+12+1.52 )=3.66≈0.58

𝐽 (0.5 )=0.58

Page 17: 02 Linear Regresstion with one variable- andrew ng

-0.5 0 0.5 1 1.5 2 2.50

1

2

3

y

x

(for fixed , this is a function of x) (function of the parameter )

0 1 2 30

1

2

3

17

𝐽 (0 )≈2.3𝐽 (𝜃1 )= 12𝑚∑

𝑖=1

𝑚

(𝜃1𝑥(𝑖)− 𝑦 (𝑖 ) )2

¿16

((0×1−1)2+(0×2−2)2+(0×3−3)2 )=16

(1+4+9 )=146≈ 2.3

Page 18: 02 Linear Regresstion with one variable- andrew ng

-0.5 0 0.5 1 1.5 2 2.50

1

2

3

y

x

(for fixed , this is a function of x) (function of the parameter )

0 1 2 30

1

2

3

18𝐽 (−0 .5 )≈5.15

𝜃1=−0.5

Page 19: 02 Linear Regresstion with one variable- andrew ng

• If we compute a range of values plot• vs we get a polynomial (looks like

a quadratic)

19

Page 20: 02 Linear Regresstion with one variable- andrew ng

Hypothesis:

Parameters:

Cost Function:

Goal:

20

Cost function intuition II

Page 21: 02 Linear Regresstion with one variable- andrew ng

(for fixed , this is a function of x) (function of the parameters )

500 1000 1500 2000 2500 30000

100000

200000

300000

400000

500000

Price ($) in 1000’s

Size in feet2 (x)

21

𝜃0𝜃1?? ?

?? ?

Page 22: 02 Linear Regresstion with one variable- andrew ng

22

Page 23: 02 Linear Regresstion with one variable- andrew ng

(for fixed , this is a function of x) (function of the parameters )

23Contour Plot

Page 24: 02 Linear Regresstion with one variable- andrew ng

(for fixed , this is a function of x) (function of the parameters )

24

Page 25: 02 Linear Regresstion with one variable- andrew ng

(for fixed , this is a function of x) (function of the parameters )

25

Page 26: 02 Linear Regresstion with one variable- andrew ng

(for fixed , this is a function of x) (function of the parameters )

26

Page 27: 02 Linear Regresstion with one variable- andrew ng

• Doing this manually is painful

• What we really want is an efficient algorithm

for finding the minimum J for θ0 and θ1

27

Page 28: 02 Linear Regresstion with one variable- andrew ng

Gradient descent algorithm

• Minimize cost function J• Used all over machine learning

for minimization

28

Page 29: 02 Linear Regresstion with one variable- andrew ng

Have some function

Want

Outline:

• Start with some

• Keep changing to reduce

until we hopefully end up at a minimum

29

Gradient descent algorithm

Page 30: 02 Linear Regresstion with one variable- andrew ng

• Local Search for optimization :– hill climbing, simulated annealing, Gradient

descent algorithm, etc

30

Page 31: 02 Linear Regresstion with one variable- andrew ng

31

Local Search Methods• Applicable when seeking Goal State & don't care how to

get there. E.g.,– N-queens, – finding shortest/cheapest round trips

• (Travel Salesman Problem, Vehicle Routing Problem)– finding models of propositional formulae (SAT solvers)– VLSI layout, planning, scheduling, time-tabling, . . . – map coloring, – resource allocation– protein structure prediction– genome sequence assembly

Page 32: 02 Linear Regresstion with one variable- andrew ng

32

Local search Key idea (surprisingly simple):

1. Select (random) initial state (generate an initial guess)

2. Make local modification to improve current state (evaluate current state and move to other states)

3. Repeat Step 2 until goal state found (or out of time)

Page 33: 02 Linear Regresstion with one variable- andrew ng

Have some function

Want

Outline:

• Start with some

• Keep changing to reduce

until we hopefully end up at a minimum

Gradient descent algorithm

33

Page 34: 02 Linear Regresstion with one variable- andrew ng

1

0

J(0,1)

34

Page 35: 02 Linear Regresstion with one variable- andrew ng

0

1

J(0,1)

35

Page 36: 02 Linear Regresstion with one variable- andrew ng

Gradient descent algorithm

36

: Learning Rate (Should be a small number) • Large number:= Huge steps• Small number := baby steps

Derivative term

: Learning Rate

Page 37: 02 Linear Regresstion with one variable- andrew ng

Gradient descent algorithm

Correct: Simultaneous update of Incorrect:

37

Page 38: 02 Linear Regresstion with one variable- andrew ng

Gradient descent intuition

• To understand the intuition, we'll return to a simpler function where we minimize one parameter to help explain the algorithm in more detail

38

h𝑤 𝑒𝑟𝑒𝜃1∈𝑅

Page 39: 02 Linear Regresstion with one variable- andrew ng

39

Two key terms in the algorithm• Derivative term

Page 40: 02 Linear Regresstion with one variable- andrew ng

40

𝜃1=𝜃1−𝛼𝑑𝑑𝜃1

𝐽 (𝜃1)

𝜃1=𝜃1−𝛼 (+𝑣𝑒𝑛𝑜 .)≥0

Partial derivative vs. derivative• Use partial derivative when we have multiple variables but only

derive with respect to one• Use derivative when we are deriving with respect to all the variables

Derivative: it takes the tangent to the point (the straight red line) and calculates the slop of this tangent line. Slop = vertical line / horizontal line

Page 41: 02 Linear Regresstion with one variable- andrew ng

41

𝜃1=𝜃1−𝛼𝑑𝑑𝜃1

𝐽 (𝜃1)

𝜃1=𝜃1−𝛼 .(−𝑣𝑒𝑛𝑜 .)≤0

Page 42: 02 Linear Regresstion with one variable- andrew ng

• Positive value• Negative value• Zero value

• At each point, the line is always tangent to the curve• Its slope is the derivative

Page 43: 02 Linear Regresstion with one variable- andrew ng

If α is too small, gradient descent can be slow.

If α is too large, gradient descent can overshoot the minimum. It may fail to converge.

43

: Learning Rate

Page 44: 02 Linear Regresstion with one variable- andrew ng

at local optima

Current value of

44

Question: When you get to a local minimum

Derivative term = 0θ1 = θ1- 0 

So θ1 remains the same

Page 45: 02 Linear Regresstion with one variable- andrew ng

Gradient descent can converge to a local minimum, even with the learning rate α fixed.

As we approach a local minimum, gradient descent will automatically take smaller steps. So, no need to decrease α over time.

45

Page 46: 02 Linear Regresstion with one variable- andrew ng

Gradient descent algorithm Linear Regression Model

Gradient descent for linear regression

46

Page 47: 02 Linear Regresstion with one variable- andrew ng

47

Gradient descent for linear regression

Page 48: 02 Linear Regresstion with one variable- andrew ng

48

𝜕𝜕𝜃 𝑗

12𝑚∑

𝑖=1

𝑚

(h𝜃 (𝑥 (𝑖 ) )−𝑦 (𝑖 ) )2

1𝑚∑

𝑖=1

𝑚

(h𝜃(𝑥(𝑖 ))− 𝑦 ( 𝑖 ))

1𝑚∑

𝑖=1

𝑚

(h𝜃(𝑥(𝑖 ))− 𝑦 ( 𝑖 )) .𝑥(𝑖)

¿ 𝜕𝜕𝜃 𝑗

12𝑚∑

𝑖=1

𝑚

(𝜃0+𝜃1𝑥 (𝑖)− 𝑦 ( 𝑖 ) )2

Page 49: 02 Linear Regresstion with one variable- andrew ng

Gradient descent algorithm

update and

simultaneously

49

𝜕𝜕𝜃0

𝐽 (𝜃0 ,𝜃1)

𝜕𝜕𝜃1

𝐽 (𝜃0 ,𝜃 1)

Page 50: 02 Linear Regresstion with one variable- andrew ng

1

0

J(0,1)

50

Page 51: 02 Linear Regresstion with one variable- andrew ng

52

Page 52: 02 Linear Regresstion with one variable- andrew ng

(for fixed , this is a function of x) (function of the parameters )

53

Page 53: 02 Linear Regresstion with one variable- andrew ng

(for fixed , this is a function of x) (function of the parameters )

54

Page 54: 02 Linear Regresstion with one variable- andrew ng

(for fixed , this is a function of x) (function of the parameters )

55

Page 55: 02 Linear Regresstion with one variable- andrew ng

(for fixed , this is a function of x) (function of the parameters )

56

Page 56: 02 Linear Regresstion with one variable- andrew ng

(for fixed , this is a function of x) (function of the parameters )

57

Page 57: 02 Linear Regresstion with one variable- andrew ng

(for fixed , this is a function of x) (function of the parameters )

58

Page 58: 02 Linear Regresstion with one variable- andrew ng

(for fixed , this is a function of x) (function of the parameters )

59

Page 59: 02 Linear Regresstion with one variable- andrew ng

(for fixed , this is a function of x) (function of the parameters )

60

Page 60: 02 Linear Regresstion with one variable- andrew ng

(for fixed , this is a function of x) (function of the parameters )

61

Page 61: 02 Linear Regresstion with one variable- andrew ng

Another name:“Batch” Gradient Descent

“Batch”: Each step of gradient descent uses all the training examples.

Another algorithm that solves

Normal equations method

Gradient descent algorithm scales better than Normal equations method to larger datasets

62

Page 62: 02 Linear Regresstion with one variable- andrew ng

Generalization of Gradient descent algorithm

• Learn with larger number of features.

• Difficult to plot

63

Page 63: 02 Linear Regresstion with one variable- andrew ng

64

We see here this matrix shows usSize, Number of bedroomsNumber floors, Age of home

All in one variable

VectorShown as y Shows us the prices

Page 64: 02 Linear Regresstion with one variable- andrew ng

• Need linear algebra for more complex linear regression models

• Linear algebra is good for making computationally efficient models (we’ll see later)– Provides a good way to work with large sets of data

sets– Typically, vectorization of a problem is a common

optimization technique

Page 65: 02 Linear Regresstion with one variable- andrew ng

End