2 - 4 - Cost Function - Intuition II (9 Min)

Embed Size (px)

Citation preview

  • 7/27/2019 2 - 4 - Cost Function - Intuition II (9 Min)

    1/8

    100:00:00,960 --> 00:00:05,684In this video, lets delve deeper and geteven better intuition about what the cost

    200:00:05,684 --> 00:00:10,523function is doing. This video assumes thatyou're familiar with contour plots. If you

    300:00:10,523 --> 00:00:15,189are not familiar with contour plots orcontour figures some of the illustrations

    400:00:15,189 --> 00:00:20,144in this video may or may not make sense toyou but is okay and if you end up skipping

    500:00:20,144 --> 00:00:24,522this video or some of it does not quite

    make sense because you haven't seen

    600:00:24,522 --> 00:00:29,246contour plots before. That's okay and you willstill understand the rest of this course

    700:00:29,246 --> 00:00:34,935without those parts of this. Here's ourproblem formulation as usual, with the

    8

    00:00:34,935 --> 00:00:39,882hypothesis parameters, cost function, andour optimization objective. Unlike

    900:00:39,882 --> 00:00:45,163before, unlike the last video, I'mgoing to keep both of my parameters, theta

    1000:00:45,163 --> 00:00:50,573zero, and theta one, as we generate ourvisualizations for the cost function. So, same

    1100:00:50,573 --> 00:00:57,204as last time, we want to understand thehypothesis H and the cost function J. So,

    1200:00:57,204 --> 00:01:04,167here's my training set of housing pricesand let's make some hypothesis. You know,

  • 7/27/2019 2 - 4 - Cost Function - Intuition II (9 Min)

    2/8

    1300:01:04,167 --> 00:01:10,219like that one, this is not a particularlygood hypothesis. But, if I set theta

    1400:01:10,219 --> 00:01:16,270zero=50 and theta one=0.06, then I end upwith this hypothesis down here and that

    1500:01:16,270 --> 00:01:22,190corresponds to that straight line. Now giventhese value of theta zero and theta one,

    1600:01:22,190 --> 00:01:27,511we want to plot the corresponding, youknow, cost function on the right. What we

    1700:01:27,511 --> 00:01:33,150did last time was, right, when we only had

    theta one. In other words, drawing plots

    1800:01:33,150 --> 00:01:37,814that look like this as a function of thetaone. But now we have two parameters, theta

    1900:01:37,814 --> 00:01:42,340zero, and theta one, and so the plot getsa little more complicated. It turns out

    20

    00:01:42,340 --> 00:01:47,699that when we have only one parameter, thatthe parts we drew had this sort of bow

    2100:01:47,699 --> 00:01:52,925shaped function. Now, when we have twoparameters, it turns out the cost function

    2200:01:52,925 --> 00:01:58,218also has a similar sort of bow shape. And,in fact, depending on your training set,

    2300:01:58,218 --> 00:02:03,511you might get a cost function that maybelooks something like this. So, this is a

    2400:02:03,511 --> 00:02:09,4043-D surface plot, where the axesare labeled theta zero and theta one. So

  • 7/27/2019 2 - 4 - Cost Function - Intuition II (9 Min)

    3/8

    2500:02:09,404 --> 00:02:15,326as you vary theta zero and theta one, the twoparameters, you get different values of the

    2600:02:15,326 --> 00:02:20,964cost function J (theta zero, theta one)and the height of this surface above a

    2700:02:20,964 --> 00:02:26,347particular point of theta zero, theta one.Right, that's, that's the vertical axis. The

    2800:02:26,347 --> 00:02:31,200height of the surface of the pointsindicates the value of J of theta zero, J

    2900:02:31,200 --> 00:02:36,471of theta one. And you can see it sort of

    has this bow like shape. Let me show you

    3000:02:36,471 --> 00:02:46,351the same plot in 3D. So here's the samefigure in 3D, horizontal axis theta one and

    3100:02:46,351 --> 00:02:52,122vertical axis J(theta zero, theta one), and if I rotatethis plot around. You kinda of a

    32

    00:02:52,122 --> 00:02:57,608get a sense, I hope, of this bowlshaped surface as that's what the cost

    3300:02:57,608 --> 00:03:03,592function J looks like. Now for the purposeof illustration in the rest of this video

    3400:03:03,592 --> 00:03:09,077I'm not actually going to use these sortof 3D surfaces to show you the cost

    3500:03:09,077 --> 00:03:16,475function J, instead I'm going to usecontour plots. Or what I also call contour

    3600:03:16,475 --> 00:03:24,748figures. I guess they mean the same thing.To show you these surfaces. So here's an

  • 7/27/2019 2 - 4 - Cost Function - Intuition II (9 Min)

    4/8

    3700:03:24,748 --> 00:03:31,135example of a contour figure, shown on theright, where the axis are theta zero and

    3800:03:31,135 --> 00:03:37,602theta one. And what each of these ovals,what each of these ellipsis shows is a set

    3900:03:37,602 --> 00:03:43,757of points that takes on the same value forJ(theta zero, theta one). So

    4000:03:43,757 --> 00:03:50,514concretely, for example this, you'll takethat point and that point and that point.

    4100:03:50,514 --> 00:03:55,583All three of these points that I just

    drew in magenta, they have the same value

    4200:03:55,583 --> 00:03:59,730for J (theta zero, theta one). Okay.Where, right, these, this is the theta

    4300:03:59,730 --> 00:04:04,774zero, theta one axis but those three havethe same Value for J (theta zero, theta one)

    44

    00:04:04,774 --> 00:04:10,218and if you haven't seen contourplots much before think of, imagine if you

    4500:04:10,218 --> 00:04:14,992will. A bow shaped function that's comingout of my screen. So that the minimum, so

    4600:04:14,992 --> 00:04:19,668the bottom of the bow is this point rightthere, right? This middle, the middle of

    4700:04:19,668 --> 00:04:24,285these concentric ellipses. And imagine abow shape that sort of grows out of my

    4800:04:24,285 --> 00:04:28,786screen like this, so that each of theseellipses, you know, has the same height

  • 7/27/2019 2 - 4 - Cost Function - Intuition II (9 Min)

    5/8

    4900:04:28,786 --> 00:04:33,345above my screen. And the minimum with thebow, right, is right down there. And so

    5000:04:33,345 --> 00:04:37,787the contour figures is a, is way to,is maybe a more convenient way to

    5100:04:37,787 --> 00:04:45,185visualize my function J. [sound] So, let'slook at some examples. Over here, I have a

    5200:04:45,185 --> 00:04:53,275particular point, right? And so this is,with, you know, theta zero equals maybe

    5300:04:53,275 --> 00:05:01,964about 800, and theta one equals maybe a

    -0.15 . And so this point, right, this

    5400:05:01,964 --> 00:05:07,322point in red corresponds to oneset of pair values of theta zero, theta one

    5500:05:07,322 --> 00:05:12,092and the corresponding, in fact, to thathypothesis, right, theta zero is

    56

    00:05:12,092 --> 00:05:17,189about 800, that is, where it intersectsthe vertical axis is around 800, and this is

    5700:05:17,189 --> 00:05:21,763slope of about -0.15. Now this line isreally not such a good fit to the

    5800:05:21,763 --> 00:05:26,859data, right. This hypothesis, h(x), with these values of theta zero,

    5900:05:26,859 --> 00:05:32,283theta one, it's really not such a good fitto the data. And so you find that, it's

    6000:05:32,283 --> 00:05:37,531cost. Is a value that's out here that'syou know pretty far from the minimum right

  • 7/27/2019 2 - 4 - Cost Function - Intuition II (9 Min)

    6/8

    6100:05:37,531 --> 00:05:42,901it's pretty far this is a pretty high costbecause this is just not that good a fit

    6200:05:42,901 --> 00:05:47,247to the data. Let's look at some moreexamples. Now here's a different

    6300:05:47,247 --> 00:05:52,489hypothesis that's you know still not agreat fit for the data but may be slightly

    6400:05:52,489 --> 00:05:57,986better so here right that's my point thatthose are my parameters theta zero theta

    6500:05:57,986 --> 00:06:07,387one and so my theta zero value. Right?That's bout 360 and my value for theta

    6600:06:07,387 --> 00:06:14,047one. Is equal to zero. So, you know, let'sbreak it out. Let's take theta zero equals

    6700:06:14,047 --> 00:06:20,063360 theta one equals zero. And this pairof parameters corresponds to that

    6800:06:20,063 --> 00:06:26,161

    hypothesis, corresponds to flat line, that is, h(x) equals 360 plus zero

    6900:06:26,161 --> 00:06:32,421times x. So that's the hypothesis. Andthis hypothesis again has some cost, and

    7000:06:32,421 --> 00:06:38,600that cost is, you know, plotted as theheight of the J function at that point.

    71

    00:06:38,791 --> 00:06:44,886Let's look at just a couple of examples.Here's one more, you know, at this value

    7200:06:44,886 --> 00:06:52,231of theta zero, and at that value of thetaone, we end up with this hypothesis, h(x)

    73

  • 7/27/2019 2 - 4 - Cost Function - Intuition II (9 Min)

    7/8

    00:06:52,231 --> 00:06:58,599and again, not a great fit to the data,and is actually further away from the minimum. Last example, this is

    7400:06:58,599 --> 00:07:03,450actually not quite at the minimum, butit's pretty close to the minimum. So this

    7500:07:03,450 --> 00:07:08,486is not such a bad fit to the, to the data,where, for a particular value, of, theta

    7600:07:08,486 --> 00:07:13,337zero. Which, one of them has value, as infor a particular value for theta one. We

    7700:07:13,337 --> 00:07:18,004get a particular h(x). And this is, thisis not quite at the minimum, but it's

    7800:07:18,004 --> 00:07:23,039pretty close. And so the sum of squareserrors is sum of squares distances between

    7900:07:23,039 --> 00:07:28,259my, training samples and my hypothesis.Really, that's a sum of square distances,

    8000:07:28,259 --> 00:07:32,548right? Of all of these errors. This is

    pretty close to the minimum even though

    8100:07:32,548 --> 00:07:37,096it's not quite the minimum. So with thesefigures I hope that gives you a better

    8200:07:37,096 --> 00:07:41,869understanding of what values of the costfunction J, how they are and how that

    83

    00:07:41,869 --> 00:07:47,324corresponds to different hypothesis and so ashow better hypotheses may corresponds to points

    8400:07:47,324 --> 00:07:52,983that are closer to the minimum of this costfunction J. Now of course what we really

    85

  • 7/27/2019 2 - 4 - Cost Function - Intuition II (9 Min)

    8/8

    00:07:52,983 --> 00:07:57,619want is an efficient algorithm, right, aefficient piece of software for

    8600:07:57,619 --> 00:08:02,218automatically finding The value of thetazero and theta one, that minimizes the

    8700:08:02,218 --> 00:08:06,566cost function J, right? And what we, whatwe don't wanna do is to, you know, how to

    8800:08:06,566 --> 00:08:10,697write software, to plot out this point,and then try to manually read off the

    8900:08:10,697 --> 00:08:15,263numbers, that this is not a good way to doit. And, in fact, we'll see it later, that

    9000:08:15,426 --> 00:08:19,938when we look at more complicated examples,we'll have high dimensional figures with

    9100:08:19,938 --> 00:08:23,906more parameters, that, it turns out,we'll see in a few, we'll see later in

    9200:08:23,906 --> 00:08:28,091this course, examples where this figure,

    you know, cannot really be plotted, and

    9300:08:28,091 --> 00:08:33,664this becomes much harder to visualize. Andso, what we want is to have software

    9400:08:33,664 --> 00:08:37,729to find the value of theta zero, theta onethat minimizes this function and

    95

    00:08:37,916 --> 00:08:42,914in the next video we start to talk aboutan algorithm for automatically finding

    9600:08:42,914 --> 00:08:47,600that value of theta zero and theta onethat minimizes the cost function J.