Upload
hasaudin
View
213
Download
0
Embed Size (px)
Citation preview
7/27/2019 2 - 4 - Cost Function - Intuition II (9 Min)
1/8
100:00:00,960 --> 00:00:05,684In this video, lets delve deeper and geteven better intuition about what the cost
200:00:05,684 --> 00:00:10,523function is doing. This video assumes thatyou're familiar with contour plots. If you
300:00:10,523 --> 00:00:15,189are not familiar with contour plots orcontour figures some of the illustrations
400:00:15,189 --> 00:00:20,144in this video may or may not make sense toyou but is okay and if you end up skipping
500:00:20,144 --> 00:00:24,522this video or some of it does not quite
make sense because you haven't seen
600:00:24,522 --> 00:00:29,246contour plots before. That's okay and you willstill understand the rest of this course
700:00:29,246 --> 00:00:34,935without those parts of this. Here's ourproblem formulation as usual, with the
8
00:00:34,935 --> 00:00:39,882hypothesis parameters, cost function, andour optimization objective. Unlike
900:00:39,882 --> 00:00:45,163before, unlike the last video, I'mgoing to keep both of my parameters, theta
1000:00:45,163 --> 00:00:50,573zero, and theta one, as we generate ourvisualizations for the cost function. So, same
1100:00:50,573 --> 00:00:57,204as last time, we want to understand thehypothesis H and the cost function J. So,
1200:00:57,204 --> 00:01:04,167here's my training set of housing pricesand let's make some hypothesis. You know,
7/27/2019 2 - 4 - Cost Function - Intuition II (9 Min)
2/8
1300:01:04,167 --> 00:01:10,219like that one, this is not a particularlygood hypothesis. But, if I set theta
1400:01:10,219 --> 00:01:16,270zero=50 and theta one=0.06, then I end upwith this hypothesis down here and that
1500:01:16,270 --> 00:01:22,190corresponds to that straight line. Now giventhese value of theta zero and theta one,
1600:01:22,190 --> 00:01:27,511we want to plot the corresponding, youknow, cost function on the right. What we
1700:01:27,511 --> 00:01:33,150did last time was, right, when we only had
theta one. In other words, drawing plots
1800:01:33,150 --> 00:01:37,814that look like this as a function of thetaone. But now we have two parameters, theta
1900:01:37,814 --> 00:01:42,340zero, and theta one, and so the plot getsa little more complicated. It turns out
20
00:01:42,340 --> 00:01:47,699that when we have only one parameter, thatthe parts we drew had this sort of bow
2100:01:47,699 --> 00:01:52,925shaped function. Now, when we have twoparameters, it turns out the cost function
2200:01:52,925 --> 00:01:58,218also has a similar sort of bow shape. And,in fact, depending on your training set,
2300:01:58,218 --> 00:02:03,511you might get a cost function that maybelooks something like this. So, this is a
2400:02:03,511 --> 00:02:09,4043-D surface plot, where the axesare labeled theta zero and theta one. So
7/27/2019 2 - 4 - Cost Function - Intuition II (9 Min)
3/8
2500:02:09,404 --> 00:02:15,326as you vary theta zero and theta one, the twoparameters, you get different values of the
2600:02:15,326 --> 00:02:20,964cost function J (theta zero, theta one)and the height of this surface above a
2700:02:20,964 --> 00:02:26,347particular point of theta zero, theta one.Right, that's, that's the vertical axis. The
2800:02:26,347 --> 00:02:31,200height of the surface of the pointsindicates the value of J of theta zero, J
2900:02:31,200 --> 00:02:36,471of theta one. And you can see it sort of
has this bow like shape. Let me show you
3000:02:36,471 --> 00:02:46,351the same plot in 3D. So here's the samefigure in 3D, horizontal axis theta one and
3100:02:46,351 --> 00:02:52,122vertical axis J(theta zero, theta one), and if I rotatethis plot around. You kinda of a
32
00:02:52,122 --> 00:02:57,608get a sense, I hope, of this bowlshaped surface as that's what the cost
3300:02:57,608 --> 00:03:03,592function J looks like. Now for the purposeof illustration in the rest of this video
3400:03:03,592 --> 00:03:09,077I'm not actually going to use these sortof 3D surfaces to show you the cost
3500:03:09,077 --> 00:03:16,475function J, instead I'm going to usecontour plots. Or what I also call contour
3600:03:16,475 --> 00:03:24,748figures. I guess they mean the same thing.To show you these surfaces. So here's an
7/27/2019 2 - 4 - Cost Function - Intuition II (9 Min)
4/8
3700:03:24,748 --> 00:03:31,135example of a contour figure, shown on theright, where the axis are theta zero and
3800:03:31,135 --> 00:03:37,602theta one. And what each of these ovals,what each of these ellipsis shows is a set
3900:03:37,602 --> 00:03:43,757of points that takes on the same value forJ(theta zero, theta one). So
4000:03:43,757 --> 00:03:50,514concretely, for example this, you'll takethat point and that point and that point.
4100:03:50,514 --> 00:03:55,583All three of these points that I just
drew in magenta, they have the same value
4200:03:55,583 --> 00:03:59,730for J (theta zero, theta one). Okay.Where, right, these, this is the theta
4300:03:59,730 --> 00:04:04,774zero, theta one axis but those three havethe same Value for J (theta zero, theta one)
44
00:04:04,774 --> 00:04:10,218and if you haven't seen contourplots much before think of, imagine if you
4500:04:10,218 --> 00:04:14,992will. A bow shaped function that's comingout of my screen. So that the minimum, so
4600:04:14,992 --> 00:04:19,668the bottom of the bow is this point rightthere, right? This middle, the middle of
4700:04:19,668 --> 00:04:24,285these concentric ellipses. And imagine abow shape that sort of grows out of my
4800:04:24,285 --> 00:04:28,786screen like this, so that each of theseellipses, you know, has the same height
7/27/2019 2 - 4 - Cost Function - Intuition II (9 Min)
5/8
4900:04:28,786 --> 00:04:33,345above my screen. And the minimum with thebow, right, is right down there. And so
5000:04:33,345 --> 00:04:37,787the contour figures is a, is way to,is maybe a more convenient way to
5100:04:37,787 --> 00:04:45,185visualize my function J. [sound] So, let'slook at some examples. Over here, I have a
5200:04:45,185 --> 00:04:53,275particular point, right? And so this is,with, you know, theta zero equals maybe
5300:04:53,275 --> 00:05:01,964about 800, and theta one equals maybe a
-0.15 . And so this point, right, this
5400:05:01,964 --> 00:05:07,322point in red corresponds to oneset of pair values of theta zero, theta one
5500:05:07,322 --> 00:05:12,092and the corresponding, in fact, to thathypothesis, right, theta zero is
56
00:05:12,092 --> 00:05:17,189about 800, that is, where it intersectsthe vertical axis is around 800, and this is
5700:05:17,189 --> 00:05:21,763slope of about -0.15. Now this line isreally not such a good fit to the
5800:05:21,763 --> 00:05:26,859data, right. This hypothesis, h(x), with these values of theta zero,
5900:05:26,859 --> 00:05:32,283theta one, it's really not such a good fitto the data. And so you find that, it's
6000:05:32,283 --> 00:05:37,531cost. Is a value that's out here that'syou know pretty far from the minimum right
7/27/2019 2 - 4 - Cost Function - Intuition II (9 Min)
6/8
6100:05:37,531 --> 00:05:42,901it's pretty far this is a pretty high costbecause this is just not that good a fit
6200:05:42,901 --> 00:05:47,247to the data. Let's look at some moreexamples. Now here's a different
6300:05:47,247 --> 00:05:52,489hypothesis that's you know still not agreat fit for the data but may be slightly
6400:05:52,489 --> 00:05:57,986better so here right that's my point thatthose are my parameters theta zero theta
6500:05:57,986 --> 00:06:07,387one and so my theta zero value. Right?That's bout 360 and my value for theta
6600:06:07,387 --> 00:06:14,047one. Is equal to zero. So, you know, let'sbreak it out. Let's take theta zero equals
6700:06:14,047 --> 00:06:20,063360 theta one equals zero. And this pairof parameters corresponds to that
6800:06:20,063 --> 00:06:26,161
hypothesis, corresponds to flat line, that is, h(x) equals 360 plus zero
6900:06:26,161 --> 00:06:32,421times x. So that's the hypothesis. Andthis hypothesis again has some cost, and
7000:06:32,421 --> 00:06:38,600that cost is, you know, plotted as theheight of the J function at that point.
71
00:06:38,791 --> 00:06:44,886Let's look at just a couple of examples.Here's one more, you know, at this value
7200:06:44,886 --> 00:06:52,231of theta zero, and at that value of thetaone, we end up with this hypothesis, h(x)
73
7/27/2019 2 - 4 - Cost Function - Intuition II (9 Min)
7/8
00:06:52,231 --> 00:06:58,599and again, not a great fit to the data,and is actually further away from the minimum. Last example, this is
7400:06:58,599 --> 00:07:03,450actually not quite at the minimum, butit's pretty close to the minimum. So this
7500:07:03,450 --> 00:07:08,486is not such a bad fit to the, to the data,where, for a particular value, of, theta
7600:07:08,486 --> 00:07:13,337zero. Which, one of them has value, as infor a particular value for theta one. We
7700:07:13,337 --> 00:07:18,004get a particular h(x). And this is, thisis not quite at the minimum, but it's
7800:07:18,004 --> 00:07:23,039pretty close. And so the sum of squareserrors is sum of squares distances between
7900:07:23,039 --> 00:07:28,259my, training samples and my hypothesis.Really, that's a sum of square distances,
8000:07:28,259 --> 00:07:32,548right? Of all of these errors. This is
pretty close to the minimum even though
8100:07:32,548 --> 00:07:37,096it's not quite the minimum. So with thesefigures I hope that gives you a better
8200:07:37,096 --> 00:07:41,869understanding of what values of the costfunction J, how they are and how that
83
00:07:41,869 --> 00:07:47,324corresponds to different hypothesis and so ashow better hypotheses may corresponds to points
8400:07:47,324 --> 00:07:52,983that are closer to the minimum of this costfunction J. Now of course what we really
85
7/27/2019 2 - 4 - Cost Function - Intuition II (9 Min)
8/8
00:07:52,983 --> 00:07:57,619want is an efficient algorithm, right, aefficient piece of software for
8600:07:57,619 --> 00:08:02,218automatically finding The value of thetazero and theta one, that minimizes the
8700:08:02,218 --> 00:08:06,566cost function J, right? And what we, whatwe don't wanna do is to, you know, how to
8800:08:06,566 --> 00:08:10,697write software, to plot out this point,and then try to manually read off the
8900:08:10,697 --> 00:08:15,263numbers, that this is not a good way to doit. And, in fact, we'll see it later, that
9000:08:15,426 --> 00:08:19,938when we look at more complicated examples,we'll have high dimensional figures with
9100:08:19,938 --> 00:08:23,906more parameters, that, it turns out,we'll see in a few, we'll see later in
9200:08:23,906 --> 00:08:28,091this course, examples where this figure,
you know, cannot really be plotted, and
9300:08:28,091 --> 00:08:33,664this becomes much harder to visualize. Andso, what we want is to have software
9400:08:33,664 --> 00:08:37,729to find the value of theta zero, theta onethat minimizes this function and
95
00:08:37,916 --> 00:08:42,914in the next video we start to talk aboutan algorithm for automatically finding
9600:08:42,914 --> 00:08:47,600that value of theta zero and theta onethat minimizes the cost function J.