32
MAT 1000 Mathematics in Today's World

MAT 1000 Mathematics in Today's World. Last Time

Embed Size (px)

Citation preview

Page 1: MAT 1000 Mathematics in Today's World. Last Time

MAT 1000

Mathematics in Today's World

Page 2: MAT 1000 Mathematics in Today's World. Last Time

Last Time

We learned how to calculate correlation (abbreviated ).

This is a number that tells us about the strength and direction of an association, when that association has the form of a straight line.

If the sign of the number is positive, the association has a positive direction, if is negative, so is the association.

The closer is to either 1 or -1, the stronger the association, and this means that a scatterplot of the data will look more like a straight line.

Page 3: MAT 1000 Mathematics in Today's World. Last Time

Today

First: a warning about interpreting correlation.

We will also talk about least-squares regression.

This is a way to calculate the line that is the “best fit” for the data, in other words: a line that is a good approximation of the scatterplot.

The reason least-squares regression is important is that it allows us to make predictions where we don’t have any data—these predictions will be based on the pattern the data gives us.

Page 4: MAT 1000 Mathematics in Today's World. Last Time

“Correlation is not causation”

You may have heard this expression before. What does it mean?

Correlation is good evidence for a cause and effect relationship between two variables. If there is such a relationship, the variables will have a strong correlation.

On the other hand, variables can have a strong correlation even though there is no cause and effect relationship.

Page 5: MAT 1000 Mathematics in Today's World. Last Time

“Correlation is not causation”

ExampleIce cream sales are correlated with drowning deaths.

Obviously not a cause and effect relationship.

In this case the explanation is that ice cream sales and drowning deaths are both related to the weather. More ice cream is sold in the summer, and more people go swimming in the summer.

We call this relationship between ice cream sales and drowning deaths “mutual response.”

Page 6: MAT 1000 Mathematics in Today's World. Last Time

“Correlation is not causation”Correlation may not even be due to mutual response.

Example (The Pirate Effect)

The number of pirates is correlated with global average temperature: over the past few centuries the number of pirates has decreased, and global average temperatures have increased.

Is global warming caused by lack of pirates?

This is just a coincidence.

People call this kind of relationship a “nonsense correlation.”

For more nonsense correlations: www.tylervigen.com

Page 7: MAT 1000 Mathematics in Today's World. Last Time

Approximating scatterplotsLast time we calculated the correlation between the heights and weights of five male adults. Here is that same data as a scatterplot.

Page 8: MAT 1000 Mathematics in Today's World. Last Time

Approximating scatterplotsIf you had to draw by hand a line that approximated the shape of this scatterplot, you could end up with any number of lines.

Page 9: MAT 1000 Mathematics in Today's World. Last Time

Approximating scatterplots

For example, maybe you would draw this line

Page 10: MAT 1000 Mathematics in Today's World. Last Time

Approximating scatterplots

Or this one

Page 11: MAT 1000 Mathematics in Today's World. Last Time

Approximating scatterplots

But there is only one “least-squares regression line:”

Page 12: MAT 1000 Mathematics in Today's World. Last Time

Review of linear functions

The goal is to take a set of pairs of data and produce a line that approximates that data.

First, we need to review some facts about lines.

In mathematics we describe a line using a linear function.

Linear functions can be put into a special form, called slope-intercept form. This looks like:

Page 13: MAT 1000 Mathematics in Today's World. Last Time

Review of linear functions

In the equation , the numbers and are called “constants.” This just means that they should have specific values.

For example

is the equation of a line. Here we specify that and . We never specify or —both of these are “variables.”

We usually call the number the slope of the line, and is called the intercept.

Page 14: MAT 1000 Mathematics in Today's World. Last Time

Review of linear functions

But how does the equation

describe a line?

There are many pairs of numbers and that satisfy this equation.

For example, the pair and .

How can we tell?

Plug in for and for :

This is a true equation.

Page 15: MAT 1000 Mathematics in Today's World. Last Time

Review of linear functions

Of course, not every pair of numbers satisfies the equation

For example, the pair and

Try plugging in for and for :

This is not a true equation.

Page 16: MAT 1000 Mathematics in Today's World. Last Time

Review of linear functions

To find the graph of the line described by the equation , we need two pairs and that satisfy the equation.

We have one, namely the pair and , let’s find one more.

What we can do is pick any number for and solve for .

Let’s say . Plug that into the equation:

If you simplify this, you will find that must be equal to .

Page 17: MAT 1000 Mathematics in Today's World. Last Time

Review of linear functions

Now we use these pairs to plot two points. The number gives the horizontal location of the point, the number gives the vertical distance.

Page 18: MAT 1000 Mathematics in Today's World. Last Time

Review of linear functions

Plot the pair and

Page 19: MAT 1000 Mathematics in Today's World. Last Time

Review of linear functions

Add in the pair and

Page 20: MAT 1000 Mathematics in Today's World. Last Time

Review of linear functions

Now connect these two points with a line

Page 21: MAT 1000 Mathematics in Today's World. Last Time

The least-squares regression line

A line will be determined by knowing the values of and .

We will give formulas for finding each of these.

In the equation we have two variables:

is the explanatory variable

is the response variable

Page 22: MAT 1000 Mathematics in Today's World. Last Time

The least-squares regression line

Remember that our starting point in all of this is a collection of paired data.

Therefore we have two variables.

We also have their means and ,

and their standard deviations and .

We can also calculate their correlation .

The formulas for the least-square regression line use all of these numbers.

Page 23: MAT 1000 Mathematics in Today's World. Last Time

The least-squares regression line

The slope of the least-squares regression line is

The intercept is

Notice the in the equation for the intercept—this is the slope (the same one we find with the previous formula).

Page 24: MAT 1000 Mathematics in Today's World. Last Time

The least-squares regression line

ExampleLast time we considered the heights and weight of five adult males. We found the following numbers:

So the least-squares regression line has slope

The intercept is

Page 25: MAT 1000 Mathematics in Today's World. Last Time

The least-squares regression line

ExamplePutting these together, the least-squares regression line is:

Page 26: MAT 1000 Mathematics in Today's World. Last Time

The least-squares regression line

Note that none of the data actually lies on the line.

For a line to be the least-squares regression line the distance from all of the data to the line must be as small as possible.

Nevertheless, the line need not (and usually does not) contain any of the data values.

Page 27: MAT 1000 Mathematics in Today's World. Last Time

Predictions

The most important application of least-squares regression lines is for making predictions.

If a scatterplot has a linear form, this suggests an underlying pattern.

Mathematically, that pattern is exactly the least-squares regression line.

We can then make predictions based on the pattern we see in the data we’ve collected.

Page 28: MAT 1000 Mathematics in Today's World. Last Time

Predictions

Let’s use our least-squares regression line to make predictions.

What does our data predict will be the weight of a man who is 70 inches tall?

To find this, we plug 70 in for and find the corresponding .

How about a man who is 76 inches tall?

Page 29: MAT 1000 Mathematics in Today's World. Last Time

Predictions

Page 30: MAT 1000 Mathematics in Today's World. Last Time

Predictions

In general, when you have paired data, you can find the least-squares regression line

The variable always corresponds to the explanatory variable (if there is one), and is always the response variable.

You can use the regression line to predict values of the response variable for different values of the explanatory variable.

Just take the value of the explanatory variable, and substitute it for in the equation. The number your get for is the predicted response.

Page 31: MAT 1000 Mathematics in Today's World. Last Time

Predictions

One danger in using least-squares regression for predictions is extrapolation.

Within the range of our data, the least-squares regression line should give reasonable predictions. But, if we plug in numbers too far outside that range, the predictions may no longer be reasonable.

In our original height and weight data, the heights range from 67 inches to 77 inches.

We can be confident that our least-squares regression line gives reasonable predictions for any height in this range.

Page 32: MAT 1000 Mathematics in Today's World. Last Time

Predictions

What weight does our regression line predict for a man who is 5 feet tall (60 inches)?

105.8 pounds.

This is quite low: even 120 pounds is considered a low weight for a 5 foot tall man.

What about 50 inches?

28.5 pounds. This is obviously preposterous.