33
4A – SCATTERPLOTS Bivariate data result from measurements being made on each of the two variables for a given set of items. Bivariate data can be graphed on a scatterplot (or scattergraph) as shown at left. Each of the data points is represented by a single visible point on the graph. When drawing a scatterplot, we need to choose the correct variable to assign to each of the axes. The convention is to place the independent variable on the x-axis and the dependent variable on the y- axis. The independent variable in an experiment or investigation is the variable that is deliberately controlled or adjusted by the investigator. The dependent variable is the variable that responds to changes in the independent variable. The relationship between two variables is called their correlation. Example 1 The operators of a casino keep records of the number of people playing a ‘Jackpot’ type game. The table below shows the number of players for different prize amounts. a) Draw a scatter plot of the data (no calculator) 1

Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

4A – SCATTERPLOTS

Bivariate data result from measurements being made on each of the two variables for a given set of items.

Bivariate data can be graphed on a scatterplot (or scattergraph) as shown at left.

Each of the data points is represented by a single visible point on the graph.

When drawing a scatterplot, we need to choose the correct variable to assign to each of the axes.

The convention is to place the independent variable on the x-axis and the dependent variable on the y-axis.

The independent variable in an experiment or investigation is the variable that is deliberately controlled or adjusted by

the investigator.

The dependent variable is the variable that responds to changes in the independent variable.

The relationship between two variables is called their correlation.

Example 1

The operators of a casino keep records of the number of people playing a ‘Jackpot’ type game. Thetable below shows the number of players for different prize amounts.

a) Draw a scatter plot of the data (no calculator)

Drawing conclusions/causation

When data are graphed, we can often estimate by eye (rather than measure) the type of correlation involved. Our ability to make these qualitative judgements can be seen from the following examples, which summarise the different types of correlation that might appear in a scatterplot.

1

Page 2: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

Example 2

Using the same data in the first example:

2

Page 3: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

a) Draw a scatter plot of the data using your CAS calculator.

b) State the type of correlation that the scatterplot shows.

c) Suggest why the plot is not perfectly linear.

2F (Further Maths)

Pearson’s Product - Moment Correlation Coefficient ( r )

A more precise tool to measure the correlation between the two variables is Pearson’s product-moment correlation coefficient (denoted by the symbol r). It is used to measure strength of linear relationships between two variables. The value of r ranges from 1 to 1. That is 1 r ≤ 1.

Following is a gallery of scatterplots with the corresponding value of r for each.

3

Page 4: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

4

Page 5: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

Exercise 2F

5

Page 6: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

4C – LINEAR MODELLING

If a linear relationship exists between a pair of variables then it is useful to be able to summarise the relationship in terms of an equation. This equation can then be used to make predictions about the levels of one variable given the value of the other.The process of finding the equation is known as linear modelling.

An equation can be found to represent the line which passes through any two points by using two coordinate geometry formulas.

The gradient of the line, passing through (x1, y1) and (x2, y2) is given by:

m=( y2− y1)(x2−x1)

The equation of a straight line with the gradient m and passing through (x1, y1) is given by:

( y− y1 )=m(x−x1)

y=m ( x−x1 )+ y1

Example 1

Find the equation of the line passing through the points (2, 6) and (5, 12).

To find the equation for a scatterplot that consists of many points we need to fit a straight line through the whole set of points.

The process of fitting a line to a set of points is often referred to as regression. The regression line or trend line (also known as line of best fit) may be placed on a scatterplot by eye or by using the three-mean method (covered in exercise 3B).

The line of best fit is the straight line which most closely fits the data.

6

Page 7: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

Ski Resort Data

Its equation can then be found by using the method in the previous example by choosing any two points that are on the line.

The y-intercept is the value of y when the level of x is zero, that is, where the line touches the y-axis. The gradient (slope) of the equation represents the rate of change of variable y with changing x.

Sometimes after drawing a scatterplot it is clear that the points represent a relationship that is not linear. The relationship might be one of the non-linear types shown below.

In such cases it is not appropriate to try to model the data by attempting to fit a straight line through the points and find its equation. It is similarly inappropriate to attempt to fit a linear model (straight line) through a scatterplot if it shows that there is no correlation between the variables.

Example 2

The following table shows the fare charged by a bus company for journeys of differing length.

7

Page 8: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

a) Represent the data using a scatterplot and place in the trend line by eye.

b) Find an equation which relates fare, F, to distance travelled, d.

c) Explain in words the meaning of the y-intercept and gradient of the line.

Example 3

The table below gives the times (in hours) spent by 8 students studying for a measurement test and the marks (in %) obtained on the test.

a) Draw the scatterplot to represent the data. Use your Calculator.

8

Page 9: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

b) Using your calculator find the equation of the line of best fit. Write your equation in terms of the variables: time spent studying and test mark.

4D – MAKING PREDICTIONS

The equation of the trend line may be used to make predictions about the variables by substituting a value into the equation.

Example 1It is found that the relationship between the number of people playing a casino Jackpot game and the prize money offered is given by the equation N = 0.07p + 220, where N is the number of people playing and p is the prize money.

a) Find the number of people playing when the prize money is $2500.

9

Page 10: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

b) Find the likely prize on offer if there were 500 people playing.

Using technology:

Alternatively, a prediction could be made from the graph’s trend line.

Example 2

The scatterplots below show the depth of snow and the corresponding number of skiers.From the graph’s trend line find:

a) the number of skiers when snow depth was 3 m. b) the depth of snow that would attract about 400 people.

10

Page 11: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

Interpolation and Extrapolation

We use the term interpolation when we make predictions from a graph’s trend line from within the bounds of the original experimental data.

We use the term extrapolation when we make predictions from a graph’s trend line from outside the bounds of the original experimental data.

Data can be interpolated or extrapolated either algebraically or graphically.

Reliability of Results

Results predicted (whether algebraically or graphically) from the trend line of a scatterplot can be considered reliable only if:1. a reasonably large number of points were used to draw the scatterplot,2. a reasonably strong correlation was shown to exist between the variables (the stronger the

correlation, the greater the confidence in predictions),3. the predictions were made using interpolation and not extrapolation. Extrapolated results can

never be considered to be reliable because when extrapolation is used we are assuming that the relationship holds true for untested values.

3A INTRODUCTION TO REGRESSION (Further Maths)

The process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships between the data and possibly make predictions based on the given data set.

Regression analysis is concerned with finding these straight lines using various methods so that the number of points above and below the line is ‘balanced’.

3A Method of Fitting Lines by Eye

There should be an equal number of points above and below the line.

Example 1 :

Fit a straight line to the data in the figure using

the equal-number-of-points method.

Exercise 3A

1. Fit a straight line to the data in the scatterplots using the equal-number-of-points method.

11

Page 12: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

3B Fitting a straight line — the 3-median methodFitting lines by eye is useful but it is not the most accurate of methods.We can find the line of best fit in the form of ___________________________________One method to find the line of best fit is called the 3-median method.This method is as follows:Step 1. Plot the points on a scatterplot.Step 2. Divide the points into 3 groups (lower, middle and upper) using vertical divisions

(a) If the number of points is divisible by 3, divide them into 3 equal groups (b) If there is 1 extra point, put the extra point in the middle group

(c) If there are 2 extra points, put 1 extra point in each of the outer groupsStep 3. Find the median point of each of the 3 groups and mark each median on the

scatterplot (the median of the x-values and the median of the y-values in the group).

(a) The median of the lower group is denoted by ( xL , yL )

(b) The median of the middle group is denoted by ( xM , yM ) (c) The median of the upper group is denoted by ( xU , yU )

12

Page 13: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

Note: Although the x-values are already in ascending order on the scatterplot, the y-values within each group may need re-ordering before you can find the median.Steps 4 and 5 can be completed using 2 different approaches; graphical or arithmetic

Graphical approach

Step 4. Place your ruler so that it passes through the lower and upper medians. Move the ruler a third of the way toward the middle group median while maintaining the slope. Hold the ruler there and draw the line.

Step 5. Find the equation of the line (general form y mx c). There are two general methods.(a) Method A: Choose two points which lie on the line and use these to find the

gradient of the line and then the equation of the line.

m=

y2− y1

x2−x1 Substitute the coordinates of one point and m into the equation to find c

(b) Method B: If the scale on the axes begins at zero, you can read off the y-intercept of the line and calculate the gradient of the line.

Arithmetic approach

Step 4. Calculate the gradient (m) of the line. Use the rule: m=

yU− y L

xU−x L

Step 5. Calculate the y-intercept (c) of the line. Use the rule:

c=13 [ ( y L+ yM + yU )−m ( xL+xM +xU )]

Thus, the equation of the regression line is y mx c.

Example 1: Find the equation of the regression line for the data in the table at right using the 3-median method. Give coefficients correct to 2 decimal places.

1. Sketch the scatterplot then divide it into 3 groups.

13

Page 14: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

2. Using graphical approach to find the equation for the line of best fit.

14

Page 15: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

3. Using arithmetic approach to find the equation for the line of best fit.

i. Find the gradient of the line

ii. Find y-intercept

iii. Find the equation of the line

CAS CALCULATOR

Fitting a Straight Line Using the 3 Median Method

Example 2

Find the equation of the regression line for the data in the table below using the 3-median method. Give coefficients correct to 2 decimal places.

On a Lists & Spreadsheet page, enter x-values into column A and y-values into column B. Label the columns accordingly.

15

Page 16: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

To draw a scatterplot of the data, add a Data & Statistics page.

Tab e to each axis to select ‘Click to add variable’. Place x on the horizontal axis and y on the vertical axis.

The graph should appear as shown. If you move the pointer lover any point and press Click x twice, the coordinates for that point will be displayed.

To fit a regression line, complete the following steps. Press:

MENU b 4: Analyse 4 6: Regression 6 3: Show Median–Median 3

Exercise 3B

16

Page 17: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

17

Page 18: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

3C Fitting a straight line — least-squares regressionAnother method for finding the equation of a straight line which is fitted to data is known as the method of least-squares regression. It is used when data show a linear relationship and have no obvious outliers.

To understand the underlying theory behind least-squares, consider the regression line shown below.

We wish to minimise the total of the vertical lines, or ‘errors’ in some way. For example, balancing the errors above and below the line. This is reasonable, but for sophisticated mathematical reasons it is preferable to minimise the sum of the squares of each of these errors. This is the essential mathematics of least-squares regression.

Choosing Between 3-Median and Least –Squares RegressionThe 3-median method should be used in preference to least-squares regression method if there are clear outliers in the data

Calculating the least-squares regression line by handSummary data needed:

x¿

the mean of the independent variable (x-variable)y the mean of the dependent variable (y-variable)sx the standard deviation of the independent variablesy the standard deviation of the dependent variabler Pearson’s product–moment correlation coefficient.

Formula to use:The general form of the least-squares regression line is

Where the slope of the regression line is

18

Page 19: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

the y-intercept of the regression line is

Example 3: For the given table below, find the regression line using the least square method.

x 1 3 4 7 10 12 14 15y 11 9 10 6 8 4 3 1

To find x, y , sx, sy and r from two set of data using CAS MENU b 4: Statistics 4 1: Stat Calculations 1 2: Two-Variable Statistics

x=¿ ____________

y=¿ ____________

sx = ___________ sy = ____________ r = ____________

(a) Find the gradient of the regression line

(b) Find the y-intercept of the regression line

(c) Find the equation of the regression line

Example 4:A study to find a relationship between the height of husbands and the height of their wives revealed the following details.Mean height of the husbands: 180 cmMean height of the wives: 169 cmStandard deviation of the height of the husbands: 5.3 cmStandard deviation of the height of the wives: 4.8 cmCorrelation coefficient, r 0.85The form of the least-squares regression line is to be: Height of wife m ×height of husband c(a) Which variable is the dependent variable? ______________________________(b) Calculate the value of m for the regression line (to 2 decimal places).

(c) Calculate the value of c for the regression line (to 2 decimal places).

(d) Use the equation of the regression line to predict the height of a wife whose husband is 195 cm tall (to the nearest cm).

19

Page 20: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

The calculation of the equation of a least-squares regression line is simple using a CAS calculator.

Example 5:A study shows the more calls a teenager makes on their mobile phone, the less time they spend on each call. Find the equation of the linear regression line for the number of calls made plotted against call time in minutes using the least-squares method on a CAS calculator. Express coefficients correct to 2 decimal places.

Number of minutes (x) 1 3 4 7 10 12 14 15Number of calls (y) 11 9 10 6 8 4 3 1

On a Lists & Spreadsheet page, enter the minutes values into column A and the number of calls values into column B. Label the columns accordingly.

To draw a scatterplot of the data in a Data & Statistics page, tab e to each axis to select ‘Click to add variable’. Place minutes on the horizontal axis and calls on the vertical axis. The graph will appear as shown.

To fit a least-squares regression line, complete the following steps. Press:

MENU b 4: Analyse 4 6: Regression 6 1: Show Linear (mx+b) 1

To find r and r2, return to the Lists & Spreadsheet page by pressing Ctrl/and then the left arrow ¡ Summary variables are found by pressing:

MENU b 4: Statistics 4 1: Stat Calculations 1 3: Linear Regression (mx+b) 3

Complete the table as shown below and press OK to display the statistical parameters. Notice that the equation is stored and labelled as function f1.

The regression information is stored in the first available column on the spreadsheet.

20

Page 21: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

Exercise 3C

21

Page 22: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

3E Residual analysisThere are situations where the mere fitting of a regression line to some data is not enough to convince us that the data set is truly linear. Even if the correlation is close to 1 or – 1 it still may not be convincing enough.

The next stage is to analyse the residuals, or deviations, of each data point from the straight line.

A residual is the vertical difference between each data point and the regression line.

When we plot the residual values against the original x-values and the points are randomly scattered above and below zero (x-axis), then the original data is most likely to have a linear relationship.

22

Page 23: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

If the residual plot shows some sort of pattern then the original data probably is not linear

Residual PlotTo produce a residual plot, carry out the following steps:

Step 1. Draw up a table as follows

x 1 2 3 4 5 6 7 8 9 10

y 5 6 8 15 24 47 77 112 187 309

ypred

Residuals

(yypred)

Step 2. Find the equation of the least-squares regression line y = mx + b using the graphics calculator.

Step 3. Calculate the predicted y-values (ypred) using the least squares regression equation. The predicted y-values are the y-values on the regression line.Put these values into the table.

Step 4. Calculate the residuals.

Residual value = y - ypred

actual data value y-value from the regression line

23

Page 24: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

Enter these values into the table.Note: the sum of all the residuals will always add to zero (or very close).

Step 5. Plot the residual values against the original x-values.If the data points in the residual plot are randomly scattered above and below zero (the x-axis), then the original data will probably be linear.If the residual plot shows a pattern then the original data is not linear.

Example 8Use the data below to produce a residual plot and comment on the likely linearity of the data.Step 1.

x 1 2 3 4

y 5 6 8 15

ypred

Residual (y – y pred)

5 6 7 8 9 10

24 47 77 112 187 309

Step 2. Equation of the least-squares regression line.

y = ax + b

Step 3. Calculate the predicted y-values using the equation _________________________________

When x = 1 ypred =24

Page 25: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

=

=

When x = 2 ypred =

=

=

Or use the CAS calculator to get the ypred values from the regression line by opening a Graphs & Geometry page and enter the equation of the least-squares regression and press enter.

Once you have the graph press Menu b, 5: Trace 5 1: Graph Trace 1. Type in the x value and the corresponding y value will appear.Step 4. Calculate the residuals.

Residual = y ypred

Residual =

=

=

Residual =

=

=

Calculate the rest of the residuals and enter them into the table. Add all residuals to check it equals zero.

Step 5. Plot residual values against original x-values.

25

Residual

Page 26: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

x

y

0 1 2 3 4 5 6 7 8 9 10

-50-40-30-20-10

0102030405060708090

100

The residual plot shows ______________________________________________________________________

_________________________________________________________________________________________

_________________________________________________________________________________________

Using a CAS calculator

Find the equation of a least-squares regression line.Enter the data on a Lists & Spreadsheet page.To find the values of m and b for the equation y = mx + b press

MENU b 4: Statistics 4 1: Stat Calculations 1 3: Linear Regression ( mx + b) 3

To generate the residual values in their own column, move to the shaded cell in column E and press:

Ctrl / MENU b 4: Variables … 4 3: Link To: ¢ 3 Select the list stat6.resid

Write down all of the residuals displayed in the column. Scroll down for the complete list of values.

Note: The stat number will vary depending on the calculator and previously stored data.

26

Page 27: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

Example 9

Using the same data as in Worked example 8, plot the residuals and discuss the features of the residual plot.

Generate the list of residuals as demonstrated in Example 8.

On the Data & Statistics page select x for the x-axis and stat.resid for the y-axis.

To identify if a pattern exists, it is useful to join the residual points. To do this, press:

MENU b 2: Plot Properties 2 1: Connect Data Points 1

27

Page 28: Exercise 2F - My Mathstariqphy.weebly.com/.../2/8/0/52805335/chapter_4_biva… · Web viewThe process of ‘fitting’ straight lines to bivariate data enables us to analyse relationships

Exercise 3E

28