18
A LEVEL MATHEMATICS QUESTIONBANKS REGRESSION AND CORRELATION 1. Sketch scatter diagrams with at least 5 points to illustrate the following: a) Data with a product moment correlation coefficient of –1 [1] b) Data with a rank correlation coefficient of 1, but product moment correlation coefficient less than 1 [1] c) Data with a product moment correlation coefficient of 0.1 [1] 2. The following data were obtained on the heights (in cm) and masses (in kg) of 10 children: Child A B C D E F G H I J Height (H) 143 120 131 128 118 106 118 138 144 101 Mass (M) 37 34 30 38 29 25 50 42 38 18 H = 1247; M = 341; H 2 = 157 459; HM = 43 223; M 2 = 12 367 a) Plot a scatter diagram of M on H [2] One child is significantly overweight. b) Use your diagram to identify this child, explaining the reasons for your choice [2] c) Omitting the child identified in b), calculate the equation of a suitable regression line for estimating the mass of a child of height 124cm, giving all values correct to three significant figures [9] d) Explain why your line would not be suitable for estimating the weight of a baby of height 54cm [1] Page 1

Questions - Regression and Correlation

Embed Size (px)

DESCRIPTION

A-level

Citation preview

Page 1: Questions - Regression and Correlation

A LEVEL MATHEMATICS QUESTIONBANKS

REGRESSION AND CORRELATION

1. Sketch scatter diagrams with at least 5 points to illustrate the following:

a) Data with a product moment correlation coefficient of –1[1]

b) Data with a rank correlation coefficient of 1, but product moment correlation coefficient less than 1[1]

c) Data with a product moment correlation coefficient of 0.1[1]

2. The following data were obtained on the heights (in cm) and masses (in kg) of 10 children:

Child A B C D E F G H I JHeight (H) 143 120 131 128 118 106 118 138 144 101Mass (M) 37 34 30 38 29 25 50 42 38 18

H = 1247; M = 341; H2 = 157 459; HM = 43 223; M2 = 12 367

a) Plot a scatter diagram of M on H[2]

One child is significantly overweight.

b) Use your diagram to identify this child, explaining the reasons for your choice[2]

c) Omitting the child identified in b), calculate the equation of a suitable regression line for estimating the mass of a child of height 124cm, giving all values correct to three significant figures

[9]

d) Explain why your line would not be suitable for estimating the weight of a baby of height 54cm[1]

Page 1

Page 2: Questions - Regression and Correlation

A LEVEL MATHEMATICS QUESTIONBANKS

REGRESSION AND CORRELATION

3.

The scatter diagram show the cost (in pounds) and length of trip (in days) for the business trips taken by theemployees (A to H) of a certain firm last year.

a) Two of the trips were abroad. Identify, with reasons, the two employees who made these trips[2]

A regression of cost on length of trip is found

b) Explain the significance of the gradient and intercept of this line in terms of trip costs, giving examples to illustrate your answer

[4]

4.X -2 -1 0 1 2 3 4 5 6Y -3 -2 0 1 5 14 31 60 112

a) Plot a scatter diagram of the data in the table above.

[2]

b) Without further calculation, state the value of Spearman's rank correlation coefficient for this data[1]

c) Calculate the value of the product moment correlation coefficient of this data(x2 = 96; y2 = 17340; xy =1157; x = 18; y = 218)

[4]

It is suggested that there would be a higher correlation between U and Y, where U=X3

d) Comment on this suggestion with reference to your graph.[1]

Page 2

.E

.H

.A.B

.C.D

.F .G

Cost(£)

Length of trip(days)

Page 3: Questions - Regression and Correlation

A LEVEL MATHEMATICS QUESTIONBANKS

REGRESSION AND CORRELATION

5. In a biology experiment a student applied different volumes of water (W) to seven different tomato plants. He then measures the yield (Y) of tomatoes produced. He obtains the following results.

Plant A B C D E F GW 0 10 20 30 40 50 60Y 0 3 8 12 9 5 0

a) Draw a scatter diagram of Y on W[2]

b) Explain why you would not expect a high value of the product-moment correlation coefficient between W and Y

[2]

The student carries out a revised experiment in which he only varies W between 0 and 30. He again uses 7 plants and obtains: W = 105; Y = 41; W2 = 2275; Y2 = 375; WY = 920

c) Calculate the product-moment correlation coefficient between W and Y and comment on your value[5]

d) Explain why it is not true to say “the more water, the higher the yield”[2]

6. x and y are the scores obtained by 8 children in tests on English and Mathematics respectively. x = 544; x2 = 39904; y = 513; y2 = 34691; xy = 36946

a) Calculate the product moment correlation coefficient between x and y[4]

b) Test whether there is significant positive correlation between x and y at the 5% level.[3]

c) It is suggested that pupils who are good at English are rarely good at maths.Use your results to comment on this assertion.

[2]

Page 3

Page 4: Questions - Regression and Correlation

A LEVEL MATHEMATICS QUESTIONBANKS

REGRESSION AND CORRELATION

7. A geography student is investigating the relationship between the size of a shopping centre (measured by the number of shops it contains) and the mean distance travelled by shoppers to reach the shopping centre. She obtains the following data:

Centre A B C D E FNo. of shops 6 20 15 30 30 60Mean distance (km) 0.5 2.1 2.4 4.1 3.9 6.2

a) Calculate Spearman’s rank correlation coefficient between mean distance travelled and number of shops[7]

The student later obtains additional data for shopping centres G, H and I, as shown below:

Centre G H INo. of shops 7 15 17Mean distance (km) 4.2 3.8 4.0

b) Without doing any further calculations, explain with reasons whether this additional data will cause the rank correlation coefficient to increase, decrease or remain unaltered

[2]

8. The following table shows the ranks given to the 10 contestants in a beauty contest by two judges:

Contestant A B C D E F G H I JJudge X 1 6 2 9 5 3 8 10 7 4Judge Y 2 5 2 8 4 1 9 10 7 6

a) Calculate Spearman's rank correlation coefficient for this data

[7]

b) Test, at the 1% level, whether there is significant agreement between the judges.[3]

Page 4

Page 5: Questions - Regression and Correlation

A LEVEL MATHEMATICS QUESTIONBANKS

REGRESSION AND CORRELATION

9. a) Give two circumstances when it would be appropriate to use Spearman's rank correlation coefficient instead of the product moment correlation coefficient.

[2]

The following are the times taken (in seconds) by 8 children to run two different races:

Child A B C D E F G HRace 1 12.1 13.6 14.2 13.8 12.4 12.9 12.8 13.6Race 2 21.4 23.0 32.2 26.2 24.4 23.0 23.0 27.7

b) Calculate Spearman's rank correlation coefficient for this data, and test at the 5% level whether it is significantly greater than zero.

[11]

Two children make the following assertions:Andrea: “A graph of time in race 1 against time in race 2 would be close to a straight line” Bijal: “If you get a high place in one race you will usually get a high place in the other race”

c) State whether either of these statements justified solely on the basis of the calculations you have already carried out. Explain your answer.

[4]

10. Spearman’s rank correlation coefficient was calculated as –0.46, based on 16 pairs of data.

a) Test whether this is significantly less than zero using a 5% level of significance[3]

b) Test whether this is significantly different to zero using a 5% level of significance[3]

Page 5

Page 6: Questions - Regression and Correlation

A LEVEL MATHEMATICS QUESTIONBANKS

REGRESSION AND CORRELATION

11.The diagram below shows the relationship between inflation (I) and unemployment (U) for various countries:

a) Estimate Spearman's rank correlation coefficient between U and I[2]

b) Comment on the assertion that “It is impossible to have low inflation with low unemployment”[2]

c) Comment on the suitability of the model I = A – BU, where A and B are positive constants[2]

d) Using this model, give an interpretation of the values A and

[2]

v) Suggest an improved model[1]

12.The table below gives data on pollution levels (L) and distance from the city centre (D)

D 0.6 1.3 1.6 2.1 2.5 3.3 4.1 4.6L 18 15 13 16 11 9 7 5

D = 20.1; D2 =64.13; L =94; L2 =1250; DL =193.6;

a) Calculate the equation of the regression line of L on D[7]

b) Use your equation to estimate the pollution level at a point 3km away from the city centre.[2]

c) Explain the significance of the coefficients of your regression line.[2]

Page 6

U

I

Page 7: Questions - Regression and Correlation

A LEVEL MATHEMATICS QUESTIONBANKS

REGRESSION AND CORRELATION

13.The table below shows the test marks in science (S) (out of 100) and Maths (M) (out of 120) for 8 pupils.

S 60 65 70 75 80 85 90 95M 120 109 97 85 72 59 45 46

A student is required to calculate the regression line of M on S. To make the calculations easier, he decides to use variables U and V instead of S and M, where U = 0.2(S80) and V = M90

a) Calculate the values of U and V[2]

b) Find the equation of the regression line of V on U, given that U2 = 44 and UV = 439[7]

c) Hence find the regression line of M on S[3]

d) One student is absent for the mathematics test. He obtained 40 on the science test. Use your regression line to obtain an estimate for his mathematics score, and comment on the reliability of your estimate.

[4]

14.a) Explain the difference between the regression line of y on x and the regression line of x on y, and show in a sketch the deviations that are to be minimised in each case.

[4]

An investigation is being carried out into the relationship between house-price (H) and earnings (E). Data is collected from people who have bought their own houses.The equation of the regression line of H on E is found to be H = 3.1E + 8000

b) Explain why this is the most appropriate regression line to use[1]

c) Explain the significance of the coefficients of the regression line[2]

Mr Windsor has inherited his house.

d) Explain why the use of this regression line may give an inaccurate estimate for the value of Mr Windsor’s house. State whether you expect it to be an over- or under-estimate and explain your answer.

[3]

v) What would be the new equation if, within the sample:i) House prices all increase by £5000

[1]

ii) House prices all increase by 10%[3]

Page 7

Page 8: Questions - Regression and Correlation

A LEVEL MATHEMATICS QUESTIONBANKS

REGRESSION AND CORRELATION

15. The following data were recorded in a science experiment to investigate the relationship between the length of a pendulum (L cm) and the time taken for one oscillation (T s):

L 20 25 30 35 40 45 50 60 70 80 90 100T 0.89 1.01 1.10 1.20 1.25 1.36 1.40 1.53 1.66 1.80 1.89 2.01

It is suggested that there is a relationship between T2 and L

a) Calculate the product moment correlation coefficient between T2 and LT2 = 25.783; T4=67.56280342; L=645; L2=42275; T2L =1689.8215)

[4]

b) Calculate the regression line of T2 on L[4]

c) Give a physical interpretation of the gradient of your line[1]

d) Suggest why it would not be appropriate to use this equation to find the oscillation time for a pendulum of length 1cm.

[2]

Page 8

Page 9: Questions - Regression and Correlation

A LEVEL MATHEMATICS QUESTIONBANKS

REGRESSION AND CORRELATION

16. A student is investigating the relationship between the average Mathematics test mark obtained by sixth-formers (M) and the number of hours per week they watch TV (T). She obtains the following data:

Sixth-former A B C D E F G H I J K LM 44 72 91 33 66 65 80 55 47 53 77 22T 21 5 23 28 10 12 4 14 17 17 7 28

a) Draw a scatter diagram to illustrate this data[2]

b) Without any calculations explain why you would not expect a particularly high value for the product moment correlation coefficient of this data.

[1]

The student decides to “adjust” her results by removing one sixth-former’s data to improve the correlation.

c) Which sixth-former’s data does she remove?[1]

d) Calculate the product moment correlation coefficient for the data with this individual removed.[7]

The student says in her conclusion “The more hours TV watched, the lower the test mark”.

v) Using your result from d), test at the 1% level whether this assertion is justified.[3]

The student decides to test her findings by using a regression line to predict the average test mark of Janice, who watches 17 hours TV per week

f) Omitting the same individual as before, calculate the equation of the appropriate regression line, and estimate Janice’s average test mark.

[8]

g) State with reasons whether this line could be used to do the following:i) Estimate the average test mark for Mike, who watches 35 hours TV per week

[2]

ii) Estimate the hours of TV watched by Saleem, who has an average test mark of 68. [2]

Page 9

Page 10: Questions - Regression and Correlation

A LEVEL MATHEMATICS QUESTIONBANKS

REGRESSION AND CORRELATION

17.The following data were obtained for variables X and Y

X 0.5 1 2 4 5 8 10 16 20 25 40Y 43 22 11 7 6 4 4 3 2.5 2 1.5

a) Draw a scatter diagram to represent this data[2]

The model Y = A + BX was suggested for this data, where A and B are constants

b) With reference to your diagram, explain why this model is not suitable[1]

It is suggested that the model Y = P + QZ (where Z = ) would be more suitable

c) With reference to your diagram, comment on this suggestion[1]

d) By calculating the equation of an appropriate regression line, obtain estimates for the values of P and Q

(Y = 106; = 4.325; = 5.38675625; = 117.78)

[6]

18.The data below were obtained from observations of the radioactivity (as measured by a Geiger counter) of a sample of a chemical (R) and the time (t) since the beginning of the experiment.

t 0 1 2 3 4 5 6 7 8R 401 280 200 142 98 68 50 34 23

a) Draw a scatter diagram of R against t[2]

A student suggests using the equation R = At + B, where A and B are constants, to model the data.

b) With reference to your diagram, explain why this would not be a suitable choice.[2]

An alternative model suggested is lnR = P + Qt, where P and Q are constants.

Given t = 36; lnR = 41.26; t2 = 204; (lnR)2 = 196.71; tlnR = 143.78

c) Calculate the equation of a suitable regression line to obtain the values of P and Q[7]

d) Estimate the values of the radioactivity count obtained at t = 4.5, giving your answer to the nearest whole number.

[3]

Page 10

Page 11: Questions - Regression and Correlation

A LEVEL MATHEMATICS QUESTIONBANKS

REGRESSION AND CORRELATION

19.The following table gives data on how the price per leaflet (P) varies with the number of leaflets produced (N) by a printer

N 100 500 1000 1500 2000 3000 5000 10000P (pence) 150 100 75 70 63 50 43 35

N=23 100N2=141 510 000 P=586; P2=52 568; NP = 1 086 000

a) Calculate the equation of the regression line of price per leaflet on number of leaflets, giving the coefficients correct to 3 significant figures

[7]

b) Use your equation to find an estimate of the total price for 2500 leaflets [3]

c) Find the points at which the regression line crosses the coordinate axes. [2]

d) Give an interpretation of these coordinates[2]

v) Comment on the limitations of this model[2]

f) The company purchasing the leaflets estimate they can spend no more than £2570 in total on leaflets. Find the maximum number of leaflets they can order.

[4]

Page 11

Page 12: Questions - Regression and Correlation

A LEVEL MATHEMATICS QUESTIONBANKS

REGRESSION AND CORRELATION

20.The table shows data on the mean annual temperature (T), the electricity consumption (E) and the gas consumption (G) for some households in various countries:

T 10.0 11.3 6.5 15.2 4.1 19.6 9.4 21.0 8.4 7.2E 400 150 640 300 700 150 300 340 460 400G 110 330 340 100 400 160 320 20 300 420

T = 112.7; T2 = 1553.71; S = 6340; ST = 59236;E= 3840; 2 = 1771800; G = 2500; G2 =801400; EG = 1053900;

a) Without carrying out any calculations, explain, giving your reasons, between which of the following pairs of variables would you expect the highest correlation:

T and ET and GT and S, where S=E+G

[2]

b) Calculate the equation of the regression line of S on T[7]

c) Explain why it would not be appropriate to calculate the regression line of T on S[1]

d) Explain why the regression line might not give correct predictions for large values of T[2]

e) Calculate the product moment correlation coefficient between E and G, and test at the 5% level whether there is any significant correlation

[7]

f) Explain why regression was used in b) and correlation in e), not vice versa. [2]

Page 12