26
1 UNIVERSITY OF TORONTO Faculty of Arts and Science SUMMER EXAMINATIONS - JUNE 2000 STA220F Duration - 3 hours AIDS ALLOWED: (to be supplied by the student) Non-programmable calculator One handwritten 8 1/2'' x 11'' aid sheet (both sides may be used) NAME_____________________________________________________ STUDENT NUMBER___________________________________________ GIVE YOUR ANSWERS ON THE SEPARATE SCANTRON ANSWER SHEET WHICH MUST BE FILLED IN, IN PENCIL. WRITE IN YOUR NAME & STUDENT NUMBER WHERE REQUIRED ON THE SCANTRON AND FILL IN THE CORRESPONDING BUBBLES IN THE SPACES BELOW. IF YOU MAKE A MISTAKE, ERASE IT. There are 30 questions. Each is worth 3.33 marks. There are no penalties(negative marks) for wrong answers. A blank response receives 0.67 marks. If answers are given to say 2 decimal places, then that is the desired accuracy. N (0, 1) and t tables are attached at the end. If, for whatever reason, the correct numerical answer does not appear as one of the 5 choices for a numerical question, the closest answer will be deemed to be the correct one. PLEASE CHECK AND MAKE SURE THAT THERE ARE NO MISSING PAGES IN THIS BOOKLET.

UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

1

UNIVERSITY OF TORONTO Faculty of Arts and Science

SUMMER EXAMINATIONS - JUNE 2000

STA220F

Duration - 3 hours

AIDS ALLOWED: (to be supplied by the student) Non-programmable calculator One handwritten 8 1/2'' x 11'' aid sheet (both sides may be used) NAME_____________________________________________________ STUDENT NUMBER___________________________________________ GIVE YOUR ANSWERS ON THE SEPARATE SCANTRON ANSWER SHEET WHICH MUST BE FILLED IN, IN PPEENNCCIILL. WRITE IN YOUR NAME & STUDENT NUMBER WHERE REQUIRED ON THE SCANTRON AANNDD FILL IN THE CORRESPONDING BUBBLES IN THE SPACES BELOW. IF YOU MAKE A MISTAKE, ERASE IT. There are 30 questions. Each is worth 3.33 marks. There are no penalties(negative marks) for wrong answers. A blank response receives 0.67 marks. If answers are given to say 2 decimal places, then that is the desired accuracy. N (0, 1) and t tables are attached at the end. If, for whatever reason, the correct numerical answer does not appear as one of the 5 choices for a numerical question, the closest answer will be deemed to be the correct one. PLEASE CHECK AND MAKE SURE THAT THERE ARE NO MISSING PAGES IN THIS BOOKLET.

Page 2: UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

1 10 40 70 100 130 160 190 (1)Consider the distribution of measurements above. Which of the following are true? (I)The upper quartile is bigger than 130. (II)The standard deviation is smaller than 80. (III)The distribution of the square roots of the measurements will be less skewed to the right (positively skewed) than the original measurements. (IV)If we standardize each of the measurements above, the new measurements will possess a roughly symmetric and bell shaped appearance. A) II, III and IV B) I and IV C) I , II and III D) II and III E) III and IV (2)In order to test H0 : μ = 60 vs Ha: μ ≠ 60 , a random sample of 9 observations (normally distributed) is obtained, yielding x- = 55 and s = 5 . What is the P-value of the test for this sample? A)greater than .10 B)between .05 and .10 C)between .025 and .05 D)between .01 and .025 E)less than .01

2

Page 3: UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

3

A number of questions below will all refer to the following study. A medical centre collected data on the blood cholesterol levels of heart attack patients. A total of 28 heart-attack patients had their cholesterol levels measured 2 days after the attack, 4 days after, and 14 days after (data not obtainable for some patients at 14 days-indicated by asterisk). Additionally, cholesterol levels were recorded for a control group of 30 people who had not had heart attacks. The data is below: MTB > Retrieve 'cholest.MTW' Worksheet retrieved from file: cholest.MTW MTB > information COLUMN NAME COUNT MISSING C1 2-DAY 28 C2 4-DAY 28 C3 14-DAY 28 9 C4 CONTROL 30 MTB > print c1-c4 ROW 2-DAY 4-DAY 14-DAY CONTROL 1 270 218 156 196 2 236 234 * 232 3 210 214 242 200 4 142 116 * 242 5 280 200 * 206 6 272 276 256 178 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 10 242 288 * 182 11 186 190 168 182 12 266 236 236 198 13 206 244 * 182 14 318 258 200 238 15 294 240 264 198 16 282 294 * 188 17 234 220 264 166 18 224 200 * 204 19 276 220 188 182 20 282 186 182 178 21 360 352 294 212 22 310 202 214 164 23 280 218 * 230 24 278 248 198 186 25 288 278 * 162 26 288 248 256 182 27 244 270 280 218 28 236 242 204 170 29 200 30 176

Page 4: UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

MTB > nscores of ‘2-day’ put into c90 MTB > name c90 'nscores' MTB > plot ‘2-day’ vs ‘nscore’

2 MTB > nscores of ‘4-day’ put into c90 MTB > plot ‘4-day’ vs ‘nscores’

3

4

Page 5: UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

MTB > nscores of ‘14-day’ put into c90 MTB > plot ‘14-day’ vs ‘nscores’

4 MTB > nscores of ‘control’ put into c90 MTB > plot ‘control’ vs ‘nscores’

5 N N* MEAN STDEV 2-DAY 28 0 253.93 47.71 4-DAY 28 0 230.64 46.97 14-DAY 19 9 221.47 49.18 CONTROL 30 0 193.13 21.30 ------------------------------------------------------------

5

Page 6: UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

6

(3) Suppose that we wished to test the null hypothesis that the mean cholesterol level of the heart attack patients at 14 days is equal to the mean cholesterol level of the controls. Which of the following are true? I) It would be appropriate to use the pooled t-test based on the t distribution with 47 d.f. II) “High power” for the test implies that there is a high probability of correctly concluding that the mean cholesterol level of the heart attack patients at 14 days is the same as the mean cholesterol level of the controls, when in fact they are the same. III) A type I error would be made if we conclude that the mean cholesterol level of heart attack patients at 14 days is the same as the mean level for controls, when in fact they are not the same. A) None B) I C) II D) III E) I and III (4)Which of the following are true? I) Strongest indication of negative skewness is for 2 day levels II) The distribution of 4 day levels is a longer(heavier/fatter)tailed distribution than the 14 day levels. III) Controls have the strongest indication of positive skewness A) II B) I and II C) II and III D) I and III E) All

Page 7: UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

7

Here is some more minitab output: MTB > let c10 = ‘2-day’ - ‘14-day’ MTB > name c10 'change' MTB > # Note that ‘change’ in c10 is the level at 2 days minus the level at 14 days MTB > describe ‘change’ N N* MEAN MEDIAN TRMEAN STDEV SEMEAN change 19 9 38.0 30.0 37.6 50.4 11.6 MIN MAX Q1 Q3 change -36.0 118.0 4.0 88.0 MTB > tinterval ‘change’ N MEAN STDEV SE MEAN 95.0 PERCENT C.I. change 19 38.0 50.4 11.6 ( 13,7 , 62.3) MTB > ttest for mean of 0 on ‘change’ TEST OF MU = 0.0 VS MU N.E. 0.0 N MEAN STDEV SE MEAN T P VALUE change 19 38.0 50.4 11.6 3.29 0.0041 MTB > twosample-t test for ‘2-day’ vs ‘14-day’; SUBC> pooled test. TWOSAMPLE T FOR 2-DAY VS 14-DAY N MEAN STDEV SE MEAN 2-DAY 28 253.9 47.7 9.0 14-DAY 19 221.5 49.2 11.3 95 PCT CI FOR MU 2-DAY - MU 14-DAY: (4.9, 60.0) TTEST MU 2-DAY = MU 14-DAY (VS NE): T= 2.38 P=0.022 DF= 45 POOLED STDEV = 48.3

Page 8: UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

8

MTB > twosample-t test 'control' vs '14-day' TWOSAMPLE T FOR CONTROL VS 14-DAY N MEAN STDEV SE MEAN CONTROL 30 193.1 21.3 3.9 14-DAY 19 221.5 49.2 11.3 95 PCT CI FOR MU CONTROL - MU 14-DAY: (-50.5, -6.2) TTEST MU CONTROL = MU 14-DAY (VS NE): T= -2.65 P=0.014 DF= 24 ---------------------------------------------------------------------------- (5) Which of the following are true? I) A test of the null hypothesis: “Mean cholesterol level of controls equals mean level of 14 day patients” versus the alternative: “Mean level of controls is less than the mean level of 14 day patients” has P-value =.007. II) A test of the null hypothesis: “Mean cholesterol level of controls equals mean level of 14 day patients” versus the alternative: “Mean level of controls does not equal the mean level of 14 day patients” has P-value =.014 III) A test of the null hypothesis: “Mean cholesterol level of controls equals mean level of 14 day patients” versus the alternative: “Mean level of controls is greater than the mean level of 14 day patients” has P-value =.993 A) All B) I and II C) II and III D) I and III E) II only (6) Which of the following are true? I) More than 75% of patients have lower cholesterol levels at 14 days than at 2 days. II) The 95% confidence interval for the average drop in cholesterol level, between 2 days and 14 days after attack, is (4.9, 60.0). III) If we calculated a 68% confidence interval for the average drop in cholesterol level between 2 days and 14 days after attack, it would include the value ‘57.0’ . A) All B) I and II C) II and III D) I and III E) I

Page 9: UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

9

More output: MTB > plot ‘14-day’ vs ‘4-day’ 300+ - * 14-DAY - * - - * * * * 250+ * - * * - - * - * 200+ ** * - * - * - * - * 150+ - * ------+---------+---------+---------+---------+---------+4-DAY 160 200 240 280 320 360 N* = 9 MTB > correlations among ‘2-day’ 4-day’ 14-day’ 2-DAY 4-DAY 4-DAY 0.673 14-DAY 0.393 0.712 MTB > regress c3 on 1 predictor= c2 , store standardized resids in c99 The regression equation is 14-DAY = 59.7 + 0.701 4-DAY 19 cases used 9 cases contain missing values Predictor Coef Stdev t-ratio p Constant 59.68 39.35 1.52 0.148 4-DAY 0.7009 0.1676 4.18 0.001 s = 31.20 R-sq = omitted R-sq(adj) = omitted Analysis of Variance SOURCE DF SS MS F p Regression 1 17018 17018 17.48 0.001 Error 17 16549 973 Total 18 33567

Page 10: UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

10

Unusual Observations Obs. 4-DAY 14-DAY Fit Stdev.Fit Residual St.Resid 21 352 294.00 306.39 21.53 -12.39 -0.55 X X denotes an obs. whose X value gives it large influence. MTB > plot ‘stresids’ vs ‘4-day’ MTB > plot c99 c2 - * - 1.2+ * - * * * stresids- * * - * - * 0.0+ * - * - * - * * * - * -1.2+ * - * - - * - ------+---------+---------+---------+---------+---------+4-DAY 160 200 240 280 320 360 N* = 9 --------------------------------------------------------------------------- (7)Regarding the regression of 14-day levels on 4-day levels, which of the following are true? I) 51% of the variation in 14-day levels can be explained by the linear relation with 4-day levels. II) The point with 4-day level = 352 and 14-day level = 294 has an unusally large residual. III) A non-linear (curved) type of model would be more appropriate for these data than the straight line model. A) none are true B) I C) II D) III E) I and II

Page 11: UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

11

More output: MTB > plot '14-day' vs '2-day' 300+ - * 14-DAY - * - - * * * * 250+ * - * * - - * - * 200+ * * - * * - * - * - * 150+ - * ----+---------+---------+---------+---------+---------+--2-DAY 160 200 240 280 320 360 N* = 9 MTB > regress '14-day' on 1 predictor: '2-day' store stresids in c99 The regression equation is 14-DAY = 130 + 0.354 2-DAY 19 cases used 9 cases contain missing values Predictor Coef Stdev t-ratio p Constant 129.69 52.96 2.45 0.025 2-DAY 0.3537 0.2009 1.76 0.096 s = 40.86 R-sq = 15.4% R-sq(adj) = 10.5% Analysis of Variance SOURCE DF SS MS F p Regression 1 5178 5178 3.10 0.096 Error 17 28388 1670 Total 18 33567

Page 12: UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

MTB > name c99 'stresids' MTB > plot 'stresids' vs '2-day' - * - 1.2+ * - * * * stresids- * * * - - * * 0.0+ - * - - * * * - * -1.2+ * * * - - * - - ----+---------+---------+---------+---------+---------+--2-DAY 160 200 240 280 320 360 N* = 9 MTB > hist 'stresids' Histogram of stresids N = 19 N* = 9 Midpoint Count -1.5 2 ** -1.0 4 **** -0.5 2 ** 0.0 2 ** 0.5 2 ** 1.0 5 ***** 1.5 2 ** MTB > nscores of 'stresids' put into c90

6

12

Page 13: UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

13

MTB > boxplot 'stresids' --------------------------- -----------I + I--------- --------------------------- +---------+---------+---------+---------+---------+------stresids -2.10 -1.40 -0.70 0.00 0.70 1.40 (8) Utilizing the fitted model above (appropriateness of this model we’ll ponder later), which of the following statements are true? (I) 15.4% of the variation in the 14-day cholesterol levels has been accounted for by the linear relationship with with 2-day levels (II) On average, an increase of 10 in 2-day levels is accompanied by an increase of 3.54 in 14-day levels. (III)The 14-day level for someone whoae 2-day level is 300 is estimated to be 236 (nearest 1.0) . A) All B) I and III C) II and III D) I only E) I and II (9) In assessing the adequacy of a model like the one above, the following possible problems are typically checked for. Which of the following conditions appear to be present in our analysis above? (I) An inspection of the scatterplot shows that a least squares line will not well represent the general relationship, due to either the influence of a small number of points or the existence of distinct clusters, etc. (II) The residual vs explanatory variable plot shows suspect or problem patterns (such as curvature, etc.). (III) The residuals have a clearly non-normal distribution. A) III only B) I and II C) I and III D) II and III E) All

Page 14: UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

14

(10) For the first 4 patients only, shown below, the regression of 4 day levels on 2 day levels produces: ROW 2-DAY 4-DAY 1 270 218 2 236 234 3 210 214 4 142 116 MTB > regress ‘4-day’ on 1 predictor ‘2-day’ The regression equation is 4-DAY = 6.8 + 0.880 2-DAY Predictor Coef Stdev t-ratio p Constant 6.83 70.69 0.10 0.932 2-DAY 0.8796 0.3219 2.73 0.112 s = 30.23 R-sq = 78.9% R-sq(adj) = 68.3% Analysis of Variance SOURCE DF SS MS F p Regression 1 6822.9 6822.9 7.46 0.112 Error 2 1828.1 914.1 Total 3 8651.0 If you were to consider various straight lines of the form: “4-day” = a + b “2-day” as potential fits to the four bivariate observations above, what is the smallest possible sum of squared residuals that might result (nearest 10)? A) 6820 B) 1830 C) 8650 D) 80 E) 910

Page 15: UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

15

(11) Given the following stem and leaf display of cholesterol levels of controls: Stem-and-leaf of CONTROL N = 30 Leaf Unit = 1.0 4 16 0246 8 17 0688 (8) 18 22222468 14 19 6888 10 20 0046 6 21 28 4 22 4 23 028 1 24 2 The IQR is closest to: A) 32 B) 27 C) 22 D) 17 E) 12

Page 16: UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

16

Consider the following Minitab generated boxplots of pulse rates for a group of students, shown separately according to indicated ‘level of regular physical activity’ , where 1= low level(inactive), 2=moderate level, 3=high level(very active) MTB > boxplot of 'pulse1'; SUBC> by 'activity'. ACTIVITY ------------------------- 1 ------------------I + I--- ------------------------- --------------- 2 ------------I + I-------------------- --------------- ------------- 3 -------I + I---------------- ------------- ----+---------+---------+---------+---------+---------+--PULSE1 50 60 70 80 90 100 (12)Which of the following statements are true? (I) None of the IQRs exceed 20 (II) Only the distribution for moderately active people shows left(negative) skewness. (III) The biggest median differs from the smallest by less than 15. A) None B) Only II and III C) Only III D) Only I E) Only II (13) Which of the following statements are true? (I) Over 75% of the moderately active have pulse rates below 78. (II) More than 25% of the students with low activity have pulse rates as high or higher than the highest rate of any of the very active. (III)The upper quartile pulse rate of the very active exceeds the upper quartile of the inactive by over 15 . A) None B) Only II and III C) Only III D) Only I E) Only II

Page 17: UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

17

(14) Which of the following are consequences of the Central Limit Theorem? (I) A large sample from a skewed population will have an approximately normal shaped histogram. (II)The mean of a population will be normally distributed if the population is quite large. (III) The average blood cholesterol level recorded in a SRS(simple random sample) of 100 students selected from a large population will be approximately normally distributed. (IV)The proportion of people who have 1998 incomes over $200,000, in a SRS of 10 people, selected from all Canadian income tax filers, will be approximately normally distributed. (V)The number of non-smokers counted in a simple random sample of 700 Ontario adults, is approximately normally distributed A) III and V B) I, IV, and V C) I, II, III and IV D) I, III, IV and V E) All (15) A manufacturing company claims that its new floodlight will last 1000 hours. After collecting a simple random sample of size ten, you determine that a 90% confidence interval for the true mean number of hours that the floodlights will last, µ, is (945,975). Which of the following are true? (Assume that all tests are non-directional or two-sided.) (I) For any α less than .10, we are not able to reject the null hypothesis that the true mean is 980. (II) If a 99% confidence interval for the mean were determined here, the numerical value 946 would have to lie in this interval. (III) If we wished to test the null hypothesis H0: µ = 968 ,we could say that the P-value must be > .10. A) Only II and III B) Only I and III C) Only I and II D) Only II E) All

Page 18: UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

18

(16) When testing H0 : µ = 5 vs Ha : µ ≠ 5 at α= .01 with n = 40 suppose that you know that β , the probability of a type II error, is equal to .02 when µ = 2 . Which of the following statements are true? (I) β > .02 when µ = 3 (II) β > .02 if the sample size was 50 (at µ=2) (III) β > .02 if σ had been twice as large (at µ=2) (IV)The power of the test at µ = 2 is .99 A) I, II, IV B) II, III C) II, IV D) I , III E) II, III, IV (17) A manufacturing plant is served by three workshops. In workshop 1 about 20 jobs are completed each day; in workshop 2, about 40 jobs are completed each day; in workshop 3, about 60 jobs are completed each day. At each workshop, the time to complete a job is approximately normally distributed with a mean of 5.0 hours. The variability in job completion times is similar at all three workshops. For a period of 1 year, each workshop recorded the days on which the daily average job completion time was less than 4.0 hours. Which workshop recorded the fewest such days? (in all likelihood) A)about the same number of days for each of the workshops B)workshop 1 C)workshop 2 D)workshop 3 E)insufficient information to answer (18)General Motors of Canada has a new deal: “an oil, filter and lube job in 25 minutes or the next one free’. Suppose that you worked for GM, and after accumulating data, found that the time needed to provide these services was approximately normally distributed with a mean of 15 minutes and standard deviation of 3.0 minutes. How many minutes would you have recommended in the ad quoted above if it was decided that about 5 free services for every 1000 customers was reasonable? (round to the nearest minute) A) 15 B) 25 C) 17 D) 20 E) 23

Page 19: UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

19

(19)A special roulette wheel has the numbers 1 through 36, as well as 0. If you bet that an odd number comes up, you win or lose $10 according to whether or not that event occurs. If X denotes your ‘net’ gain, X=10 with probability 18/37 and X = -10 with probability 19/37. Suppose that you play this game 100 times. Let Y be your net gain after these 100 plays.. The mean (expectation) and standard deviation of Y are, respectively(to two significant figures): A) None of the other answers B) -10; 99 C) -19; 1000 D) -100; 27 E) -27; 100 (20) A disease is known to affect 1 out of every 100 people in a population and there is a clinical test being used to test for the presence of this disease. When a person has the disease the test comes back positive 99.5 % of the time. The test also produces some false positives: 1% of the uninfected patients test positive. You have just tested positive. What is the probability that you do not have the disease? A) .98 B) .67 C) .55 D) .52 E) .50 (21) In a study, there is a response variable y measured in degrees Celsius. The correlation with a quantitative explanatory variable x(measured in inches) is =.50; the slope of the regression upon x is =2.0; the mean response in lab A =20; the standard deviation of the response in lab A = 5; the pooled two-sample t-statistic for comparing the mean response in lab A and in lab B =1.0 If we convert the response variable to Fahrenheit, we should find: [Note that Fahrenheit degrees= 32 + (9/5) Celsius degrees ] (I) correlation coefficient(with x) =.50 (II) regression line (on x) slope=2.0 (III) mean response in lab A=36 (IV) standard deviation of the response in lab A=41 (V) the numerical value of the pooled two-sample t-statistic now = 1.0 Which of the above are correct? A) I and V B) I, II and IV C) II, III and IV D) I, IV and V E) II, III and V

Page 20: UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

20

(22) Suppose that many different researchers carry out tests of the same null hypothesis, with a total of 200 tests being conducted worldwide(independently). In exactly twelve tests out of the 200, results are statistically significant at the 5% level. Which of the following statements are true? (I)If the null hypothesis were in fact true, the number of statistically significant test results(5% level), out of 200 tests, should follow a Binomial distribution, with a mean of 10 and stdev of 3.08 . (II) We should report to the news media that we have found statistically significant evidence enabling us to reject the null hypothesis. (III) The null hypothesis has not been disproven, since the results are not at all unusual, i.e. the level of statistical significance for these findings should be considered to be unimpressive. A) I only B) II only C) III only D) I and II E) I and III

Page 21: UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

(23)A simple random sample of 600 voters in one province were cross-classified by income level and political party of choice in an upcoming election, as shown below: INCOME LEVEL POLITICAL PARTY -------------- -------------------------------------------------------- P.C. Liberal N.D.P | Total: ----------------------------------------------|--------- Low 20 40 30 | 90 | Moderate 150 100 20 | 270 | High 110 60 10 | 180 | Very High 40 15 5 | 60 -----------------------------------------------------------------|------- Total: 320 215 65 | 600 Which of the following statements are true? (I) The proportion of the high income earners who are Liberal supporters has estimated variance:

.00123][= 180

180120

18060

⎟⎠⎞

⎜⎝⎛⎟⎠⎞

⎜⎝⎛

(II) The estimate of the proportion of the voters who are either NDP supporters or low income earners is 155/600 (III)A 95 percent confidence interval for the difference between the proportion of high income earners who are Liberal supporters and the proportion of low income earners who are Liberal supporters is: .11 ± (1.96)(.095) (To the number of decimals displayed.) A) All of them B) I and III only C) II and III D) I and II E) I only

21

Page 22: UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

(24)Suppose that one person is randomly selected from the 600 voters mentioned above. Which of the following are true? (I)The probability that the selected person is a Liberal supporter is: 215/600 (II) The probability that the selected person is an NDP supporter and is of low income is

60090x

60065

(III) The selected person turns out to be of high income. The probability that this voter will also be classified as a Liberal supporter is 60/600 A) none are true B) I C) III D) II E) I and II (25) Assume that a U.S. study and a Canadian study to estimate the proportion of adults in favour of capital punishment are conducted using simple random samples(not really practical). Assume the true unknown proportions in the 2 countries are fairly similar. The U.S. survey uses a sample 16 times bigger than the Canadian sample. Both samples are quite large. The U.S. population is 9 times bigger than the Canadian population. The confidence interval in the U.S study is approximately A) 4 times wider than the Canadian study confidence interval B) the same width as the Canadian study confidence interval C) one sixteenth of the width of the Canadian study confidence interval D) 3 times narrower than the Canadian study confidence interval E) one quarter the width of the Canadian study confidence interval

22

Page 23: UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

23

(26)A large shipment of items is accepted by a quality checker only if a random sample of 5 items contains no defective ones. Suppose that in fact 20% of all items produced are defective. Find the probability that the next two shipments will both be rejected. (Hint: First determine the probability that a single shipment is rejected; then consider the case of 2 shipments) A) .28 B).82 C).45 D) .32 E) .64 (27)Suppose that the weights of airline passengers are known to have a distribution with a mean of 77 kg. and standard deviation of 10.0 kg. . A certain plane has a passenger weight capacity of 7900 kg. . What is the probability that a flight with 100 passengers will exceed the capacity? [Hint: find the mean and standard deviation of the total(or average) weight of 100 passengers] A).02 B).05 C).08 D).11 E).14

Page 24: UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

24

Patient: Drug: Recovery Time (in days): 1 A 29 2 A 11 3 B 26 4 B 23 5 A 25 6 B 28 7 B 14 8 B 17 9 A 16 10 A 21 (28) The above data were obtained in an attempt to discover whether a change in a drug would result in improved average recovery time. The investigator had 10 patients; 5 were given drug A, and the remaining 5 were given the supposedly improved drug B. The drugs were assigned to the patients as shown above. The investigator arrived at this arrangement by taking 10 playing cards, 5 red corresponding to drug A and 5 black corresponding to drug B. The cards were throughly shuffled and then dealt out to give the sequence shown in the table. The first card was red, the second card was red, the third was black, and so forth. How many of the following statements are TRUE?: (I)This design is not properly randomized in the sense that the investigator has not assigned experimental units to treatments in a way to ensure that all possible assignments of subjects to drugs are equally likely. (II)The above experiment is an example of a completely randomized design. (III) The investigator could have matched the patients by age and/or physical condition into pairs and then randomly assigned the two drugs to the two subjects in each pair. This approach is called ‘blocking’ and usually has the effect of decreasing the relevant standard error. (IV) The type of drug is the response variable here, while the recovery time is the explanatory variable (or factor). A) all are true B) three of them are true C) two of them are true D) one of them is true E) none of them are true

Page 25: UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

25

(29)Which of the following statements makes the LEAST SENSE? A) The P-value is calculated using the assumption that the null hypothesis is true. B) The P-value decreases as the weight of evidence against the null hypothesis increases. C) A P-value below .05 indicates that an outcome this(or even more) unusual occurs no more than 5 times in a hundred when the null hypothesis is true. . D) For moderate sample sizes, P-values for non-directional t-tests will be fairly accurate even when the distribution shapes are somewhat skewed E) When the P-value is above .05, we know that the null hypothesis will be true more than 5 times in 100 . (30) Assume that the distribution of blood types among Canadians is approximately as follows: 20% type A, 20% type B, 50% type O, and 10% type AB. Suppose that the blood types of married couples are independent and that both the husband and wife follow this distribution. What is the probability that in a randomly chosen married couple, one will have type O blood and the other type B blood? (pick the closest answer) A) .25 B) .20 C) .15 D) .10 E) .05 “Total Pages = 25”

Page 26: UNIVERSITY OF TORONTOfisher.utstat.toronto.edu/~hadas/STA220 UTM/Past... · 7 160 146 142 184 8 220 182 216 198 9 226 238 248 160 ... 27 244 270 280 218 28 236 242 204 170 29 200

26

Solns to summer 2000 exam : ddaea ebaeb bcaaa ddeee aeebe caceb part marks: 5-b +2 6-b +2