15
PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name ________________________________________________ Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 1 of 15 PubHlth 640 Intermediate Biostatistics Spring 2015 Examination 1 Units 1 and 2 – Review of Introductory Biostatistics & Regression and Correlation Due: Monday March 2, 2015 Before you begin: This is a “take-home” exam. You are welcome to use any reference materials you wish. You are welcome to use the computer as you wish, too. However, you MUST work this exam by yourself and you may not consult with anyone. Instructions and Checklist: __1. Start each problem on a new page. __ 2. Write your name on every page. __ 3. Make a photo-copy of your exam for safekeeping prior to submission __ 4. Complete the signature page __ 5. Please DO NOT submit a copy of the exam questions!! I have them…. How to submit your exam (sorry – Faxed exams are NOTpermitted): (1) ONLINE Students Please be sure your name is somewhere on your submission. Next, save it as a SINGLE FILE pdf using the naming convention lastname_exam1.pdf. Email it to me at: [email protected] (2) Worcester Section. Please bring your exam (stapled please please please) to class on Monday March 2, 2015. If you are unable to come to class that day, I will accept a pdf (see instructions for online students). (2) Amherst Section Please put your exam (stapled please please please) in my mailbox, located in the mail room on the 4 th floor of Arnold House. If you are unable to come to Arnold House on Monday the 2nd, I will accept a pdf (see instructions for online students). (3) ALL I will also accept exams sent by U.S. Post. Please mail with postmark no later than March 2, 2015 to: Carol Bigelow School of Public Health/402 Arnold House University of Massachusetts/Amherst 715 North Pleasant Street Amherst, MA 01003-9304 Tel. 413-545-1319.

PubHlth 640 Intermediate Biostatistics Spring 2015 ...courses.umass.edu/biep640w/pdf/BE640 Exam I 2015.pdf2a. (2 points) If the binomial distribution holds, what is the probability

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name ________________________________________________

    Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 1 of 15

    PubHlth 640 Intermediate Biostatistics

    Spring 2015 Examination 1

    Units 1 and 2 – Review of Introductory Biostatistics & Regression and Correlation Due: Monday March 2, 2015

    Before you begin: This is a “take-home” exam. You are welcome to use any reference materials you wish. You are welcome to use the computer as you wish, too. However, you MUST work this exam by yourself and you may not consult with anyone. Instructions and Checklist: __1. Start each problem on a new page. __ 2. Write your name on every page. __ 3. Make a photo-copy of your exam for safekeeping prior to submission __ 4. Complete the signature page __ 5. Please DO NOT submit a copy of the exam questions!! I have them…. How to submit your exam (sorry – Faxed exams are NOTpermitted): (1) ONLINE Students Please be sure your name is somewhere on your submission. Next, save it as a SINGLE FILE pdf using the naming convention lastname_exam1.pdf. Email it to me at: [email protected] (2) Worcester Section. Please bring your exam (stapled please please please) to class on Monday March 2, 2015. If you are unable to come to class that day, I will accept a pdf (see instructions for online students). (2) Amherst Section Please put your exam (stapled please please please) in my mailbox, located in the mail room on the 4th floor of Arnold House. If you are unable to come to Arnold House on Monday the 2nd, I will accept a pdf (see instructions for online students). (3) ALL I will also accept exams sent by U.S. Post. Please mail with postmark no later than March 2, 2015 to: Carol Bigelow School of Public Health/402 Arnold House University of Massachusetts/Amherst 715 North Pleasant Street Amherst, MA 01003-9304 Tel. 413-545-1319.

  • PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name ________________________________________________

    Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 2 of 15

    Signature This is to confirm that in completing this exam, I worked independently and did not consult with anyone. Name: ___________________________________________________________ Date: ___________________________

    Thank you!

  • PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name ________________________________________________

    Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 3 of 15

    1. (10 points, total) Two healthy parents have a child with a severe autosomal recessive condition that cannot be identified by prenatal diagnosis. They realize that the risk of this condition for subsequent offspring is 0.25, but with to embark on a second pregnancy. During the early stages of the pregnancy, an ultrasound test determines that there are twins.

    1a. (3 points) Suppose the twins are identical (monozygotic). What is the probability that both twins are affected? What is the probability that exactly one twin is affected? 1b. (3 points) Suppose the twins are fraternal (dizygotic). What is the probability that both twins are affected? What is the probability that exactly one twin is affected? 1c. (4 points)

    Suppose the probability that the twins are identical is 0.33 and, thus, the probability that the twins are fraternal is 0.67. What is the overall probability that both twins are affected? What is the probability that exactly one twin is affected?

  • PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name ________________________________________________

    Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 4 of 15

    2. (10 points, total) An outbreak of acute gastroenteritis occurred at a nursing home in Baltimore, Maryland in December 1980. A total of 46 out of 98 residents of the nursing home became ill. People living in the nursing home shared rooms: 13 rooms contained 2 occupants, 4 rooms contained 3 occupants, and 15 rooms contained 4 occupants. One question that arises is whether or not a clustering of disease occurred for persons living in the same room. The following table shows the observed distribution of cases.

    2a. (2 points) If the binomial distribution holds, what is the probability of finding two cases of acute gastroenteritis in a room with 2 occupants?

    2b. (2 points) If the binomial distribution holds, what is the probability of finding two or more cases of acute gastroenteritis in a room with 3 occupants?

    2c. (2 points) If the binomial distribution holds, what is the probability of finding two or more case of acute gastroenteritis in a room with 4 occupants?

    2d. (2 points) One useful measure of geographical clustering is the number of rooms for which the number of cases of acute gastroenteritis is two or more. If the binomial distribution holds, what is the expected number of rooms with two or more cases of acute gastroenteritis over the entire nursing home?

    2e. (2 points) Finally, compare your answer to question #2d (the expected number of rooms with 2 or more cases) with the observed number of rooms with 2 or more cases. In 1-2 sentence, what is your assessment of whether or not there is any evidence of clustering of disease within rooms?

  • PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name ________________________________________________

    Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 5 of 15

    3. (10 points total) The distribution of serum levels of alpha tocopherol (serum vitamin E) is approximately normal with mean µ = 860 µg/dL and standard deviation σ = 340 µg/dL .

    3a. (3 points) What percent of people have serum alpha tocopherol levels between 400 and 1000 µg/dL? 3b. (3 points) Suppose a person is identified has having toxic levels of alpha tocopherol if his or her serum level is > 2000 µg/dL. What percentage of people will be so identified? 3c (4 points) A study is undertaken for evidence of toxicity among 2000 people who regularly take vitamin-E supplements. The investigators found that 4 people have serum alpha tocopherol levels > 2000 µg/dL. Is this an unusual number of people with toxic levels of serum alpha tocopherol?

  • PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name ________________________________________________

    Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 6 of 15

    4. (10 points total) Hypertensive patients are screened at a neighborhood health clinic and are given methyl dopa, a strong antihypertensive medication for their condition. They are asked to come back 1 week later and have their blood pressures measured again. Suppose the initial and follow-up systolic blood pressures (SBP) of the patients are given in the table below.

    Patient id Initial SBP Follow-up SBP 1 200.0 188.0 2 194.0 212.0 3 236.0 186.0 4 163.0 150.0 5 240.0 200.0 6 225.0 222.0 7 203.0 190.0 8 180.0 154.0 9 177.0 180.0

    10 240.0 225.0 To test the effectiveness of the drug, we want to measure the difference (D = Initial – Follow up) between initial and follow-up SBP blood pressures for each person. 4a (2 points) What are the sample mean and sd of D? 4b (2 points) What is the estimated standard error of the mean difference? 4c (3 points) Assume that D is distributed normal. Construct a 99% confidence interval for µD 4d (3 points) Using your answer to question #4c, what is your opinion regarding the effectiveness of methyl dopa in reducing systolic blood pressure?

  • PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name ________________________________________________

    Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 7 of 15

    5. (10 points total) Suppose that, among 25-34 year old males in the general population, the average daily intake of linoleic acid is 15 g. As part of a dietary-instruction program, ten 25-34 year old males adopted a vegetarian diet for one month. While on the diet, the average daily intake of linoleic acid was a sample mean =13 g with a sample standard deviation = 4 g. Suppose we are uncertain what effect a vegetarian diet will have on the level of linoleic-acid.

    5a. (2 points) What are the null and alternative hypotheses in this case? 5b. (2 points) Using the p-value method, carry out the appropriate statistical hypothesis test to compare the mean level of linoleic acid in the vegetarian population with that of the general population. 5c. (2 points) Suppose the sample standard deviation, based on a sample of n=20 subjects is s=5. Using the critical region method, test the null hypothesis HO: σ

    2 = 16 versus HA: σ2 ≠ 16 .

    5d. (2 points) Next, consider a sample of n = 20. Suppose the sample standard deviation in this sample is s = 5. Use p-value method to test the null hypothesis HO: σ

    2 = 16 versus HA: σ2 ≠ 16 .

    5e. (2 points) Now go back to the sample size n = 10. Using the summary statistics provided at the beginning of this question (sample mean =13 g and sample standard deviation = 4 g), compute a 90% confidence interval estimate of the true mean intake of linoleic acid in the vegetarian population of 25-34 year old males.

  • PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name ________________________________________________

    Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 8 of 15

    6. (10 points total) In a study of crop losses due to air pollution, plots of Blue Lake snap beans were grown in n = 12 open-top field chambers, which were fumigated with various concentrations of sulfur dioxide (X), in ppm. After a month of fumigation, the plants were harvested and the total yield (Y) of bean pods, in kg, was recorded for each chamber. Some preliminary calculations have been performed for you.

    n = 12 sx = 0.11724

    SSQ (residual) = 0.2955

    x = 0.12 sy = 0.31175

    y = 1.117

    rxy = −0.8506

    6a. (2 points) Calculate the linear regression of Y on X by obtaining the values of the estimated slope ( β̂1 ) and intercept ( β̂0 ). 6b. (2 points) Produce the analysis of variance table by completing the “?” entries in the table below.

    Source Sum of Squares DF Mean Square F-Ratio P

    Regression

    ?___________

    ?____

    ? ____________

    ?_________

    ?_____

    Residual

    ?___________ ?____ ?____________

    Total

    ?_____________

    ? ___

    6c. (2 points) Under the assumption that the linear model is applicable, calculate a 95% confidence interval estimate of an individual (single chamber) yield of beans exposed to x=0.24 ppm of sulfur dioxide. 6d. (2 points) Under the assumption that the linear model is applicable, calculate a 95% confidence interval estimate of the mean yield of beans grown under conditions of exposure to x=0.24 ppm of sulfur dioxide. 6e. (2 points) What percent of the observed variability in yield is explained by the fitted model?

  • PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name ________________________________________________

    Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 9 of 15

    7. (10 points total) To assess physical conditioning in normal individuals, it is useful to know how much energy they are capable of expending. Since the process of expending energy requires oxygen, one way to evaluate this is to look at the rate at which they use oxygen at peak physical activity. To examine the peak physical activity, tests have been designed where the individual runs on a treadmill. At specified time intervals, the speed at which the treadmill moves and the grade of the treadmill both increase. The individual is then systematically run to maximum physical capacity. The maximum capacity is determined by the individual; the person stops when unable to go further. Because physical conditioning is relative to the size of the individual, such measures take into account body size. One of these is VO2 MAX (ml/kg/min); this is computed by looking at the volume of oxygen used per minute per kilogram of body weight. Consider the following multiple predictor regression analysis of n=94 sedentary males with treadmill tests. The dependent (outcome) variable is Y = VO2 MAX . There are four predictors:

    X1 = treadmill duration (seconds) X2 = maximum heart rate (beats/minute) X3 = height (centimeters) X4 = weight (kilograms)

    A partial display of the regression results is provided. Coefficients Table Constant or Predictor β̂ SE(β̂ )

    X1 = treadmill duration 0.0510 0.00416 X2 = max heart rate 0.0191 0.0258

    X3 = height -0.0320 0.0444 X4 = weight 0.0089 0.0520

    Constant (intercept) 2.89 11.17 Analysis of Variance Table

    Source Sum of Squares DF Mean Square F-Ratio P

    Regression

    4,314.69

    ?____

    ? ____________

    ?_________

    ?_____

    Residual

    ?___________ ?____ ?____________

    Total

    5,245.31

    ?____

  • PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name ________________________________________________

    Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 10 of 15

    7a. (2 points) Compute the t-statistic value for testing the adjusted statistical significance of X1 = treadmill duration. What is its achieved significance (the p-value)? Do we reject β1 = 0 at the 10% significance level?

    7b. (3 points) Fill in the missing values in the analysis of variance table (-- corrected 2/17/2015 --).

    Source Sum of Squares DF Mean Square F-Ratio P

    Regression

    4,314.69

    ?____

    ? ____________

    ?_________

    ?_____

    Residual

    ?___________ ?____ ?____________

    Total

    5,245.31

    ?____

    7c. (3 points) Next, test the overall significance of the fitted model. In developing your answer, be sure to state the null and alternative hypotheses. In 1-2 sentences, interpret your findings. 7d. (2 points) What is R2? In reporting your answer, give its numerical value and 1 sentence, explain its meaning.

  • PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name ________________________________________________

    Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 11 of 15

    8. (10 points total) Consider a multiple linear regression to evaluate some hypothesized associations with plasma lipid levels of total cholesterol (Y), mg/dL, in a sample of 25 patients suffering from hyperlipoproteinemia. Two predictors were considered:

    X1 = weight (kg) X2 = age (years)

    Three models were fit. The table below shows the estimated regression model and the residual sum of squares (SSE) for each model. The total sum of squares, corrected is SSY = 145,377.04 Model Fitted line Sum of Squares Residual, SSE 1 Ŷ = 199.2975 +1.622X1

    135,145.3138

    2 Ŷ = 102.5751+ 5.321X2

    43,444.3743

    3 Ŷ = 77.983+ 0.417X1 + 5.217X2

    42,806.2254

    8a. (3 points) For each model, what is the predicted cholesterol level Ŷ for a 30-year old patient who weights 70 kg? Next, suppose the observed cholesterol for this patient is Y = 263 mg/dL. In 1-2 sentences, compare each of the 3 predicted values Ŷ with the observed Y =263 mg/dL. 8b. (3 points) For each model, what is the R2 value? 8c. (4 points) If you use R2 and model simplicity as your selection criteria, what model appears to be the best predictive model?

  • PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name ________________________________________________

    Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 12 of 15

    9. (10 points total) A psychologist performed a multiple predictor linear regression analysis of anxiety level (Y), measured on a scale ranging from 1 to 50, as the average of an index determined at three points in a 2-week period. Three predictors were considered:

    X1 = systolic blood pressure (mm Hg) X2 = IQ X3 = Job satisfaction, measured on a scale ranging 1 to 25.

    The following table summarizes the results obtained from a “variables-added-in-order” regression on data from a sample of size n=22.

    Source DF Sum of Squares Regression

    X1 X2 | X1

    X3 | (X1, X2)

    1 1 1

    981.326 190.232 129.431

    Residual, SSE 18 442.292

    9a. (5 points) Test for the significance of each independent variable as it enters the model. For each test, state the null and alternative hypotheses in terms of regression coefficient parameters. 9b. (5 points) Test for the significance of adding both X2 and X3 to a model already containing X1. State the null and alternative hypotheses in terms of regression coefficient parameters.

  • PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name ________________________________________________

    Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 13 of 15

    10. (10 points total) A randomized controlled trial is performed to test the effectiveness of a new medication in reducing the duration of “heartburn” following meals. 120 subjects give informed consent and participate in the trial. Prior to randomization, a baseline (before medication) average time of heartburn in minutes (X1) is recorded. Participants are then randomized. 60 subjects are randomized to receive the new medication. The other 60 are randomized to receive a placebo. X2 is the indicator (dummy) variable, indicating receipt of the new medication. The response variable of interest in this trial is the participant’s average time of heartburn (Y) on the “post-treatment” occasion of measurement. For your reference, the total sum of squares, corrected is TSS = 17,851.7. Thus, this is a multiple predictor regression setting with two predictors:

    X1 = baseline average time of heart burn (minutes) X2 = Indicator of randomization to the “active” treatment

    10a. (1 point) State the definition of X2, the indicator of randomization to the “active” treatment. State the definition of an appropriate interaction variable, for the interaction of X1 and X2. 10b. (1 point) What is the linear model for the expected value of Y = post-treatment average duration of heartburn if it is related to treatment only, with no confounding and no modification by baseline? In developing your answer, use terms such as E [ Y | X1, X2] for the expected value of Y for given values of X1 and X2, β0 for the intercept term, etc. 10c. (1 point) What is the linear model for the expected value of Y = post-treatment average duration of heartburn if it is related to treatment, with confounding but no modification by baseline? In developing your answer, use terms such as E [ Y | X1, X2] for the expected value of Y for given values of X1 and X2, β0 for the intercept term, etc. 10d (1 point) What is the linear model for the expected value of Y = post-treatment average duration of heartburn if it is related to treatment, and differently so (modified by) depending on baseline? In developing your answer, use terms such as E [ Y | X1, X2] for the expected value of Y for given values of X1 and X2, β0 for the intercept term, etc.

  • PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name ________________________________________________

    Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 14 of 15

    10e. (1 point) Complete the following “model 1” analysis of variance table by filling in the “?_____”.

    Source DF Sum of

    Squares Mean

    Square

    F

    p-value Regression

    X1

    ?_____

    ?______

    ?_____

    ?_____

    ?_____

    Residual, SSE ?______ ?_____ 73.43 Total,

    corrected

    ?____

    ?_______

    10f. (1 point) Complete the following “model 2” analysis of variance table by filling in the “?_____”.

    Source DF Sum of

    Squares Mean

    Square

    F

    p-value Regression

    X2

    ?_____

    ?______

    ?_____

    ?_____

    ?_____

    Residual, SSE ?______ ?_____ 142.73 Total,

    corrected

    ?____

    ?_______

    10g (1 point) Complete the following “model 3” analysis of variance table by filling in the “?_____”.

    Source DF Sum of

    Squares Mean

    Square

    F

    p-value Regression

    X1, X2

    ?_____

    ?______

    ?_____

    ?_____

    ?_____

    Residual, SSE ?______ ?_____ 71.66 Total,

    corrected

    ?____

    ?_______

  • PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name ________________________________________________

    Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 15 of 15

    10h. (1 point) Complete the following “model 4” analysis of variance table by filling in the “?_____”.

    Source DF Sum of

    Squares Mean

    Square

    F

    p-value Regression

    X1, X2, X1*X2

    ?_____

    ?______

    ?_____

    ?_____

    ?_____

    Residual, SSE ?______ ?_____ 72.28 Total,

    corrected

    ?____

    ?_______

    10i. (2 points) Your turn! Just give it a try. It’s only 2 points. Carry out the appropriate assessments of these fitted models to assess the benefit of the experimental treatment? Is there an overall benefit? Is it confounded by baseline? Is it modified by baseline? Report your findings in a 1 paragraph report. Here are the estimated betas and associated estimated standard errors for your use. Model 1 Model 2 Model 3 Model 4 X1 = baseline

    β̂ =

    0.81 - 0.89 0.89

    sê(β̂ ) =

    0.07 - 0.08 0.14

    X2 = treatment

    β̂ =

    - 5.80 -3.49 -3.70

    sê(β̂ ) =

    - 2.18 1.76 5.91

    X1 * X2

    β̂ =

    - - - 0.01

    sê(β̂ ) =

    - - - 0.17