36
Name: Statistics 112 Fall 2002 - Final Instructions: Please Read Carefully This exam is closed book. You may have three pages of notes (double-sided). JMP outputs included for each question may contain unnecessary parts. You may use a calculator. You may use a pen or pencil. Show all your work. If a question says to explain, you will get no credit without an explanation. However, it is not necessary to fill up all the room on the exam. A brief explanation will usually suffice. Write on the back of the pages if you need more room. All necessary outputs are attached. The statistical tables are in a separate attachment. There are blank pages at the end of the statistical tables for scratch work. Please write your answers legibly Question Max. Score Score Received I – 10 Multiple Choice (1 pt each) 15 II – 5 Choose Test (1.5 pts each) 7.5 III 11 IV 7 V 9.5

1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

Name:

Statistics 112Fall 2002 - Final

Instructions: Please Read Carefully

This exam is closed book. You may have three pages of notes (double-sided). JMP outputs included for each question may contain unnecessary parts. You may use a calculator. You may use a pen or pencil. Show all your work. If a question says to explain, you will get no credit without an explanation.

However, it is not necessary to fill up all the room on the exam. A brief explanation will usually suffice.

Write on the back of the pages if you need more room. All necessary outputs are attached. The statistical tables are in a separate

attachment. There are blank pages at the end of the statistical tables for scratch work. Please write your answers legibly

Question Max. Score Score ReceivedI – 10 Multiple Choice (1 pt each) 15II – 5 Choose Test (1.5 pts each) 7.5III 11IV 7V 9.5VI 12VII 10VIII 10IX 18Total Score 100

Page 2: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

I. (15) Write the best answer for the following multiple choice questions below the question. Do not give any explanation. (1.5 points each)

(1) In a statistical report, the statement is made that the 95% confidence interval for the percentage of babies who are boys is between 51% and 55% (i.e., 53% 2%). This means, that if, in the future, a 95% confidence interval is computed in the same way for each of a large number of random samples of the same size

(a) 95% of such intervals will cover (contain) the midpoint 53%(b) 95% of such intervals will cover (contain) the population percentage of boys.(c) 95% of such intervals will overlap (intersect) the interval 51% to 55%(d) 95% of such intervals will completely cover (contain) the interval 51% to 55%

(2) A study of human development showed two types of movies to groups of children. Crackers were available in a bowl, and the investigators compared the number of crackers eaten by children watching the different kinds of movies. One kind of movie was shown at 8 A.M. (right after the children had breakfast) and another at 11 A.M. (right before the children had lunch). It was found that more crackers were eaten during the movie shown at 11 A.M. than during the movie shown at 8 A.M. The investigators concluded that the different types of movies had an effect on appetite. The results cannot be trusted because

(a) the study was not double blind. Neither the investigators nor the children should have been aware of which movie was being shown.

(b) the investigators were biased. They knew beforehand what they hoped to show.(c) the investigators should have used several bowls, with crackers randomly placed

in each.(d) the time the movie was shown is a lurking variable

(3) A group of college students believes that herbal tea has remarkable restorative powers. To test its theory, the group makes weekly visits to a local nursing home, visiting with residents, talking with them and serving them herbal tea. After several months, many of the residents are more cheerful and healthy. Which of the following may be correctly concluded from this study?

(a) herbal tea does improve one’s emotional state, at least for the residents of nursing homes.

(b) there is some evidence that herbal tea may improve one’s emotional state. The results would be completely convincing if a scientist had conducted the study rather than a group of college students.

(c) the results of the study are not convincing since only a local nursing home was used and only for a few months.

Page 3: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

(d) the results of the study are not convincing since the effect of herbal tea is confounded with several other factors.

(4) Medical researchers are excited about a new cancer treatment that destroys tumors by cutting off their blood supply. To date, the treatment has been tried only on mice, but in mice it has been nearly 100% effective in eradicating tumors and appears to have no side effects. As evidence of the effectiveness of the new treatment in treating cancer in humans, these studies

(a) display a high degree of statistical significance and so with nearly 100% certainty will work in humans

(b) are convincing, assuming the results have been replicated in a large number of mice

(c) are convincing, assuming that proper randomization and control were used(d) suffer from lack of realism

(5) A news release for a diet products company reports: “There’s good news for the 65 million Americans currently on a diet.” Its study showed that people who lose weight can keep it off. The sample was 20 graduates of the company’s program who endorse it in commercials. The results of the sample are probably

(a) biased, overstating the effectiveness of the diet(b) biased, understating the effectiveness of the diet(c) unbiased because these are nationally recognized individuals(d) unbiased, but they could be more accurate. A larger sample size should be used.

(6) Which of the following will increase the value of the power in a statistical test of hypotheses?

(a) increase the significance level (b) increase the sample size(c) consider computing the power for a value of the alternative that is further from the

value of the parameter of interest under the null hypothesis(d) all of the above.

(7) We measure a response variable at each of several times. A scatterplot of versus time of measurement looks approximately like a positively sloping straight line. We may conclude

(a) the correlation between time of measurement and is negative, because logarithms of positive fractions (such as correlations) are negative.

(b) the rate of growth of is positive, but sloping down over time.(c) an exponential growth model would approximately describe the relationship

between and time of measurement.

Page 4: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

(d) A mistake has been made. It would have been better to plot versus the logarithm of the time of measurement.

(8) It is easy to measure the “diameter at breast height” of a tree. It’s hard to measure the total “aboveground biomass”of a tree, because to do this you must cut and weigh the tree. Ecologists commonly estimate the biomass using a power law. Based on data on 378 trees in tropical rain forests, the following relationship between biomass y measured in kilograms and diameter x measured in centimeters was estimated by least squares

What would you predict the biomass of a tropical tree 40 centimeters in diameters to be approximately?

(a) 7(b) 95(c) 1020(d) 1,020,000

(9) Does taking gingko tables twice a day provide significant improvement in mental performance? To investigate this issue, a researcher conducted a study with 150 adult subjects who took gingko tablets twice a day for a period of six months. At the end of the study, 200 variables related to the mental performance of the subjects were measured on each subject and the means compared to known means for these variables in the population of all adults. Nine of these variables were significantly better (in the sense of statistical significance) at the 5% level for the group taking the gingko tablets as compared to the population as a whole, and one variable was significantly better at the 1% level for the group taking the gingko tablets as compared to the population as a whole. It would be correct to conclude

(a) there is good statistical evidence that taking gingko tablets twice a day provides some improvement in mental performance.

(b) there is good statistical evidence that taking gingko tablets twice a day provides improvement for the variable that was significant at the 1% level. We should be somewhat cautious about making claims for the variables that were significant at the 5% level.

(c) these results would have provided good statistical evidence that taking gingko tablets twice a day provides some improvement in mental performance if the number of subjects had been larger. It is premature to draw statistical conclusions from studies in which the number of subjects is less than the number of variables measured.

(d) none of the above.

Page 5: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

(10) Are proficiency test scores affected by the education of the child’s parents? To answer this question, a random sample of 9-year old children was drawn. Each child’s test score and the education level of the parent with the higher level were recorded. The education categories are less than high school, high school graduate, some college, and college graduate. The null hypothesis for the one-way analysis of variance F test is that the population mean test scores are the same for all four education categories. The alternative hypothesis is

(a) that the population mean test score is larger for children of college graduates than for the other three educational categories

(b) that the population mean test score is smaller for children whose parents both did not graduate from high school than for the other three educational categories

(c) that the population mean test score for children of college graduates is larger than the population mean test score for children whose parents both did not graduate from high school

(d) none of the above.

Page 6: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

II. (7.5) The following is a list of some of the statistical methods that you have learned about in this course

One sample t-test One sample z-test Two sample t-test (pooled or unpooled) Two sample z-test Matched pairs t-test Simple linear regression Multiple linear regression One-way ANOVA

For each of the situations described below, state the technique (from the list above) that you believe is the most appropriate for answering the question of interest. If none are applicable, answer “none of the above.” No explanation is necessary. (1.5 points each)

Note: If a technique is appropriate that is a special case of another technique, give the answer of the less general technique. In other words, if one-way ANOVA is appropriate, answer one-way ANOVA rather than multiple linear regression. Similarly, if a two-sample t-test is appropriate, answer two-sample t-test rather than one-way ANOVA.

(1) A company has devised a new ink-jet cartridge for its plain-paper fax machine that it believes has a longer lifetime (on average) than the one currently being produced. To investigate its length of life, 225 of the new cartridges were tested by counting the number of high-quality printed pages each was able to produce. The sample mean and standard deviation were 1,511.4 pages and 35.7 pages respectively. The historical average lifetime for cartridges produced by the current process is 1,502.5 pages.

(2) A large proportion of U.S. teenagers work while attending high school. These heavy workloads often result in underachievement in the classroom and lower grades. A study of high school students in California showed that those who worked only a few hours per week had the highest grade point averages. The study collected data on the number of hours worked per week and grade point average (on a scale of 0 to 4.0) for a sample of 500 high school students. Does grade point average tend to decrease as number of hours worked increases?

Page 7: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

(3) A bank would like to develop a model to predict the total sum of money customers withdraw from automatic teller machines (ATMs) on a weekend so that they can be sure to stock an adequate amount of money in each of the machines. They have data on the amount of money withdrawn last weekend for a random sample of 35 ATM machines throughout the city. They believe several factors can be useful in predicting the amount of money withdrawn including the average assessed value of houses in the vicinity of the ATM machine, how far away the nearest branch office of the bank is from the ATM machine, and whether or not the ATM machine is located in a shopping center.

(4) Research scientists at a pharmaceutical company have recently developed a new nonprescription sleeping pill and want to determine whether or not the drug is effective. They decide to test its effectiveness by measuring the time it takes for people to fall asleep after taking the pill. A random sample of 100 volunteers who regularly suffer from insomnia is chosen. Each person is given one pill containing the newly developed drug and one placebo. Participants are told to take one pill one night and the second pill one night a week later (They do not know whether the pill they are taking is the placebo or the real thing, and the order of use is random). Each participant is fitted with a device that measures the time until sleep occurs and this time is recorded. Is the drug effective?

(5) In marketing children’s products, it’s extremely important to produce television commercials that hold the attention of the children who view them. A psychologist hired by a marketing research firm wants to determine whether differences in attention span exist between children watching advertisements for different types of products. One hundred fifty children under 10 years of age were recruited for an experiment. One third watched a 60-second commercial for a new computer game, one-third watched a commercial for a breakfast cereal, and another third watched a commercial for children’s clothes. Their attention spans were measured. Is there a difference in attention span between the three products advertised?

Page 8: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

III. (11) A study was conducted to assess the effects of marijuana use during pregnancy on birth weight. A random sample of 21 women who used marijuana during pregnancy was taken. The mean birthweight of the children in the sample was 6.1 pounds and the standard deviation of the birthweights of the children in the sample was pounds.

(a) (3) Construct a 95% confidence interval for the mean birthweight of children in the population whose mothers use marijuana during pregnancy.

(b) (4) The mean birthweight of all children in the population is known to be 7 pounds. Does this study provide strong evidence that the mean birthweight of children whose mothers use marijuana during pregnancy is not equal to the mean birthweight of all children? Use a hypothesis test to address this question – state and , compute the test statistic and the P-value, and answer the question by interpreting your result.

Page 9: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

(c) (4) Heavy drinking of alcohol is believed to cause lower birthweight in children. Mothers who use marijuana during pregnancy are more likely to drink heavily during pregnancy than mothers who do not use marijuana during pregnancy. Suppose that in addition to the random sample of 21 women who used marijuana during pregnancy, we also have available a random sample of 21 women who did not use marijuana during pregnancy and that we have recorded the daily average amount of alcohol drunk during pregnancy for both samples of women. Briefly outline how you would analyze this data to address the question of how much does using marijuana during pregnancy reduce birthweight after the effect of alcohol use on birthweight is taken into account.

Page 10: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

IV. (7) The nicotine content in cigarettes of a certain brand is normally distributed with mean (in milligrams) and standard deviation The brand advertises that the mean nicotine content of its cigarettes is 1.5, but you are suspicious and plan to investigate the advertised claim by testing the hypotheses

at the 5% significance level. You will do so by measuring the nicotine content of 100 randomly selected cigarettes of this brand and computing the mean nicotine content of your measurements. What is the power of your test for the alternative

Page 11: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

V. (9.5) One general belief held by observers of the business world is that taller men earn more money than shorter men. In a University of Pittsburgh study (reported in the Wall Street Journal, December 30, 1986), 250 MBA graduates, all about 30 years old, were polled and asked to report their height (in inches) and their annual income (to the nearest $1,000). The JMP output from a simple linear regression of income on height is shown below.

Bivariate Fit of Income By Height

40

50

60

70

80

Inco

me

60 70 80Height

Linear Fit

Linear FitIncome = 17.933325 + 0.6041112 Height

Summary of FitRSquare 0.050545RSquare Adj 0.046717Root Mean Square ErrorMean of Response 59.588Observations (or Sum Wgts) 250

Analysis of VarianceSource DF Sum of Squares Mean Square F RatioModel 1 905.597 905.597 13.2025Error 248 17010.967 68.593 Prob > FC. Total 249 17916.564 0.0003

Parameter EstimatesTerm Estimate Std Error t Ratio Prob>|t|Intercept 17.933325 11.47593 1.56 0.1194Height 0.6041112 0.16626 3.63 0.0003

Page 12: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

-20

-10

0

10

20

Res

idua

l

60 70 80Height

(a) (4) Do these data provide strong evidence that taller MBA’s earn more money than shorter ones? State this question in terms of a hypothesis test and report the results of the test at the 0.05 significance level.

(b) (1.5) The root mean square error is left blank. What is a reasonable estimate for this value?

(i) 2.42(ii) 8.29

Page 13: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

(iii) 25.73(iv) 321.46

(c) (4) A group of 20 men who are five foot, two inches and 30 years old bring a class action suit against a large company, claiming that men this short are discriminated against because of their height. The court defines discrimination as occurring when the mean salary of a particular group (e.g., a group of a particular height, race or sex) is less than the mean salary of all comparable employees in the company. The group of short men take a random sample of 100 30-year-old men in the company, record their height and salary and compute the least squares regression of salary on height. Based on their least squares regression, they compute a 95% confidence interval for the mean earnings of five foot, two inch 30-year-old men in the company of ($53,890, $56,940). It is known that the mean earnings of all 30-year old men in the company is $60,000. The group of five foot, two inch men argues that this analysis provides strong evidence of discrimination because the upper end of the confidence interval is less than $60,000.

The company’s defense lawyers counter this argument by calculating the 95% prediction interval for the mean earnings of a 30-year old man in the company who is five foot, two inches, which turns out to be ($38,330, $70,580). The defense lawyers argue that because the upper end of the prediction interval is greater than $60,000, there is not strong evidence that men who are five feet, two inches are discriminated against.

Which side makes a more compelling argument – the group of five foot, two inch men or the company’s defense lawyers? Explain briefly.

Page 14: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

VI. (12) In each of the following settings, we want to predict the value of a response variable based on an explanatory variable. However, we would be reluctant to use the least squares regression line for prediction. State briefly what general caution about simple linear regression each example best illustrates (State only one caution that the example best illustrates).

(a) (4) We want to predict a car’s fuel consumption from its speed. The scatterplot below shows data on the British Ford Escort. The residual plot from the least square regression is depicted below.

Bivariate Fit of Fuel Used By Speed

5

10

15

20

Fuel

Use

d

0 50 100 150Speed

-5

0

5

10

Res

idua

l

0 50 100 150Speed

Page 15: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

(b) (4) A golf coach wants to predict the second round scores of his players based on their first round scores. The scatterplot below shows the second round scores and first round scores of 11 members of the team in two rounds of tournament play. The residual plot from the least squares regression is depicted below.

Bivariate Fit of Round 2 By Round 1

75

80

85

90

95

Rou

nd 2

75 80 85 90 95 100 105 110Round 1

-5

0

5

10

Res

idua

l

75 80 85 90 95 100 105 110Round 1

Page 16: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

(c) (4) A community in the Philadelphia area is interested in predicting how cutting its crime rate will affect property value. If low crime rates increase property values, the community might be able to cover the costs of increased police protection by gains in tax revenues from higher property values. The community gathered data for itself and 38 other similar communities in Pennsylvania. The scatterplot below shows house price (average house price for sales during the most recent year) versus crime rate (rate of crimes per 1000 population) and the least squares regression line. The residual plot from the least squares regression line is depicted below.

Bivariate Fit of house price By crime rate

50000

100000

150000

200000

250000

hous

e pr

ice

10 15 20 25 30 35 40crime rate

-100000

-50000

0

50000

100000

Res

idua

l

10 15 20 25 30 35 40crime rate

Page 17: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

VII. (10) As part of their training, air force pilots make two practice landings with instructors, and are rated on performance. The instructors discuss the ratings with the pilots after each landing and offer suggestions for improvement. The JMP output below shows distributional summaries of ratings on the first and second landings and the ratings on the second landing minus the first landing, a scatterplot of ratings on the first and second landings and the least squares regression of the second landing rating on the first landing rating. State whether each of the below statements is true or false and explain your answer briefly (2.5 points for each part).

(a) (2.5) The data indicate that the pilots tended to become more homogenous in their ratings on the second landing – the instructors’ suggestions seem to have helped those pilots doing poorly more than those pilots doing well.

(b) (2.5) The data indicate that it is difficult for pilots to stay near the top of the ratings, from landing to landing (from the first landing to the second landing, specifically)

(c) (2.5) The data indicate that some factor (perhaps the instructors’ suggestions or experience) has caused the ratings to increase in the second landing compared to the first landing.

(d) (2.5) To predict the rating on the second landing for a pilot having a rating of 10 on the first landing, we should use the line (shown in the scatterplot of ratings on the first and second landings) and hence predict a rating of 10 on the second landing.

Page 18: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

Distributionsfirst landing

3

4

5

6

7

8

9

10

11

12

Quantiles100.0% maximum 11.85099.5% 11.85097.5% 11.60390.0% 10.31875.0% quartile 9.11750.0% median 7.88025.0% quartile 6.66710.0% 5.2542.5% 4.0620.5% 3.7000.0% minimum 3.700

MomentsMean 7.9191667Std Dev 1.7814775Std Err Mean 0.2299878upper 95% Mean 8.3793711lower 95% Mean 7.4589622N 60

second landing

3

4

5

6

7

8

9

10

11

12

13

Page 19: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

Quantiles100.0% maximum 12.73099.5% 12.73097.5% 12.45290.0% 10.69475.0% quartile 8.90050.0% median 7.99025.0% quartile 6.70710.0% 5.1252.5% 3.9710.5% 3.9600.0% minimum 3.960

MomentsMean 7.9623333Std Dev 1.9389508Std Err Mean 0.2503175upper 95% Mean 8.4632174lower 95% Mean 7.4614492N 60

Distributionssecond landing minus first landing

-7

-6

-5

-4

-3

-2

-1

0

1

2

3

4

Quantiles100.0% maximum 3.70099.5% 3.70097.5% 3.50190.0% 2.68975.0% quartile 1.18750.0% median 0.17025.0% quartile -0.97510.0% -2.4722.5% -5.5560.5% -6.0700.0% minimum -6.070

MomentsMean 0.0431667Std Dev 1.9576062Std Err Mean 0.2527259upper 95% Mean 0.54887lower 95% Mean -0.462537N 60

Page 20: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

Bivariate Fit of second landing By first landing

3

4

5

6

7

8

9

10

11

12

13se

cond

land

ing

3 4 5 6 7 8 9 10 11 12first landing

Linear Fit

Linear Fitsecond landing = 4.0934433 + 0.4885476 first landing

Summary of FitRSquare 0.201484RSquare Adj 0.187717Root Mean Square Error 1.747514Mean of Response 7.962333Observations (or Sum Wgts) 60

Analysis of VarianceSource DF Sum of Squares Mean Square F RatioModel 1 44.69166 44.6917 14.6348Error 58 177.12061 3.0538 Prob > FC. Total 59 221.81227 0.0003

Parameter EstimatesTerm Estimate Std Error t Ratio Prob>|t|Intercept 4.0934433 1.03619 3.95 0.0002first landing 0.4885476 0.127707 3.83 0.0003

Page 21: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

VIII. (10) Lotteries have become important sources of revenue for governments. Many people have criticized lotteries, however, referring to them as a tax on the poor and uneducated. In an examination of the issue, a random sample of 100 adults was asked how much they spend on lottery tickets and was interviewed about various socioeconomic variables. The following data was recorded: amount spent on lottery tickets as a percentage of total household income (Lottery), number of years of education (Education), age (Age), number of children (Children) and personal income in thousands of dollars (Income). The regression output from the multiple regression of Lottery on Education, Age, Children and Income is shown below.

Response LotteryWhole ModelActual by Predicted Plot

-2

0

2

4

6

8

10

12

14

Lotte

ry A

ctua

l

0 5 10 15Lottery Predicted P<.0001 RSq=0.43RMSE=2.9098

Summary of FitRSquare 0.433474RSquare Adj 0.40962Root Mean Square Error 2.90978Mean of Response 5.39Observations (or Sum Wgts) 100

Analysis of VarianceSource DF Sum of Squares Mean Square F RatioModel 4 615.4421 153.861 18.1722Error 95 804.3479 8.467 Prob > FC. Total 99 1419.7900 <.0001

Parameter EstimatesTerm Estimate Std Error t Ratio Prob>|t|Intercept 11.906094 1.785197 6.67 <.0001Education -0.430018 0.132072 -3.26 0.0016Age 0.0291899 0.025228 1.16 0.2501Children 0.0934351 0.224313 0.42 0.6780Income -0.074471 0.027726 -2.69 0.0085

Effect TestsSource Nparm DF Sum of Squares F Ratio Prob > FEducation 1 1 89.758072 10.6012 0.0016Age 1 1 11.335241 1.3388 0.2501Children 1 1 1.469028 0.1735 0.6780Income 1 1 61.082022 7.2143 0.0085

Page 22: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

(a) (3) Below is a correlation table. From this table, which explanatory variable does the best job of explaining lottery spending in a simple linear regression with only one variable? Explain briefly how you know.

Multivariate Correlations

Lottery Education Age Children IncomeLottery 1.0000 -0.6202 0.1767 -0.0230 -0.5891Education -0.6202 1.0000 -0.1782 0.1073 0.7339Age 0.1767 -0.1782 1.0000 0.1072 -0.0418Children -0.0230 0.1073 0.1072 1.0000 0.0801Income -0.5891 0.7339 -0.0418 0.0801 1.0000

(b) (3) Based on the multiple linear regression below, what would you predict the lottery spending of a 30 year old woman with 12 years of education, two children and an income of $30,000 to be?

Page 23: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

(c) (4) A goal of the study was to test the following theories:

(i) Relatively uneducated people spend different amounts on lottery tickets than do relatively educated people, all other things being equal

(ii) Older people spend different amounts on lottery tickets than younger people, all other things being equal

Assuming that there are no lurking variables, translate these theories into appropriate null and alternative hypotheses about the multiple linear regression model’s parameters. Test each of these theories at the 0.01 significance level and state your conclusions.

Page 24: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

IX. (18) An observational study was conducted to see how lung function related to age and occupation. The study involved 45 male subjects from four occupations (physician, chemical worker, fireman and farm worker). The following abbreviations for the variables were used:

AIRCAP = air capacity (cubic centimeters) that the subject can expire in one second

AGE = age (years)

CHEMW = dummy variable to measure whether subject is a chemical workers (1 if he is, 0 if not)

FIREW = dummy variable to measure whether subject is a fireman (1 if he is, 0 if not)

FARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not)

`The regression output from a multiple regression of AIRCAP on AGE, CHEMW, FIREW and FARMW and from a simple linear regression analysis of AIRCAP on AGE is shown below. Answer the questions following the output.

Response aircapWhole ModelActual by Predicted Plot

0

500

1000

1500

2000

2500

3000

3500

airc

ap A

ctua

l

0 500 1000 1500 2000 2500 3000 3500aircap Predicted P<.0001 RSq=0.71RMSE=365.25

Summary of FitRSquare 0.707756RSquare Adj 0.678531Root Mean Square Error 365.245Mean of Response 1726.022Observations (or Sum Wgts) 45

Analysis of VarianceSource DF Sum of Squares Mean Square F RatioModel 4 12923069 3230767 24.2179Error 40 5336157 133404 Prob > FC. Total 44 18259225 <.0001

Parameter EstimatesTerm Estimate Std Error t Ratio Prob>|t|Intercept 3555.966 211.4725 16.82 <.0001age -39.90086 4.198496 -9.50 <.0001

Page 25: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

Term Estimate Std Error t Ratio Prob>|t|chemw -340.6008 152.5197 -2.23 0.0312firew -38.69081 155.9672 -0.25 0.8053farmw -220.1845 155.7976 -1.41 0.1653

Effect TestsSource Nparm DF Sum of Squares F Ratio Prob > Fage 1 1 12048850 90.3186 <.0001chemw 1 1 665286 4.9870 0.0312firew 1 1 8210 0.0615 0.8053farmw 1 1 266453 1.9973 0.1653

Bivariate Fit of aircap By age

0

500

1000

1500

2000

2500

3000

3500

airc

ap

20 30 40 50 60age

Linear Fit

Linear Fitaircap = 3399.5034 - 39.844785 age

Summary of FitRSquare 0.659937RSquare Adj 0.652028Root Mean Square Error 380.0027Mean of Response 1726.022Observations (or Sum Wgts) 45

Analysis of VarianceSource DF Sum of Squares Mean Square F RatioModel 1 12049936 12049936 83.4471Error 43 6209289 144402.07 Prob > FC. Total 44 18259225 <.0001

Parameter EstimatesTerm Estimate Std Error t Ratio Prob>|t|Intercept 3399.5034 191.7539 17.73 <.0001age -39.84479 4.361801 -9.13 <.0001

Page 26: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

(a) (4) Based on the multiple regression of AIRCAP on AGE, CHEMW, FIREW and FARMW, (i) what is the estimated intercept and estimated slope for the regression line of AIRCAP on AGE for physicians; (ii) what is the estimated intercept and estimated slope for the regression line of AIRCAP on AGE for chemical workers?

(b) (3) Based on the multiple regression of AIRCAP on AGE, CHEMW, FIREW and FARMW, what is the p-value for the test of the hypothesis that physicians and fire workers of the same age have the same mean earnings?

(c) (3) Based on the multiple regression of AIRCAP on AGE, CHEMW, FIREW and FARMW, what is a 95% confidence interval for the difference between the mean earnings of physicians and farm workers of the same age?

Page 27: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of

(d) (4) Is there strong evidence that physicians, chemical workers, fire workers and farm workers of the same age do not all have the same mean earnings? Justify your statement with a hypothesis test.

(e) (4) Occupational epidemiologists would like to use this multiple regression analysis to determine whether working in certain occupations causes people to have worse lung functioning. A variable which measures amount smoked per day was not included in the multiple regression analysis. How does the fact that the amount smoked per day is not included in the multiple regression analysis above affect the epidemiologists’ ability to attain their goal of determining whether working in certain occupations causes people to have worse lung functioning?

Page 28: 1dsmall/stat112-02/hom… · Web viewFARMW = dummy variable to measure whether subject is a farm worker (1 if he is, 0 if not) `The regression output from a multiple regression of