15

Click here to load reader

c.ymcdn.com  · Web viewNow, copy and paste the questions below into a Word document and answer the questions. 1. (a) ... Add an eighth point to the scatter diagram and move it around

Embed Size (px)

Citation preview

Page 1: c.ymcdn.com  · Web viewNow, copy and paste the questions below into a Word document and answer the questions. 1. (a) ... Add an eighth point to the scatter diagram and move it around

Activities in StatisticsMichael Sullivan

I. Opening Day of Class Survey/Students submit questions to be included in a StatCrunch survey (great activity for teaching students to write survey questions).

II. Ebola In October, 2014, there was an Ebola breakout in West Africa. At the time, there was no vaccine for the virus, however, there were some experimental drugs that had not yet been approved for humans. Because the spread of the disease was reaching an epidemic, there were calls to initiate randomized trials of an experimental drug on human subjects right away. (a) Discuss how you would go about designing a randomized trial to assess the efficacy of an experimental Ebola vaccine. (b) Doctors Without Borders was on the record, prior to any randomized trial, as saying that trials in which subjects are assigned to a control group is unethical. Discuss the ethics behind a randomized trial of a potential life-saving vaccine to test its efficacy while an epidemic is raging.

III. Exploring the Properties of the Linear Correlation Coefficient

 In this activity, you will use the correlation by eye applet to create scatter diagrams and determine the linear correlation coefficient. You will have the opportunity to observe important properties and limitations of the linear correlation coefficient.

Open the Correlation by Eye  applet by right-clicking the link and selecting "open link in new tab".     Now, copy and paste the questions below into a Word document and answer the questions.  

 1. (a) If data currently exists in the scatter diagram, click the Trash can to delete the data.  Create a scatter diagram of 12 to 15 observations with positive association.  Click Show to show the correlation coefficient of the data in the scatter diagram.  Copy the applet and paste it into your document. (b) Move some of the observations from the scatter diagram and note how the correlation coefficient changes as the positive association strengthens and weakens. (c) Align the points in the scatter diagram in a straight line with positive slope.   What is the value of the linear correlation coefficient?  

2. (a) Click the Trash can to delete the data.  Create a scatter diagram of 12 to 15 observations with negative association.  Click Show to show the correlation coefficient of the data in the scatter diagram.(b) Move some of the observations from the scatter diagram and note how the correlation coefficient changes as the negative association strengthens and weakens.

Page 2: c.ymcdn.com  · Web viewNow, copy and paste the questions below into a Word document and answer the questions. 1. (a) ... Add an eighth point to the scatter diagram and move it around

 (c)  Align the points in the scatter diagram in a straight line with negative slope.   What is the value of the linear correlation coefficient?  

3. (a) Click the Trash can to delete the data.  Draw a scatter diagram with no association.  Copy the applet and paste it into your document. What is the value of the correlation coefficient?  (b) Click the Trash can to delete the data.  Draw a scatter diagram in an upside-down U-shaped pattern.  Copy the applet and paste it into your document.  What is the value of the correlation coefficient?(c) What does a correlation coefficient of 0 suggest?

4. (a)  Click the Trash can to delete the data.  In the lower-left corner of the applet, draw a scatter diagram of 8 to 10 observations with a correlation coefficient around 0.8.(b) Add another point in the upper-right corner of the applet that roughly lines up with the other points in the scatter diagram.   What is the value of the correlation coefficient?(c)  Move the additional point around the scatter diagram and note how the correlation coefficient changes.  Is the correlation coefficient a resistant measure?  Why? 

5. (a) Click the Trash can to delete the data.  Draw a scatter diagram with six points arranged vertically in a straight line.  What is the value of the correlation coefficient?  Why?(b) Add a seventh point to the right side of the scatter diagram  and move the point around until the correlation coefficient is approximately 0.75.  Copy the applet and paste it into your document. (c) Click the Trash can to delete the data.  Draw a scatter diagram with approximately seven points in a U-shaped pattern near the lower-left corner of the applet.  Add an eighth point to the scatter diagram and move it around until the correlation coefficient is approximately 0.75.  Copy the applet and paste it into your document. (d) Conclude the correlation coefficient should not be used exclusively to judge linear association – a scatter diagram should be used as well. 

IV. Influential Observations

This activity will allow you to discover the circumstances under which an observation may be influential. Use the Influential Observation Applet

http://www.statcrunch.com/5.0/viewresult.php?resid=1461163

to answer questions 1 through 5.

1. (a) Click the “Regression” box to show the least-squares regression line for the data.(b) Grab the endpoints of the green “User” line so it lies on top of the red least-squares regression line.

2. Click “Add point”. Add a point at (9, 13). So, the input x value is 9 and the input y value is 13. Click OK. The red line represents the least-squares regression line with the new point included, while the green line represents the least-squares regression line with the new point excluded. Do

Page 3: c.ymcdn.com  · Web viewNow, copy and paste the questions below into a Word document and answer the questions. 1. (a) ... Add an eighth point to the scatter diagram and move it around

you believe the slope and/or y-intercept changed significantly? Look at the values of the intercept and slope in the table below the graph.

3. (a) Reset the applet. Grab the endpoints of the green “User” line so it lies on top of the red least-squares regression line.(b) Add a point at (14, 15). The red line represents the least-squares regression line with the new point included, while the green line represents the least-squares regression line with the new point excluded. Do you believe the slope and/or y-intercept changed significantly? Look at the values of the intercept and slope in the table below the graph.

4. (a) Reset the applet. Grab the endpoints of the green “User” line so it lies on top of the red least-squares regression line.(b) Add a point at (19, 14). The red line represents the least-squares regression line with the new point included, while the green line represents the least-squares regression line with the new point excluded. Do you believe the slope and/or y-intercept changed significantly? Look at the values of the intercept and slope in the table below the graph.

5. (a) Reset the applet. Grab the endpoints of the green “User” line so it lies on top of the red least-squares regression line.(b) Add a point at (20, 8). The red line represents the least-squares regression line with the new point included, while the green line represents the least-squares regression line with the new point excluded. Do you believe the slope and/or y-intercept changed significantly? Look at the values of the intercept and slope in the table below the graph.

V. What is Level of Confidence Anyway?

In this activity, you will use the Confidence Interval applet in StatCrunch to generate confidence intervals for a population proportion.  You will use the results to understand what is meant by the level of confidence and the reason why a particular interval may not capture the population proportion.  Copy and paste the following questions into a Word document and submit via the document sharing tool.

In StatCrunch, open the Confidence Intervals for a Proportion Applet (Open the StatCrunch spreadsheet, select Applets > Confidence intervals > for a proportion.

1. Exploring the role of level of confidence:

a. Construct 1000 confidence intervals with p = 0.3, level of confidence = 0.95, and sample size = 100.  What proportion of the 95% confidence intervals include the population proportion, 0.3? Include a screen shot of the applet (Copy and Paste).

Construct another 1000 confidence intervals with p = 0.3, level of confidence = 0.95, and sample size = 100.   What proportion of the 95% confidence intervals include the population proportion, 0.3?

Page 4: c.ymcdn.com  · Web viewNow, copy and paste the questions below into a Word document and answer the questions. 1. (a) ... Add an eighth point to the scatter diagram and move it around

Did the same proportion of the intervals include the population proportion each time?  What proportion of the intervals would you expect to include the population proportion? 

b. Construct 1000 confidence intervals with p = 0.3, level of confidence = 0.99, and sample size = 100.  What proportion of the 99% confidence intervals include the population proportion, 0.3? Include a screen shot of the applet (Copy and Paste).

Construct another 1000 confidence intervals with p = 0.3, level of confidence = 0.99, and sample size = 100.   What proportion of the 99% confidence intervals include the population proportion, 0.3?

Did the same proportion of the intervals include the population proportion each time?  What proportion of the intervals would you expect to include the population proportion?  

c. Explain what 95% means in a 95% confidence interval.  Explain what 99% means in a 99% confidence interval. 

2.  a. Construct 100 confidence intervals with p = 0.3, level of confidence = 0.95, and sample size = 100.  Select one of the intervals that does not include the population proportion that is below the population proportion (red line) by double-clicking on the red line.   In the popup window, notice the value of the sample proportion.  Determine the number of standard errors the sample proportion is from the population proportion, 0.3.   Remember, the standard error is

b. From the same 100 confidence intervals constructed in part a, select one of the intervals that does not include the population proportion that is above the population proportion (red line) by double-clicking on the red line.   In the popup window, notice the value of the sample proportion.  Determine the number of standard errors the sample proportion is from the population proportion, 0.3.  

c. Notice that each of the samples from parts a and b are more than 1.96 standard deviations from the population proportion.  Conclude that any sample proportion that is more than 1.96 standard deviations from the population proportion will result in an interval that does not include the population proportion.

VI. Activity: z versus t distribution

In this activity, we explore the distributions of and .

(a) Simulate obtaining 1000 simple random samples of size n = 9 from a normal population whose mean is 100 and standard deviation is 15.

(b) Determine the mean and standard deviation of each of the 1000 samples.

Page 5: c.ymcdn.com  · Web viewNow, copy and paste the questions below into a Word document and answer the questions. 1. (a) ... Add an eighth point to the scatter diagram and move it around

(c) Compute and where is the sample mean of each sample and s is the sample standard deviation of each sample.

(d) Draw a histogram of z and a histogram of t.

(e) Describe each histogram from part (d). What is the shape, center, and spread of each distribution?

(f) Based on your answer to part (e), describe the effect of not knowing the population standard deviation on the distribution of t versus the distribution of z.

VII. Introduction to Hypothesis Testing for a Population Proportion 

In this problem, you will learn the idea behind hypothesis testing for a proportion.  Write your results directly on a Word file (or save your results as a pdf).  Copy and paste any StatCrunch commands that you use into the document. A New Math Program   Professors Honey Kirk and Diane Lerma of Palo Alto College developed a “learning community curriculum that blended the developmental mathematics and the reading curriculum with a structured emphasis on study skills.”   In a typical developmental mathematics course at Palo Alto College, 50% of the students complete the course with a letter grade of A, B, or C.  In the experimental course, of the 16 students enrolled, 11 completed the course with a letter grade of A, B, or C.  Source: Kirk, Honey & Lerma, Diane, “Reading Your Way to Success in Mathematics: A Paired Course of Developmental Mathematics and Reading”   MathAMATYC Educator; Vol. 1. No. 2 Feb, 2010.

 (a)    What proportion of the students enrolled in the experimental course passed with

an A, B, or C?

 

(b)   What are the null and alternative hypothesis for this study?

           

(c)    When we test hypotheses, we assume the statement in the null hypothesis is true and look to gather evidence to support the statement in the alternative hypothesis.  Therefore, we can use a coin to simulate the results of this study.  Treat each of 16 randomly selected students as a coin.  If a flip comes up "heads", the randomly selected student passes the course; a tail means the student fails the course. Rather than physically flipping 16 coins over and over to determine the proportion of times we observe 11 or more heads (passing students) out of 16, we can use technology.  Use the "Coin flipping" applet in StatCrunch to simulate this experiment assuming the probability a randomly selected student passes the course is 0.5.  To access the applet, log into StatCrunch. 

Page 6: c.ymcdn.com  · Web viewNow, copy and paste the questions below into a Word document and answer the questions. 1. (a) ... Add an eighth point to the scatter diagram and move it around

Select StatCrunch > Applets > Simulation > Coin flipping.  Make "Probability of heads" as 0.5 and number of coins as 16. Remember, each coin represents a randomly selected student who has a 0.5 probability of passing the course.   Simulate teaching this course 1000 times assuming the methodology is not effective (that is, assuming the probability a randomly selected student will pass the course is still 0.5 using this new teaching method). Use your results to gauge the likelihood of 11 or more students passing the course if the true pass rate is 0.5.  Interpret this result.  Note:  This probability is referred to as a P-value.

 

(d)   Use the binomial probability distribution to determine an exact P-value by computing P(X > 11) where there are n= 16 trials of the experiment with probability of success p = 0.5.   [You should use StatCrunch to determine this probability.]

 

(e)    Now suppose that the actual study was conducted on 48 students and 33 passed the course with an A, B, or C.  This would be a study that has ten times as many subjects.  What is the proportion of students who passed in this experiment?  How does the result compare with part (a)?

 

(f)    Use the “Coin Tossing” applet to simulate 1000 repetitions of the experiment with 48 students assuming the proportion of students who pass is 0.5.  Determine the proportion of repetitions that result in 33 or more passing.  That is, determine the approximate P-value.

 

(g)    Verify that the normal model can be used to determine the P-value. Then find the P-value.  Compare the result with the approximate P-value from part (f).

 

(h)   Write a paragraph explaining the role that sample size plays in inferential statistics.

VIII. Activity on Choosing the Correct Method

In Problems 10–16, for each study, explain which statistical procedure (estimating a single proportion; estimating a single mean; hypothesis test for a single proportion; hypothesis test for a single mean; hypothesis test or estimation of two proportions, dependent or independent; hypothesis test or estimation of two means, dependent or independent) would most likely be used for the research objective given. Assume all model requirements for conducting the appropriate procedure have been satisfied.

10. Is the mean IQ of the students in Professor Dang’s statistics class higher than that of the general population, 100?

Page 7: c.ymcdn.com  · Web viewNow, copy and paste the questions below into a Word document and answer the questions. 1. (a) ... Add an eighth point to the scatter diagram and move it around

11. Do adult males who take a single aspirin daily experience a lower rate of heart attacks than adult males who do not take aspirin daily?

12. Does turmeric (a antioxidant that can be added to foods) help with depression? Researchers randomly assigned 200 adult women who were clinically depressed to two groups. Group 1 had turmeric added to their regular diet for one week; Group 2 had no additives in their diet. At the end of one week, the change in their scores on the Beck Depression Inventory was compared.

13. While exercising by climbing stairs, is it better to take one stair, or two stairs, at a time. Researchers identified 30 volunteers who were asked to climb stairs for two different 15-minute intervals taking both one stair and two stairs at a time. Whether the volunteer did one stair or two stairs first was determined randomly. The goal of the research was to determine if energy expenditure for each exercise routine was different.

14. What is the typical amount of time 20- to 24-year-old males spend brushing their teeth (each time they brush)?

15. What proportion of registered voters is in favor of a tax increase to reduce the federal debt?

16. Do adult women who maintain a regular sleep schedule have a lower adiposity (a measure of body fat) over women whose sleep schedule fluctuates by 90 minutes or more?

IX. Designing Studies

Design a Study Traditional underwriting to determine the risks associated with lending include credit scores, income, and employment history. The online lender ZestFinance used data analysis to find that people who fill out loan applications using all capital letters default more often than those who use all lower case letters. In addition, people who fill out the application using upper and lowercase letters accurately default at the lowest rate. Explain how to obtain and analyze data in order to determine whether the method used to fill out loan applications results in different default rates.

Insurance and Credit Scores A study by InsuranceQuotes.com found that homeowners with poor credit pay 91 percent more for home insurance than people with excellent credit.

(a) A quote in the article stated, "Insurers have found a direct correlation between a consumer's credit and the likelihood that he or she will make a home (or auto) claim." Explain what is wrong with this quote. (b) Suppose credit scores may be classified as Excellent, Good, Fair, and Poor. Explain how you might go about deciding whether credit scores might be used to determine whether an individual files a claim on his/her homeowner's insurance policy or not. Include an explanation of the type of inferential procedure you would use.

Design a Study David Elkington, the Chief Executive Officer (CEO) of InsideSales.com wondered whether the cycles of the moon affect sales. He got the idea from his father-in-law,

Page 8: c.ymcdn.com  · Web viewNow, copy and paste the questions below into a Word document and answer the questions. 1. (a) ... Add an eighth point to the scatter diagram and move it around

who said that emergency rooms are crowded when the moon is full. There are eight phases of the moon: New Moon, Waning Crescent, Third Quarter, Waning Gibbous, Full, Waxing Gibbous, First Quarter, and Waxing Crescent. Explain how to obtain and analyze data in order to determine whether moon sales affect sales.

Designing a Study Stock fund managers are investment professionals who decide stocks that should be part of a portfolio. In a recent article in the Wall Street Journal (Not a Stock-Picker’s Market, WSJ January 25, 2014), the performance of stock fund managers was considered based on dispersion in the market. In the stock market, risk is measured by the standard deviation rate of return of stock (dispersion). When dispersion is low, then the rate of return of the stocks that make up the market are not as spread out. That is, the return on Company X is close to that of Y is close to that of Z, and so on. When dispersion is high, then the rate of return of stocks is more spread out; meaning some stocks outperform others by a substantial amount. Since 1991, the dispersion of stocks has been about 7.1%. In some years, the dispersion is higher (such as 2001 when dispersion was 10%), and in some years it is lower (such as 2013 when dispersion was 5%). So, in 2001, stock fund managers would argue, one needed to have more investment advice in order to identify the stock market winners, whereas in 2013, since dispersion was low, virtually all stocks ended up with returns near the mean, so investment advice was not as valuable. (a) Suppose you want to design a study to determine whether the proportion of fund managers who outperform the market in low-dispersion years is less than the proportion of fund managers who outperform the market in high-dispersion years. What would the response variable be in this study? What is the explanatory variable in this study? (b) What or who are the individuals in this study? (c) To what population does this study apply? (d) What would be the null and alternative hypotheses? (e) Suppose this study was conducted and the data yielded a P-value of 0.083. Explain what this result suggests.

MythBusters In a MythBusters episode, the question was asked, "Which is better? A four-way stop or a roundabout?" Better was determined based on comparing the number of vehicles that travel through the four-way stop to the number of vehicles that enter/exit the roundabout over a 5-minute interval of time. Four-way stops are popular in the United States, while roundabouts are popular in Europe. Suppose the folks at MythBusters conducted this experiment 15 times for each intersection design.(a) Explain why the variable of interest is quantitative. (b) Explain why the data might be analyzed by comparing two independent means. Include in this explanation what the null and alternative hypothesis would be and what each mean represents. (c) A potential improvement on the experimental design might be to identify 15 groups of different drivers and ask each group to drive through each intersection design for the 5-minute time interval. Explain why this is a better design and explain the role randomization would play. What would be the null and alternative hypothesis here and how would the mean be computed?

Page 9: c.ymcdn.com  · Web viewNow, copy and paste the questions below into a Word document and answer the questions. 1. (a) ... Add an eighth point to the scatter diagram and move it around

Semester-Long Research Project

The goal of this assignment is to expose you to the entire statistical process outlined in Chapter 1. This project is to be completed in stages with each stage having a deadline.

Stage 1: Identify a research objective.  Develop a survey that addresses the question(s) posed in the research objective.  Administer the survey in StatCrunch.

Stage 2: Describe the data through both graphical and numerical summaries.

Stage 3: Conduct the necessary inference to answer your research question(s).

Detailed Description of Stage 1 (15 points)

For this assignment, you are to develop a survey that addresses the question(s) in your research objective.  For interesting questions that have been asked in the past, consider visiting polling organizations such as Gallup, the Harris Poll, or Pew Research.  Ask a variety of demographic questions to start off (gender, level of education, level of income, age, political affiliation/philosophy and so on).  Then ask questions that will allow you to write a report complete with statistical analysis that would answer your research question(s).  You might consider discussing the goals of your research with me prior to developing the survey to be sure it is viable.  You must submit the survey to me prior to posting it in StatCrunch so we are sure the questions are structured appropriately.

          Once your survey is approved, it should be posted in StatCrunch and administered to at least 75 individuals.  Watch the video to see how to create a survey in StatCrunch. 

          It would be a great idea to consider this an article that you are writing for a periodical (such as a newspaper, magazine, or blog).   I don’t want to give too much guidance regarding the type of research you should conduct, but it could be based on topics such as

·          Current political events (such as sequester, taxation, government spending/government debt or deficit)

·          Current pop culture events

·          Medical research

·          Social science/psychology/education

·          Social/moral issues (abortion, birth control, gay marriage, immigration)

Again, if you are struggling with a topic, I suggest you visit the above referenced polling firms for ideas.  Also, look through newspapers or magazines for current events that might warrant some interesting investigation.   Try to ask questions that would result in both quantitative and qualitative responses.  You may also look at the Sullivan Statistics Survey questions to get some idea as to the type of questions you may ask.   Be creative and be sure to do something that

Page 10: c.ymcdn.com  · Web viewNow, copy and paste the questions below into a Word document and answer the questions. 1. (a) ... Add an eighth point to the scatter diagram and move it around

interests you.  Odds are, you may be able to use this analysis and paper in another course sometime down the line. 

Detailed Description of Stage 2 (30 points)

In this stage, you need to summarize (describe) the data obtained from your survey.  Essentially, you need to determine the type of descriptive statistics (pie charts, histograms, mean, standard deviation, and so on) that apply to the data.  These are the topics covered in Chapters 2 through 4. It is important to describe data by some of the demographic data collected.  For example, suppose you ask what the highest level of taxation one should pay to the federal government.  You might consider a boxplot by gender, or by political affiliation.

Detailed Description of Stage 3 (30 points)

This is the inferential portion of the project.  Essentially, you need to determine the type of inferential statistics (estimation or hypothesis testing) that apply to the data.  This is where you use your data to answer the research question(s). For example, suppose you wanted to know whether the proportion of males who favor tax increases on everyone to pay down the national debt differs from the proportion of females, you would test

Ho: pm = Pf  versus H1: pm ≠ pf

Remember, each stage will have its own due date.  The project must be submitted in Word using the document sharing tool in MyStatLab (do not share your project with others) so that I can easily make comments to the project.