Chapter 20 and 21 combined testing hypotheses about proportions 2013

1. Hypotheses In Statistics, a hypothesis proposes a model for the world and then we look at the data. If the data are consistent with that model, we have no reason to disbelieve the hypothesis. Data consistent with the model lend support to the hypothesis, but do not prove it. But if the facts are inconsistent with the model, we need to make a choice as to whether they are inconsistent enough to disbelieve the model. If they are inconsistent enough, we can reject the model.

2. Hypotheses Testing Think about the logic of jury trials: To prove someone is guilty, we start by assuming they are innocent. We retain that hypothesis until the facts make it unlikely beyond a reasonable doubt. Then, and only then, we reject the hypothesis of innocence and declare the person guilty. 3. Hypotheses (cont.) The statistical twist is that we can quantify our level of doubt. We can use the model proposed by our hypothesis to calculate the probability that the event weve witnessed could happen. Thats just the probability were looking forit quantifies exactly how surprised we are to see our results. This probability is called a P-value. 4. Our Problem Suppose we tossed a coin 100 times and we have obtained 38 heads and 62 tails. Is the coin biased toward tails? There is no way to say yes or no with 100% certainty. But we can evaluate the strength of support to the hypothesis that the coin is biased. 5. Hypotheses (cont.) Null hypothesis- H0 established fact, no change of parameters, a statement that we expect data to contradict (status quo) Alternative hypothesis- HA new conjuncture, change of parameters, your claim, a statement that needs a strong support from data to claim it. Our problem: testing a hypothesis about p = proportion of times it turns tails (in the long run) H0: coin is fair, p = 0.5 (or p 0.5) HA: coin is biased, p > 0.5 6. Ex: A statistics professor wants to see if more than 80% of her students enjoyed taking her class. At the end of the term, she takes a random sample of students from her large class and asks, in an anonymous survey, if the students enjoyed taking her class. Which set of hypotheses should she test? A. H0: p < 0.80 HA: p > 0.80 B. H0: p = 0.80 HA: p > 0.80 C. H0: p > 0.80 HA: p = 0.80 D. H0: p = 0.80 HA: p < 0.80 7. Ex: An online catalog company wants on-time delivery for 90% of the orders they ship. They have been shipping orders via UPS and FedEx but will switch to a new, cheaper delivery service (ShipFast) unless there is evidence that this service cannot meet the 90% on-time goal. As a test the company sends a random sample of orders via ShipFast, and then makes follow-up phone calls to see if these orders arrived on time. Which hypotheses should they test? A. H0: p < 0.90 HA: p > 0.90 B. H0: p = 0.90 HA: p > 0.90C. H0: p > 0.90 HA: p = 0.90 D. H0: p = 0.90 HA: p < 0.90 8. Hypotheses (cont.)When the data are consistent with the model from the null hypothesis, the P-value is high and we are unable to reject the null hypothesis. In that case, we have to retain the null hypothesis we started with. We cant claim to have proved it; instead we fail to reject the null hypothesis when the data are consistent with the null hypothesis model and in line with what we would expect from natural sampling variability. If the P-value is low enough, well reject the null hypothesis, since what we observed would be very unlikely were the null model true. Assume that the null hypothesis Ho is true and uphold it, unless data strongly speaks against it. 9. Testing Hypotheses The null hypothesis, which we denote H0, specifies a population model parameter of interest and proposes a value for that parameter. We want to compare our data to what we would expect given that H0 is true. We can do this by finding out how many standard deviations away from the proposed value we are. We then ask how likely it is to get results like we did if the null hypothesis were true. 10. The Reasoning of Hypothesis Testing 1. Hypotheses The null hypothesis: To perform a hypothesis test, we must first translate our question of interest into a statement about model parameters.In general, we have H0: parameter = hypothesized value.The alternative hypothesis: The alternative hypothesis, HA, contains the values of the parameter we accept if we reject the null. 11. The Reasoning of Hypothesis Testing (cont.) 2. Model The test about proportions is called a one-proportion z-test. 12. One-Proportion z-Test The conditions for the one-proportion z-test are the same as for the one proportion z-interval. We test the hypothesis H0: p = p0using the statisticz where SD p p p0 SD pp0 q0 n When the conditions are met and the null hypothesis is true, this statistic follows the standard Normal model, so we can use that model to obtain a P-value. 13. The Reasoning of Hypothesis Testing (cont.) 3. Mechanics Under mechanics we place the actual calculation of our test statistic from the data. Different tests will have different formulas and different test statistics. Usually, the mechanics are handled by a statistics program or calculator, but its good to know the formulas. 14. The Reasoning of Hypothesis Testing (cont.) 3. Mechanics If the difference between what we have observed and what is expected under the null model H0 assumption is statistically significant (large enough) then we reject H0 in favor of HA. 15. Our Coin Problem where and p0 is the H0 value of the parameter, in our case p0=0.5. 16. The Reasoning of Hypothesis Testing (cont.) 3. Mechanics continued The ultimate goal of the calculation is to obtain a P-value. The P-value is the probability that the observed statistic value (or an even more extreme value) could occur if the null model were correct. If the P-value is small enough, well reject the null hypothesis. Note: The P-value is a conditional probabilityits the probability that the observed results could have happened if the null hypothesis is true. 17. The Reasoning of Hypothesis Testing P-value The probability that the test statistics takes the observed or more extreme value, when the null hypothesis H0 is true. Our Problem: P-value = P(z > 2.4)= .0082 For a fair coin the probability of seeing 62 or more tails in 100 tosses is less than 0.01 (1%). The smaller the p-value, the stronger evidence against H0 (that is in favor of HA). So we reject the null hypothesis that this is a fair coin and support the alternative that it is biased towards tails. 18. Just Checking 1. An allergy drug has been tested and found to give relief to 75% of the patients in a large clinical trial. Now the scientists want to see if the new improved version works even better. What would the null hypothesis and alternative hypothesis be? 2. The new drug is tested and the P-value is 0.0001. What would you conclude about the new drug? 19. P-value info (Ch 21) We can use an alpha level or to set a threshold on our P-value. Alpha level is also called the significance level. If our P-value is less than our alpha level, we will reject the null hypothesis. If our P-value is greater than our alpha level, we have to fail to reject the null hypothesis. We can define a rare event arbitrarily by setting a threshold for our P-value.We would then say that the results are statistically significant.Alpha levels are represented using the symbol .Typically we use = 0.1, 0.05, or 0.01.When in doubt, we use = 0.05.Partially depends on importance of claim being made. The more important the claim or higher the stakes, the higher an alpha level you would use. 20. Statistically Significant (Ch 21) When we get a P-value below our alpha level (lets assume 0.05), we can say we reject the null hypothesis at the 5% level of significance. Sometimes, statistical significance doesnt mean the difference is important in the context of the situation. On the other hand, sometimes a significant difference may turn out to not be statistically significant. Sometimes a larger sample size can fix this. 21. Statistically Significant (Ch 21) It may make you uncomfortable to reject/fail to reject. If your P-value falls just slightly above your alpha level, youre not allowed to reject the null hypothesis. (fail to reject the null) Yet a P-value just barely below the alpha level leads to rejection. When you decide to declare a verdict, it is a good idea to report the P-value as an indication of the strength of the evidence. 22. The Reasoning of Hypothesis Testing (cont.) 4. Conclusion/Decision The conclusion/decision in a hypothesis test is always a statement about the null hypothesis. The conclusion must state either Reject H0 Fail to reject H0 (uphold H0)And, as always, the conclusion should be stated in context. 23. The Reasoning of Hypothesis Testing (cont.) 4. Conclusion Your conclusion about the null hypothesis should never be the end of a testing procedure. Often there are actions to take or policies to change. 24. Alternative Hypotheses There are three possible alternative hypotheses: HA: parameter < hypothesized value HA: parameter hypothesized value HA: parameter > hypothesized value 25. Alternative Hypotheses (cont.) HA: parameter value is known as a two-sided alternative because we are equally interested in deviations on either side of the null hypothesis value. For two-sided alternatives, the P-value is the probability of deviating in either direction from the null hypothesis value. 26. Alternative Hypotheses (cont.) The other two alternative hypotheses are called one-sided alternatives. A one-sided alternative focuses on deviations from the null hypothesis value in only one direction. Thus, the P-value for one-sided alternatives is the probability of deviating only in the direction of the alternative away from the null hypothesis value. 27. Alternative Hypotheses (cont.) 28. Critical Values for Hypothesis Testing Just like we used critical values in confidence intervals, we will use them with alpha levels.If our z-score is more extreme than the critical value, then we will have a P-value smaller than our alpha level. 29. Just Checking cont. 3. A bank is testing a new method for getting delinquent customers to pay their past-due credit card bills. The standard way was to send a letter (costing about $0.40 each) asking the customer to pay. That worked 30% of the time. They want to test a new method that involves sending a video tape to the customer encouraging them to contact the bank and set up a payment plan. Developing and sending the video costs about $10.00 per customer. What is the parameter of interest? What are the null and alternative hypotheses? 30. Just Checking cont. 4. The bank sets up an experiment to test the effectiveness of the video tape. They mail it out to several randomly selected delinquent customers and keep track of how many actually do contact the bank to arrange payments. The banks statistician calculates a P-value of 0.003. What does this P-value suggest about the video tape? 31. 5. Some people are concerned that new tougher standards and high-stakes tests may drive up the high school dropout rate. The National Center for Education Statistics reported that the high school dropout rate for the year 2004 was 10.3%. One school district, whose dropout rate has always been very close to the national average, reports that 210 of their 1782 students dropped out last year. Is their experience evidence that the dropout rate is increasing? 32. 6. In a study of 11,000 car crashes, it was found that 5720 of them occurred within 5 miles of home. Is this significant evidence to show that more than 50% of car crashes occur within 5 miles of home? 33. Confidence Intervals and Hypothesis Tests Confidence intervals and hypothesis tests are built on the same calculations with the same assumptions and conditions. Our conclusion about the null should be consistent with whether or not the proportion in the claim falls within the confidence interval. A 95% confidence interval corresponds with a two-sided hypothesis test with = 5%. 34. Confidence Levels and Hypothesis Testing A confidence interval with a confidence level of C% corresponds to a two-sided hypothesis test with an level of 100 C%. A confidence interval with a confidence level of C% corresponds to a one-sided hypothesis test with an level of (100 C)%. Think about it: A one-sided test with = 5% corresponds to a confidence interval with 5% on each side, giving 90% confidence level. 35. Example: Is Euro a fair coin? Soon after the Euro was introduced as currency in Europe, it was widely reported that someone had spun a Euro 250 times and gotten heads 140 times. a. Estimate the true proportion of heads using a 95% confidence interval. (remember to check conditions)CI : pz* pq n(.56)(.44) .56 1.96 250.56 .062CI : (.488,.622) b. Does your confidence interval provide evidence that the coin is unfair when spun? Explain. c. What is the significance level? 36. Just Checking 7. An experiment to test the fairness of a roulette wheel gives a z-score of 0.62. What would you conclude? 8. We encountered a bank that wondered if it could get more customers to make payments on delinquent balances by sending them a DVD urging them to set up a payment plan. Well, the bank just got back the results on their tests of this strategy. A 90% confidence interval for the success rate is (0.29, 0.45). Their old send-a-letter method had worked 30% of the time. Can you reject the null hypothesis that the proportion is still 30% at =0.05? Explain. 9. Given the confidence interval the bank found in their trial of DVDs, what would you recommend that they do? Should they scrap the DVD strategy? 37. Errors in Hypothesis Testing Even with our careful analysis and lots of evidence, we can make an incorrect decision. Two ways we can make mistakes with hypothesis testing: Type I: null hypothesis is true, but we reject it. (HOT) Type II: null hypothesis is false, but we fail to reject it. (HAT) Which error is more serious depends on the situation. 38. Type I Error- HOT In medical terms, this would be a false positive. A healthy person is diagnosed with a disease incorrectly.In jury terms, this would mean an innocent person is convicted. 39. Type II Error- HAT In medical terms, this would be a false negative. An infected person goes undiagnosed.In jury terms, this would mean an guilty person is not convicted. 40. Type I and II Errors 41. Just Checking continued 10. Remember our bank? It is looking for evidence that the costlier DVD strategy produces a higher success rate than the letters it has been sending. Explain what a Type I error is in this context and what would the consequences would be to the bank? 11. Whats a Type II error in the bank experiment context, and what would the consequences be? 42. Example: Spam Filter 12. Suppose a spam filter uses a point system to score each email based on sender, subject, and keywords. The higher the point total, the more likely that the message is spam. We can think of the filters decision as a hypothesis test. The null hypothesis is that the email is a real message. A high point score would be evidence that it is junk and will therefore reject the null hypothesis and classify it as spam. a. When the filter allows spam to slip through into your inbox, which kind of error is this? b. Which kind of error is it when a real message gets classified as junk? c. If the filter has a default cutoff score of 50 , but you reset it to 60, is that analogous to choosing a higher or lower value of for a hypothesis test? 43. Probability of Errors To reject H0, the P-value must fall below . When H0 is true that happens exactly with probability so when you choose the level , you are setting the probability of a Type I error to . When H0 is false and we fail to reject it, we have made a Type II error. We assign the letter to the probability of this mistake. 44. Reducing Errors We can reduce to lower the chance of a Type I Error, but then that will have the effect of raising . The only way to really reduce both Type I and Type II errors simultaneously is to increase our sample size, which will reduce our standard deviations. 45. What Can Go Wrong? Dont interpret the P-value as the probability that H0 is true. Dont believe too strongly in arbitrary alpha levels. Dont confuse practical and statistical significance. Dont forget that in spite of all your care, you might make a wrong decision.

Technology

Chapter 20 and 21 combined testing hypotheses about proportions 2013