Upload
vivek-singh
View
330
Download
2
Tags:
Embed Size (px)
Citation preview
Statistical ReasoningIs your drinking water safe? Do most peopleapprove of the President’s tax plan? How much isthe cost of health care rising? These questions andthousands more like them can be answered onlythrough statistical studies. Indeed, statistical infor-mation appears in the news every day, making theability to understand and reason with statistics cru-cial to modern life.
Statistical thinking will oneday be as necessary for effi-cient citizenship as the abil-ity to read and write.
—H. G. Wells
321
UNIT 5AFundamentals of Statistics: We discuss howstatistical studies are conducted, with empha-sis on the importance of sampling.
UNIT 5BShould You Believe a Statistical Study? Wedevelop eight useful guidelines for evaluatingstatistical claims.
UNIT 5CStatistical Tables and Graphs: We investi-gate basic tables and graphs, including fre-quency tables, bar graphs, pie charts,histograms, and line charts.
UNIT 5DGraphics in the Media: News media go wellbeyond the basics with fancy statistical graph-ics. We explore common types of mediagraphics.
UNIT 5ECorrelation and Causality: One of the mostimportant uses of statistics is to identify cause-and-effect relationships. We investigate how tointerpret correlations and how to decidewhether a correlation is the result of causality.
benn.8206.05.pgs 12/15/06 8:22 AM Page 321
322 CHAPTER 5 Statistical Reasoning
By the WayYou’ll sometimes hearthe word data used as asingular synonym forinformation, but techni-cally the word data isplural. One piece ofinformation is called adatum, and two or morepieces are called data.
HISTORICAL NOTE
Statistics originated withthe collection of censusand tax data, which areaffairs of state. That iswhy the word state is atthe root of the wordstatistics.
TWO DEFINITIONS OF STATISTICS
• Statistics is the science of collecting, organizing, and interpreting data.• Statistics are the data that describe or summarize something.
UNIT 5A Fundamentals of Statistics
The subject of statistics plays a major role in modern society. It’s used to determinewhether a new drug is effective in treating cancer. It’s involved when agriculturalinspectors check the safety of the food supply. It’s used in every opinion poll and sur-vey. In business, it’s used for market research. Sports statistics are part of daily conver-sation for millions of people. Indeed, you’ll be hard-pressed to think of a topic that isnot linked in some way to statistics.
But what is (or are) statistics? There are two answers, because the term statistics canbe either singular or plural. When it is singular, statistics refers to the science of statis-tics. The science of statistics helps us collect, organize, and interpret data, which arenumbers or other pieces of information about some topic. When it is plural, the wordstatistics refers to the data themselves, especially those that describe or summarizesomething. For example, if there are 30 students in your class and they range in agefrom 17 to 64, the numbers “30 students,” “17 years,” and “64 years” are statistics thatdescribe your class.
How Statistics WorksStatistical studies are conducted in many different ways and for many different pur-poses, but they all share a few characteristics. To get the basic ideas, consider theNielsen ratings, which are used to estimate the numbers of people watching varioustelevision shows. These ratings are used, for example, to determine the most populartelevision show of the week.
Suppose the Nielsen ratings tell you that Lost was last week’s most popular show,with 22 million viewers. You probably know that no one actually counted all 22 mil-lion people. But you may be surprised to learn that the Nielsen ratings are based onthe television-viewing habits of people in only 5000 homes. To understand howNielsen can draw a conclusion about millions of Americans from 5000 homes, weneed to investigate the principles behind statistical research.
Nielsen’s goal is to draw conclusions about the viewing habits of all Americans. Inthe language of statistics, we say that Nielsen is interested in the population of allAmericans. The characteristics of this population that Nielsen seeks to learn—suchas the number of people watching each television show—are called populationparameters. Note that, although we usually think of a population as a group of peo-ple, in statistics a population can be any kind of group—people, animals, or things.For example, in a study of college costs, the population might be all colleges and uni-versities, and the population parameters might include prices for tuition, fees, andhousing.
benn.8206.05.pgs 12/15/06 8:22 AM Page 322
5A Fundamentals of Statistics 323
Nielsen seeks to learn about the population of all Americans by studying a muchsmaller sample of Americans in depth. More specifically, Nielsen has devices (called“people meters”) attached to televisions in 5000 homes, so the people who live inthese homes make up the sample of Americans that Nielsen studies. The individualmeasurements that Nielsen collects from the sample, such as who is watching eachshow at each time, constitute the raw data. Nielsen then consolidates these raw datainto a set of numbers that characterize the sample, such as the percentage of youngmale viewers watching Lost. These numbers are called sample statistics.
❉EXAMPLE 1 Population and SampleFor each of the following cases, describe the population, sample, population parame-ters, and sample statistics.
a. Agricultural inspectors for Jefferson County measure the levels of residuefrom three common pesticides on 25 ears of corn from each of the 104 corn-producing farms in the county.
b. Anthropologists determine the average brain size of early Neanderthals inEurope by studying skulls found at three sites in southern Europe.
SOLUTION
a. The inspectors seek to learn about the population of all ears of corn grownin the county. They do this by studying a sample that consists of 25 earsfrom each farm. The population parameters are the average levels of residuefrom the three pesticides on all corn grown in the county. The sample sta-tistics describe the average levels of residue that are actually measured onthe corn in the sample.
b. The anthropologists seek to learn about the population of all early Nean-derthals in Europe. Specifically, they seek to determine the average brainsize of all Neanderthals, which is the population parameter in this case. Thesample consists of the relatively few individual Neanderthals whose skullsare found at the three sites. The sample statistic is the average brain size(skull size) of the individuals in the sample. Now try Exercises 25–30.
The Process of a Statistical StudyBecause Nielsen does not study the entire population of all Americans, it cannot actu-ally measure any population parameters. Instead, the company tries to infer reasonablevalues for population parameters from the sample statistics (which it did measure).
➽
By the WayArthur C. Nielsenfounded his companyand invented marketresearch in 1923. Hebegan producing ratingsfor radio programs in1942 and added televi-sion ratings in the 1960s.Nielsen’s people meters,attached to all the tele-visions in 5000 homes, tellthe company wheneach television is on andwhat show is beingwatched. People in thehomes are supposed topush buttons that tellNielsen who is watchingeach television. Nielsencan thereby determinethe breakdown of view-ership by age, sex, andethnicity, as well as totalviewing numbers.
DEFINITIONS
The population in a statistical study is the complete set of people or things beingstudied. The sample is the subset of the population from which the raw data areactually obtained.
Population parameters are specific characteristics of the population that a statis-tical study is designed to estimate. Sample statistics are numbers or observationsthat summarize the raw data.
benn.8206.05.pgs 12/15/06 8:22 AM Page 323
324 CHAPTER 5 Statistical Reasoning
POPULATION SAMPLE
POPULATIONPARAMETERS
SAMPLESTATISTICS
START
2. Draw from population.
1. Identify goals.
5. Draw conclusions. 3. Collect raw data and summarize.
4. Make inferences about population.
FIGURE 5.1 Elements of a statistical study.
The process of inference is simple in principle, though it must be carried out withgreat care. For example, suppose Nielsen finds that 7% of the people in its samplewatched Lost. If this sample accurately represents the entire population of all Ameri-cans, then Nielsen can infer that approximately 7% of all Americans watched the show.In other words, the sample statistic of 7% is used as an estimate for the populationparameter. (By using statistical techniques that we’ll discuss in Unit 6D, Nielsen canalso estimate the uncertainty in the inferred population parameters.)
Once Nielsen has estimates of the population parameters, it can draw general con-clusions about what Americans were watching. The process used by Nielsen MediaResearch is similar to that used in many statistical studies. Figure 5.1 summarizes thegeneral relationships among a population, a sample, the sample statistics, and thepopulation parameters.
By the WayStatisticians often dividetheir subject into twomajor branches.Descriptive statistics isthe branch that dealswith describing data inthe form of tables,graphs, or sample statis-tics. Inferential statistics isthe branch that dealswith inferring (or estimat-ing) population charac-teristics from sampledata.
BASIC STEPS IN A STATISTICAL STUDY
1. State the goal of your study precisely. That is, determine the population youwant to study and exactly what you’d like to learn about it.
2. Choose a representative sample from the population.3. Collect raw data from the sample and summarize these data by finding sample
statistics of interest.4. Use the sample statistics to infer the population parameters.5. Draw conclusions: Determine what you learned and whether you achieved your
goal.
benn.8206.05.pgs 12/15/06 8:22 AM Page 324
5A Fundamentals of Statistics 325
❉EXAMPLE 2 Unemployment SurveyEach month, the U.S. Labor Department surveys 60,000 households to determinecharacteristics of the U.S. work force. One population parameter of interest is theU.S. unemployment rate, defined as the percentage of people who are unemployedamong all those who are either employed or actively seeking employment. Describehow the five basic steps of a statistical study apply to this research.
SOLUTION The steps apply as follows.
Step 1. The goal of the research is to learn about the employment (or unem-ployment) within the population of all Americans who are eitheremployed or actively seeking employment.
Step 2. The Labor Department chooses a sample consisting of people employedor seeking employment in 60,000 households.
Step 3. The Labor Department asks questions of the people in the sample, andtheir responses constitute the raw data for the research. The Departmentthen consolidates these data into sample statistics, such as the percentageof people in the sample who are unemployed.
Step 4. Based on the sample statistics, the Labor Department makes estimates ofthe corresponding population parameters, such as the unemploymentrate for the entire United States.
Step 5. The Labor Department draws conclusions based on the populationparameters and other information. For example, it might use the currentand past unemployment rates to draw conclusions about whether jobshave been created or lost. Now try Exercises 31–36.
Choosing a SampleChoosing a sample may be the most important step in any statistical study. If the sam-ple fairly represents the population as a whole, then it’s reasonable to make inferencesfrom the sample to the population. But if the sample is not representative, then there’slittle hope of drawing accurate conclusions about the population.
Suppose you want to determine the average height and weight of students at alarge university by measuring the heights and weights of a sample of 100 students. Asample consisting only of members of the football and basketball teams would not bereliable, because these athletes tend to be larger than most students. In contrast, sup-pose you select your sample with a computer program that randomly draws studentnumbers from the entire university population. In this case, the 100 students in yoursample are likely to be representative of the entire student body. You can thereforeexpect that the average height and weight of students in the sample are reasonableestimates of the averages for all students.
➽
Now try Exercises 37–38. ➽
By the WayAccording to the LaborDepartment, someonewho is not working is notnecessarily unemployed.For example, stay-at-home moms and dadsare not counted amongthe unemployed unlessthey are actively tryingto find a job, and peo-ple who had been try-ing to find work butgave up in frustrationare not counted asunemployed.
DEFINITION
A representative sample is a sample in which the relevant characteristics of thesample members match those of the population.
benn.8206.05.pgs 12/15/06 8:22 AM Page 325
326 CHAPTER 5 Statistical Reasoning
A sample drawn with a computer program that selects students at random is anexample of a simple random sample. More technically, simple random samplingmeans that every sample of a particular size has the same chance of being selected. Inthe case of the student sample, every set of 100 students has an equal chance of beingselected by the computer program.
Simple random sampling is usually the best way to choose a representative sample.However, it is not always practical or necessary, so other sampling techniques aresometimes used. The following box summarizes four of the most common samplingtechniques, and Figure 5.2 illustrates the ideas.
COMMON SAMPLING METHODS
Simple random sampling: We choose a sample of items in such a way that everysample of a given size has an equal chance of being selected.
Systematic sampling: We use a simple system to choose the sample, such asselecting every 10th or every 50th member of the population.
Convenience sampling: We use a sample that is convenient to select, such as peo-ple who happen to be in the same classroom.
Stratified sampling: We use this method when we are concerned about differ-ences among subgroups, or strata, within a population. We first identify the sub-groups and then draw a simple random sample within each subgroup. The totalsample consists of all the samples from the individual subgroups.
Every sample of the same size has an equal chance of being selected. Computers are often used to generate random telephone numbers.
Simple Random Sampling:
Partition the population into at least two strata, then draw a sample from each.
Stratified Sampling:Systematic Sampling:
Use results that are readily available.Convenience Sampling:
Hey!Do you support
the deathpenalty?
Select every kth member.
FIGURE 5.2 Common sampling techniques.
benn.8206.05.pgs 12/15/06 8:22 AM Page 326
5A Fundamentals of Statistics 327
By the WayNeanderthals livedbetween about 100,000and 30,000 years ago inEurasia and northernAfrica. They were physio-logically distinct frommodern humans, but sci-entists are not yet surewhether they repre-sented a separatespecies or could inter-breed with Homo sapi-ens. Neanderthalsdeveloped manyaspects of culture,including caring for thesick and burying theirdead. Skull measure-ments suggest thatNeanderthals had largerbrains than modernhumans.
Regardless of what type of sampling is used, always keep the following two keyideas in mind:
• No matter how a sample is chosen, the study can be successful only if the sampleis representative of the population.
• Even if a sample is chosen in the best possible way, it is still just a sample (asopposed to the entire population). Thus, we can never be sure that a sample is rep-resentative of the population. In general, a larger sample is more likely to be rep-resentative of the population, as long as it is chosen well.
❉EXAMPLE 3 Sampling MethodsIdentify the type of sampling used in each of the following cases, and comment onwhether the sample is likely to be representative of the population.
a. You are conducting a survey of students in a dormitory. You choose yoursample by knocking on the door of every 10th room.
b. To survey opinions on a possible property tax increase, a research firm ran-domly draws the addresses of 150 homeowners from a public list of allhomeowners.
c. Agricultural inspectors for Jefferson County check the levels of residue fromthree common pesticides on 25 ears of corn from each of the 104 corn-producing farms in the county.
d. Anthropologists determine the average brain size of early Neanderthals inEurope by studying skulls found at three sites in southern Europe.
SOLUTION
a. Choosing every 10th room makes this a systematic sample. The sample maybe representative, as long as students were randomly assigned to rooms.
b. The records presumably list all homeowners, so drawing randomly fromthis list produces a simple random sample. It has a good chance of beingrepresentative of the population.
c. Each farm may have different pesticide use, so the inspectors consider cornfrom each farm as a subgroup (stratum) of the full population. By checking25 ears of corn from each of the 104 farms, the inspectors are using strati-fied sampling. If the ears are collected randomly on each farm, each set of25 is likely to be representative of its farm.
d. By studying skulls found at selected sites, the anthropologists are using aconvenience sample. They have little choice, because only a few skullsremain from the many Neanderthals who once lived in Europe. However, itseems reasonable to assume that these skulls are representative of the largerpopulation. Now try Exercises 39–44.
Watching Out for BiasConsider a study designed to estimate the average weight of all men at a college. As wediscussed earlier, a sample consisting only of football players would not be representa-tive of the population with respect to weight. We say that this sample is biased becausethe men in the sample differ in a critical way from “typical” men at the college. Moregenerally, the term bias refers to any problem in the design or conduct of a statisticalstudy that tends to favor certain results.
➽
benn.8206.05.pgs 12/15/06 8:22 AM Page 327
328 CHAPTER 5 Statistical Reasoning
Besides occurring in a poorly chosen sample, bias can arise in many other ways.For example, a researcher may be biased if he or she has a personal stake in the out-come of the study. In that case, the researcher might distort (intentionally or uninten-tionally) the true meaning of the data. You should always be on the lookout for anytype of bias that may affect the results or interpretation of a statistical study. We’ll dis-cuss sources of bias further in Unit 5B.
Types of Statistical StudyBroadly speaking, most statistical studies fall into one of two categories: observationalstudies and experiments. Nielsen’s studies of television viewing are observationalbecause they are designed to observe the television-viewing behavior of the people inits 5000 sample homes. Note that observational studies may still involve some inter-action. For example, an opinion poll is observational, even though researchers mayconduct in-depth interviews, because the poll’s goal is to learn (observe) people’sopinions, not to change them. Similarly, a study in which individuals in the sample areweighed is also observational, because the measurement process records (observes)but does not change a person’s weight.
In contrast, consider a medical study designed to test whether large doses of vita-min C can help prevent colds. To conduct this study, the researchers must ask somepeople in the sample to take large doses of vitamin C. This type of statistical study iscalled an experiment, because some participants receive a treatment (in this case,vitamin C) that they would not otherwise receive.
It is difficult to determine whether an experimental treatment works unless youcompare groups that receive the treatment to groups that don’t. In the vitamin Cstudy, for example, researchers might create two groups of people: a treatment
DEFINITION
A statistical study suffers from bias if its design or conduct tends to favor certainresults.
Time out to thinkThinking about issues of bias, explain why television networks use Nielsen to measureratings rather than doing it themselves.
TWO BASIC TYPES OF STATISTICAL STUDY
1. In an observational study, researchers observe or measure characteristics of thesample members but do not attempt to influence or modify these characteristics.
2. In an experiment, researchers apply a treatment to some or all of the samplemembers and then look to see whether the treatment has any effects.
benn.8206.05.pgs 12/15/06 8:22 AM Page 328
5A Fundamentals of Statistics 329
group that takes large doses of vitamin C and a control group that does not takevitamin C. The researchers can then look for differences in the numbers of coldsamong people in the two groups. Having a control group is usually crucial to inter-preting the results of experiments.
In an experiment, it is very important for the treatment and control groups to bealike in all respects except for the treatment. For example, if the treatment group con-sisted of active people with good diets and the control group consisted of sedentarypeople with poor diets, we could not attribute any differences in colds to vitamin Calone. To avoid this type of problem, assignments to the control and treatment groupsmust be done randomly.
The Placebo Effect and BlindingFor experiments involving people, using a treatment and a control group might notbe enough to get reliable results. The problem is that people can be affected by theirbeliefs as well as by real treatments. For example, stress and other psychological fac-tors have been shown to affect resistance to colds. If people taking vitamin C getfewer colds than people who don’t, we can’t conclude that the vitamin C was respon-sible. It might be that people stayed healthier because they believed that vitamin Cworks. Therefore, people in the control group should be given a placebo—in thiscase, pills that look like vitamin C pills but don’t actually contain vitamin C. As longas the participants don’t know whether they are in the treatment or control group(that is, whether they got the real pills or the placebo), any effect arising from psycho-logical factors—known as a placebo effect—should affect both groups equally. Then,if people in the vitamin C group get fewer colds than people in the control group, wehave evidence that vitamin C really works.
With proper treat-ment, a cold can becured in a week. Leftto itself, it may lingerfor seven days.
—A MEDICAL FOLK SAYING
By the WayThe placebo effect canbe surprisingly powerful.Consider a drug nowused to combat bald-ing, which was tested onbalding men. The drugmaker was pleased tolearn that 86% of themen receiving the drugeither stopped baldingor grew new hair. Butremarkably, so did 42%of the men whoreceived the placebo!In other studies, as manyas 75% of participantsreceiving a placebohave actually improved.
TREATMENT AND CONTROL GROUPS
The treatment group in an experiment is the group of sample members whoreceive the treatment being tested.
The control group in an experiment is the group of sample members who do notreceive the treatment being tested.
It is important for the treatment and control groups to be selected randomly andto be alike in all respects except for the treatment.
DEFINITIONS
A placebo lacks the active ingredients of a treatment being tested in a study, but isidentical in appearance to the treatment. Thus, study participants cannot distin-guish the placebo from the real treatment.
The placebo effect refers to the situation in which patients improve simplybecause they believe they are receiving a useful treatment.
benn.8206.05.pgs 12/15/06 8:22 AM Page 329
330 CHAPTER 5 Statistical Reasoning
In statistical terminology, the practice of keeping people in the dark about who isin the treatment group and who is in the control group is called blinding. A single-blind experiment is one in which the participants don’t know which group theybelong to, but the experimenters (the people administering the treatment) do know.Using a placebo is one way to create a single-blind experiment. Sometimes, a single-blind experiment can still be unreliable if the experimenters can subtly influenceoutcomes. For example, in an experiment that involves interviews, the experi-menters might speak differently to people who received the real treatment than tothose who received the placebo. This type of problem can be avoided by making theexperiment double-blind, which means neither the participants nor the experi-menters know who belongs to each group. (Of course, someone must keep track ofthe two groups in order to evaluate the results at the end. In typical double-blindexperiments, researchers hire experimenters to make any necessary contact with theparticipants.)
❉EXAMPLE 4 What’s Wrong with This Experiment?For each of the experiments described below, identify any problems and explain howthe problems could have been avoided.
a. A chiropractor wants to know if his adjustments relieve back pain. He per-forms adjustments on 25 patients with back pain. Afterward, 18 of thepatients say they feel better. He concludes that the adjustments are an effec-tive treatment.
b. A new drug for attention deficit disorder (ADD) is supposed to make theaffected children more polite. Randomly selected children suffering fromADD are divided into treatment and control groups. Those in the controlgroup receive a placebo that looks just like the real drug. The experiment issingle-blind. Experimenters interview the children one-on-one to decidewhether they became more polite.
SOLUTION
a. The 25 patients who receive adjustments represent a treatment group, butthis study lacks a control group. The patients may be feeling better becauseof a placebo effect rather than any real effect of the adjustments. The chiro-practor might have improved his study by hiring an actor to do a fakeadjustment (one that feels like a real manipulation, but doesn’t actually con-
BLINDING IN EXPERIMENTS
An experiment is single-blind if the participants do not know whether they aremembers of the treatment group or members of the control group, but the experi-menters do know.
An experiment is double-blind if neither the participants nor the experimenters(people administering the treatment) know who belongs to the treatment groupand who belongs to the control group.
benn.8206.05.pgs 12/15/06 8:23 AM Page 330
5A Fundamentals of Statistics 331
DILBERT reprinted by permission of United Feature Syndicate, Inc.
form to chiropractic guidelines) on a control group. Then he could havecompared the results in the two groups to see whether a placebo effect wasinvolved.
b. Because the experimenters know which children received the real drug, dur-ing the interviews they may inadvertently speak differently or interpretbehavior differently with these children. In that case, their conclusionsmight not be valid. The experiment should have been double-blind, so thatthe experimenters conducting the interviews would not have known whichchildren received the real drug and which children received the placebo.
Now try Exercises 45–50.
Case-Control StudiesSometimes it may be impractical or unethical to conduct an experiment. For example,suppose we want to study how alcohol consumed during pregnancy affects newbornbabies. Because it is already known that alcohol can be harmful during pregnancy, itwould be unethical to divide a sample of pregnant mothers randomly into two groupsand then force the members of one group to consume alcohol. However, we may beable to conduct a case-control study, in which the participants naturally form groupsby choice. In this example, the cases consist of mothers who consume alcohol duringpregnancy by choice, and the controls consist of mothers who choose not to consumealcohol.
A case control study is observational because the researchers do not change thebehavior of the participants. But it also resembles an experiment because the caseseffectively represent a treatment group and the controls represent a control group.
➽
DEFINITIONS
A case-control study is an observational study that resembles an experimentbecause the sample naturally divides into two (or more) groups. The participantswho engage in the behavior under study form the cases, which makes them like atreatment group in an experiment. The participants who do not engage in thebehavior are the controls, making them like a control group in an experiment.
benn.8206.05.pgs 12/15/06 8:23 AM Page 331
332 CHAPTER 5 Statistical Reasoning
❉EXAMPLE 5 Which Type of Study?For each of the following questions, what type of statistical study is most likely to leadto an answer? Why?
a. What is the average income of stock brokers?b. Do seat belts save lives?c. Can lifting weights improve runners’ times in a 10-kilometer race?d. Can a new herbal remedy reduce the severity of colds?
SOLUTION
a. An observational study can tell us the average income of stock brokers. Weneed only survey (observe) the brokers.
b. It would be unethical to do an experiment in which some people were toldto wear seat belts and others were told not to wear them. Instead, we canconduct an observational case-control study. Some people choose to wear seatbelts (the cases) and others choose not to wear them (the controls). By com-paring the death rates in accidents between cases and controls, we can learnwhether seat belts save lives. (They do.)
c. We need an experiment to determine whether lifting weights can improverunners’ 10K times. One group of runners will be put on a weight-liftingprogram, and a control group will be asked to stay away from weights. Wemust try to ensure that all other aspects of their training are similar. Thenwe can see whether the runners in the lifting group improve their timesmore than those in the control group. Note that we cannot use blinding inthis experiment because there is no way to prevent participants from know-ing whether they are lifting weights.
d. We should use a double-blind experiment, in which some participants get theactual remedy while others get a placebo. We need double-blind condi-tions because the severity of a cold may be affected by mood or other fac-tors that experimenters might inadvertently influence.
Now try Exercises 51–56.
Surveys and Opinion PollsSurveys and opinion polls may be the most common types of statistical study, and wemust be very careful when we interpret them. Fortunately, survey and poll results usu-ally include something called the margin of error.
Suppose a poll finds that 76% of the public supports the President, with a marginof error of 3 percentage points. The 76% is a sample statistic; that is, 76% of the peo-ple in a sample said they support the President. The margin of error helps us under-stand how well this sample statistic is likely to approximate the true populationparameter (in this case, the percentage of all Americans who support the President).By adding and subtracting the margin of error from the sample statistic, we find arange of values, or a confidence interval, likely to contain the population parameter.In this case, we add and subtract 3 percentage points to find a confidence intervalfrom 73% to 79%.
➽
By the WayPoliticians and mar-keters often pretendthey are trying to con-duct a true opinion pollor survey when, in fact,they are deliberatelytrying to get particularresults. These types ofsurveys are called pushpolls because they tryto “push” people’sopinions.
benn.8206.05.pgs 12/15/06 8:23 AM Page 332
5A Fundamentals of Statistics 333
DEFINITION
The margin of error in a statistical study is used to describe a confidence inter-val that is likely to contain the true population parameter. We find this interval bysubtracting and adding the margin of error from the sample statistic obtained inthe study. That is, the confidence interval is
to Asample statistic 1 margin of error B from Asample statistic 2 margin of error B
How confident can we be in a poll result? Unless we are told otherwise, we assumethat the margin of error is defined to give us 95% confidence that the confidenceinterval contains the population parameter. We’ll discuss the precise meaning of “95%confidence” in Unit 6D, but for now you can think of it as follows: If the poll wererepeated 20 times with 20 different samples, 19 of the 20 polls (that is, 95% of thepolls) would have a confidence interval that contains the true population parameter.
❉EXAMPLE 6 Close ElectionAn election eve poll finds that 52% of surveyed voters plan to vote for Smith, and sheneeds a majority (more than 50%) to win without a runoff. The margin of error in thepoll is 3 percentage points. Will she win?
SOLUTION We subtract and add the margin of error of 3 percentage points to find aconfidence interval
We can be 95% confident that the actual percentage of people planning to vote forher is between 49% and 55%. Because this confidence interval leaves open the possi-bility of both a majority and less than a majority, this election is too close to call.
Now try Exercises 57–60. ➽
from 52% 2 3% 5 49% to 52% 1 3% 5 55%
Time out to thinkIn Example 6, suppose the poll found the candidate had 55% of the vote. Shouldshe be confident of a win?
benn.8206.05.pgs 12/15/06 8:23 AM Page 333
334 CHAPTER 5 Statistical Reasoning
EXERCISES 5A
QUICK QUIZChoose the best answer to each of the following questions.Explain your reasoning with one or more complete sentences.
1. You conduct a poll in which you randomly select 1000 reg-istered voters from Texas and ask if they approve of the jobtheir governor is doing. The population for this study is
a. all registered voters in the state of Texas.
b. the 1000 people that you interview.
c. the governor of Texas.
2. Results of the poll described in Exercise 1 would mostlikely suffer from bias if you chose the participants from
a. all registered voters in Texas.
b. all people with a Texas drivers license.
c. people who donated money to the governor’s campaign.
3. When we say that a sample is representative of the popula-tion, we mean that
a. the results found for the sample are similar to those wewould find for the entire population.
b. the sample is very large.
c. the sample was chosen in the best possible way.
4. Consider an experiment designed to see whether cashincentives improve school attendance. The researcherchooses two groups of 100 high school students. She offersone group $10 for every week of perfect attendance. Shetells the other group that they are part of an experimentbut does not give them any incentive. The students who donot receive an incentive represent
a. the treatment group. b. the control group.
c. the observation group.
5. The experiment described in Exercise 4 is
a. single-blind. b. double-blind. c. not blind.
6. The purpose of a placebo is
a. to prevent participants from knowing whether theybelong to the treatment group or the control group.
b. to distinguish between the cases and the controls in acase-control study.
c. to determine whether diseases can be cured without anytreatment.
7. If we see a placebo effect in an experiment to test a newtreatment designed to cure warts, we know that
a. the experiment was not properly double-blind.
b. the experimental groups were too small.
c. warts were cured among members of the control group.
8. An experiment is single-blind if
a. it lacks a treatment group. b. it lacks a control group.
c. the participants do not know whether they belong to thetreatment or control group.
9. Poll X predicts that Powell will receive 49% of the vote,while Poll Y predicts that he will receive 53% of the vote.Both polls have a margin of error of 3 percentage points.What can you conclude?
a. One of the two polls must have been conducted poorly.
b. The two polls are consistent with each other.
c. Powell will receive 51% of the vote.
10. A survey reveals that 12% of Americans believe Elvis is stillalive, with a margin of error of 4 percentage points. Theconfidence interval for this poll is
a. from 10% to 14%. b. from 8% to 16%.
c. from 4% to 20%.
REVIEW QUESTIONS11. Why do we say that the term statistics has two meanings?
Describe both meanings.
12. Define the terms population, sample, population parameter,and sample statistics as they apply to statistical studies.
13. Describe the five basic steps in a statistical study, and givean example of their application.
14. Why is it so important that a statistical study use a repre-sentative sample? Briefly describe four common samplingmethods.
15. What is bias? How can it affect a statistical study? Giveexamples of several forms of bias.
16. Describe and contrast observational studies and experi-ments. What do we mean by the treatment group andcontrol group in an experiment? What do we mean by thecases and controls in an observational case-control study?
benn.8206.05.pgs 12/15/06 8:23 AM Page 334
5A Fundamentals of Statistics 335
17. What is a placebo? Describe the placebo effect and how itcan make experiments difficult to interpret. How can mak-ing an experiment single-blind or double-blind help?
18. What is meant by the margin of error in a survey or opin-ion poll? How is it used to identify a confidence interval?
DOES IT MAKE SENSE?Decide whether each of the following statements makes sense(or is clearly true) or does not make sense (or is clearly false).Explain your reasoning.
19. In my experimental study, I used a sample that was largerthan the population.
20. I followed all the guidelines for sample selection carefully,yet my sample still did not reflect the characteristics of thepopulation.
21. I wanted to test the effects of vitamin C on colds, so I gavethe treatment group vitamin C and gave the control groupvitamin D.
22. I don’t believe the results of the experiment, because theresults were based on interviews but the study was notdouble-blind.
23. The pre-election poll found that Kennedy would get 58%of the vote, with a margin of error of 4%, but he ended uplosing the election.
24. By choosing my sample carefully, I can make a good esti-mate of the average height of Americans by measuring theheights of only 500 people.
BASIC SKILLS & CONCEPTSPopulation and Sample. For the studies described in Exer-cises 25–30, describe the population, sample, population param-eters, and sample statistics.
25. In order to gauge public opinion on how to handle Iran’sgrowing nuclear program, the Pew Research Center sur-veyed 1001 Americans by telephone.
26. Astronomers typically determine the distance to a galaxy (agalaxy is a huge collection of billions of stars) by measuringthe distances to just a few stars within it and taking themean (average) of these distance measurements.
27. In a USA Today Internet poll, readers responded voluntar-ily to the question “Do you consume at least one caf-feinated beverage every day?”
28. The Gallup Organization conducted a poll of 1003 Ameri-cans in its household panel who plan to take a summervacation to determine what percentage of people plan tocancel their summer vacation because of the increase ingasoline prices.
29. Harris Interactive surveyed 2435 U.S. adults nationwideand asked them to rate quality of American public schools.
30. The American Institute of Education conducts an annualstudy of attitudes of incoming college students by survey-ing approximately 261,000 first-year students at 462 col-leges and universities. There are approximately 1.6 millionfirst-year college students in this country.
Steps in a Study. Describe how you would apply the five basicsteps of a statistical study to the issues in Exercises 31–36.
31. You want to determine the average number of hours perday students at a middle school spend listening to iPods.
32. As an airline marketing executive, you want to know ifthere has been an increase in frustration with air travelamong business travelers.
33. You want to know the percentage of male college studentsin America who do Sudoku puzzles at least once per week.
34. You want to know the typical percentage of the bill that isleft as a tip in restaurants.
35. You want to know the average lifetime of windshieldwipers on cars made in Japan.
36. You want to know the percentage of high school studentswho are vegetarians.
37. Representative Sample? You want to determine themean (average) number of hours spend studying each weekby high school girls. Which of the following samples ismost likely to be representative, and why? Also explainwhy each of the other choices is not likely to make a repre-sentative sample for this study.
• The girls’ track team
• The girls in an advanced placement calculus course
• The girls in the cast of the current theater production
• The first 50 girls you meet in the school cafeteria
38. Representative Sample? You want to determine the typi-cal dietary habits of students at a college. Which of the fol-lowing would make the best sample, and why? Also explainwhy each of the other choices would not make a good sam-ple for this study.
• Students in a single dormitory
• Students majoring in public health
• Students who participate in intercollegiate sports
• Students enrolled in a required mathematics class
Identify the Sampling Method. Exercises 39–44 eachdescribe a sample. Identify the sampling method as simple ran-dom sampling, systematic sampling, convenience sampling, or
benn.8206.05.pgs 10/1/07 9:38 AM Page 335
336 CHAPTER 5 Statistical Reasoning
stratified sampling. Briefly explain why you think this samplingmethod was chosen.
39. An IRS (Internal Revenue Service) auditor randomlyselects for audits 30 taxpayers in each of the filing statuscategories: single, head of household, married filing jointly,and married filing separately.
40. People magazine chooses its “25 most beautiful women” bylooking at responses from readers who voluntarily mail in asurvey printed in the magazine.
41. A study of the use of antidepressants selects 50 participantswhose ages are between 20 and 29, 50 participants whoseages are between 30 and 39, and 50 participants whoseages are between 40 and 49.
42. Every 100th computer chip that is produced is given a reli-ability test.
43. A computer randomly selects 400 names from a list of allregistered voters. Those selected are surveyed to predictwho will win the election for Mayor.
44. A taste test for chips and salsa is given at the entrance to asupermarket.
Type of Study. For Exercises 45–50, state whether the study isan observational study or an experiment. If it is an experiment,describe the treatment and control groups and discuss whethersingle- or double-blinding is needed. If it is observational, statewhether it is a case-control study and, if it is, distinguishbetween the cases and the controls.
45. A study at the University of Southern California separated108 volunteers into groups, based on psychological testsdesigned to determine how often they lied and cheated.Those with a tendency to lie had different brain structuresthan those who did not lie (British Journal of Psychiatry).
46. A National Cancer Institute study of 716 melanomapatients and 1014 cancer-free patients matched by age, sex,and race found that those having a single large mole hadtwice the risk of melanoma. Having 10 or more moles wasassociated with a 12 times greater risk of melanoma(Journal of the American Medical Association).
47. In a study done at Boston University, researchers tooksnapshots of 4000 white adults every four years for 30 yearsand determined that 9 of 10 men and 7 of 10 women willeventually become overweight (Annals of Internal Medicine).
48. A breast cancer study began by asking 25,624 women ques-tions about how they spent their leisure time. The healthof these women was tracked over the next 15 years. Thosewomen who said they exercise regularly were found tohave a lower incidence of breast cancer (New England Jour-nal of Medicine).
49. A (hypothetical) study of 45 swimmers found that thosewho were placed on a weight-training regimen in additionto daily swimming workouts improved their times by 3.5%.
50. A survey of 275,811 first-year college students revealedthat 32.4% of these students had an A average in highschool (Higher Education Research Institute).
Which Type of Study? For each of the questions in Exercises51–56, what type of statistical study is most likely to lead to ananswer? Why?
51. How many hours per week does the average public schoolteacher work?
52. What is the percentage of American voters who favor aconstitutional amendment banning gay marriages?
53. Do teenagers with a diet high in dairy products have ahigher incidence of acne?
54. Do drivers of the same model car get better mileage withhigh-ethanol fuel?
55. Does a multi-vitamin a day reduce the incidence ofstrokes?
56. Are the Sunday horoscopes in a local newspaper moreaccurate than the weekday horoscopes?
Margin of Error. Each of Exercises 57–60 states both a samplestatistic and a margin of error. Find the confidence interval ineach case, and answer any additional questions asked. Be sure toexplain your answers clearly.
57. A poll is conducted the day before a state election for Sen-ator. There are only two candidates running. The pollshows that 53% of the voters surveyed favor the Republi-can candidate, with a margin of error of 2.5 percentagepoints. Should the Republican plan a victory party? Whyor why not?
58. A poll is conducted the day before an election for U.S.Representative. There are only two candidates running.The poll shows that 48.5% of the voters surveyed favor theDemocratic candidate, with a margin of error of 2.0 per-centage points. Based on this poll, should the Democraticcandidate expect to lose the election? Why or why not?
59. Of 133 adult Americans surveyed in a Gallup poll who saidtheir vacation plans had changed because of high gasolineprices, 58% said they had changed their destination orshortened their trip. With a margin of error of 9.0 per-centage points, can you say that a majority of Americanschanged their destination or shortened their trip?
60. In a survey of 1002 people, 701 (which is 70%) said thatthey voted in the most recent presidential election (based
benn.8206.05.pgs 12/15/06 8:23 AM Page 336
5A Fundamentals of Statistics 337
on data from ICR Research Group). The margin of errorfor the survey was 3 percentage points. However, actualvoting records show that only 61% of all eligible votersactually did vote. Does this necessarily imply that peoplelied when they answered the survey?
65. In a TIME/CNN poll, 748 adults were asked whether theybelieved their children would have a higher standard of liv-ing than they have; 63% of those polled said “yes.” Themargin of error was 3.7 percentage points.
66. A Gallup poll of 1002 American adults determined that81% of those surveyed believed that the state of moral val-ues in the country overall was getting worse. The marginof error was 3.2 percentage points.
67. Based on its survey of 60,000 households (see Example 2),the U.S. Labor Department reported an unemploymentrate of 6.4% in June 2003. The margin of error for thereport was 0.2 percentage point.
68. The Pew Research Center asked 1546 adult Americanswhether humans would land on Mars within the next50 years; 76% of these people said either “definitely yes”or “probably yes.” The margin of error for the poll was2.5 percentage points.
69. A Fox News opinion poll asked 900 registered voters, “Doyou personally think the government is listening to yourphone conversations?” Thirty percent of those surveyedresponded “yes” and 58% responded “no.” The margin oferror was 3.0 percentage points.
70. A Roper Organization survey of 2000 adults revealed that64% of those surveyed kept money in a regular savingsaccount. The margin of error for the survey was 2.2 per-centage points.
WEB PROJECTSFind useful links for Web Projects on the text Web site:www.aw.com/bennett-briggs
71. Current Nielsen Ratings. Find the Nielsen ratings forthe past week. What were the three most popular televi-sion shows? Explain both the “rating” and the “share” foreach show.
72. Nielsen Sample. Use information available on theNielsen Media Research Web site to answer each of thefollowing questions.
a. How does Nielsen select the sample of homes to beincluded in a viewer survey?
b. Describe a few ways by which Nielsen attempts tocheck that the results from its people meter surveys areaccurate.
c. Based on what you have learned, do you think theNielsen ratings are reliable? If so, why? If not, whynot?
FURTHER APPLICATIONSExperiment Results. Consider an experiment designed todetermine the effectiveness of a new drug. The drug is given toparticipants in the treatment group, while participants in thecontrol group receive a placebo. For each set of results describedin Exercises 61–64, discuss whether there appears to be evidencethat the treatment is effective.
61. 70% of those in the treatment group showed improve-ment; 30% of those in the placebo group showedimprovement.
62. 45% of those in the treatment group showed improve-ment; 45% of those in the placebo group showedimprovement.
63. 90% of those in the treatment group showed improve-ment; 50% of those in the placebo group showedimprovement.
64. 25% of those in the treatment group showed improve-ment; 50% of those in the placebo group showedimprovement.
Interpreting Real Studies. For each of Exercises 65–70, dothe following:
a. Identify the population and the population parameter ofinterest.
b. Briefly describe the sample and sample statistic for thestudy.
c. Find the confidence interval likely to contain the populationparameter of interest.
benn.8206.05.pgs 12/15/06 8:23 AM Page 337
338 CHAPTER 5 Statistical Reasoning
73. Attitude Update. The Pew Research Center for the Peo-ple and the Press studies public attitudes toward the press,politics, and public policy issues. Go to its Web site andfind the latest survey about attitudes. Write a one-pagesummary of what Pew surveyed, how it conducted the sur-vey, and what it found.
74. Labor Statistics. Use the Bureau of Labor Statistics Webpage to learn about its monthly survey. Choose one aspectof the survey, such as how the sample is chosen or how it isused to compare unemployment rates over time. Write ashort summary of what you learn.
75. Professional Polling. Visit the Web site of a nationalpolling organization and report on a recent poll. Write ashort description of the poll and its results, commentingon features such as sampling technique, sample size, andmargin of error.
IN THE NEWS76. Statistics in the News. Select three news stories from the
past week that involve statistics in some way. In each case,write one or two paragraphs describing the role of statisticsin the story.
77. Statistics in Your Major. Write two to three paragraphsdescribing the ways in which you think the science of sta-tistics is important in your major field of study. (If you have
not chosen a major, answer this question for a major thatyou are considering.)
78. Statistics in Sports. Choose a sport and describe threedifferent statistics commonly tracked by participants in orspectators of the sport. In each case, briefly describe theimportance of the statistic to the sport.
79. Sample and Population. Find a report in today’s newsconcerning any type of statistical study. What is the popu-lation being studied? What is the sample? Why do youthink the sample was chosen as it was?
80. Poor Sampling. In a recent newspaper or magazine, findan article about a study that attempts to describe somecharacteristic of a population, but that you believe involvedpoor sampling (for example, a sample that was too small orunrepresentative of the population under study). Describethe population, the sample, and what you think was wrongwith the sample. Briefly discuss how you think the poorsampling affected the study results.
81. Good Sampling. In a recent newspaper or magazine, findan article that describes a statistical study in which thesample was well chosen. Describe the population, the sam-ple, and why you think the sample was a good one.
82. Margin of Error. Find a report of a recent survey or poll.Interpret the sample statistic and margin of error quotedfor the survey or poll.
UNIT 5B Should You Believe a Statistical Study?
Most statistical research is carried out with integrity and care. Nevertheless, statisticalresearch is sufficiently complex that bias can arise in many different ways. We shouldalways examine reports of statistical research carefully, looking for anything thatmight make us question the results. In this unit, we discuss eight guidelines that canhelp you answer the question “Should I believe a statistical study?”
Guideline 1: Identify the Goal, Population, and Type of Study
Before evaluating the details of a statistical study, we must know what it is about.Based on what you hear or read, try to answer basic questions such as these:
• What was the goal of the study?
• What was the population under study? Was the population clearly and appropri-ately defined?
• What type of study was used? Was the type appropriate for the goal?
benn.8206.05.pgs 12/15/06 8:23 AM Page 338
5B Should You Believe a Statistical Study? 339
If you can’t find reasonable answers to these questions, it will be difficult to evaluateother aspects of the study.
❉EXAMPLE 1 Appropriate Type of Study?A newspaper reports: “Researchers gave each of the 100 participants their astrologicalhoroscopes, and asked them whether the horoscopes appeared to be accurate. Eighty-five percent of the participants reported that the horoscopes were accurate. Theresearchers concluded that horoscopes are valid most of the time.” Analyze this studyaccording to Guideline 1.
SOLUTION The goal of the study was to determine the validity of horoscopes. Basedon the news report, it appears that the study was observational: The researchers simplyasked the participants about the accuracy of the horoscopes. However, because theaccuracy of a horoscope is somewhat subjective, this study should have been a con-trolled experiment in which some people were given their actual horoscope and oth-ers were given a fake horoscope. Then the researchers could have looked fordifferences between the two groups. Moreover, because researchers could easily influ-ence the results by how they questioned the participants, the experiment should havebeen double-blind. In summary, the type of study was inappropriate to the goal andits results are meaningless. Now try Exercises 19–20.
Guideline 2: Consider the SourceStatistical studies are supposed to be objective, but the people who carry them out andfund them may be biased. Thus, it is important to consider the source of a study andevaluate the potential for biases that might invalidate its conclusions.
❉EXAMPLE 2 Is Smoking Healthy?By 1963, enough research on the health dangers of smoking hadaccumulated that the Surgeon General of the United States publiclyannounced that smoking is bad for health. Research done since thattime has built further support for this claim. However, while thevast majority of studies show that smoking is unhealthy, a few stud-ies found no dangers from smoking, and perhaps even healthbenefits. These studies generally were carried out by the TobaccoResearch Institute, funded by the tobacco companies. Analyze theTobacco Research Institute studies according to Guideline 2.
SOLUTION Tobacco companies had a financial interest in mini-mizing the dangers of smoking. Because the studies carried out atthe Tobacco Research Institute were funded by the tobacco compa-nies, there may have been pressure on the researchers to produceresults to the companies’ liking. This potential for bias does notmean their research was biased, but the fact that it contradicts virtu-ally all other research on the subject should be cause for concern.
Now try Exercises 21–22. ➽
➽
By the WaySurveys show that nearlyhalf of Americansbelieve their horo-scopes. However, in con-trolled experiments, thepredictions of horo-scopes come true nomore often than wouldbe expected bychance.
Copyright © 1998, 2004 by Sidney Harris.
benn.8206.05.pgs 12/15/06 8:23 AM Page 339
340 CHAPTER 5 Statistical Reasoning
Guideline 3: Look for Bias in the SampleLook for bias that may prevent the sample from being representative of the popula-tion. There are two particularly common forms of bias that can affect sample selection.
CASE STUDY The 1936 Literary Digest PollThe Literary Digest, a popular magazine of the 1930s, successfully predicted the out-comes of several elections using large polls. In 1936, editors of the Literary Digestconducted a particularly large poll in advance of the presidential election. They ran-domly chose a sample of 10 million people from various lists, including names in tele-phone books and rosters of country clubs. They mailed a postcard “ballot” to each ofthese 10 million people. About 2.4 million people returned the postcard ballots. Basedon the returned ballots, the editors of the Literary Digest predicted that Alf Landonwould win the presidency by a margin of 57% to 43% over Franklin Roosevelt.Instead, Roosevelt won with 62% of the popular vote. How did such a large survey goso wrong?
The sample suffered from both selection bias and participation bias. The selectionbias arose because the Literary Digest chose its 10 million names in ways that favoredaffluent people. For example, selecting names from telephone books meant choosingonly from those who could afford telephones back in 1936. Similarly, country clubmembers are usually quite wealthy. The selection bias favored Landon because hewas the Republican, and affluent voters of the 1930s tended to vote for Republicancandidates.
The participation bias arose because return of the postcard ballots was voluntary.People who felt most strongly about the election were more likely to be among thosewho returned their postcard ballots. This bias also tended to favor Landon because hewas the challenger—people who did not like President Roosevelt could express theirdesire for change by returning the postcards. Together, the two forms of bias madethe sample results useless, despite the large number of people surveyed.
BIAS IN CHOOSING A SAMPLE
Selection bias occurs whenever researchers select their sample in a way that tendsto make it unrepresentative of the population. For example, a pre-election pollthat surveys only registered Republicans has selection bias because it is unlikely toreflect the opinions of all voters.
Participation bias occurs primarily with surveys and polls; it arises wheneverpeople choose whether to participate. Because people who feel strongly about anissue are more likely to participate, their opinions may not represent the larger pop-ulation that is less emotionally attached to the issue. (Surveys or polls in which peo-ple choose whether to participate are often called self-selected or voluntary responsesurveys.)
HISTORICAL NOTE
A young pollster namedGeorge Gallup con-ducted his own surveyprior to the 1936 elec-tion. Sending postcardsto only 3000 randomlyselected people, he cor-rectly predicted not onlythe outcome of theelection, but also theoutcome of the LiteraryDigest poll to within 1%.Gallup went on toestablish a very success-ful polling organization.
By the WayAfter decades of argu-ing to the contrary, in1999 the Philip MorrisCompany—the world’slargest seller of tobaccoproducts—publiclyacknowledged thatsmoking causes lungcancer, heart disease,emphysema, and otherserious diseases. Shortlythereafter, Philip Morrischanged its name toAltria.
benn.8206.05.pgs 12/15/06 8:23 AM Page 340
5B Should You Believe a Statistical Study? 341
❉EXAMPLE 3 Self-Selected PollThe television show Nightline conducted a poll in which viewers were asked whetherthe United Nations headquarters should be kept in the United States. Viewers couldrespond to the poll by paying 50 cents to call a “900” phone number with their opin-ions. The poll drew 186,000 responses, of which 67% favored moving the UnitedNations out of the United States. Around the same time, a poll using simple randomsampling of 500 people found that 72% wanted the United Nations to stay in theUnited States. Which poll is more likely to be representative of the general opinionsof Americans?
SOLUTION The Nightline sample suffered from severe participation bias. Not onlydid viewers choose whether to call in for the survey, but they had to pay to participate.This cost made it even more likely that respondents would be those who felt a needfor change. Thus, despite its large number of respondents, the Nightline survey wastoo biased to be trusted. In contrast, a simple random sample of 500 people is quitelikely to be representative, so the finding of this small survey has a better chance ofrepresenting the true opinions of all Americans. Now try Exercises 23–24.
Guideline 4: Look for Problems in Defining orMeasuring the Variables of Interest
Statistical studies usually attempt to measure something, and we call the things beingmeasured the variables of interest in the study. The term variable simply refers to anitem or quantity that can vary or take on different values. For example, variables inthe Nielsen ratings include show being watched and number of viewers.
➽
By the WayMore than a third of allAmericans routinely shutthe door or hang up thephone when contactedfor a survey, therebymaking self-selection aproblem for legitimatepollsters. One reasonpeople hang up may bethe proliferation of sell-ing under the guise ofmarket research (oftencalled “sugging”), inwhich a telemarketerpretends you are part ofa survey in order to getyou to buy something.
DEFINITION
A variable is any item or quantity that can vary or take on different values. Thevariables of interest in a statistical study are the items or quantities that the studyseeks to measure.
Results of a statistical study may be especially difficult to interpret if the variablesunder study are difficult to define or measure. For example, imagine trying to conducta study of how exercise affects resting heart rates. The variables of interest would beamount of exercise and resting heart rate. However, both variables are difficult to defineand measure. In the case of amount of exercise, it’s not clear what the definition covers:Does it include walking to class? Even if we specify the definition, how can we meas-ure amount of exercise given that some forms of exercise are more vigorous than oth-ers? The following two examples describe real cases in which defining or measuringvariables caused problems in statistical studies.
Time out to thinkHow would you measure your resting heart rate? Describe some difficulties in defin-ing and measuring resting heart rate.
benn.8206.05.pgs 12/15/06 8:23 AM Page 341
342 CHAPTER 5 Statistical Reasoning
❉EXAMPLE 4 Can Money Buy Love?A Roper poll reported in USA Today involved a survey of the wealthiest 1% of Ameri-cans. The survey found that these people would pay an average of $487,000 for “truelove,” $407,000 for “great intellect,” $285,000 for “talent,” and $259,000 for “eternalyouth.” Analyze this result according to Guideline 4.
SOLUTION The variables in this study are very difficult to define. How, for example,do you define “true love”? And does it mean true love for a day, a lifetime, or some-thing else? Similarly, does the ability to balance a spoon on your nose constitute “tal-ent”? Because the variables are so poorly defined, it’s likely that different peopleinterpreted them differently, making the results very difficult to interpret.
Now try Exercise 25.
❉EXAMPLE 5 Illegal Drug SupplyLaw enforcement authorities try to stop illegal drugs from entering the country. Acommonly quoted statistic is that they succeed in stopping only about 10% to 20% ofthe drugs entering the United States. Should you believe this statistic?
SOLUTION There are essentially two variables in the study: quantity of illegal drugsintercepted and quantity of illegal drugs NOT intercepted. It should be relatively easy tomeasure the quantity of illegal drugs that law enforcement officials intercept. How-ever, because the drugs are illegal, it’s unlikely that anyone is reporting the quantity ofdrugs that are not intercepted. How, then, can anyone know that the intercepteddrugs are 10% to 20% of the total? In a New York Times analysis, a police officer wasquoted as saying that his colleagues refer to this type of statistic as “P.F.A.,” for“pulled from the air.” Now try Exercise 26.
Guideline 5: Watch Out for Confounding VariablesVariables that are not intended to be part of the study can sometimes make it difficultto interpret results properly. Such variables are often called confounding variables,because they confound (confuse) a study’s results.
It’s not always easy to discover confounding variables. Sometimes they are discov-ered years after a study was completed, and sometimes they are not discovered at all.Fortunately, confounding variables are sometimes more obvious and can be discov-ered simply by thinking hard about factors that may have influenced a study’sresults.
❉EXAMPLE 6 Radon and Lung CancerRadon is a radioactive gas produced by natural processes (the decay of uranium) in theground. The gas can leach into buildings through the foundation and can accumulatein relatively high concentrations if doors and windows are closed. Imagine a studythat seeks to determine whether radon gas causes lung cancer by comparing the lungcancer rate in Colorado, where radon gas is fairly common, with the lung cancer ratein Hong Kong, where radon gas is less common. Suppose the study finds that the
➽
➽
By the WayMany hardware storessell simple kits that youcan use to test whetherradon gas is accumulat-ing in your home. If it is,the problem can beeliminated by installingan appropriate“radonmitigation”system,whichusually consists of a fanthat blows the radon outfrom under the housebefore it can get in.
benn.8206.05.pgs 12/15/06 8:23 AM Page 342
5B Should You Believe a Statistical Study? 343
lung cancer rates are nearly the same. Is it fair to conclude that radon is not a signifi-cant cause of lung cancer?
SOLUTION The variables under study are amount of radon and lung cancer rate. How-ever, because smoking can also cause lung cancer, smoking rate may be a confoundingvariable in this study. In particular, the smoking rate in Hong Kong is much higherthan the smoking rate in Colorado, so any conclusions about radon and lung cancermust take the smoking rate into account. In fact, careful studies have shown thatradon gas can cause lung cancer, and the U.S. Environmental Protection Agency(EPA) recommends taking steps to prevent radon from building up indoors.
Now try Exercises 27–28.
Guideline 6: Consider the Setting and Wording inSurveys
Even when a survey is conducted with proper sampling and with clearly defined termsand questions, it’s important to watch out for problems in the setting or wording thatmight produce inaccurate or dishonest responses. Dishonest responses are particu-larly likely when the survey concerns sensitive subjects, such as personal habits orincome. For example, the question “Do you cheat on your income taxes?” is unlikelyto elicit honest answers from those who cheat, especially if the setting does not guar-antee complete confidentiality.
In other cases, even honest answers may not be accurate if the wording of ques-tions invites bias. Sometimes just the order of the words in a question can affect theoutcome. A poll conducted in Germany asked the following two questions:
• Would you say that traffic contributes more or less to air pollution than industry?
• Would you say that industry contributes more or less to air pollution than traffic?
With the first question, 45% answered traffic and 32% answered industry. With thesecond question, only 24% answered traffic while 57% answered industry. Thus, sim-ply changing the order of the words traffic and industry dramatically changed the sur-vey results.
❉EXAMPLE 7 Do You Want a Tax Cut?The Republican National Committee commissioned a poll to find out whetherAmericans supported a tax-cut proposal. Asked whether they favored the tax cut,67% of respondents answered yes. Should we conclude that Americans supported theproposal?
SOLUTION A question like “Do you favor a tax cut?” is biased because it does notgive other options (much like the fallacy of limited choice discussed in Unit 1A). In fact,an independent poll conducted at the same time gave respondents a list of options forusing surplus revenues. This poll found that 31% wanted the money devoted to SocialSecurity, 26% wanted it used to reduce the national debt, and only 18% favored usingit for a tax cut. (The remaining 25% of respondents chose a variety of other options.)
Now try Exercises 29–30. ➽
➽
By the WayPeople are more likely tochoose the item thatcomes first in a surveybecause of what psy-chologists call theavailability error—thetendency to make judg-ments based on what isavailable in the mind.Professional pollingorganizations must bevery careful to avoid thisproblem, sometimes byposing the question tosome people in oneorder and to others inthe opposite order.
benn.8206.05.pgs 12/15/06 8:23 AM Page 343
344 CHAPTER 5 Statistical Reasoning
Guideline 7: Check That Results Are Presented FairlyEven when a statistical study is done well, it may be misrepresented in graphs or con-cluding statements. Researchers may occasionally misinterpret the results of theirown studies or jump to conclusions that are not supported by the results, particularlywhen they have personal biases toward certain interpretations. In other cases, newsreporters or others may misinterpret a survey or jump to unwarranted conclusionsthat make a story seem more spectacular. Misleading graphs are an especially com-mon problem (see Unit 5D). In general, you should look for inconsistencies betweenthe interpretation of a study (in pictures and words) and any actual data given with it.
❉EXAMPLE 8 Does the School Board Need a Statistics Lesson?The school board in Boulder, Colorado, created a hubbub when it announced that28% of Boulder school children were reading “below grade level,” and hence con-cluded that methods of teaching reading needed to be changed. The announcementwas based on reading tests on which 28% of Boulder school children scored below thenational average for their grade. Do these data support the board’s conclusion?
SOLUTION The fact that 28% of Boulder children scored below the national aver-age for their grade implies that 72% scored at or above the national average. Thus,the school board’s ominous statement about students reading “below grade level”makes sense only if “grade level” means the national average score for a particulargrade. This interpretation of “grade level” is curious because it means that half thestudents in the nation are always below grade level—no matter how high the scores.The conclusion that teaching methods needed to be changed was not justified bythese data. Now try Exercises 31–32.
Guideline 8: Stand Back and Consider the ConclusionsFinally, even if a study seems reasonable according to all the previous guidelines, youshould stand back and consider the conclusions. Ask yourself questions such asthese:
• Did the study achieve its goals?
• Do the conclusions make sense?
• Can you rule out alternative explanations for the results?
• If the conclusions do make sense, do they have any practical significance?
❉EXAMPLE 9 Practical SignificanceAn experiment is conducted in which the weight losses of people who try a new “FastDiet Supplement” are compared to the weight losses of a control group of people whotry to lose weight in other ways. After eight weeks, the results show that the treatmentgroup lost an average of pound more than the control group. Assuming that it hasno dangerous side effects, does this study suggest that the Fast Diet Supplement is agood treatment for people wanting to lose weight?
SOLUTION Compared to the average person’s body weight, the difference of poundhardly matters at all. Thus, while the statistics in this case may be interesting, theydon’t seem to have much practical significance. Now try Exercises 33–36. ➽
12
12
➽
Extraordinary claimsrequire extraordinaryevidence.
—CARL SAGAN (1934–1996)
benn.8206.05.pgs 12/15/06 8:23 AM Page 344
5B Should You Believe a Statistical Study? 345
EXERCISES 5B
SUMMARY Eight Guidelines for Evaluating a Statistical Study
1. Identify the goal of the study, the population considered, and the type of study.2. Consider the source, particularly with regard to whether the researchers may be
biased.3. Look for bias that may prevent the sample from being representative of the
population.4. Look for problems in defining or measuring the variables of interest, which can
make it difficult to interpret results.5. Watch out for confounding variables that can invalidate the conclusions of a
study.6. Consider the setting and the wording of questions in any survey, looking for
anything that might tend to produce inaccurate or dishonest responses.7. Check that results are presented fairly in graphs and concluding statements,
since both researchers and media often create misleading graphics or jump toconclusions that the results do not support.
8. Stand back and consider the conclusions. Did the study achieve its goals? Do theconclusions make sense? Do the results have any practical significance?
QUICK QUIZChoose the best answer to each of the following questions.Explain your reasoning with one or more complete sentences.
1. You read about an issue that was subject to an observa-tional study when clearly it should have been studied witha double-blind experiment. The results from the observa-tional study are therefore
a. still valid, but a little less reliable.
b. valid, but only if you first correct for the fact that thewrong type of study was done.
c. essentially meaningless.
2. A study conducted by the oil company Exxon Mobil showsthat there was no lasting damage from a large oil spill inAlaska. This conclusion
a. is definitely invalid, because the study was biased.
b. may be correct, but the potential for bias means that youshould look very closely at how the conclusion wasreached.
c. could be correct if it falls within the confidence intervalof the study.
3. Consider a study designed to learn about the social net-works of all college freshmen, in which the researchersrandomly interviewed students living in on-campus dormi-tories. The way this sample was chosen means the studywill suffer from
a. selection bias.
b. participation bias.
c. confounding variables.
4. The show American Idol selects winners based on votes castby anyone who wants to vote. This means that the winner
a. is the person most Americans want to win.
b. may or may not be the person most Americans want towin, because the voting is subject to participation bias.
c. may or may not be the person most Americans want towin, because the voting should have been double-blind.
5. Consider an experiment in which you measure the weightsof 6-year-olds. The variable of interest in this study is
a. the size of the sample.
b. the weights of 6-year-olds.
c. the ages of the children under study.
benn.8206.05.pgs 12/15/06 8:23 AM Page 345
346 CHAPTER 5 Statistical Reasoning
6. Consider a survey in which 1000 people are asked “Howoften do you go to the dentist?” The variable of interest inthis study is
a. the number of visits to the dentist.
b. the 1000-person size of the sample.
c. the integers 0 through 5.
7. Imagine a survey of randomly selected people found thatpeople who used sunscreen were more likely to have beensunburned in the past year. Which explanation for thisresult seems most likely?
a. Sunscreen is useless.
b. The people in the study all used sunscreen that hadpassed its expiration date.
c. People who use sunscreen are more likely to spend timein the sun.
8. You want to know whether people prefer Smith or Jonesfor mayor, and you are considering two possible ways toword the question. Wording X is “Do you prefer Smith orJones for mayor?” Wording Y is “Do you prefer Jones orSmith for mayor?” (That is, the names are reversed in thetwo wordings.) The best approach is to
a. use Wording X for everyone.
b. use the same wording for everyone—it doesn’t matterwhether it is Wording X or Wording Y.
c. use Wording X for half the people and Wording Y forthe other half.
9. A self-selected survey is one in which
a. the people being surveyed decide which question toanswer.
b. people decide for themselves whether to be part of thesurvey.
c. the people who design the survey are also the surveyparticipants.
10. If a statistical study is carefully conducted in every possibleway, then
a. its results must be correct.
b. we can have confidence in its results, but it is still possi-ble that they are not correct.
c. we say that the study is perfectly biased.
REVIEW QUESTIONS11. Briefly describe each of the eight guidelines for evaluating
statistical studies. Give an example to which each guidelineapplies.
12. Describe and contrast selection bias and participation biasin sampling. Give an example of each.
13. What do we mean by variables of interest in a study?
14. What are confounding variables, and what problems canthey cause?
DOES IT MAKE SENSE?Decide whether each of the following statements makes sense(or is clearly true) or does not make sense (or is clearly false).Explain your reasoning.
15. The TV survey got more than 1 million phone-inresponses, so it is clearly more valid than the survey by theprofessional pollsters, which involved interviews with onlya few hundred people.
16. The survey of religious beliefs suffered from selection biasbecause the questionnaires were handed out only atCatholic churches.
17. My experiment proved beyond a doubt that vitamin C canreduce the severity of colds, because I controlled the exper-iment carefully for every possible confounding variable.
18. Everyone who jogs for exercise should try the new trainingregimen, because careful studies suggest it can increaseyour speed by 1%.
BASIC SKILLS & CONCEPTSWould You Believe This Study? Exercises 19–30 eachdescribe some aspect of a statistical study. Based solely on theinformation given in each case, decide whether you have anyreason to doubt the results of the study. Explain your reasoning.
19. Researchers who want to assess the quality of schoollunches in American elementary schools visit a school inTopeka, Kansas.
20. An experimental, double-blind study finds that people whoeat more fast food are more likely to feel tired throughoutthe day.
21. The staff at the conservative Heritage Foundation con-ducted a study to find out what people think of the newDemocratic tax plan.
22. A study financed by a major pharmaceutical company findsthat its new drug is no more effective against high bloodpressure than older, less expensive drugs.
23. A TV talk show host asks the TV audience, “Do you sup-port a national speed limit of 55 mph?” and asks people tovote by telephone at a toll-free number.
benn.8206.05.pgs 12/15/06 8:23 AM Page 346
5B Should You Believe a Statistical Study? 347
24. In trying to determine whether their candidate for gover-nor has a chance of defeating the incumbent Democrat,the Republican Party conducts a survey of 1000 of itsmembers, selected at random.
25. A study claims to have found that Europeans lead morefulfilling lives than Americans.
26. A government study finds, based on people who had theirtax returns audited, that 15% of taxpayers understate theirincome.
27. In a study designed to determine whether people who wearhelmets while riding a bicycle have fewer accidents,researchers tracked 500 riders with helmets for one month.
28. A study seeks to learn about obesity among children. Theresearchers monitor the eating and exercise habits of thechildren in the study, carefully recording everything theyeat and all their activity.
29. A consumer pollster for soft drinks asked customers in asupermarket, “Do you prefer Zinger sodas or some otherbrand?”
30. To gauge public opinion on whether there should be aconstitutional amendment to ban flag burning, a surveyasked people, “Do you support the American flag?”
Would You Believe This Claim? Exercises 31–36 eachdescribe a claim based on a statistical study. Based solely on theinformation given in each case, decide whether you have anyreason to doubt the claim. Explain your reasoning.
31. A study involving 200 long-distance runners claimed that anew energy drink is preferable for all athletes.
32. Citing statistical data indicating that half the children inthe school district are of above average weight, the SchoolBoard claims to have proved that new exercise classesshould be mandated for everyone.
33. The U.S. Census Bureau claims that a larger proportion ofU.S. residents than ever have earned high school and col-lege diplomas.
34. Based on data showing that a new cold treatment canshorten the average duration of a cold from 7 days to6.8 days, the company that sells the treatment claims thateveryone should use it.
35. A study of 20 nations (in the Canadian Medical AssociationJournal ) discovered that Germany has the most meanannual visits to a doctor (8.5), while Finland has thefewest (3.2).
36. Researchers, monitoring the health of 200 people who takeat least two pills per day, claim that people who take pillsregularly have better health.
FURTHER APPLICATIONSBias. Exercises 37–44 present situations in which bias may bean issue. Describe one potential source of bias in the situation,and briefly discuss whether the bias should affect your view ofthe situation.
37. People visiting the Web site SaveTheAnimals.com canvote on whether or not euthanasia of prairie dogs isacceptable.
38. Market researchers conduct a survey at a supermarket on aweekday between 10:00 a.m. and noon to determine whatfraction of customers use coupons.
39. An exit poll designed to predict the winner of a local elec-tion uses interviews with everyone who votes between 7:00and 7:30 a.m.
40. An exit poll designed to predict the winner of a nationalelection uses interviews with randomly selected voters inNew York.
41. In order to determine the opinions of people in the 18- to24-year age group on controlling illegal immigration,researchers survey a random sample of 1000 NationalGuard members in this age group.
42. A college mails survey forms to all current seniors, askingfor the students’ choice of their all-time best and worstprofessor. Students are asked to return the survey in thecampus mail.
benn.8206.05.pgs 12/15/06 8:23 AM Page 347
348 CHAPTER 5 Statistical Reasoning
43. Planned Parenthood members are surveyed to determinewhether American adults prefer abstinence, counseling andeducation, or morning-after pills for high school students.
44. Scientists working for Greenpeace (which opposes geneti-cally engineered crops) conduct a study to determinewhether Monsanto’s new, genetically engineered soybeanposes any threat to the environment.
45. It’s All in the Wording. Princeton Survey Research Asso-ciates did a study for Newsweek magazine illustrating theeffects of wording in a survey. Two questions were asked:
• Do you personally believe that abortion is wrong?
• Whatever your own personal view of abortion, do youfavor or oppose a woman in this country having thechoice to have an abortion with the advice of her doctor?
To the first question, 57% of the respondents replied yes,while 36% responded no. In response to the second ques-tion, 69% of the respondents favored the choice, while24% opposed the choice. Discuss why the two questionsproduced seemingly contradictory results. How could theresults of the questions be used selectively by variousgroups?
46. Tax or Spend? A Gallup poll asked the following twoquestions:
• Do you favor a tax cut or “increased spending on othergovernment programs”? Result: 75% for tax cut.
• Do you favor a tax cut or “spending to fund new retire-ment savings accounts, as well as increased spending oneducation, defense, Medicare and other programs”?Result: 60% for the spending.
Discuss why the two questions produced seemingly contra-dictory results. How could the results of the questions beused selectively by various groups?
Stat-Bytes. Politicians must make their political statements(often called sound-bytes) very short because the attention spanof listeners is so short. A similar effect occurs in reporting sta-tistical news. Major statistical studies are often reduced to oneor two sentences. The summaries of statistical reports in Exer-cises 47–52 are taken from various news sources. Discuss whatcrucial information is missing and what more you would wantto know before you acted on the report.
47. The Atlantic, summarizing a Federal Highway Administra-tion report, says that the worst traffic bottleneck in theUnited States is the U.S. 101/I-405 interchange, whichgenerates 27,144 hours of delay every year.
48. CNN reports on a Zagat Survey of America’s Top Restau-rants which found that “only nine restaurants achieved arare 29 out of a possible 30 rating and none of thoserestaurants is in the Big Apple.”
49. USA Today reports that two-thirds of adults say that cellphone use during a dinner for two at a nice restaurant isunacceptable.
50. Only 2% of the estates of Americans who died in the pastyear paid estate taxes, while 60% of Americans favorrepealing estate taxes.
51. Time Magazine reports that 28% of Americans polledbelieve the Bible is literally true, down from 38% in 1976.
52. Thirty percent of newborns in India would qualify forintensive care if they were born in the United States.
Accurate Headlines? Exercises 53–55 give a headline and abrief description of the statistical news story that accompaniedthe headline. In each case, discuss whether the headline accu-rately represents the story.
53. Headline: “Drugs shown in 98 percent of movies”
Story summary: A “government study” claims that druguse, drinking, or smoking was depicted in 98% of the topmovie rentals (Associated Press).
54. Headline: “Sex more important than jobs”
Story summary: A survey found that 82% of 500 peopleinterviewed by phone ranked a satisfying sex life as impor-tant or very important, while 79% ranked job satisfactionas important or very important (Associated Press).
55. Headline: “Grape juice may fight disease”
Story summary: A study of 15 people, partially funded byWelch Foods, found that grape juice helps to expand bloodvessels and increase the levels of HDL cholesterol. Bothconstricted blood vessels and low HDL levels are risk fac-tors for heart disease (Milwaukee Journal Sentinel ).
56. Exercise and Dementia. A recent study in the Annals ofInternal Medicine was summarized by the Associated Press,in part, as follows:
The study followed 1740 people aged 65 and older who showedno signs of dementia at the outset. The participants’ health wasevaluated every two years for six years. Out of the originalpool, 1185 were later found to be free of dementia, 77 percentof whom reported exercising three or more times a week;158 people showed signs of dementia, only 67 percent of whomsaid they exercised that much. The rest either died or withdrewfrom the study.
benn.8206.05.pgs 12/15/06 8:23 AM Page 348
5C Statistical Tables and Graphs 349
IN THE NEWS59. Applying the Guidelines. Find a recent newspaper arti-
cle or television report about a statistical study on a topicthat you find interesting. Write a short report applyingeach of the eight guidelines given in this section. (Some ofthe guidelines may not apply to the particular study youare analyzing. In that case, explain why the guideline is notapplicable.)
60. Believable Results. Find a recent news report about astatistical study whose results you believe are meaningfuland important. In one page or less, summarize the studyand explain why you find it believable.
61. Unbelievable Results. Find a recent news report about astatistical study whose results you don’t believe are mean-ingful or important. In one page or less, summarize thestudy and why you don’t believe its claims.
62. Legal Experts. Find a news report concerning a majorongoing trial. Find out whether any of the “expert wit-nesses” are being paid by either side. Based on what youlearn, describe whether you think the experts are givingbiased testimony.
63. Biased Questioning? Find a recent news report ofresponses to a single question in an opinion poll. State theexact words of the question and the results of the poll.Analyze the question and the reported results for potentialbiases. At the end of your analysis, state whether youbelieve the results, and defend your opinion.
a. How many people completed the study?
b. Fill in the following two-way table (with numbers ofindividuals), using the figures given in the abovepassage:
Exercise No exercise Total
DementiaNo dementiaTotal
c. Draw a Venn diagram with two overlapping circles toillustrate the data.
WEB PROJECTSFind useful links for Web Projects on the text Web site:www.aw.com/bennett-briggs
57. Polling Organization. Go to the Web site for a majorprofessional polling organization. Study results from arecent poll, and evaluate the poll according to the guide-lines in this section.
58. Harper’s Index. Go to the Web site for the Harper’sIndex and study a few of the recently quoted statistics. Besure to select the option on the page that allows you to seethe sources for the statistics. Choose three statistics thatyou find particularly interesting, and discuss whether, inaccord with the guidelines given in this section, youbelieve them.
UNIT 5C Statistical Tables and Graphs
Whether you look at a newspaper, a corporate annual report, or a government study,you are almost sure to see tables and graphs of statistical data. Some of these tablesand graphs are simple; others can be quite complex. Some make it easy to understandthe data; others may be confusing or even misleading. In this unit, we’ll investigatesome of the basic principles behind tables and graphs, preparing for more complexgraphics in Unit 5D.
Frequency TablesA teacher makes the following list of the grades she gave to her 25 students on anessay:
A C C B C D C C F D C C C B B A B D B A A B F C B
benn.8206.05.pgs 12/15/06 8:23 AM Page 349
350 CHAPTER 5 Statistical Reasoning
Time out to thinkBriefly explain why the total relative frequency should always be 1, or 100%.
This list contains all the raw data, but it isn’t easy to read. A better way to displaythese data is with a frequency table—a table showing the number of times, or freq-uency, that each grade appears (Table 5.1). The five possible grades are called thecategories for the table.
There are two common variations on the idea of frequency. The relative fre-quency for a category expresses its frequency as a fraction or percentage of the total.For example, 4 of the 25 students received A grades, so the relative frequency for Agrades is or 16%. The total relative frequency must always be 1, or 100%.However, because of rounding, you may sometimes find that the relative frequenciesin a table or chart add up to slightly more or less than 100%.
The cumulative frequency is the number of responses in a particular categoryand all preceding categories. For example, the cumulative frequency for grades of C
and above is 20, because 20 students received grades of either A, B, or C.
4>25,
❉EXAMPLE 1 Relative and Cumulative FrequencyAdd to Table 5.1 columns showing the relative and cumulative frequencies.
SOLUTION Table 5.2 shows the new columns and calculations.
DEFINITION
A basic frequency table has two columns:
• The first column lists all the categories of data.• The second column lists the frequency of each category, which is the number of
times each category appears in the data set.
Additional columns may include relative frequency (frequency expressed as afraction or percentage of the total) or cumulative frequency (total of frequenciesfor the given category and all previous categories).
TABLE 5.2Grade Frequency Relative Frequency Cumulative Frequency
A 4 4
B 7
C 9
D 3
F 2
Total 25 25 1 5 100%
2 1 3 1 9 1 7 1 4 5 25 2>25 5 8%
3 1 9 1 7 1 4 5 23 3>25 5 12%
9 1 7 1 4 5 20 9>25 5 36%
7 1 4 5 11 7>25 5 28%
4>25 5 16%
TABLE 5.1Grade Frequency
A 4
B 7
C 9
D 3
F 2
Total 25
Now try Exercises 25–26. ➽
benn.8206.05.pgs 12/15/06 8:23 AM Page 350
5C Statistical Tables and Graphs 351
Data TypesEssay grades represent subjective ratings, not actual measurements or counts. We saythat the grade categories are qualitative, because they represent qualities such as bador good. In contrast, scores on a multiple-choice exam are quantitative, because theyrepresent an actual count (or measurement) of the number of correct answers. Aswe’ll see shortly, distinguishing between qualitative and quantitative data can be use-ful in creating tables or graphs.
DATA TYPES
Qualitative data describe qualities or nonnumerical categories.
Quantitative data represent counts or measurements.
❉EXAMPLE 2 Data TypesClassify each of the following types of data as either qualitative or quantitative.
a. Brand names of shoes in a consumer surveyb. Heights of studentsc. Audience ratings of a film on a scale of 1 to 5, where 5 means excellent
SOLUTION
a. Brand names are nonnumerical categories, so they are qualitative data.b. Heights are measurements, so they are quantitative data.c. Although the film rating categories involve numbers, the numbers represent
subjective opinions about a film, not counts or measurements. Thus, theyare qualitative data, despite being stated as numbers.
Now try Exercises 27–34. ➽
Time out to thinkGive another example in which numbers are used to represent qualitative datarather than quantitative data.
Binning DataWhen we deal with quantitative data categories, it’s often useful to group, or bin, thedata into categories that cover a range of possible values. For example, in a table ofincome levels, it might be useful to create bins of $0 to $20,000, $20,001 to $40,000,and so on. In this case, the frequency of each bin is simply the number of people withincomes in that bin.
❉EXAMPLE 3 Binned Exam ScoresConsider the following set of 20 scores from a 100-point exam:
76 80 78 76 94 75 98 77 84 88 81 72 91 72 74 86 79 88 72 75
benn.8206.05.pgs 12/15/06 8:23 AM Page 351
352 CHAPTER 5 Statistical Reasoning
Determine appropriate bins and make a frequency table. Include columns for relativeand cumulative frequency, and interpret the cumulative frequency for this case.
SOLUTION The scores range from 72 to 98. One way to group the data is with 5-pointbins. The first bin represents scores from 95 to 99, the second bin represents scoresfrom 90 to 94, and so on. Note that there is no overlap between bins. We then count thefrequency (the number of scores) in each bin. For example, only 1 score is in bin 95 to99 (the high score of 98) and 2 scores are in bin 90 to 94 (the scores of 91 and 94).Table 5.3 shows the complete frequency table. In this case, we interpret the cumula-tive frequency of any bin to be the total number of scores in or above that bin. Forexample, the cumulative frequency of 6 for the bin 85 to 89 means that 6 scores areeither between 85 and 89 or higher than 89.
Bar Graphs and Pie ChartsBar graphs and pie charts are commonly used to show data whenthe categories are qualitative. You are probably familiar with both,but let’s review the basic ideas.
Consider the essay grade data in Table 5.1. A bar graph wouldshow each category with a bar whose length corresponded to itsfrequency. If you make a bar graph by hand (as opposed to witha computer), you should measure the bar lengths carefully tomake sure they correctly correspond to the frequencies. InFigure 5.3, for example, the vertical axis is marked with frequen-cies centimeter apart. Thus, the bar for A grades is 2 centime-ters long, because the frequency of A grades is 4. Note that theleft side of the bar graph in Figure 5.3 is marked with frequency,while the right side is marked with relative frequency. As youcan see, bar graphs make it easy to display both frequenciessimultaneously.
In contrast, pie charts are used primarily for relative frequen-cies, because the total pie must always represent the total relative
12
TABLE 5.3 Frequency Table for Binned Exam Scores
Scores Frequency Relative Frequency Cumulative Frequency
95 to 99 1 1
90 to 94 2 3
85 to 89 3 6
80 to 84 3 9
75 to 79 7 16
70 to 74 4 20
Total 20 20 1.00 5 100%
0.20 5 20%
0.35 5 35%
0.15 5 15%
0.15 5 15%
0.10 5 10%
0.05 5 5%
A B C D F0
1
2
3
4
5
6
7
9
8
10
4%
8%
12%
16%
20%
24%
28%
36%
32%
Grade
Freq
uen
cy o
f gra
de
Rel
ativ
e fr
equ
ency
Essay Grade Data
FIGURE 5.3 Bar graph for the essay grade data inTable 5.1.
Now try Exercises 35–36. ➽
benn.8206.05.pgs 12/15/06 8:23 AM Page 352
IMPORTANT LABELS FOR GRAPHS
Title/caption: The graph should have a title or caption (or both) that explainswhat is being shown and, if applicable, lists the source of the data.
Vertical scale and title: Numbers along the vertical axis should clearly indicatethe scale. The numbers should line up with the tick marks—the marks along theaxis that precisely locate the numerical values. Include a label that describes thevariable shown on the vertical axis.
Horizontal scale and title: The categories should be clearly indicated along thehorizontal axis. (Tick marks may not be necessary for qualitative data, but shouldbe included for quantitative data.) Include a label that describes the variable shownon the horizontal axis.
Legend: If multiple data sets are displayed on a single graph, include a legend orkey to identify the individual data sets.
5C Statistical Tables and Graphs 353
frequency of 100%. The size of each wedge is proportional to the relative frequencyof the category it represents. Figure 5.4 shows a pie chart for the essay grade data. Tomake comparisons easier, relative frequencies are often written on pie chart wedges.
A16% F
8%
D12%
C36%
B28%
FIGURE 5.4 Pie chart for the essay grade data in Table 5.1.
Nowadays, most people make graphs with the aid of computers that measure barlengths or wedge sizes automatically. However, you must still specify any labels oraxis marks you want on a graph. This labeling is extremely important: Withoutproper labels, a graph is meaningless. The following summary lists the importantlabels for graphs. Of course, not all labels are necessary in all cases. For example, piecharts do not require a vertical or horizontal scale. Notice how these rules wereapplied in Figure 5.3.
❉EXAMPLE 4 Carbon Dioxide EmissionsCarbon dioxide is released into the atmosphere primarily by the combustion of fossilfuels (oil, coal, natural gas). Table 5.4 lists the eight countries that emit the most car-bon dioxide each year. Make bar graphs for the total emissions and the emissions perperson. Put the bars in descending order of size.
benn.8206.05.pgs 12/15/06 8:23 AM Page 353
354 CHAPTER 5 Statistical Reasoning
Time out to thinkNote that the two bar graphs in Figure 5.5 do not show the countries in the sameorder. Why not? What can we learn by comparing the two graphs? Explain.
0
300
600
900
1200
1500
U.S
.
Chi
na
Rus
sia
Japa
n
Uni
ted
Kin
gdom
Ger
man
y
Indi
a
Can
ada
Total CO2 Emissions Per Person CO2 Emissions
CO
2 em
issi
ons (
mill
ions
of m
etri
c to
ns o
f car
bon)
Per
capi
ta C
O2
emis
sion
s(m
etri
c to
ns o
f car
bon)
(a) (b)
U.S
.
Can
ada
Rus
sia
Ger
man
yU
nite
dK
ingd
omJa
pan
Chi
na
Indi
a0
1
2
3
4
5
6
FIGURE 5.5 Bar graphs for (a) total carbon dioxide emissions by country and (b) per per-son carbon dioxide emissions by country. Now try Exercises 37–38. ➽
HISTORICAL NOTE
A bar graph with thebars in descendingorder is often called aPareto chart, after Ital-ian economist VilfredoPareto (1848–1923).
TABLE 5.4 The World’s Eight Leading Emitters of Carbon Dioxide
Total Carbon Dioxide Per Person CarbonEmissions (millions of Dioxide Emissions
Country metric tons of carbon) (metric tons of carbon)
United States 1582 5.4
China 966 0.7
Russia 438 3.0
Japan 329 2.6
India 280 0.3
Germany 230 2.8
Canada 164 5.2
United Kingdom 154 2.6
Source: U.S. Department of Energy, based on 2003 emissions.
SOLUTION The categories are the countries. Because country names are qualitativedata, a bar graph is appropriate.
The values for total carbon dioxide emissions go from 154 to 1582 (millions oftons), so a range of 0 to 1600 makes a good choice for the vertical scale. Each bar’sheight corresponds to its data value, and we label the category (country) under thebar. Figure 5.5a shows the bar graph for total emissions, with bars in order of decreas-ing height.
The data values for per person emissions range from 0.3 to 5.4 (tons), so a range of0 to 6 will work for this vertical scale. Figure 5.5b shows the bar graph, again withbars placed in order of descending height.
benn.8206.05.pgs 12/15/06 8:23 AM Page 354
5C Statistical Tables and Graphs 355
❉EXAMPLE 5 Simple Pie ChartAmong the registered voters in Rochester County, 25% are Democrats, 25% areRepublicans, and 50% are Independents. Make a pie chart showing the breakdown ofparty affiliations in Rochester County.
SOLUTION The wedge sizes should correspond to the relative frequencies. Thus,the wedges for Republicans and Democrats each occupy one-fourth of the pie, whilethe wedge for Independents occupies the remaining half of the pie (Figure 5.6).Note the importance of clear labeling.
Registered Voters in Rochester County
Independent50%
Democrat25%
Republican25%
FIGURE 5.6 Party affiliations of registered voters in Rochester County.
❉EXAMPLE 6 Student MajorsFigure 5.7 is a pie chart showing planned major areas forfirst-year college students. Make a bar graph showingthe same data, with the bars in order of decreasing size.What are the three most popular major areas? Commenton the relative ease with which this question can beanswered with the pie chart and the bar graph.
SOLUTION Figure 5.8 shows the bar graph for the data.Note that, because we have only relative frequency
data from the pie chart, we can show only relative fre-quencies on the bar graph. This bar graph makes itimmediately obvious that the three most popular majorareas are business (16.7%), arts and humanities (12.1%),and professional (11.6%). (“Professional” includes fieldswith professional licensing, such as architecture, nurs-ing, and pharmacy.) In contrast, it takes a fair amount ofstudy of the pie chart before we can easily list the threemost popular major areas.
Other Fields9.9%
Undecided8.3%
Arts and Humanities
12.1%
BiologicalSciences
6.6%
Business16.7%
Education11.0%
Engineering8.7%
PhysicalSciences
2.6%
SocialSciences
10.0%
Professional11.6%
Technical2.1%
What Students Expect to Major In
FIGURE 5.7 Planned major areas for first-year collegestudents.Source: The Chronicle of Higher Education.
Now try Exercises 39–40. ➽
benn.8206.05.pgs 12/15/06 8:23 AM Page 355
356 CHAPTER 5 Statistical Reasoning
0
2
4
6
8
10
12
14
16
18
Per
cen
tage
of s
tud
ents
What Students Expect to Major In
Bu
sin
ess
Pro
fess
ion
al
Ed
uca
tio
n
Oth
er
Art
s an
dH
um
anit
ies
Soci
al S
cien
ces
En
gin
eeri
ng
Un
dec
ided
Bio
logy
Tech
nic
al
Ph
ysic
alSc
ien
ces
FIGURE 5.8 Bar graph for the data in Figure 5.7.
0
1
2
3
4
5
6
7
8
75 85 9570 80 90 100
Scores
(a)
Freq
uen
cy
Exam Scores
0
1
2
3
4
5
6
7
8
75 85 9570 80 90 100
Scores
(b)
Freq
uen
cy
Exam Scores
FIGURE 5.9 (a) Histogram for the data in Table 5.3. (b) Line chart for the same data.
Time out to thinkExample 6 discussed an advantage of a bar graph over a pie chart for showing thedata concerning major areas. Do you think the pie chart has any advantages overthe bar graph? If so, what?
Histograms and Line ChartsFor quantitative data categories, the two most common types of graphics arehistograms and line charts. Figure 5.9a shows a histogram for the binned exam data ofTable 5.3. Figure 5.9b shows a line chart for the same data.
Now try Exercises 41–42. ➽
benn.8206.05.pgs 12/15/06 8:23 AM Page 356
5C Statistical Tables and Graphs 357
4
0
6
8
10
12
Hom
icid
es p
er 1
00,0
00 p
eopl
e
Year
1960
1962
1964
1966
1968
1970
1972
1974
1976
1978
1980
1982
1984
1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
U.S. Homicide Rate
FIGURE 5.10 U.S. homicide rate per 100,000 people.Source: FBI Uniform Crime Reports.
A histogram is essentially a bar graph in which the data categories are quantita-tive. Thus, the bars on a histogram must follow the natural order of the numericalcategories. In addition, the widths of histogram bars have a specific meaning. Forexample, the width of each bar in Figure 5.9a represents 5 points on the exam.Because there are no gaps between the categories, the bars on a histogram touch eachother.
A line chart serves the same basic purpose as a histogram, but instead of usingbars, a line chart connects a series of dots. When data are binned, the dot is placed atthe center of each bin. Histograms and line charts are often used to show how somevariable changes with time. For example, the line chart in Figure 5.10 shows how theU.S. homicide rate has changed with time. The categories are time intervals. In thiscase, each bin represents a year in the data. Histograms and line charts with time onthe horizontal axis are often called time-series diagrams.
❉EXAMPLE 7 Oscar-Winning ActressesTable 5.5 shows the ages of 34 recent Academy Award–winning actresses atthe time when they won their award. Make a histogram and a line chart todisplay these data. Discuss the results.
DEFINITIONS
A histogram is a bar graph for quantitative data categories. The barshave a natural order and the bar widths have specific meaning.
A line chart shows the data value for each category as a dot, and the dots areconnected with lines. For each dot, the horizontal position is the center ofthe bin it represents and the vertical position is the data value for the bin.
A time-series diagram is a histogram or line chart in which the horizon-tal axis represents time.
Technical NoteDifferent books definethe terms histogramand bar graph differ-ently. In this book, abar graph is anygraph that uses bars,and histograms arebar graphs used forquantitative datacategories.
TABLE 5.5Number of
Age Actresses
20–29 7
30–39 15
40–49 6
50–59 1
60–69 3
70–79 1
80–89 1
benn.8206.05.pgs 12/15/06 8:23 AM Page 357
358 CHAPTER 5 Statistical Reasoning
SOLUTION The fact that the categories are 10-year bins makes the data quantitative.Thus, a histogram is appropriate. Figure 5.11a shows the histogram. The bars touchone another because there are no gaps between the categories.
Figure 5.11b shows the same data as a line chart. The histogram is also included toshow how it relates to the line chart. In looking at these data, we see that actresses aremost likely to win Oscars when they are fairly young.
0
5
10
20
15
10 20 30 40 50 60 70 80 90
Age at time of award
(b)N
um
ber
of a
ctre
sses
0
5
10
20
15
10 20 30 40 50 60 70 80 90
Age at time of award
(a)
Nu
mb
er o
f act
ress
es
Ages of 34 Academy Award–Winning Actresses Ages of 34 Academy Award–Winning Actresses
Bonds
Gold
7 14 21 28 4 11 18 25 1 8 15 22 29
100
$105
95
July Aug. Sept.
MARKET GAUGE: COMPARING INVESTMENTS
How $100 invested12 weeks ago in stocks (measured by the S.&P. 500), bonds (Lehman Treasury Bond Index) and gold would have fared through yesterday.
Stocks
FIGURE 5.12
HISTORICAL NOTE
Gold was once consid-ered to be a solid invest-ment and an importantpart of any investmentportfolio. However, goldprices have languishedin recent decades. In2006, gold was worthonly about $650 perounce—much less thanits inflation-adjustedvalue of more than$2000 per ounce in 1980.
FIGURE 5.11 Histogram for ages of 34 recent Academy Award–winning actresses. (b) Line chart for the same data, withhistogram overlaid for comparison. Now try Exercises 43–44.
❉EXAMPLE 8 Reading a Time-Series DiagramFigure 5.12 shows a time-series line chart of stock, bond, and gold prices over a12-week period. Suppose that, on July 7, you invested $100 in a stock fund that tracksthe S&P 500, $100 in a bond fund that follows the Lehman Index, and $100 in gold.If you sold all three funds on August 4, how much did you gain or lose?
➽
benn.8206.05.pgs 12/15/06 8:23 AM Page 358
5C Statistical Tables and Graphs 359
SOLUTION The graph shows that the $100 in the stock fund would have been worthabout $101 on August 4. The $100 bond investment would have declined in value toabout $96. The gold investment would have held its initial value of $100. Thus, onAugust 4, your complete portfolio would have been worth
You would have lost $3 on your total investment of $300.Now try Exercises 45–46. ➽
$297.$101 1 $96 1 $100 5
EXERCISES 5C
QUICK QUIZChoose the best answer to each of the following questions.Explain your reasoning with one or more complete sentences.
1. In a class of 100 students, 25 students received a grade ofB. What was the relative frequency of a B grade?
a. 25
b. 0.25
c. It cannot be calculated with the information given.
2. For the class described in Exercise 1, what was thecumulative frequency of a grade of B or above?
a. 25
b. 0.25
c. It cannot be calculated with the information given.
3. Which of the following is an example of qualitative data?
a. waist sizes in inches b. ratings of restaurants
c. meal costs at restaurants
4. The sizes of the wedges in a pie chart tell you
a. the number of categories in the pie chart.
b. the frequencies of the categories in the pie chart.
c. the relative frequencies of the categories in the pie chart.
5. You have a table listing ten tourist attractions and theirannual numbers of visitors. Which type of display wouldbe most appropriate for these data?
a. a bar graph b. a pie chart c. a line chart
6. Where should you put the names of the ten tourist attrac-tions when you make your display of the data described inExercise 5?
a. They should be in the title of the display.
b. They should be in alphabetical order along the verticalaxis.
c. They should be listed along the horizontal axis.
7. You have a list of the GPAs of 100 college graduates, pre-cise to the nearest 0.001. You want to make a frequencytable for these data. A good first step would be to
a. group all the data into bins 0.2 of a grade point wide.
b. draw a pie chart for the 100 individual GPAs.
c. count how many people have identical GPAs.
8. You have a list of the average gasoline price for each monthduring the past year. Which type of display would be mostappropriate for these data?
a. a bar graph b. a pie chart c. a line chart
9. A histogram is
a. a graph that shows how some quantity has changedthrough history.
b. a graph that shows cumulative frequencies.
c. a bar chart for quantitative data.
10. You have a histogram and you want to convert it into a linechart. A good first step would be to
a. make a list of all the categories in alphabetical order.
b. place a dot at the top of each bar, in the center of the bar.
c. calculate all the relative frequencies that you can readfrom the histogram.
REVIEW QUESTIONS11. What is a frequency table? Explain what we mean by the
categories and frequencies. What do we mean by relativefrequency? What do we mean by cumulative frequency?
12. What is the distinction between qualitative data and quan-titative data? Give a few examples of each.
13. What is the purpose of binning? Give an example in whichbinning is useful.
14. What two types of graphs are most common when the cat-egories are qualitative data? Describe the construction ofeach.
benn.8206.05.pgs 9/29/07 11:53 AM Page 359
360 CHAPTER 5 Statistical Reasoning
15. Describe the importance of labeling on a graph, and brieflydiscuss the kinds of labels that should be included ongraphs.
16. What two types of graphs are most common when the cat-egories are quantitative data? Describe the construction ofeach.
DOES IT MAKE SENSE?Decide whether each of the following statements makes sense(or is clearly true) or does not make sense (or is clearly false).Explain your reasoning.
17. I made a frequency table with two columns, one labeledState and one labeled State Capitol.
18. The relative frequency of B grades in our class was 0.3.
19. Your bar graph must be wrong, because your bars arewider than the ones shown on the teacher’s answer key.
20. Your bar graph must be wrong, because it shows differentfrequencies than the ones shown on the teacher’s answerkey.
21. Your pie chart must be wrong, because you have the 45%frequency wedge near the upper left and the answer keyshows it near the lower right.
22. Your pie chart must be wrong, because when I added thepercentages on your wedges, they totaled 124%.
23. I was unable to make a bar chart, because the data cate-gories were qualitative rather than quantitative.
24. I rearranged the bars on my histogram so that the tallestbar would come first.
BASIC SKILLS & CONCEPTSFrequency Tables. Make a frequency table for the data ineach of Exercises 25–26. Include columns for relative frequencyand cumulative frequency. Briefly explain the meaning of eachcolumn.
25. Final grades of 20 students in a math class:
A A B B B B B C C C C C C C C D D D F F
26. A film section of a local newspaper lists 5 five-star films(the highest rating), 10 four-star films, 20 three-star films,15 two-star films, and 5 one-star films.
Qualitative vs. Quantitative. In Exercises 27–34, determinewhether the variable described is qualitative or quantitative, andexplain why.
27. The hair color of individuals
28. The average service time in a bank
29. The responses of people in a sausage taste test whereup to
30. The lowest high temperature in each month of the year inSedona, Arizona
31. The responses (yes, no, undecided) to the question “Willyou vote for a new water treatment plant?”
32. The total income of each household in America
33. The dessert selections at a restaurant used in a customerpreference poll
34. The number of people voting for each dessert selection ina restaurant preference poll
Binned Frequency Tables. In Exercises 35–36, use the indi-cated bin size to make a frequency table for the following set ofexam scores:
89 67 78 75 64 70 83 95 69 84
77 88 98 90 92 68 86 79 60 96
Include columns for relative frequency and cumulative fre-quency. Briefly explain the meaning of each column.
35. Use 5-point bins (95 to 99, 90 to 94, etc.).
36. Use 10-point bins (90 to 99, 80 to 89, etc.).
37. Largest States. The following table shows the five mostpopulous U.S. states as of 2004. Make a bar graph for thesedata, with the bars in descending order.
State Population
California 35.9 million
Texas 22.5 million
New York 19.2 million
Florida 17.4 million
Illinois 12.7 million
38. Food Franchises. The table below shows the five foodcompanies with the most franchises. Make a bar graph forthese data, with the bars in descending order.
Company Number of franchises
McDonald’s 22,183
Subway 21,444
Kentucky Fried Chicken 10,040
Domino’s Pizza 6953
Dunkin’ Donuts 5759
5 5 outstanding0 5 inedible
benn.8206.05.pgs 12/15/06 8:23 AM Page 360
5C Statistical Tables and Graphs 361
Constructing Pie Charts. Exercises 39–40 each give a data set.Compute the percentage for each category and construct a piechart for the data.
39. Six candidates ran for three seats on the City Council. Thevote tallies for the candidates are given in the table below.
Candidate Votes
Aniston 2380
Clooney 1030
Cruise 987
Jolie 1753
Pitt 1914
Streep 2208
40. In a pizza preference poll, 92 people voted for theirfavorite toppings as follows.
Topping Votes
Anchovies 8
Cheese 27
Pepperoni 16
Sausage 36
Vegetarian 23
41. Government Income. The pie chart in Figure 4.12 onp. 308 shows the makeup of federal government receipts.Make a bar graph for these data.
42. Government Spending. The pie chart in Figure 4.13 onp. 309 shows the makeup of federal government spending.Make a bar graph for these data.
43. Oscar-Winning Actors. The following data show theages of 34 recent Academy Award–winning actors at thetime they won their award. Make a frequency table forthese data, using bins of 20–29, 30–39, and so on. Thendraw both a histogram and a line chart to display thebinned data.
32 37 36 32 51 53 33 61 35 45 55 39
76 37 42 40 32 60 38 56 48 48 40 43
62 43 42 44 41 56 39 46 31 47 40 43
44. Oscar Winners. In words, contrast the graphs in Exam-ple 7 with those you drew in Exercise 43. Do actors appearto be more likely to win Oscars when they are younger,older, or neither? Do you think these graphs indicate anydifference in how movie makers treat male and female per-formers? Defend your opinion.
45. Homicide Rates. Study Figure 5.10. Write one to twoparagraphs summarizing how the homicide rate haschanged with time since 1960.
46. Death Rates. Figure 5.13 shows overall death rates in theUnited States during the 20th century. Note that the spikein 1919 was due to a worldwide epidemic of influenza.Write a few sentences summarizing the overall trend,describing how much the death rate changed during thecentury, and putting the 1919 spike into context in termsof its impact on the population.
20
15
5
10
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000Year
Rat
e
Death Rates per 1000 Population
Figure 5.13 Source: National Center for Health Statistics.
FURTHER APPLICATIONSStatistical Graphs. Each of Exercises 47–56 gives a table ofdata. For each exercise, do the following:
a. Explain whether the data categories are qualitative or quantitative.
b. If the data categories are qualitative, draw either a bar graphor a pie chart for the data. If the data categories are quanti-tative, draw either a histogram or a line chart for the data.
c. Write a one-paragraph summary of any interesting infor-mation revealed by the graphic.
47. The following frequency table gives the ages of the NobelPrize winners in literature at the time of their award for1990 through 2005.
Age Number of winners
58–59 2
60–61 1
62–63 3
64–65 0
66–67 1
68–69 2
70–71 1
72–73 2
74–75 2
76–77 2
benn.8206.05.pgs 12/15/06 8:23 AM Page 361
362 CHAPTER 5 Statistical Reasoning
48. The following table lists the top eight retail companies inthe United States, by total sales volume.
Company Sales (billions of dollars)
Albertson’s 36.8
Home Depot 45.7
JC Penney 33.0
Kmart 37.0
Kroger 49.0
Sears 40.9
Target 36.9
Wal-Mart 193.3
Source: Wall Street Journal Almanac.
49. The following table shows the average SAT scores for vari-ous ethnic groups in the United States in 2005.
Ethnic group Average SAT score
White 1068
Black 864
Native American 982
Asian/Pacific Islander 1091
Hispanic 917
Source: The College Board.
50. The following table lists the ten musical groups with themost platinum albums in the United States (1,000,000sales).
Group Number of platinum albums
The Beatles 92
The Eagles 81
Led Zeppelin 80
AC/DC 60
Aerosmith 59
Pink Floyd 54
Van Halen 50
U2 45
Alabama 44
Fleetwood Mac 44
51. The following table lists areas of the world’s major landmasses.
Land mass Area (millions of sq. miles)
Asia 17.2
Africa 11.6
North America 9.3
South America 6.9
Australia 3.0
Europe 3.8
Antarctica 5.1
All others 2.1
52. The following table gives the percentages of total energyproduced in the United States from various sources.
Energy source Percentage of total energy
Coal 32.2%
Natural gas 31.0%
Crude oil 16.4%
Nuclear power 11.7%
Renewable 8.7%
Source: U.S. Department of Energy.
53. The following table gives the stated religions of first-yearcollege students. (Note: The “other religions” categoryconsists of religions that were stated by less than 1% of thestudents in the sample.)
Religion Percent of sample
Baptist 11.6
Catholic 30.5
Episcopal 1.7
Jewish 2.8
Lutheran 5.8
Methodist 6.4
Mormon 1.5
Presbyterian 4.0
United Church of Christ 1.5
Other religions 19.3
No religion 14.9
Source: UCLA Higher Education Research Institute.
benn.8206.05.pgs 12/15/06 8:23 AM Page 362
5C Statistical Tables and Graphs 363
54. The following table gives the rates of violent crimes (rape,robbery, assault, theft) by age of victim. Rates are units ofcrimes per 1000 people aged 12 or older.
Age group Crime rate
12–15 51.6
16–19 53.0
20–24 43.3
25–34 26.4
35–49 18.5
50–64 10.3
2.0
Source: Bureau of Justice Statistics.
55. The following table gives average family size in the UnitedStates since 1940.
Year Family size Year Family size
1940 3.76 1980 3.29
1950 3.54 1985 3.23
1960 3.67 1990 3.17
1965 3.70 1995 3.19
1970 3.58 2000 3.17
1975 3.42 2003 3.19
Source: U.S. Bureau of Census.
56. Drunk Driving Deaths. Figure 5.14 shows the numberof automobile fatalities in the United States in which alco-hol was involved for each year from 1982 to 2003.
.65
c. The total numbers of automobile fatalities in 1982 and2003 were 43,945 and 42,643, respectively. What percent-age of all fatalities in these two years involved alcohol?
d. In view of your answer to part c, can you offer explana-tions for the trend in these data? Explain.
57. Ages of Presidents. The following table gives the orderof the presidents of the United States and the ages atwhich they first took office.a. Find a creative way to display these data.b. Which presidents could have said that they were the
youngest president (or the same age in years as theyoungest) at the time they took office?
c. Which presidents could have said that they were theoldest president (or the same age in years as the oldest) atthe time they took office?
d. Write a paragraph describing significant features of thedata.
Order 1 2 3 4 5 6 7 8 9 10 11
Age 57 61 57 57 58 57 61 54 68 51 49
Order 12 13 14 15 16 17 18 19 20 21 22
Age 64 50 48 65 52 56 46 54 49 50 47
Order 23 24 25 26 27 28 29 30 31 32 33
Age 55 55 54 42 51 56 55 51 54 51 60
Order 34 35 36 37 38 39 40 41 42 43
Age 62 43 55 56 61 52 69 64 46 54
WEB PROJECTSFind useful links for Web Projects on the text Web site:www.aw.com/bennett-briggs
58. Emissions. Look for updated data concerning inter-national carbon dioxide emissions at the Web site for theInternational Energy Annual, published by the U.S. EnergyInformation Administration (EIA). Create an updated orexpanded version of Figure 5.5. Discuss any new featuresof your updated graphs.
59. Energy Table. Explore some of the many energy tables atthe U.S. Energy Information Administration (EIA) Website. Choose a table that you find interesting, and make agraph of its data. You may choose any of the graph typesdiscussed in this section. Explain how you made yourgraph, and briefly discuss what can be learned from it.
60. Statistical Abstract. Go to the Web site for the StatisticalAbstract of the United States. Explore the selection of “fre-quently requested tables.” Choose one table of interest toyou, and make a graph from its data. You may choose anyof the graph types discussed in this section. Explain how
CO2Alchohol-Related Fatalities
Year
50000
10,00015,00020,00025,00030,000
Fata
litie
s
1982
1984
1986
1988
1990
1992
1994
1996
1998
2000
2002
Figure 5.14 Source: National Highway Traffic SafetyAdministration.
a. How many alcohol-related fatalities were there in 1982?in 2003? Comment on the overall trend over this period.
b. What is the percent change in alcohol-related fatalitiesover this period?
benn.8206.05.pgs 12/15/06 8:23 AM Page 363
364 CHAPTER 5 Statistical Reasoning
you made your graph, and briefly discuss what can belearned from it.
IN THE NEWS61. Frequency Tables. Find a recent news article that
includes some type of frequency table. Briefly describe thetable and how it is useful to the news report. Do youthink the table was constructed in the best possible wayfor the article? If so, why? If not, what would you havedone differently?
62. Bar Graph. Find a recent news article that includes a bargraph with qualitative data categories. Briefly explain whatthe graph shows, and discuss whether it helps make thepoint of the news article.
63. Pie Chart. Find a recent news article that includes a piechart. Briefly discuss the effectiveness of the pie chart. Forexample, would it be better if the data were displayed in abar graph rather than a pie chart? Could the pie chart beimproved in other ways?
64. Histogram. Find a recent news article that includes a his-togram. Briefly explain what the histogram shows, and dis-cuss whether it helps make the point of the news article.Are the labels clear? Is the histogram a time-series dia-gram? Explain.
65. Line Chart. Find a recent news article that includes a linechart. Briefly explain what the line chart shows, and discusswhether it helps make the point of the news article. Are thelabels clear? Is the line chart a time-series diagram? Explain.
UNIT 5D Graphics in the Media
Now that we’ve discussed basic types of statistical graphs, we are ready to explore someof the fancier graphics that appear daily in the news. We will also discuss several cau-tions to keep in mind when interpreting media graphics.
Graphics Beyond the BasicsMany graphical displays of data go beyond the basic types discussed in Unit 5C. Here,we explore a few of the types that are most common in the news media.
Multiple Bar GraphsA multiple bar graph is a simple extension of a regular bargraph. It has two or more sets of bars that allow comparisonbetween two or more data sets. All the data sets must involvethe same categories so that they can be displayed on thesame graph. For example, Figure 5.15 is a multiple bar graphshowing trends in home computing. The categories areyears. The two sets of bars represent two different measuresof home computing: ownership of personal computers andconnection to the Internet. Note that a legend clearly identi-fies the two sets of bars.
❉EXAMPLE 1 Computing TrendsSummarize two major trends shown in Figure 5.15.
SOLUTION The most obvious trend is that both data setsshow an increase with time. That is, the number of homeswith computers and the number of online homes bothincreased with time. We see a second trend by comparing
60
70
80
50
40
30
20
10
01995 1997 1999 2001 2003
PC and On-Line Households in the U.S., 1995–2003(In millions)
On-line householdsHouseholds with PCs
FIGURE 5.15 Trends in home computing.Source: Statistical Abstract of the United States.
benn.8206.05.pgs 12/15/06 8:23 AM Page 364
5D Graphics in the Media 365
the bars within each year. In 1995, the number of online homes (about 10 million) wasless than one-third the number of homes with computers (about 33 million). By 2003,the number of online homes (about 62 million) was about 90% of the number ofhomes with computers (about 70 million). This tells us that a higher percentage ofcomputer users are going online. Now try Exercises 23–24.
Stack PlotsAnother common type of graph, called a stack plot, shows different data sets in a ver-tical stack. Figure 5.16 uses a stack plot to show trends in death rates (deaths per100,000 people) for four diseases since 1900. Each disease has its own color-codedregion, or wedge; note the importance of the legend. The thickness of a wedge at aparticular time tells you its value at that time: When a wedge is thick it has a largevalue, and when it is thin it has a small value.
➽
Pneumonia
Cardiovascular
Tuberculosis
Cancer
180
620
In a stack plot, the thickness of a wedgeat a particular time tells you its value.
For 1980, the top of the cardiovascular wedge is at about 620 along the vertical axis …
… and the bottom is at about 180. So the 1980death rate for cardiovascular disease was
about 620 – 180 = 440 (deaths per 100,000).
900
600
400
700
800
500
300
100
0
200
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000
Death Rates for Various Diseases: 1900–2004
Year
Dea
ths
per
100,
000
FIGURE 5.16 A stack plot showing trends in death rates from four diseases.
❉EXAMPLE 2 Stack PlotBased on Figure 5.16, what was the death rate for cardiovascular disease in 1980? Dis-cuss the general trends visible on this graph.
SOLUTION For 1980, the cardiovascular wedge extends from about 180 to 620 onthe vertical axis, so its thickness is about 440. Thus, the death rate in 1980 for cardio-vascular disease was about 440 deaths per 100,000 people. The graph shows severalimportant trends. First, the downward slope of the top wedge shows that the overalldeath rate from these four diseases decreased substantially, from nearly 800 deaths per100,000 in 1900 to about 525 in 2003. The drastic decline in the thickness of thetuberculosis wedge shows that this disease was once a major killer, but has been nearly
benn.8206.05.pgs 10/12/07 4:01 PM Page 365
366 CHAPTER 5 Statistical Reasoning
wiped out since 1950. Meanwhile, the cancer wedge shows that the death rate fromcancer rose steadily until the mid-1990s, but has dropped somewhat since then.
Now try Exercises 25–28.
Graphs of Geographical DataWe are often interested in geographical patterns in data. Figure 5.17 shows one com-mon way of displaying geographical data. In this case, the data on per capita (per per-son) income are shown state by state. The legend explains that different colorsrepresent different income levels. Similar colors are used for similar income levels.Thus, it is easy to see that income levels tend to be highest in the northeast and lowestin the south.
➽
FL
NM
DE
DCMD
TX
OK
KS
NE
SD
NDMT
WY
COUT
ID
AZ
AK
NV
WA
CA
OR
KY
ME
NY
PA
MI
VT
NHMA
RICT
VAWV
OHINIL
NCTN
SC
ALMS
AR
LA
HI
MO
IA
MN
WI
NJ
GA
Key:
State Per Capita Income
$20,000–$24,999$25,000–$29,999$30,000–$34,999$35,000–$39,999$40,000–$44,999
FIGURE 5.17 Per capita income in the 50 states (2002).Source: U.S. Department of Commerce.
By the WaySince the mid-1980s,there has been a smallbut noticeable resur-gence of tuberculosis inthe United States. Part ofthe resurgence is due tonew strains of the dis-ease that resist mostcommon drug treatments.
The display in Figure 5.17 works well because each state is associated with aunique income level. For data that vary continuously across geographical areas, acontour map is more convenient. Figure 5.18 shows a contour map of temperatureover the United States at a particular time. Each of the contours connects locationswith the same temperature. For example, the temperature is 50°F everywhere alongthe contour labeled 50° and 60°F everywhere along the contour labeled 60°F.Between these two contours, the temperature is between 50°F and 60°F. Note that inregions where contours are tightly spaced, there are greater temperature changes. Forexample, the closely packed contours in the northeast indicate that the temperaturevaries substantially over small distances. To make the graph easier to read, the regionsbetween adjacent contours are color-coded.
benn.8206.05.pgs 12/15/06 8:23 AM Page 366
5D Graphics in the Media 367
FL
NM
DEMD
TX
OK
KS
NE
SD
NDMT
WY
COUT
ID
AZ
NV
WA
CA
OR
KY
ME
NY
PA
MI
VTNHMA
RICT
VAWV
OHINIL
NCTN
SCALMS
AR
LA
MO
IA
MNWI
NJ
GA
20°F
30°F
40°F
40°F50°F
60°F
70°F
50°F
60°F
70°F
80°F
40°F
30°F
20°F
Widely separated contours mean largeregions have nearly the same temperature.
Closely packed contours mean a largetemperature difference over a short distance.
FIGURE 5.18 A contour map of temperature.
❉EXAMPLE 3 Interpreting Geographical DataStudy Figures 5.17 and 5.18, using them to answer the following questions.
a. Which state(s) had the highest per capita income in 2002?b. Were there any temperatures above 80°F in the United States on the date
shown in Figure 5.18? If so, where?
SOLUTION
a. Connecticut was the only state with a per capita income in the highest cate-gory shown on the graph ($40,000–$44,999), so it had the highest per capitaincome. (The District of Columbia was also in this category, but it is not astate.)
b. The 80° contour passes through southern Florida, so the parts of Floridasouth of this contour had a high temperature above 80°.
Now try Exercises 29–30. ➽
The greatest value ofa picture is when itforces us to noticewhat we neverexpected to see.
—JOHN TUKEY
Time out to thinkLook for a weather map in today’s news. How are the temperature contoursshown? Interpret the temperature data.
benn.8206.05.pgs 10/12/07 4:01 PM Page 367
Three-Dimensional GraphicsToday, computer software makes it easy to give almost any graph athree-dimensional appearance. For example, Figure 5.19 shows thebar graph of Figure 5.3, but “dressed up” with a three-dimensionallook. It may look nice, but the three-dimensional effects arepurely cosmetic. They don’t provide any information that wasn’talready in the two-dimensional graph in Figure 5.3. As thisexample shows, many “three-dimensional” graphics really onlymake two-dimensional data look a little fancier.
In contrast, each of the three axes in Figure 5.20 carries distinctinformation, making it a true three-dimensional graph. Researchersstudying migration patterns of a bird species (the Bobolink) countedthe number of birds flying over seven New York cities throughoutthe night. As shown on the inset map, the cities were aligned east-west so that the researchers would learn what parts of the state thebirds flew over, and at what times of night, as they headed south for
368 CHAPTER 5 Statistical Reasoning
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
A B C
Grade
Freq
uen
cy o
f gra
de
Essay Grade Data
D F
FIGURE 5.19 This graph has a three-dimensionalappearance, but shows only two-dimensional data.
CubaAlfred Richford
Oneonta
Jefferson
NEW YORK
87
65
43
21
70
60
50
40
30
20
10
40
30
20
10
0
Number of birds
Source: Bill Evans/Cornell Laboratory of Ornithology
SONIC MAPPING TRACES BIRD MIGRATION
JeffersonOneontaRichfordIthaca
AlfredCuba
Beaver Dams
Hours after 8:30 p.m.
Sensors across New York State counted each occurrence of the nocturnal flight call of thebobolink to trace the fall migration on the night of Aug. 28–29, 1993. The data showed theheaviest swath passing over the eastern part of the state.
Ithaca
Beaver Dams
FIGURE 5.20 This graph shows true three-dimensional data.Source: New York Times.
benn.8206.05.pgs 12/15/06 8:23 AM Page 368
5D Graphics in the Media 369
the winter. Thus, the three axes measure number of birds, time of night, and east-westlocation.
❉EXAMPLE 4 Three-Dimensional Bird MigrationBased on Figure 5.20, at about what time was the largest number of birds flyingover the east-west line marked by the seven cities? Over what part of New York didmost of the birds fly? Approximately how many birds passed over Oneonta around12:00 midnight?
SOLUTION The number of birds detected in all the cities peaked between 3 and5 hours after 8:30 p.m., or between about 11:30 p.m. and 1:30 a.m. More birds flewover the two easternmost cities of Oneonta and Jefferson than over cities farther west.Thus, most of the birds were flying over the eastern part of the state. To answer thespecific question about Oneonta, note that 12:00 midnight is the midpoint of timecategory 4. On the graph, this time aligns with the dip between peaks on the line atOneonta. Looking across to the number of birds axis, we see that about 30 birds wereflying over Oneonta at that time. Now try Exercises 31–39.
Combination GraphicsAll of the graphic types we have studied so far are common and fairly easy to create.But the media today are often filled with many varieties of even more complex graph-ics. For example, Figure 5.21 shows a graphic concerning the participation of womenin the summer Olympics. This single graphic combines a line chart, many pie charts,and numerical data. It is certainly a case of a picture being worth far more than athousand words.
➽
Women participating
’081900 ’16 ’24 ’32 ’40 ’48 ’56 ’64 ’72 ’80 ’88 ’96’04 ’12 ’20
Percentage of women participants
Total number of women participating
’28 ’36 ’44 ’52 ’60 ’68 ’76 ’84 ’92 ’00 ’0433 11 14
9.4 16.113.3
14.8
50 86 1082 6 6 14 15 19 26 33 4325 29 39 49
1.81.6% 4.4 9.00.9 2.2 2.99.6 8.1
10.5 11.414.2
20.7
62
21.5
25.8
28.8
34.2
42.0
44.0%
23.0
98 121 135
Number of events for women
The Ever-Growing Presence of Women in Summer Olympics
Source: International Olympic Committee
nogames
nogames
0
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
4,500
5,000
FIGURE 5.21 Source: Adapted from The New York Times.
benn.8206.05.pgs 10/11/07 1:28 PM Page 369
A Few Cautions about GraphicsAs we have seen, graphics can offer clear and meaningful summaries of statistical data.However, even well-made graphics can be misleading if we are not careful in inter-preting them, and poorly made graphics are almost always misleading. Moreover,some people use graphics in deliberately misleading ways. Here, we discuss a few ofthe more common ways in which graphics can lead us astray.
Perceptual DistortionsMany graphics are drawn in a way that distorts our perception of them. Figure 5.22shows one of the most common types of distortion. The dollar-shaped bars are usedto represent the declining value of the dollar over time. The lengths of the bars repre-sent the data, but our eyes tend to focus on the areas of the bars. For example, the bot-tom bar is supposed to show that a dollar in 2005 was worth only 42% as much as adollar in 1980. Its length is indeed 42% that of the top bar, but its area is muchsmaller in comparison (about 18% of the area of the top bar). This gives the percep-tion that the value of the dollar shrank even more than it really did.
Now try Exercises 42–43.
Watch the ScalesFigure 5.23a shows the percentage of college students between 1910 and 2005 whowere women. At first glance, it appears that this percentage grew by a huge marginafter about 1950. But the vertical axis scale does not begin at zero and does not end at100%. The increase is still substantial but looks far less dramatic if we redraw thegraph with the vertical axis covering the full range of 0 to 100% (Figure 5.23b). Froma mathematical point of view, leaving out the zero point on a scale is perfectly honestand can make it easier to see small-scale trends in the data. Nevertheless, as this exam-ple shows, it can be visually deceptive if you don’t study the scale carefully.
Now try Exercises 44–45. ➽
➽
370 CHAPTER 5 Statistical Reasoning
❉EXAMPLE 5 Olympic WomenDescribe three trends shown in Figure 5.21.
SOLUTION The line chart shows that the total number of women competing in thesummer Olympics has risen fairly steadily, especially since the 1960s, reaching nearly5000 in the 2004 games. The pie charts show that the percentage of women among allcompetitors has also increased, reaching 44% in the 2004 games. The bold red num-bers at the bottom show that the number of events for women has also increased dra-matically, reaching 135 in the 2004 games.
Now try Exercises 40–41. ➽
Time out to thinkDo you think the upward trend of the pie charts in Figure 5.21 will continue over thenext few Olympic games? Why or why not?
2005 � $0.42
1980 � $1.00
1990 � $0.63
FIGURE 5.22 The lengths ofthe dollars are proportional totheir spending power, but oureyes are drawn to the areas,which decline more than thelengths.
benn.8206.05.pgs 12/15/06 8:23 AM Page 370
5D Graphics in the Media 371
30
35
40
45
50
55
60
1920 1940 1960 1980 2000 1920 1940 1960 1980 2000
Women as a Percentage of All College StudentsP
erce
nt w
omen
Per
cent
wom
en
Year Year(a) (b)
01020
30405060708090
100
FIGURE 5.23 Both graphs show the same data, but they look very different because their verticalscales have different ranges.Source: National Center for Education Statistics and Bureau of Labor Statistics.
100
50
1950
1960
1970
1980
1990
2000
Bill
ions
of
calc
ulat
ions
per
sec
ond
Year(b)
Computer Speed
Cal
cula
tions
per
sec
ond
102
105
108
1011
1950
1960
1970
1980
1990
2000
Year
0
(a)FIGURE 5.24 Both graphs show the same data, but the one on the left uses an exponential scale.
Sometimes the scale may not be deceptive, but still requires care to avoid misinter-pretation. Consider Figure 5.24a, which shows how the speeds of the fastest comput-ers have increased with time. At first glance, it appears that speeds have beenincreasing linearly. For example, it might look as if the speed increased by the sameamount from 1990 to 2000 as it did from 1950 to 1960. However, if we look closely,we see that each tick mark on the vertical scale represents a tenfold increase in speed.Now we see that computer speed grew from about 1 to 100 calculations per secondbetween 1950 and 1960, and from about 100 million to 10 billion calculations per sec-ond between 1990 and 2005. This type of scale is called an exponential scale (orlogarithmic scale), because each unit corresponds to a power of 10. In general, expo-nential scales are useful for displaying data that vary over a huge range of values. Youcan see this usefulness by looking at Figure 5.24b, where the computer data have beenrecast with an ordinary scale. Because the speeds have grown so rapidly, the ordinaryscale makes it impossible to see any detail in the early years shown on the graph.
By the WayIn 1965, Intel founderGordon E. Moore pre-dicted that advances intechnology would allowcomputer chips to dou-ble in power roughlyevery two years. Thisidea is now calledMoore’s law, and it hasheld fairly true ever sinceMoore first stated it.
Now try Exercise 46. ➽
benn.8206.05.pgs 12/15/06 8:23 AM Page 371
372 CHAPTER 5 Statistical Reasoning
Percentage Change GraphsIs college getting more or less expensive? A quick look at Figure 5.25 might give theimpression that the cost for private colleges has been holding fairly steady while thecost for public colleges fell steeply in 2006 after rising in prior years.
But look more closely and you’ll see that this is not the case at all. The vertical axisin Figure 5.25 represents the percentage increase in costs. A flat graph means only thatcosts increased by the same percentage each year, not that costs held steady. Similarly,the drop in 2006 for public colleges means only that the cost rose by less in that yearthan in the preceding years.
In fact, actual costs (not adjusted for inflation) for both public and private collegeshave risen substantially with time, as shown in Figure 5.26. Moreover, because therate of inflation (as measured by the Consumer Price Index; see Unit 3D) has beenless than the rate of increase in college costs, the real cost of public colleges has steadilyrisen. Graphs that show percentage change are very common, particularly with eco-nomic data. Although they are perfectly honest, you can be misled unless you inter-pret them with great care.
Perc
enta
ge c
hang
e fr
ompr
evio
us a
cade
mic
yea
r
’95
–’96
’96
–’97
’97
–’98
’98
–’99
’99
–’00
’00
–’01
’01–
’02
’02
–’03
’03
–’04
’04
–’05
’05
–’06
0
4%
12%
8%
16%
Changes in College Costs
Public
Private
FIGURE 5.25 This graph shows the rate of increasewith time in tuition and fees at four-year public andprivate colleges.Source: The College Board.
0
$4,000
$16,000
$8,000
$20,000
$12,000
$24,000
Actual College Costs
’95
–’96
’96
–’97
’97
–’98
’98
–’99
’99
–’00
’00
–’01
’01–
’02
’02
–’03
’03
–’04
’04
–’05
’05
–’06
Public Private
FIGURE 5.26 This graph shows the change with time inthe actual cost (not adjusted for inflation) of tuition andfees at four-year public and private colleges.You canuse the rise in these costs to calculate the percentageincreases shown in Figure 5.25.Source: The College Board.
Time out to thinkBased on Figure 5.24a, can you predict the speed of the fastest computers in 2015?Could you make the same prediction with Figure 5.24b? Explain.
PictographsPictographs are graphs embellished with additional artwork. The artwork may makethe graph more appealing, but it can also distract or mislead. Figure 5.27 is a picto-graph showing the rise in world population from 1804 to 2054 (numbers for futureyears are based on United Nations projections). The lengths of the bars correspondcorrectly to world population for the different years listed. However, the artisticembellishments of this graph are deceptive in several ways. For example, your eyemay be drawn to the figures of people lining the globe. Because this line of peoplerises from the left side of the pictograph to the center and then falls, it might give the
Now try Exercise 47. ➽
benn.8206.05.pgs 10/12/07 3:54 PM Page 372
5D Graphics in the Media 373
impression that future world population will be declining. In fact, the line of people ispurely decorative and carries no information.
Perhaps the most serious problem with this pictograph is that it makes it appearthat world population has been rising linearly. However, notice that the time intervalson the horizontal axis are not uniform in size. For example, the interval between thebars for 1 billion and 2 billion people is 123 years (from 1804 to 1927), but the inter-val between the bars for 5 billion and 6 billion people is only 12 years (from 1987 to1999).
Pictographs are very common, but as this example shows, you have to study themcarefully to extract the essential information and not be distracted by the cosmeticeffects. Now try Exercise 48. ➽
Billions of people
1804 1927 1960 1974 1987 1999 2013 2028 2054
World Population(in billions of people)
12
34
56
78
999
Billions of people
1804 1927 1960 1974 1987 1999 2013 2028 2054
World Population(in billions of people)
12
34
56
78
999
FIGURE 5.27 Source: Data from United Nations Population Divi-sion, World Population Prospects.
By the WayIf world population con-tinues to double at thesame rate as in the late20th century, it will reach34 billion by 2100 and192 billion by 2200. Byabout 2650, human pop-ulation would be solarge that it would not fiton the Earth, even ifeveryone stood elbow-to-elbow everywhere.
EXERCISES 5D
QUICK QUIZChoose the best answer to each of the following questions.Explain your reasoning with one or more complete sentences.
1. Consider Figure 5.15. Suppose you were given data for thenumber of households with high-speed Internet access ineach of the years shown. How would you add these data tothe graphic?
a. Add a third bar for each year.
b. Stack the high-speed data on top of the on-line bars.
c. Put a small pie chart on top of each pair of bars.
2. Consider Figure 5.16. According to this graph, the approx-imate death rate from tuberculosis in 1950 was
a. 2 per 100,000.
b. 20 per 100,000.
c. 200 per 100,000.
3. Consider Figure 5.17. According to this graph, what is percapita income in Oregon (OR)?
a. between $25,000 and $30,000
b. exactly $25,000
c. It cannot be determined from the graph.
4. Consider Figure 5.18. According to this map, the tempera-ture in Iowa (IA) was
a. 30°F. b. 40°F. c. between 30°F and 40°F.
benn.8206.05.pgs 12/15/06 8:23 AM Page 373
374 CHAPTER 5 Statistical Reasoning
5. Consider Figure 5.18. Notice the small loop labeled 40°Fnear the southeast corner of Idaho (ID). What can you sayabout temperatures within that small region?
a. They were 40°F.
b. They were higher than 40°F but lower than 50°F.
c. They could have been anything above 40°F.
6. Suppose you are given a contour map showing elevation(altitude) for the state of Vermont. The region with themost closely spaced contours represents
a. the highest altitude.
b. the lowest altitude.
c. the steepest terrain.
7. Consider Figure 5.21. Approximately how many womenparticipated in the 1948 Olympics?
a. 19 b. 9.4 c. 450
8. Consider Figure 5.23a. The way the graph is drawn
a. makes the graph completely invalid.
b. makes the changes from one decade to the next appearlarger than they really were.
c. makes it more difficult to see the upward and downwardtrends that have occurred over time.
9. Consider Figure 5.24a. Moving one tick mark up the verti-cal axis represents an increase in computer speed of
a. 1 billion calculations per second.
b. a factor of 2.
c. a factor of 10.
10. Consider Figure 5.25. In years where the graph slopesdownward with time,
a. college costs decreased.
b. the cost of college rose, but by a lower percentage thanin previous years.
c. the cost of college rose, but the new cost represented alower proportion of the average person’s income.
REVIEW QUESTIONS11. Briefly describe the construction and use of multiple bar
graphs and stack plots.
12. What are geographical data? Briefly describe at least twoways to display geographical data. Be sure to explain themeaning of contours on a contour map.
13. What are three-dimensional graphics? Explain the differ-ence between graphics that only appear three-dimensionaland those that show truly three-dimensional data.
14. Describe how perceptual distortions can arise in graphicsand how they can be misleading.
15. How can graphics be misleading when the scales do notgo all the way to zero? Why are such graphics sometimesuseful?
16. What is an exponential scale? When is an exponential scaleuseful?
17. Explain how a graph that shows percentage change canshow descending bars (or a descending line) even when thevariable of interest is increasing.
18. What is a pictograph? How can a pictograph enhance agraph? How can it make a graph misleading?
DOES IT MAKE SENSE?Decide whether each of the following statements makes sense(or is clearly true) or does not make sense (or is clearly false).Explain your reasoning.
19. My bar chart contains more information than yours,because I made my bars three-dimensional.
20. I used an exponential scale because the data values for mycategories ranged from 7 to 450,000.
21. There’s been only a very slight rise in our stock price overthe past few months, but I wanted to make it look dramaticso I started the vertical scale from the lowest price ratherthan from zero.
22. A graph showing the yearly rate of increase in the numberof computer users has a slight downward trend, eventhough the actual number of users is rising.
BASIC SKILLS & CONCEPTS23. Net Grain Production. Net grain production is the dif-
ference between the amount of grain a country producesand the amount of grain its citizens consume. It is positiveif the country produces more than it consumes, and nega-tive if the country consumes more than it produces. Fig-ure 5.28 shows the net grain production of four countriesin 1990 and projected for 2030.
a. Which of the four countries had to import grain to meetits needs in 1990?
b. Which of the four countries are expected to need toimport grain to meet needs in 2030?
c. Given that India and China are the world’s two mostpopulous countries, what does this graph tell you abouthow world agriculture will have to change between nowand 2030?
benn.8206.05.pgs 9/29/07 11:53 AM Page 374
5D Graphics in the Media 375
Bachelor’sdegree
Advanceddegree
Somecollege/
Associatedegree
Highschool
Not highschool
graduate
Overall
10,000
0
20,000
30,000
40,000
50,000
60,000
70,000
$80,000
200019951985
Median Earnings of Workers 21 Years and Over byEducational Attainment, 1985 to 2000
FIGURE 5.29 Source: TIME Almanac, 1999, p. 886 andU.S. Census Bureau.
–250
–200
–150
–100
–50
0
50
100
U.S. China India Russia
19902030
Mil
lio
ns
of t
on
s
Net Grain Production,1990 and 2030 (projected)
FIGURE 5.28
0
200
400
600
800
1000
1200
1400
1900
1910
1920
1930
1940
1950
1960
1970
1980
1990
2000
Women Men
Year
Col
lege
gra
duat
es (t
hous
ands
)
College Degrees Awarded
FIGURE 5.30
24. Education and Earnings. Figure 5.29 shows medianearnings in three different years according to level of edu-cation.
a. Briefly explain the meaning of each of the three sets ofbars on the graph.
b. Compare in words the change in earnings between 1985and 2000 for people with bachelor’s degrees to thechange for people who did not graduate from high
school. What do these data say about the value of a col-lege education?
c. The graph has a three-dimensional appearance. Is itshowing true three-dimensional data, or is the appear-ance purely cosmetic? Do you think the three-dimensional appearance helps or hinders the display?
25. Stack Plot. Answer the following based on Figure 5.16.
a. State whether the death rate for each of the four diseasesindividually decreased or increased between 1900 and2003.
b. When was the death rate due to cardiovascular diseasesthe greatest, and what was it?
c. What was the death rate due to cancer in 2000?
d. Based on the trends in the graph, speculate on which ofthese four diseases will be responsible for the mostdeaths in 2050. Explain.
26. College Degrees. Figure 5.30 shows the numbers of col-lege degrees awarded to men and women over time.
a. Estimate the numbers of college degrees awarded tomen and to women (separately) in 1930 and in 2005.
b. Did men or women earn more degrees in 1980? Didmen or women earn more degrees in 2005?
c. During what decade did the total number of degreesawarded increase the most?
d. Compare the total numbers of degrees awarded in 1950and 2005.
e. Do you think the stack plot is an effective way to displaythese data? Briefly discuss other ways that might havebeen used instead.
benn.8206.05.pgs 12/15/06 8:23 AM Page 375
376 CHAPTER 5 Statistical Reasoning
Less than 10%
Probability That a Black Student Would Have White Classmates
20% – 40%40% – 60%60% – 80%More than 80%Counties with no data or no black students
FIGURE 5.32 Source: New York Times, April 2, 2000.
27. Federal Spending. Figure 5.31 shows the changes inmajor spending categories of the federal budget. (Paymentsto individuals includes Social Security and Medicare; netinterest represents interest payments on the national debt;all other represents non-defense discretionary spending.)
Interpret the stack plot and discuss some of the trends itreveals.
a. Find the percentage of the budget that went to netinterest in 1990, 1995, and 2005.
b. Find the percentage of the budget that went to defensein 1960, 1980, and 2005.
c. Find the percentage of the budget that went to pay-ments to individuals in 1980, 2000, and 2005.
28. Federal Trends. Consider Figure 5.31. Summarize atleast three trends shown in the figure.
29. School Segregation. One way of measuring segregationis to determine the likelihood that a black student will havewhite classmates. A New York Times study found that, bythis measure, segregation increased significantly in the1990s. Figure 5.32 shows the probability that a black stu-dent had white classmates, by county, during the1997–1998 academic year. Do there appear to be any sig-nificant regional differences? Can you pick out any differ-ences between urban and rural areas? Discuss possibleexplanations for a few of the trends that you see in the figure.
Payments to individuals
National defense
Net interest
All other
20
’60 ’65 ’70 ’75 ’80 ’85 ’90 ’95 ’05’00
40
60
80
100
Per
cent
Year
Percentage Composition of Federal Government Outlays
FIGURE 5.31 Source: Office of Management and Budget.
benn.8206.05.pgs 12/15/06 8:23 AM Page 376
5D Graphics in the Media 377
N
S
EW
FIGURE 5.33
18 million homes 73 million homes
Homes with Cable TV
1980
2005
FIGURE 5.35
U.S. Age Distribution. Parts (a) and (b) of Figure 5.34 displaythe age distribution of the U.S. population from 1960 to 2050
FIGURE 5.34
35
30
25
20
15
10
5
0
1960
1970
1980
1990
2000
2010
2050
Year(a)
Percent of population
�5
�65
5–1718–2425–4445–65
U.S. Age Distribution
�65
�5
25–4418–245–17
35
30
25
20
15
10
5
0
1960
1970
1980
1990
2000
2010
2050
Year(b)
Percent of population
45–65
30. Contour Elevations. Contour maps are often used toshow geographical elevations. Figure 5.33 shows elevationcontours around Boulder, Colorado. Discuss a few key fea-tures shown on the map.
(projected) in two different ways; the age categories are in oppo-site order so that all of the data can be viewed. Use these graphsto answer the questions in Exercises 31–39.
31. Briefly describe the meaning of each bar.
32. Do these graphs display true three-dimensional data, or isthe three-dimensional look cosmetic?
33. How has the percentage of the youngest Americanschanged since 1960?
34. Estimate the percentage of 5- to 17-year-olds in 1960 andin 2000.
35. Estimate the percentage of 45- to 65-year-olds in 1960 andin 2010.
36. In which year did (will) the 25- to 44-year-old group com-prise the largest percentage of the population?
37. In which year did (will) the 45- to 65-year-old group com-prise the largest percentage of the population?
38. Which age group is expected to see the greatest increasebetween 2000 and 2050?
39. Describe the most significant changes that you see in theU.S. population between 1960 and 2050.
40. Extending the Olympic Graph. Make a list of all thedata you would need in order to extend the graph inFigure 5.21 to the 2008 Olympics and beyond.
41. Data for 2008 Olympics. Use the Web to find the datayou need to extend Figure 5.21 (see Exercise 40) throughthe 2008 Olympics (assuming they have occurred by thetime you read this problem). Then photocopy the graphand add the new data on the same graph.
42. Volume Distortion. Figure 5.35 uses television sets torepresent the numbers of homes with cable in 1980 and
benn.8206.05.pgs 9/29/07 11:53 AM Page 377
378 CHAPTER 5 Statistical Reasoning
2005. Note that the heights of the TVs represent the num-bers of homes. Briefly explain how the graph creates a per-ceptual distortion that exaggerates the true change in thenumber of homes with cable.
43. Three-Dimensional Pies. The pie charts in Figure 5.36represent the percentage of Americans in three age cate-gories in 1990 and 2050 (projected). Briefly explain howthe three-dimensional effects create a perceptual distortionin this case. Why would flat pies (without the three-dimen-sional effects) give a more accurate representation of thedata?
46. Cellular Phone Users. The following table shows thenumber of cell phone subscribers in the United States forselected years between 1990 and 2003. Display the datausing both an ordinary vertical scale and an exponentialvertical scale. (Hint: For the exponential scale, use tickmarks at 1 million, 10 million, and 100 milllion.) Whichgraph is more useful? Why?
Year Subscribers (millions)
1990 5.3
1995 33.8
1997 55.3
1998 69.2
1999 86.0
2000 109.5
2001 128.3
2002 140.8
2003 158.7
47. Rising College Costs. Refer to Figures 5.25 and 5.26 toanswer the following questions.
a. In what academic year did public college costs rise bythe largest percentage? What was the percentageincrease?
b. In the same year (as part a), what was the percentageincrease in private college costs?
c. In the same year, which had the larger increase in actualcost (in dollars): public or private colleges? Explain.
48. World Population. Recast Figure 5.27 with a proper hor-izontal axis. What trends are clear in your new graph thatare not clear in the original? Explain.
1990 Age Distribution
Others
65 – 84 85+
2050 Age Distribution
Others
65 – 84 85+
FIGURE 5.36Source: U.S. Census Bureau.
5 0 0
5 5 0
6 0 0
6 5 0
7 0 0
7 5 0
8 0 0
Men Women
Ave
rage
wee
kly
earn
ings
FIGURE 5.37 Source: U.S. Census Bureau.
1 7 0 1 8 0 1 9 0 2 0 0 2 1 0
Oldsmobile
Lexus
Saab
Lincoln
Braking distance (feet)
FIGURE 5.38 Source: Car and Driver.
44. Comparing Earnings. Figure 5.37 compares the averageweekly earnings of men and women. Identify any mislead-ing aspects of the display. Draw the display in a fairer way.
45. Braking Distances. Figure 5.38 shows the braking dis-tance for four different cars. Discuss the ways in which itmight be deceptive. How much greater is the braking dis-tance of Lincolns than the braking distance of Oldsmo-biles? Draw the display in a fairer way.
benn.8206.05.pgs 12/15/06 8:23 AM Page 378
5D Graphics in the Media 379
FURTHER APPLICATIONSCreating Graphics. Exercises 49–52 give tables of real data.For each table, make a graphical display of the data. You maychoose any graphic type that you feel is appropriate to the dataset. In addition to making the display, write a few sentencesexplaining why you chose this type of display and a few sen-tences describing interesting patterns in the data.
49. Percent Never Married. The following table shows thepercentages, for 1970 and 2003, of men and women in var-ious age categories who were never married.
Women 1970 2003 Men 1970 2003
20–24 35.8 75.4 20–24 54.7 86.0
25–29 10.5 40.3 25–29 19.1 54.6
30–34 6.2 22.7 30–34 9.4 33.1
35–39 5.4 14.3 35–39 7.2 21.8
40–44 4.9 12.2 40–44 6.3 17.4
Source: U.S. Census Bureau.
50. Alcohol on the Road. The following table gives the totalnumber of automobile fatalities and the number of fatali-ties in which alcohol was involved for 1982 to 2004. Allfigures are in thousands of deaths.
Year Total Alcohol
1982 43,945 26,173
1984 44,257 24,762
1986 46,087 25,017
1988 47,087 23,833
1990 44,599 22,587
1992 39,250 18,290
1994 40,716 17,308
1996 42,065 17,749
1998 41,501 16,673
2000 41,945 17,380
2002 42,815 17,419
2004 42,643 17,013
Source: National Highway Traffic Safety Administration.
51. Daily Newspapers. The following table gives the numberof daily newspapers and their total circulation (in millions)for selected years since 1920.
Number of CirculationYear daily newspapers (millions)
1920 2042 27.8
1930 1942 39.6
1940 1878 41.1
1950 1772 53.9
1960 1763 58.8
1970 1748 62.1
1980 1747 62.2
1990 1611 62.3
2000 1485 56.1
2003 1456 55.2
Source: Editor & Publisher.
52. Firearm Fatalities. The following table summarizesdeaths due to firearms in different nations in a recent year.
FatalCountry Total Homicides Suicides accidents
U.S. 35,563 15,835 18,503 1225
Germany 1197 168 1004 25
Canada 1189 176 975 38
Australia 536 96 420 20
Spain 396 76 219 101
U.K. 277 72 193 12
Sweden 200 27 169 4
Vietnam 131 85 16 30
Japan 93 34 49 10
Source: Coalition to Stop Gun Violence.
53. Seasonal Effects on Schizophrenia? The graph inFigure 5.39 shows data regarding the relative risk of schiz-ophrenia among people born in different months.
a. Note that the scale of the vertical axis does not includezero. Sketch the same risk curve using an axis thatincludes zero. Comment on the effect of this change.
b. Each value of the relative risk is shown with a dot at itsmost likely value and with an “error bar” indicating therange in which the data value probably lies. The study
benn.8206.05.pgs 12/15/06 8:23 AM Page 379
380 CHAPTER 5 Statistical Reasoning
concludes that “the risk was also significantly associatedwith the season of birth.” Given the size of the errorbars, does this claim appear justified? (Is it possible todraw a flat line that passes through all of the error bars?)
WEB PROJECTSFind useful links for Web Projects on the text Web site:www.aw.com/bennett-briggs
55. Weather Maps. Many Web sites offer contour maps withcurrent weather data. For example, you can use the YahooWeather site to generate many different contour weathermaps. Generate at least two contour weather maps and dis-cuss what they show.
56. Cancer Cure. As shown in Figure 5.16, cancer is one ofthe leading causes of death today. Nevertheless, scientistshave made great progress in treating many forms of cancer.Go to the American Cancer Society Web site and investi-gate research into cancer cures. Read about one or tworecent studies, and write a short report on what you learn.Be sure to include graphics in your report.
57. USA Snapshot. The USA Today Web site offers a dailypictograph for its “USA Snapshot.” Study today’s snapshot.Briefly discuss its purpose and effectiveness.
IN THE NEWS58. News Graphics. Find a recent news report that shows a
multiple bar graph or stack plot. Comment on the effec-tiveness of the display. Could another display have beenused to depict the same data?
59. Geographical Data. Find an example of a graph of geo-graphical data in a recent news report. Comment on theeffectiveness of the display. Could another display havebeen used to depict the same data?
60. Three-Dimensional Effects. Find an example of a three-dimensional display in a recent news report. Are the datathree-dimensional or are the three-dimensional effectscosmetic? Comment on the effectiveness of the display.Could another display have been used to depict the samedata?
61. Graphic Confusion. Find an example in a recent newsreport of a graph that is misleading in one of the ways dis-cussed in this unit. Explain what makes the graph mislead-ing, and describe how it could have been drawn morehonestly.
62. Outstanding News Graph. Find a graph from a recentnews report that, in your opinion, is truly outstanding indisplaying data visually. Discuss what the graph shows, andexplain why you think it is so outstanding.
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
January
February
Marc
hAprilM
ayJu
neJu
lyAugust
Septem
berOcto
berNovem
berD
ecember
Month of birth
Rel
ativ
e ri
sk
FIGURE 5.39 Source: New England Journal of Medicine.
54. Starting Salaries for Men and Women. Consider thedata in the table below showing the average startingsalaries for men and women with various levels of educa-tion. Construct a graphical display and write two para-graphs that demonstrate as clearly as possible the evidentdisparity in the salaries of men and women.
Male Female
Overall $44,726 $28,367
Not a HS graduate 21,447 14,214
HS graduate only 33,266 21,659
Some college 36,419 22,615
Associate’s degree 43,462 29,537
Bachelor’s degree 63,084 38,447
Master’s degree 76,896 48,205
Professional 136,128 72,445
Doctorate 95,894 73,516
Source: U.S. Census Bureau, 2003.
benn.8206.05.pgs 12/15/06 8:23 AM Page 380
5E Correlation and Causality 381
UNIT 5E Correlation and Causality
A major goal of many statistical studies is to determine whether one factor causesanother. For example, does smoking cause lung cancer? In this unit, we will discusshow statistics can be used to search for correlations that might suggest a cause-and-effect relationship. Then we’ll explore the more difficult task of establishing causality.
Seeking CorrelationWhat does it mean when we say that smoking causes lung cancer? It certainly does notmean that you’ll get lung cancer if you smoke a single cigarette. It does not even meanthat you’ll definitely get lung cancer if you smoke heavily for many years, since someheavy smokers do not get lung cancer. Rather, it is a statistical statement meaning thatyou are much more likely to get lung cancer if you smoke than if you don’t smoke.
Let’s try to understand how researchers learned that smoking causes lung cancer.Before they could investigate cause, researchers first needed to establish correlationsbetween smoking and cancer. The process of establishing correlations began withobservations. The early observations were informal. Doctors noticed that smokersmade up a surprisingly high proportion of their patients with lung cancer. This sug-gestion of a linkage led to carefully conducted studies in which researchers comparedlung cancer rates among smokers and nonsmokers. These studies showed clearly thatheavier smokers were more likely to get lung cancer. In more formal terms, we saythat there is a correlation between the variables amount of smoking and incidence of lungcancer. A correlation is a special type of relationship between variables, in which a riseor fall in one goes along with a corresponding rise or fall in the other.
Smoking is one of theleading causes ofstatistics.
—FLETCHER KNEBEL
DEFINITION
A correlation exists between two variables when higher values of one variableconsistently go with higher values of another or when higher values of one vari-able consistently go with lower values of another.
Here are a few other examples of correlations:
• There is a correlation between the variables height and weight for people. That is,taller people tend to weigh more than shorter people.
• There is a correlation between the variables demand for apples and price of apples.That is, demand tends to decrease as prices increase.
• There is a correlation between practice time and skill among piano players. That is,those who practice more tend to be more skilled.
Establishing a correlation between two variables does not mean that a change inone variable causes a change in the other. Thus, finding the correlation between smok-ing and lung cancer did not by itself prove that smoking causes lung cancer. We couldimagine, for example, that some gene predisposes a person both to smoking and tolung cancer. Nevertheless, identifying the correlation was the crucial first step inlearning that smoking causes lung cancer.
By the WaySmoking is linked tomany serious diseasesbesides lung cancer,including heart diseaseand emphysema. Smok-ing is also linked with lesslethal health conditionssuch as premature skinwrinkling and sexualimpotence.
benn.8206.05.pgs 12/15/06 8:23 AM Page 381
382 CHAPTER 5 Statistical Reasoning
DEFINITION
A scatter diagram is a graph in which each point represents the values of twovariables.
Time out to thinkSuppose there really were a gene that made people prone to both smoking andlung cancer. Explain why we would still find a strong correlation between smokingand lung cancer in that case, but would not be able to say that smoking causedlung cancer.
Scatter DiagramsTable 5.6 shows the production cost and gross receipts (total revenue from ticketsales) for the 15 biggest-budget science fiction and fantasy movies of all time (throughmid-2006). Movie executives presumably hope there is a favorable correlationbetween the production budget and the receipts. That is, they hope that spendingmore to produce a movie will result in higher box office receipts. But is there such acorrelation? We can look for a correlation by making a scatter diagram showing therelationship between the variables production cost and gross receipts.
TABLE 5.6 Biggest-Budget Science Fiction and Fantasy Movies
Production Cost Gross Receipts Movie (millions of dollars) (millions of dollars)
King Kong (2005) 207 218
Spider-Man 2 (2004) 200 373
Chronicles of Narnia (2005) 180 292
Waterworld (1995) 175 88
Van Helsing (2004) 170 120
Polar Express (2004) 170 172
Terminator 3 (2003) 170 150
Poseidon (2006) 160 52
Batman Begins (2005) 150 205
Harry Potter/Goblet of Fire (2005) 150 290
Armageddon (1998) 140 201
Men in Black 2 (2002) 140 190
Spider-Man (2002) 139 403
Final Fantasy: The Spirits Within (2001) 137 32
Hulk (2003) 137 132
Note: Gross receipts are for United States only; worldwide receipts are often sub-stantially higher. These figures are not adjusted for inflation.
benn.8206.05.pgs 12/15/06 8:23 AM Page 382
5E Correlation and Causality 383
0
50
100
150
200
250
300
350
400
450
Spider-Man 2
Waterworld
Hulk Van HelsingTerminator 3
King Kong
50 100 150 200Production cost (millions of dollars)
Gro
ss r
ecei
pts
(mill
ions
of d
olla
rs)
Batman Begins
Harry Potter/Goblet of Fire Chronicles of Narnia
Spider-Man
250
Poseidon
FIGURE 5.40 Scatter diagram for the data in Table 5.6.
The following procedure describes how we make the scatter diagram, which isshown in Figure 5.40:
1. We assign one variable to each axis, and we label each axis with values thatcomfortably fit the data. Here, we assign production cost to the horizontal axisand gross receipts to the vertical axis. We choose a range of $50 to $250 millionfor the production cost axis and $0 to $450 million for the gross receipts axis.
2. For each movie in Table 5.6, we plot a single point at the horizontal positioncorresponding to its production cost and the vertical position corresponding toits gross receipts. For example, the point for the movie Waterworld goes at aposition of $175 million on the horizontal axis and $88 million on the verticalaxis. The dashed lines on Figure 5.40 show how we locate this point.
3. (Optional) If we wish, we can label data points, as is done for selected points inFigure 5.40.
Types of CorrelationLook carefully at the scatter diagram for movies in Figure 5.40. The dots seem to bescattered about with no apparent pattern. In other words, at least for these big-budgetmovies, there appears to be little or no correlation between the amount of moneyspent producing the movie and the amount of money it earned in gross receipts.
Now consider the scatter diagram in Figure 5.41, which shows the weights (incarats) and retail prices of 23 diamonds. Here, the dots show a clear upward trend,indicating that larger diamonds generally cost more. The correlation is not perfect.For example, the heaviest diamond is not the most expensive. But the overall trendseems fairly clear. Because the prices tend to increase with the weights, we say thatFigure 5.41 shows a positive correlation.
Time out to thinkBy studying Table 5.6, associate each of the unlabeled data points in Figure 5.40with a particular movie.
Technical NoteWe often have somereason to think thatone variable dependsat least in part on theother. In the case ofFigure 5.40, we mightguess that grossreceipts shoulddepend on the pro-duction cost. Wetherefore call produc-tion cost the expla-natory variable andgross receipts theresponse variable,because the produc-tion cost might helpexplain the grossreceipts. The explana-tory variable is usuallyplotted on the hori-zontal axis and theresponse variable onthe vertical axis.
benn.8206.05.pgs 12/15/06 8:23 AM Page 383
384 CHAPTER 5 Statistical Reasoning
20
0
40
60
80
100
120
Infa
nt m
orta
lity
(dea
ths
per
1000
live
bir
ths)
Higher life expectancy generally goes with lowerinfant mortality, so this is a negative correlation.
50 60 70 80Life expectancy (years)
Bangladesh
Pakistan
EgyptIndia
Kenya
BrazilPeru
Guatemala
RussiaMexico
SouthKorea
Israel,CzechRepublic
Canada,Australia
Greece
FIGURE 5.42 A scatter diagram for life expectancy andinfant mortality.
In contrast, Figure 5.42 shows a scatter diagram for the variables life expectancy andinfant mortality in 16 countries. We again see a clear trend, but this time it is anegative correlation: Countries with higher life expectancy tend to have lower infantmortality.
Besides stating whether a correlation exists, we can also discuss its strength. Themore closely the data follow the general trend, the stronger is the correlation.
❉EXAMPLE 1 Inflation and UnemploymentPrior to the 1990s, most economists assumed that the unemployment rate and theinflation rate were negatively correlated. That is, when unemployment goes down,inflation goes up, and vice versa. Table 5.7 shows unemployment and inflation datafor the period 1990–2006. Make a scatter diagram for these data. Based on your dia-gram, does it appear that the data support the historical claim of a link between theunemployment and inflation rates?
By the WayIn statistics, thecorrelation coefficientprovides a quantitativemeasure of the strengthof a correlation. It isdefined to be 1 for aperfect (meaning alldata points lie on a sin-gle straight line) positivecorrelation, for a per-fect negative correla-tion, and 0 for nocorrelation.
21
RELATIONSHIPS BETWEEN TWO DATA VARIABLES
No correlation: There is no apparent relationship between the two variables.
Positive correlation: Both variables tend to increase (or decrease) together.
Negative correlation: The two variables tend to change in opposite directions,with one increasing while the other decreases.
Strength of a correlation: The more closely two variables follow the generaltrend, the stronger the correlation (which may be either positive or negative). In aperfect correlation, all data points lie on a straight line.
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
18,000
0 0.5 1 1.5 2 2.5
Pri
ce (d
olla
rs)
Weight (carats)
Higher weight generally goes with higher price, so this is a positive correlation.
FIGURE 5.41 A scatter diagram for diamond weightsand prices.
benn.8206.05.pgs 12/15/06 8:23 AM Page 384
5E Correlation and Causality 385
4 5 6Unemployment rate (%)
7 8
2
3
Infla
tion
rate
(%)
4
0
1
5
6
FIGURE 5.43 Scatter diagram for the data in Table 5.7.
SOLUTION We make the scatter diagram by plotting the variable unemployment rateon the horizontal axis and the variable inflation rate on the vertical axis. To make thegraph easy to read, we use values ranging from 3.5% to 8% for the unemploymentrate and from 0 to 6% for the inflation rate. Figure 5.43 shows the result. To the eye,there does not appear to be any obvious correlation between the two variables. (A cal-culation confirms that there is no appreciable correlation.) Thus, these data do notsupport the historical claim of a negative correlation between the unemployment andinflation rates.
❉EXAMPLE 2 Accuracy of Weather ForecastsThe scatter diagrams in Figure 5.44 show two weeks of data comparing the actualhigh temperature for the day with the same-day forecast (left diagram) and the three-day forecast (right diagram). Discuss the types of correlation on each diagram.
TABLE 5.7 U.S. Inflation and Unemployment
Unemployment Inflation Unemployment InflationYear Rate (%) Rate (%) Year Rate (%) Rate (%)
1990 5.6 5.4 1999 4.3 2.2
1991 6.8 4.2 2000 4.0 3.4
1992 7.5 3.0 2001 4.2 1.8
1993 6.9 3.0 2002 5.8 1.6
1994 6.1 2.6 2003 6.0 2.3
1995 5.6 2.8 2004 5.5 2.7
1996 5.4 3.0 2005 5.1 3.4
1997 4.9 2.3 2006 4.6 3.4
1998 4.6 2.3
Source: U.S. Bureau of Labor Statistics; 2006 data through May of that year.
Now try Exercises 23–24. ➽
benn.8206.05.pgs 12/15/06 8:23 AM Page 385
386 CHAPTER 5 Statistical Reasoning
Same-day forecast (°F) Three-day forecast (°F)
3020 6040 50
30
40
20
50
Act
ual
tem
per
atu
re (
°F)
Act
ual
tem
per
atu
re (
°F)
60
70
3020 40 50 60
30
20
40
50
60
70
FIGURE 5.44 Comparison of actual high temperatures with same-day and three-day forecasts.
SOLUTION Both scatter diagrams show a general trend in which higher predictedtemperatures mean higher actual temperatures. Thus, both show positive correla-tions. However, the points in the left diagram lie more nearly on a straight line, indi-cating a stronger correlation than in the right diagram. This makes sense, because weexpect weather forecasts to be more accurate on the same day than three days inadvance. Now try Exercises 25–26.
Possible Explanations for a CorrelationWe began by stating that correlations can help us search for cause-and-effect rela-tionships. But we’ve already seen that causality is not the only possible explanationfor a correlation. For example, the predicted temperatures on the horizontal axis ofFigure 5.44 certainly do not cause the actual temperatures on the vertical axis. Thefollowing box summarizes three possible explanations for a correlation.
➽
❉EXAMPLE 3 Explanation for a CorrelationConsider the correlation between infant mortality and life expectancy in Figure 5.42.Which of the three possible explanations for a correlation applies? Explain.
SOLUTION The negative correlation between infant mortality and life expectancy isprobably an example of common underlying cause. Both variables respond to anunderlying variable that we might call quality of health care. In countries where healthcare is better in general, infant mortality is lower and life expectancy is higher.
Now try Exercises 27–28. ➽
POSSIBLE EXPLANATIONS FOR A CORRELATION
1. The correlation may be a coincidence.2. Both variables might be directly influenced by some common underlying cause.3. One of the correlated variables may actually be a cause of the other. Note that,
even in this case, we may have identified only one of several causes.
benn.8206.05.pgs 12/15/06 8:23 AM Page 386
5E Correlation and Causality 387
❉EXAMPLE 4 How to Get Rich in the Stock Market (Maybe)Every financial advisor has a strategy for predicting the direction of the stock market.Most focus on fundamental economic data, such as interest rates and corporate prof-its. But an alternative strategy relies on a remarkable correlation between the SuperBowl winner in January and the direction of the stock market for the rest of the year:The stock market tends to rise when a team from the old, pre-1970 NFL wins theSuper Bowl, and tends to fall otherwise. This correlation successfully matched 28 ofthe first 32 Super Bowls to the stock market. Suppose that the Super Bowl just endedand the winner was the Detroit Lions, an old NFL team. Should you invest all yourspare cash (and maybe even some that you borrow) in the stock market?
SOLUTION Based on the reported correlation, you might be tempted to invest, sincethe old-NFL winner suggests a rising stock market over the rest of the year. However,this investment would make sense only if you believed that the Super Bowl resultactually causes the stock market to move in a particular direction. This belief is clearlypreposterous, and the correlation is undoubtedly a coincidence. If you are going toinvest, don’t base your investment on this correlation. Now try Exercises 29–34.
Establishing CausalitySuppose you have discovered a correlation and suspect causality. How can you testyour suspicion? Let’s return to the issue of smoking and lung cancer. The strong cor-relation between smoking and lung cancer did not by itself prove that smoking causeslung cancer. In principle, we could have looked for proof with a controlled experi-ment. But such an experiment would be unethical, since it would require forcing agroup of randomly selected people to smoke cigarettes. So how was smoking estab-lished as a cause of lung cancer?
The answer involves several lines of evidence. First, researchers found correlationsbetween smoking and lung cancer among many groups of people: women, men, andpeople of different races and cultures. Second, among groups of people that seemedotherwise identical, lung cancer was found to be rarer in nonsmokers. Third, peoplewho smoked more and for longer periods of time were found to have higher rates oflung cancer. Fourth, when researchers accounted for other potential causes of lungcancer (such as exposure to radon gas or asbestos), they found that almost all theremaining lung cancer cases occurred among smokers.
These four lines of evidence made a strong case, but still did not rule out the possi-bility that some other factor, such as genetics, predisposes people both to smokingand to lung cancer. However, two additional lines of evidence made this possibilityhighly unlikely. One line of evidence came from animal experiments. In controlledexperiments, animals were divided into randomly chosen treatment and controlgroups. The experiments still found a correlation between inhalation of cigarettesmoke and lung cancer, which seems to rule out a genetic factor, at least in the ani-mals. The final line of evidence came from biologists studying cell cultures (that is,small samples of human lung tissue). The biologists discovered the basic process bywhich ingredients in cigarette smoke can create cancer-causing mutations. Thisprocess does not appear to depend in any way on specific genetic factors, making it allbut certain that lung cancer is caused by smoking and not by any preexisting geneticfactor.
➽
By the WayThe Super Bowl Indicatorwent into a slump afterSuper Bowl 32, correctlypredicting the stockmarket’s direction inonly one of the nextseven years.
The truth is rarely pureand never simple.
—OSCAR WILDE
benn.8206.05.pgs 12/15/06 8:23 AM Page 387
388 CHAPTER 5 Statistical Reasoning
CASE STUDY Air Bags and ChildrenBy the mid-1990s, passenger-side air bags had become commonplace in cars. Statisti-cal studies showed that the air bags saved many lives in moderate- to high-speed colli-sions. But a disturbing pattern also appeared. In at least some cases, young children,especially infants and toddlers in child car seats, were killed by air bags in low-speedcollisions.
At first, many safety advocates found it difficult to believe that air bags could be thecause of the deaths. But the observational evidence became stronger, meeting the firstfour guidelines for establishing causality. For example, the greater risk to infants inchild car seats fit Guideline 3, because it indicated that being closer to the air bagsincreased the risk of death. (A child car seat sits on top of the built-in seat, therebyputting a child closer to the air bags than the child would be otherwise.)
To seal the case, safety experts undertook experiments using dummies. They foundthat children, because of their small size, often sit where they could be easily hurt bythe explosive opening of an air bag. The experiments also showed that an air bagcould impact a child car seat hard enough to cause death, thereby revealing the physi-cal mechanism by which the deaths occurred.
By the WayBased on these studies,the government nowrecommends that childcar seats never be usedon the front seat, andthat children under age12 sit in the back seat ifpossible.
GUIDELINES FOR ESTABLISHING CAUSALITY
To investigate whether a suspected cause actually causes an effect:
1. Look for situations in which the effect is correlated with the suspected causeeven while other factors vary.
2. Among groups that differ only in the presence or absence of the suspectedcause, check that the effect is similarly present or absent.
3. Look for evidence that larger amounts of the suspected cause produce largeramounts of the effect.
4. If the effect might be produced by other potential causes (besides the suspectedcause), make sure that the effect still remains after accounting for these otherpotential causes.
5. If possible, test the suspected cause with an experiment. If the experiment can-not be performed with humans for ethical reasons, consider doing the experi-ment with animals, cell cultures, or computer models.
6. Try to determine the physical mechanism by which the suspected cause pro-duces the effect.
Time out to thinkThere’s a great deal of controversy concerning whether animal experiments areethical. What is your opinion of animal experiments? Defend your opinion.
The following box summarizes these ideas about establishing causality. Generallyspeaking, the case for causality is stronger when more of these guidelines are met.By the Way
The first four guidelinesfor establishing causalityare called Mill’s meth-ods, after the Englishphilosopher and econo-mist John Stuart Mill(1806–1873). Mill was aleading scholar of histime and an early advo-cate of women’s right tovote.
benn.8206.05.pgs 12/15/06 8:23 AM Page 388
5E Correlation and Causality 389
CASE STUDY What Is Causing Global Warming?Statistical measurements show that the global average temperature—the average tem-perature everywhere on Earth’s surface—has risen about 1.5°F in the past century,with more than half of this warming occurring in just the past 30 years. But what iscausing this so-called global warming?
Scientists have for decades suspected that the temperature rise is tied to an increasein the atmospheric concentration of carbon dioxide and other greenhouse gases. Com-parative studies of Earth and other planets, particularly Venus and Mars, show thatthe greenhouse gas concentration is the single most important factor in determining aplanet’s average temperature. It is even more important than distance from the Sun.For example, Venus, which is about 30% closer than Earth to the Sun, would be onlyabout 45°F warmer than Earth if it had an Earth-like atmosphere. But because Venushas a thick atmosphere made almost entirely of carbon dioxide, its actual surface tem-perature is about 880°F—hot enough to melt lead. The reason greenhouse gasescause warming is that they slow the escape of heat from a planet’s surface, therebyraising the surface temperature.
In other words, the physical mechanism by which greenhouse gases cause warmingis well understood (satisfying Guideline 6 on our list), and there is no doubt that alarge rise in carbon dioxide concentration would eventually cause Earth to becomemuch warmer. Nevertheless, as you’ve surely heard, many people have questionedwhether the current period of global warming really is due to humans or whether itmight be due to natural variations in the carbon dioxide concentration or other natu-ral factors.
In an attempt to answer these questions, the United States and other nations havedevoted billions of dollars over the past two decades to an unprecedented effort tounderstand Earth’s climate. We still have much more to learn, but the research to datemakes a strong case for human input of greenhouse gases as the cause of global warm-ing. Two lines of evidence make the case particularly strong.
The first line of evidence comes from careful measurements of past and presentcarbon dioxide concentrations in Earth’s atmosphere. Figure 5.45 shows the data.Notice that past changes in the carbon dioxide concentration correlate clearly withtemperature changes, confirming that we should expect a rising greenhouse gas con-centration to cause rising temperatures. Moreover, while the past data show that thecarbon dioxide concentration does indeed vary naturally, it also shows that the recentrise is much greater than any natural increase during the past several hundred thou-sand years. Human activity is the only viable explanation for the huge recent increasein carbon dioxide concentration.
The second line of evidence comes from experiments. We cannot perform con-trolled experiments with our entire planet, but we can run experiments with computermodels that simulate the way Earth’s climate works. Earth’s climate is incredibly com-plex, and many uncertainties remain in attempts to model the climate on computers.However, today’s models are the result of decades of work and refinement. Each timea model of the past failed to match real data, scientists sought to understand the miss-ing (or incorrect) ingredients in the model and then tried again with improved mod-els. Today’s models are not perfect, but they match real climate data quite well, givingscientists confidence that the models have predictive value. Figure 5.46 compares
By the WayCarbon dioxide andother greenhouse gasesare present naturally inEarth’s atmosphere,which is a good thing.Without them, Earth’saverage temperaturewould be a frigid with them, the globalaverage temperature isabout 59°F. From this per-spective, the problemwith global warming isthat human input of car-bon dioxide and othergreenhouse gases intoour atmosphere is rap-idly causing our planetto have too much of agood thing.
210ºF;
By the WayGlobal warming is amajor issue becausecomputer models sug-gest it will have severeconsequences. Amongthe predicted conse-quences are anincrease in the strengthand frequency of hurri-canes and other severestorms, a rise in sea leveldue to both heating ofthe oceans and meltingof glacial ice, and majorchanges to localweather patternsaround the world.
benn.8206.05.pgs 12/15/06 8:23 AM Page 389
390 CHAPTER 5 Statistical Reasoning
model data and real data, showing good agreement and clearly suggesting that humanactivity is the cause of global warming. If you include the effects of the greenhousegases put into the atmosphere by humans, the models agree with the data, but if youleave out these effects, the models fail.
1.0
–1.0
0.0
Year
Cha
nge
(com
pare
d to
pas
tav
erag
e gl
obal
tem
pera
ture
) (˚C
)
1850 1900 1950 2000
–0.5
0.5
Observations show a clear risein average global temperatures(red line) . . .
. . . agreeing with models(green swath) that includeeffects of greenhouse gasesreleased by humans.
FIGURE 5.46 This graph compares the predictions ofvarious climate models (green swath) with observed tem-perature changes (red line) since about 1860. The agree-ment is not perfect—telling us we still have much tolearn—but it is good enough to give us confidence thatgreenhouse gases are indeed causing global warming.
200150
100,000300,000400,000 200,000
Years ago
1750
today
0
350400
CO
2 (p
pm)
300
300
320
340
360
380
250
Tem
pera
ture
cha
nge
(˚C)
(rel
ativ
e to
pas
t mill
enni
um)
–8–10
–6–4–20246
2000 20101990198019701960
Year
CO
2 (p
pm)
Periods of higher CO2 concentration coincide with times of higher global average temperature.
Human use of fossil fuels has raised CO2 levels above all peaks occurring in the past 400,000 years.
FIGURE 5.45 The atmospheric concentration of carbon dioxide and global average tempera-ture over the past 400,000 years.The recent data (right) represent direct meas-urements (at Mauna Loa,Hawaii); the past data come from studies of air bubblestrapped in Antarctic ice.The concentration is measured in parts per million (ppm).
CO2
benn.8206.05.pgs 12/15/06 8:23 AM Page 390
5E Correlation and Causality 391
Time out to thinkCheck the idea that human activity causes global warming against each of the sixguidelines for establishing causality.
Confidence in CausalityIf human activity is causing global warming, we’d be wise to change our activities so asto stop it. But while we have good reason to think that this is the case, not everyone isyet convinced. Moreover, the changes needed to slow global warming might be veryexpensive. How do we decide when we’ve reached the point where something likeglobal warming requires steps to address it?
In an ideal world, we would continue to study the issue until we could establishfor certain that human activity is the cause of global warming. However, we haveseen that it is difficult to establish causality and often impossible to prove causalitybeyond all doubt. We are therefore forced to make decisions about global warming,and many other important issues, despite remaining uncertainty about cause andeffect.
In other areas of mathematics, accepted techniques help us deal with uncertaintyby allowing us to calculate numerical measures of possible errors. But there are noaccepted ways to assign such numbers to the uncertainty that comes with questions ofcausality. Fortunately, another area of study has dealt with practical problems ofcausality for hundreds of years: our legal system. You may be familiar with the follow-ing three broad ways of expressing a legal level of confidence.
By the WayFor criminal trials, theSupreme Courtendorsed this guidancefrom Justice Ginsburg:“Proof beyond a reason-able doubt is proof thatleaves you firmly con-vinced of the defen-dant’s guilt. There arevery few things in thisworld that we know withabsolute certainty, andin criminal cases the lawdoes not require proofthat overcomes everypossible doubt. If, basedon your consideration ofthe evidence, you arefirmly convinced thatthe defendant is guiltyof the crime charged,you must find him guilty.If on the other hand, youthink there is a real possi-bility that he is not guilty,you must give him thebenefit of the doubtand find him not guilty.”
BROAD LEVELS OF CONFIDENCE IN CAUSALITY
Possible cause: We have discovered a correlation, but cannot yet determinewhether the correlation implies causality. In the legal system, possible cause (suchas thinking that a particular suspect possibly caused a particular crime) is often thereason for starting an investigation.
Probable cause: We have good reason to suspect that the correlation involvescause, perhaps because some of the guidelines for establishing causality are satis-fied. In the legal system, probable cause is the general standard for getting a judgeto grant a warrant for a search or wiretap.
Cause beyond reasonable doubt: We have found a physical model that is so suc-cessful in explaining how one thing causes another that it seems unreasonable todoubt the causality. In the legal system, cause beyond reasonable doubt is the usualstandard for conviction. It generally demands that the prosecution show how andwhy (essentially the physical model) the suspect committed the crime. Note thatbeyond reasonable doubt does not mean beyond all doubt.
benn.8206.05.pgs 12/15/06 8:23 AM Page 391
392 CHAPTER 5 Statistical Reasoning
EXERCISES 5E
QUICK QUIZChoose the best answer to each of the following questions.Explain your reasoning with one or more complete sentences.
1. If X is correlated with Y,
a. X causes Y.
b. increasing values of X go with increasing values of Y.
c. increasing values of X go with either increasing ordecreasing values of Y.
2. Consider Figure 5.42. According to this diagram, lifeexpectancy in Russia is about
a. 22 years. b. 63 years. c. 58 years.
3. If the points on a scatter diagram fall on a nearly straightline sloping upward, the two variables have
a. a strong positive correlation.
b. a weak negative correlation.
c. no correlation.
4. If the points on a scatter diagram fall into a broad swaththat slopes downward, the two variables have
a. a strong positive correlation.
b. a weak negative correlation.
c. no correlation.
5. When can you rule out the possibility that changes to vari-able X cause changes to variable Y?
a. when there is no correlation between X and Y
b. when there is a negative correlation between X and Y
c. when a scatter diagram of the two variables shows pointslying in a straight line
6. What type of correlation would you expect between wagesand the unemployment rate?
a. none
b. positive: higher wages would go with higherunemployment
c. negative: higher wages would go with lower unemployment
7. You have found a higher rate of birth defects among babiesborn to women exposed to second-hand smoke. To supporta claim that the second-hand smoke caused the birthdefects, what else should you expect to find?
a. evidence that higher rates of defects are correlated withexposure to greater amounts of smoke
b. evidence that these types of birth defects occur only inbabies whose mothers were exposed to smoke, and neverto any other babies
c. evidence that the types of birth defects in these babiesare more debilitating than other types of birth defects
8. Consider Figure 5.45. According to this graph, how doesthe concentration today compare to the highest concentrations during the 400,000 years before humansbegan industry?
a. The values are about the same.
b. Today’s value is about 10% higher.
c. Today’s value is about 30% higher.
9. Based on the trend shown in Figure 5.45, predict the concentration in the year 2040.
a. 390 ppm b. 420 ppm c. 600 ppm
CO2
CO2CO2
While these broad levels remain fairly vague, they give us at least some commonlanguage for discussing confidence in causality. If you study law, you will learn muchmore about the subtleties of interpreting these terms. However, because statistics haslittle to say about them, we will not discuss them much further in this book.
Time out to thinkGiven what you know about global warming, do you think that human activity is apossible cause, probable cause, or cause beyond reasonable doubt? Defend youropinion. Based on your level of confidence in the causality, how would you recom-mend setting policies with regard to global warming?
benn.8206.05.pgs 12/15/06 8:23 AM Page 392
5E Correlation and Causality 393
10. A jury finding that a person is guilty “beyond reasonabledoubt” is supposed to mean that
a. the person is definitely guilty.
b. the 12 members of the jury each felt that there was morethan a 50% chance that the person was guilty.
c. any reasonable person would conclude that the evidencewas sufficient to establish guilt.
REVIEW QUESTIONS11. What is a correlation? Give three examples of pairs of vari-
ables that are correlated.
12. What is a scatter diagram, and how do you make one? Howcan we use a scatter diagram to look for a correlation?
13. Define and distinguish among positive correlation, nega-tive correlation, and no correlation. How do we determinethe strength of a correlation?
14. Describe the three general categories of explanation for acorrelation. Give an example of each.
15. Briefly describe each of the six guidelines presented in thisunit for establishing causality. Give an example of theapplication of each guideline.
16. Briefly describe three levels of confidence in causality andhow they can be useful when we do not have absoluteproof of causality.
DOES IT MAKE SENSE?Decide whether each of the following statements makes sense(or is clearly true) or does not make sense (or is clearly false).Explain your reasoning.
17. There is a strong negative correlation between the priceof tickets and the number of tickets sold. This suggeststhat if we want to sell a lot of tickets, we should lower theprice.
18. There is a strong positive correlation between the amountof time spent studying and grades in mathematics classes.This suggests that if you want to get a good grade, youshould spend more time studying.
19. I found a nearly perfect positive correlation between vari-able A and variable B, and therefore was able to concludethat an increase in variable A causes an increase in vari-able B.
20. I found a nearly perfect negative correlation betweenvariable C and variable D, and therefore was able to con-clude that an increase in variable C causes a decrease invariable D.
21. I had originally suspected that an increase in variable Ewould cause a decrease in variable F, but I no longerbelieve this because I found no correlation between thetwo variables.
22. I agree that we should require kids to wear helmets if hel-mets really lower injury rates, but it makes no sense to startthis requirement until we have absolute proof that helmetscause the lower injury rate.
BASIC SKILLS & CONCEPTSInterpreting Scatter Diagrams. Exercises 23–26 each show ascatter diagram with its axes labeled. For each exercise, do thefollowing:
a. Indicate the variables for which we can seek a correlationwith this diagram.
b. State whether the diagram shows a positive correlation, anegative correlation, or no correlation. If there is a positiveor negative correlation, state whether it is strong or weak.
c. In words, summarize any conclusions you can draw fromthe diagram.
23.
10
15
20
25
30
35
1500 2500 3500 4500Weight of cars (pounds)
2004 Model Cars
City
gas
mile
age
(mi/
gal)
012345678
50 55 60 65 70Voter turnout (%)
U.S. Presidential Elections, 1964–2004
Une
mpl
oym
ent (
%)
24.
benn.8206.05.pgs 12/15/06 8:23 AM Page 393
394 CHAPTER 5 Statistical Reasoning
0
2
4
6
8
10
Salary level (dollars per year)
Employees of Big Co.
Per
cent
of i
ncom
e gi
ven
to c
hari
ty
$30,
000
$60,
000
$90,
000
$120
,000
$150
,000
$180
,000
$210
,000
$240
,000
$270
,000
50045040035030025020015010050
0
U.S. Farms 1950–2000
Number of farms (millions)
Ave
rage
siz
e (a
cres
)
0 1 2 3 4 5 6
25. FURTHER APPLICATIONSMaking Scatter Diagrams. Exercises 35–40 each give a table ofdata. In each case, do the following:
a. Make a scatter diagram for the data.
b. State whether the two variables appear to be correlated and,if so, whether the correlation is positive or negative andstrong or weak.
c. Suggest a reason for the correlation (or lack of correlation).If you suspect causality, briefly discuss what further evi-dence you would need to establish it.
35. Defense and Economy. The table below gives the percapita gross national product and the per capita expendi-ture on defense for eight developed countries. Grossnational product (GNP) is a measure of the total economicoutput of a country in monetary terms. Per capita GNP isthe GNP averaged over every person in the country.
Per capita Per capitaCountry GNP ($) defense ($)
Australia 26,900 350
France 31,000 553
Germany 30,120 328
Israel 17,380 1673
Japan 37,180 310
Norway 52,000 659
United Kingdom 33,940 583
United States 41,400 1128
36. The following table gives number of home runs and bat-ting average for baseball’s Most Valuable Players,1996–2005 League and
Home BattingPlayer runs average
Ken Caminiti (1996 NL) 40 .326
Juan Gonzalez (1996 AL) 47 .314
Larry Walker (1997 NL) 49 .366
Ken Griffey Jr. (1997 AL) 56 .304
Sammy Sosa (1998 NL) 66 .308
Juan Gonzalez (1998 AL) 45 .318
Chipper Jones (1999 NL) 45 .319
AL 5 American League B .ANL 5 National
26.
Types of Correlation. Exercises 27–34 list pairs of variables.State the units you would use to measure each of the two vari-ables (for example, pounds, years, or miles per hour). Then statewhether you believe the two variables are correlated. If youbelieve they are correlated, state whether the correlation is posi-tive or negative and strong or weak. Explain your reasoning.
27. Latitude north of the equator and average high tempera-ture in June
28. Height of individual and amount of pocket change
29. Age and time spent daily on cell phone
30. Altitude on a mountain hike and air pressure
31. Population of a state and average salary of public schoolteachers
32. Population of a state and percentage of foreign-born residents
33. Fertility rate of women and life expectancy in the country
34. Family income of public school students and experience ofteacher (continued)
benn.8206.05.pgs 12/15/06 8:23 AM Page 394
5E Correlation and Causality 395
Ivan Rodriguez (1999 AL) 35 .332
Jeff Kent (2000 NL) 33 .334
Jason Giambi (2000 AL) 43 .333
Barry Bonds (2001 NL) 73 .328
Ichiro Suzuki (2001 AL) 8 .350
Barry Bonds (2002 NL) 46 .370
Miguel Tejada (2002 AL) 34 .308
Barry Bonds (2003 NL) 45 .341
Alex Rodriguez (2003 AL) 47 .298
Barry Bonds (2004 NL) 45 .362
Vladimir Guerrero (2004 AL) 39 .337
Albert Pujols (2005 NL) 41 .330
Alex Rodriguez (2005 AL) 48 .321
37. The following table gives per capita personal income andpercent of the population below the poverty level for tenstates in 2004.
Per capita Percent ofpersonal population below
State income (dollars) poverty level
California 35,019 13.1
Colorado 36,063 9.7
Illinois 34,351 12.6
Iowa 30,560 8.9
Minnesota 35,861 7.4
Montana 26,857 15.1
Nevada 33,405 10.9
New Hampshire 37,040 5.8
Utah 26,606 9.1
West Virginia 25,872 17.4
Source: U.S. Census Bureau; U.S. Bureau of Economic Analysis.
38. The following table gives the average hours of televisionwatched in households in five categories of annual income.(Hint: For the first and last categories of the householdincome data, place the dot at the position corresponding to$25,000 and $65,000, respectively. For other categories,place the dot at the center of each bin.)
Household income Weekly TV hours
Less than $30,000 56.3
$30,000–$40,000 51.0
$40,000–$50,000 50.5
$50,000–$60,000 49.7
More than $60,000 48.7
Source: Nielsen Media Research.
39. The following table gives the average teacher salary andthe expenditure on public education per pupil for ten statesin 2004.
Average teacher Per pupilState salary (dollars) expenditure (dollars)
Alabama 38,325 6701
Alaska 51,736 9808
Arizona 41,843 5474
Connecticut 57,337 11,774
Massachusetts 53,181 10,772
North Dakota 35,441 6683
Oregon 49,169 7587
Texas 40,476 7168
Utah 38,976 5245
Wyoming 39,532 9673
Source: National Education Association.
40. The following table gives mean daily Caloric intake (allresidents) and infant mortality rate (per 1000 births) forten countries.
Mean daily Infant mortality rateCountry Calories (per 1000 births)
Afghanistan 1523 154
Austria 3495 6
Burundi 1941 114
Colombia 2678 24
Ethiopia 1610 107
Germany 3443 6
Liberia 1640 153
New Zealand 3362 7
Turkey 3429 44
United States 3671 7
benn.8206.05.pgs 12/15/06 8:23 AM Page 395
396 CHAPTER 5 Statistical Reasoning
Correlation and Causality. Exercises 41–46 make statementsabout a correlation. In each case, state the correlation clearly(for example, there is a positive correlation between variable Aand variable B). Then state whether the correlation is mostlikely due to coincidence, a common underlying cause, or adirect cause. Explain your answer.
41. In a large resort city, the crime rate increased at the sametime that the number of tourists increased.
42. Over the past three decades, the number of miles of free-ways in Los Angeles has grown, and traffic congestion hasworsened.
43. When gasoline prices rise, sales of sport utility vehiclesdecline.
44. Sales of ice cream in a local restaurant are positively corre-lated with sales of swimming suits at a local store.
45. Automobile gas mileage decreases with tire pressure.
46. Over a period of twenty years, the number of ministers andpriests in a city increased, as did attendance at movies.
47. Identifying Causes: Headaches. You are trying to iden-tify the cause of late-afternoon headaches that plague youseveral days each week. For each of the following tests andobservations, explain which of the six guidelines for estab-lishing causality you used and what you concluded.
• The headaches occur only on days that you go to work.
• If you stop drinking Coke at lunch on days you go towork, the headaches persist.
• In the summer, the headaches occur less frequently ifyou open the windows of your office slightly. Theyoccur even less often if you open the windows of youroffice fully.
Having made all these observations, what reasonable con-clusion can you reach about the cause of the headaches?
48. Smoking and Lung Cancer. There is a strong correla-tion between tobacco smoking and incidence of lung can-cer, and most physicians believe that tobacco smokingcauses lung cancer. Yet, not everyone who smokes getslung cancer. Briefly describe how smoking could causecancer when not all smokers get cancer.
49. Longevity of Orchestra Conductors. A famous study inForum on Medicine (1978) concluded that the mean lifetimeof conductors of major orchestras was 73.4 years, about5 years longer than that of all American males at the time.The author claimed that a life of music causes a longer life.Evaluate the claim of causality and propose other explana-tions for the longer life expectancy of conductors.
50. High-Voltage Power Lines. Suppose that people livingnear a particular high-voltage power line have a higherincidence of cancer than people living farther from thepower line. Can you conclude that the high-voltage powerline is the cause of the elevated cancer rate? If not, whatother explanations might there be for it? What other typesof research would you like to see before you conclude thathigh-voltage power lines cause cancer?
51. Soccer and Birthdays. A recent study revealed that thebest soccer players in the world tend to have birthdays inthe earlier months of the year. Is this a coincidence or canyou find a plausible explanation?
WEB PROJECTSFind useful links for Web Projects on the text Web site:www.aw.com/bennett-briggs
52. Success in the NFL. Use the Web to find last season’sNFL team statistics. Make a table showing the followingfor each team: number of wins, average yards gained onoffense per game, and average yards allowed on defenseper game. Make scatter diagrams to explore the correla-tions between offense and wins and between defense andwins. Discuss your findings. Do you think that there areother team statistics that would yield stronger correlationswith the number of wins?
53. Statistical Abstract. Explore the “frequently requestedtables” at the Web site for the Statistical Abstract of theUnited States. Choose data that are of interest to you andexplore at least two correlations. Briefly discuss what youlearn from the correlations.
54. Air Bags and Children. Starting from the Web site of theNational Highway Traffic Safety Administration, researchthe latest studies on the safety of air bags, especially withregard to children. Write a short report summarizing yourfindings and offering recommendations for improvingchild safety in cars.
55. Global Warming. Use the Web to find recent informa-tion about global warming and its potential consequences.Discuss the evidence linking human activity to globalwarming. In light of your findings, suggest how we shoulddeal with the issue of global warming.
56. Tobacco Lawsuits. Tobacco companies have been thesubject of many lawsuits relating to the dangers of smok-ing. Research one recent lawsuit. What were the plaintiffstrying to prove? What statistical evidence did they use?How well do you think they established causality? Did theywin? Summarize your findings in one to two pages.
benn.8206.05.pgs 9/29/07 11:53 AM Page 396
Chapter 5 Summary 397
IN THE NEWS57. Correlations in the News. Find a recent news report
that describes some type of correlation. Describe the cor-relation. Does the article give any sense of the strength ofthe correlation? Does it suggest that the correlationreflects any underlying causality? Briefly discuss whetheryou believe the implications the article makes with respectto the correlation.
58. Causation in the News. Find a recent news report inwhich a statistical study has led to a conclusion of causa-
tion. Describe the study and the claimed causation. Do youthink the claim of causation is legitimate? Explain.
59. Legal Causation. Find a news report concerning anongoing legal case, either civil or criminal, in which estab-lishing causality is important to the outcome. Brieflydescribe the issue of causation in the case and how theability to establish or refute causality will influence theoutcome of the case.
CHAPTER 5 SUMMARY
UNIT KEY TERMS KEY IDEAS AND SKILLS
5A
5B
statisticsis a scienceare data
population, samplepopulation parameters,
sample statisticsbiasobservational study
case-control studyexperiment
placebo,placebo effect
blindingsingle-blinddouble-blind
margin of errorconfidence interval
selection biasparticipation biasvariable (in a statistical
study)
Understand and interpret the five basic steps in a statistical study.Understand the importance of a representative sample.Be familiar with four common sampling methods:
simple random samplingsystematic samplingconvenience samplingstratified sampling
Distinguish between observational studies and experiments; also recognize observational case-control studies.
Understand the placebo effect and the importance of blinding in experiments.
Find a confidence interval from a margin of error:from (sample statistic margin of error) to (sample statistic margin of error)
Understand and apply eight guidelines for evaluating a statistical study.
1
2
(Continues on the next page)
benn.8206.05.pgs 12/15/06 8:23 AM Page 397
398 CHAPTER 5 Statistical Reasoning
5C
5D
5E
frequency tablecategoriesfrequencyrelative frequencycumulative frequency
data typesqualitativequantitative
bar chartpie charthistogramline charttime-series diagram
multiple bar graphstack plotgeographical datacontour map
correlationcausescatter diagram
Interpret and create frequency tables.Interpret and create bar graphs and pie charts.Interpret and create histograms and line charts.
Interpret multiple bar graphs, stack plots, contour maps, and othermedia graphs.
Distinguish between true three-dimensional data and graphs that have a three-dimensional look for cosmetic reasons only.
Be aware of common cautions about graphs.
Distinguish between correlation and causality.Create and interpret scatter diagrams and use them to identify
correlations:positive, negative, or no correlationstrength of correlation
Know three possible explanations for a correlation:coincidencecommon underlying causetrue cause
Understand and apply six guidelines for establishing causality.
benn.8206.05.pgs 12/15/06 8:23 AM Page 398