78
Statistical Reasoning Is your drinking water safe? Do most people approve of the President’s tax plan? How much is the cost of health care rising? These questions and thousands more like them can be answered only through statistical studies. Indeed, statistical infor- mation appears in the news every day, making the ability to understand and reason with statistics cru- cial to modern life. Statistical thinking will one day be as necessary for effi- cient citizenship as the abil- ity to read and write. —H. G. Wells 321 UNIT 5A Fundamentals of Statistics: We discuss how statistical studies are conducted, with empha- sis on the importance of sampling. UNIT 5B Should You Believe a Statistical Study? We develop eight useful guidelines for evaluating statistical claims. UNIT 5C Statistical Tables and Graphs: We investi- gate basic tables and graphs, including fre- quency tables, bar graphs, pie charts, histograms, and line charts. UNIT 5D Graphics in the Media: News media go well beyond the basics with fancy statistical graph- ics. We explore common types of media graphics. UNIT 5E Correlation and Causality: One of the most important uses of statistics is to identify cause- and-effect relationships.We investigate how to interpret correlations and how to decide whether a correlation is the result of causality.

Chapter 5

Embed Size (px)

Citation preview

Page 1: Chapter 5

Statistical ReasoningIs your drinking water safe? Do most peopleapprove of the President’s tax plan? How much isthe cost of health care rising? These questions andthousands more like them can be answered onlythrough statistical studies. Indeed, statistical infor-mation appears in the news every day, making theability to understand and reason with statistics cru-cial to modern life.

Statistical thinking will oneday be as necessary for effi-cient citizenship as the abil-ity to read and write.

—H. G. Wells

321

UNIT 5AFundamentals of Statistics: We discuss howstatistical studies are conducted, with empha-sis on the importance of sampling.

UNIT 5BShould You Believe a Statistical Study? Wedevelop eight useful guidelines for evaluatingstatistical claims.

UNIT 5CStatistical Tables and Graphs: We investi-gate basic tables and graphs, including fre-quency tables, bar graphs, pie charts,histograms, and line charts.

UNIT 5DGraphics in the Media: News media go wellbeyond the basics with fancy statistical graph-ics. We explore common types of mediagraphics.

UNIT 5ECorrelation and Causality: One of the mostimportant uses of statistics is to identify cause-and-effect relationships. We investigate how tointerpret correlations and how to decidewhether a correlation is the result of causality.

benn.8206.05.pgs 12/15/06 8:22 AM Page 321

Page 2: Chapter 5

322 CHAPTER 5 Statistical Reasoning

By the WayYou’ll sometimes hearthe word data used as asingular synonym forinformation, but techni-cally the word data isplural. One piece ofinformation is called adatum, and two or morepieces are called data.

HISTORICAL NOTE

Statistics originated withthe collection of censusand tax data, which areaffairs of state. That iswhy the word state is atthe root of the wordstatistics.

TWO DEFINITIONS OF STATISTICS

• Statistics is the science of collecting, organizing, and interpreting data.• Statistics are the data that describe or summarize something.

UNIT 5A Fundamentals of Statistics

The subject of statistics plays a major role in modern society. It’s used to determinewhether a new drug is effective in treating cancer. It’s involved when agriculturalinspectors check the safety of the food supply. It’s used in every opinion poll and sur-vey. In business, it’s used for market research. Sports statistics are part of daily conver-sation for millions of people. Indeed, you’ll be hard-pressed to think of a topic that isnot linked in some way to statistics.

But what is (or are) statistics? There are two answers, because the term statistics canbe either singular or plural. When it is singular, statistics refers to the science of statis-tics. The science of statistics helps us collect, organize, and interpret data, which arenumbers or other pieces of information about some topic. When it is plural, the wordstatistics refers to the data themselves, especially those that describe or summarizesomething. For example, if there are 30 students in your class and they range in agefrom 17 to 64, the numbers “30 students,” “17 years,” and “64 years” are statistics thatdescribe your class.

How Statistics WorksStatistical studies are conducted in many different ways and for many different pur-poses, but they all share a few characteristics. To get the basic ideas, consider theNielsen ratings, which are used to estimate the numbers of people watching varioustelevision shows. These ratings are used, for example, to determine the most populartelevision show of the week.

Suppose the Nielsen ratings tell you that Lost was last week’s most popular show,with 22 million viewers. You probably know that no one actually counted all 22 mil-lion people. But you may be surprised to learn that the Nielsen ratings are based onthe television-viewing habits of people in only 5000 homes. To understand howNielsen can draw a conclusion about millions of Americans from 5000 homes, weneed to investigate the principles behind statistical research.

Nielsen’s goal is to draw conclusions about the viewing habits of all Americans. Inthe language of statistics, we say that Nielsen is interested in the population of allAmericans. The characteristics of this population that Nielsen seeks to learn—suchas the number of people watching each television show—are called populationparameters. Note that, although we usually think of a population as a group of peo-ple, in statistics a population can be any kind of group—people, animals, or things.For example, in a study of college costs, the population might be all colleges and uni-versities, and the population parameters might include prices for tuition, fees, andhousing.

benn.8206.05.pgs 12/15/06 8:22 AM Page 322

Page 3: Chapter 5

5A Fundamentals of Statistics 323

Nielsen seeks to learn about the population of all Americans by studying a muchsmaller sample of Americans in depth. More specifically, Nielsen has devices (called“people meters”) attached to televisions in 5000 homes, so the people who live inthese homes make up the sample of Americans that Nielsen studies. The individualmeasurements that Nielsen collects from the sample, such as who is watching eachshow at each time, constitute the raw data. Nielsen then consolidates these raw datainto a set of numbers that characterize the sample, such as the percentage of youngmale viewers watching Lost. These numbers are called sample statistics.

❉EXAMPLE 1 Population and SampleFor each of the following cases, describe the population, sample, population parame-ters, and sample statistics.

a. Agricultural inspectors for Jefferson County measure the levels of residuefrom three common pesticides on 25 ears of corn from each of the 104 corn-producing farms in the county.

b. Anthropologists determine the average brain size of early Neanderthals inEurope by studying skulls found at three sites in southern Europe.

SOLUTION

a. The inspectors seek to learn about the population of all ears of corn grownin the county. They do this by studying a sample that consists of 25 earsfrom each farm. The population parameters are the average levels of residuefrom the three pesticides on all corn grown in the county. The sample sta-tistics describe the average levels of residue that are actually measured onthe corn in the sample.

b. The anthropologists seek to learn about the population of all early Nean-derthals in Europe. Specifically, they seek to determine the average brainsize of all Neanderthals, which is the population parameter in this case. Thesample consists of the relatively few individual Neanderthals whose skullsare found at the three sites. The sample statistic is the average brain size(skull size) of the individuals in the sample. Now try Exercises 25–30.

The Process of a Statistical StudyBecause Nielsen does not study the entire population of all Americans, it cannot actu-ally measure any population parameters. Instead, the company tries to infer reasonablevalues for population parameters from the sample statistics (which it did measure).

By the WayArthur C. Nielsenfounded his companyand invented marketresearch in 1923. Hebegan producing ratingsfor radio programs in1942 and added televi-sion ratings in the 1960s.Nielsen’s people meters,attached to all the tele-visions in 5000 homes, tellthe company wheneach television is on andwhat show is beingwatched. People in thehomes are supposed topush buttons that tellNielsen who is watchingeach television. Nielsencan thereby determinethe breakdown of view-ership by age, sex, andethnicity, as well as totalviewing numbers.

DEFINITIONS

The population in a statistical study is the complete set of people or things beingstudied. The sample is the subset of the population from which the raw data areactually obtained.

Population parameters are specific characteristics of the population that a statis-tical study is designed to estimate. Sample statistics are numbers or observationsthat summarize the raw data.

benn.8206.05.pgs 12/15/06 8:22 AM Page 323

Page 4: Chapter 5

324 CHAPTER 5 Statistical Reasoning

POPULATION SAMPLE

POPULATIONPARAMETERS

SAMPLESTATISTICS

START

2. Draw from population.

1. Identify goals.

5. Draw conclusions. 3. Collect raw data and summarize.

4. Make inferences about population.

FIGURE 5.1 Elements of a statistical study.

The process of inference is simple in principle, though it must be carried out withgreat care. For example, suppose Nielsen finds that 7% of the people in its samplewatched Lost. If this sample accurately represents the entire population of all Ameri-cans, then Nielsen can infer that approximately 7% of all Americans watched the show.In other words, the sample statistic of 7% is used as an estimate for the populationparameter. (By using statistical techniques that we’ll discuss in Unit 6D, Nielsen canalso estimate the uncertainty in the inferred population parameters.)

Once Nielsen has estimates of the population parameters, it can draw general con-clusions about what Americans were watching. The process used by Nielsen MediaResearch is similar to that used in many statistical studies. Figure 5.1 summarizes thegeneral relationships among a population, a sample, the sample statistics, and thepopulation parameters.

By the WayStatisticians often dividetheir subject into twomajor branches.Descriptive statistics isthe branch that dealswith describing data inthe form of tables,graphs, or sample statis-tics. Inferential statistics isthe branch that dealswith inferring (or estimat-ing) population charac-teristics from sampledata.

BASIC STEPS IN A STATISTICAL STUDY

1. State the goal of your study precisely. That is, determine the population youwant to study and exactly what you’d like to learn about it.

2. Choose a representative sample from the population.3. Collect raw data from the sample and summarize these data by finding sample

statistics of interest.4. Use the sample statistics to infer the population parameters.5. Draw conclusions: Determine what you learned and whether you achieved your

goal.

benn.8206.05.pgs 12/15/06 8:22 AM Page 324

Page 5: Chapter 5

5A Fundamentals of Statistics 325

❉EXAMPLE 2 Unemployment SurveyEach month, the U.S. Labor Department surveys 60,000 households to determinecharacteristics of the U.S. work force. One population parameter of interest is theU.S. unemployment rate, defined as the percentage of people who are unemployedamong all those who are either employed or actively seeking employment. Describehow the five basic steps of a statistical study apply to this research.

SOLUTION The steps apply as follows.

Step 1. The goal of the research is to learn about the employment (or unem-ployment) within the population of all Americans who are eitheremployed or actively seeking employment.

Step 2. The Labor Department chooses a sample consisting of people employedor seeking employment in 60,000 households.

Step 3. The Labor Department asks questions of the people in the sample, andtheir responses constitute the raw data for the research. The Departmentthen consolidates these data into sample statistics, such as the percentageof people in the sample who are unemployed.

Step 4. Based on the sample statistics, the Labor Department makes estimates ofthe corresponding population parameters, such as the unemploymentrate for the entire United States.

Step 5. The Labor Department draws conclusions based on the populationparameters and other information. For example, it might use the currentand past unemployment rates to draw conclusions about whether jobshave been created or lost. Now try Exercises 31–36.

Choosing a SampleChoosing a sample may be the most important step in any statistical study. If the sam-ple fairly represents the population as a whole, then it’s reasonable to make inferencesfrom the sample to the population. But if the sample is not representative, then there’slittle hope of drawing accurate conclusions about the population.

Suppose you want to determine the average height and weight of students at alarge university by measuring the heights and weights of a sample of 100 students. Asample consisting only of members of the football and basketball teams would not bereliable, because these athletes tend to be larger than most students. In contrast, sup-pose you select your sample with a computer program that randomly draws studentnumbers from the entire university population. In this case, the 100 students in yoursample are likely to be representative of the entire student body. You can thereforeexpect that the average height and weight of students in the sample are reasonableestimates of the averages for all students.

Now try Exercises 37–38. ➽

By the WayAccording to the LaborDepartment, someonewho is not working is notnecessarily unemployed.For example, stay-at-home moms and dadsare not counted amongthe unemployed unlessthey are actively tryingto find a job, and peo-ple who had been try-ing to find work butgave up in frustrationare not counted asunemployed.

DEFINITION

A representative sample is a sample in which the relevant characteristics of thesample members match those of the population.

benn.8206.05.pgs 12/15/06 8:22 AM Page 325

Page 6: Chapter 5

326 CHAPTER 5 Statistical Reasoning

A sample drawn with a computer program that selects students at random is anexample of a simple random sample. More technically, simple random samplingmeans that every sample of a particular size has the same chance of being selected. Inthe case of the student sample, every set of 100 students has an equal chance of beingselected by the computer program.

Simple random sampling is usually the best way to choose a representative sample.However, it is not always practical or necessary, so other sampling techniques aresometimes used. The following box summarizes four of the most common samplingtechniques, and Figure 5.2 illustrates the ideas.

COMMON SAMPLING METHODS

Simple random sampling: We choose a sample of items in such a way that everysample of a given size has an equal chance of being selected.

Systematic sampling: We use a simple system to choose the sample, such asselecting every 10th or every 50th member of the population.

Convenience sampling: We use a sample that is convenient to select, such as peo-ple who happen to be in the same classroom.

Stratified sampling: We use this method when we are concerned about differ-ences among subgroups, or strata, within a population. We first identify the sub-groups and then draw a simple random sample within each subgroup. The totalsample consists of all the samples from the individual subgroups.

Every sample of the same size has an equal chance of being selected. Computers are often used to generate random telephone numbers.

Simple Random Sampling:

Partition the population into at least two strata, then draw a sample from each.

Stratified Sampling:Systematic Sampling:

Use results that are readily available.Convenience Sampling:

Hey!Do you support

the deathpenalty?

Select every kth member.

FIGURE 5.2 Common sampling techniques.

benn.8206.05.pgs 12/15/06 8:22 AM Page 326

Page 7: Chapter 5

5A Fundamentals of Statistics 327

By the WayNeanderthals livedbetween about 100,000and 30,000 years ago inEurasia and northernAfrica. They were physio-logically distinct frommodern humans, but sci-entists are not yet surewhether they repre-sented a separatespecies or could inter-breed with Homo sapi-ens. Neanderthalsdeveloped manyaspects of culture,including caring for thesick and burying theirdead. Skull measure-ments suggest thatNeanderthals had largerbrains than modernhumans.

Regardless of what type of sampling is used, always keep the following two keyideas in mind:

• No matter how a sample is chosen, the study can be successful only if the sampleis representative of the population.

• Even if a sample is chosen in the best possible way, it is still just a sample (asopposed to the entire population). Thus, we can never be sure that a sample is rep-resentative of the population. In general, a larger sample is more likely to be rep-resentative of the population, as long as it is chosen well.

❉EXAMPLE 3 Sampling MethodsIdentify the type of sampling used in each of the following cases, and comment onwhether the sample is likely to be representative of the population.

a. You are conducting a survey of students in a dormitory. You choose yoursample by knocking on the door of every 10th room.

b. To survey opinions on a possible property tax increase, a research firm ran-domly draws the addresses of 150 homeowners from a public list of allhomeowners.

c. Agricultural inspectors for Jefferson County check the levels of residue fromthree common pesticides on 25 ears of corn from each of the 104 corn-producing farms in the county.

d. Anthropologists determine the average brain size of early Neanderthals inEurope by studying skulls found at three sites in southern Europe.

SOLUTION

a. Choosing every 10th room makes this a systematic sample. The sample maybe representative, as long as students were randomly assigned to rooms.

b. The records presumably list all homeowners, so drawing randomly fromthis list produces a simple random sample. It has a good chance of beingrepresentative of the population.

c. Each farm may have different pesticide use, so the inspectors consider cornfrom each farm as a subgroup (stratum) of the full population. By checking25 ears of corn from each of the 104 farms, the inspectors are using strati-fied sampling. If the ears are collected randomly on each farm, each set of25 is likely to be representative of its farm.

d. By studying skulls found at selected sites, the anthropologists are using aconvenience sample. They have little choice, because only a few skullsremain from the many Neanderthals who once lived in Europe. However, itseems reasonable to assume that these skulls are representative of the largerpopulation. Now try Exercises 39–44.

Watching Out for BiasConsider a study designed to estimate the average weight of all men at a college. As wediscussed earlier, a sample consisting only of football players would not be representa-tive of the population with respect to weight. We say that this sample is biased becausethe men in the sample differ in a critical way from “typical” men at the college. Moregenerally, the term bias refers to any problem in the design or conduct of a statisticalstudy that tends to favor certain results.

benn.8206.05.pgs 12/15/06 8:22 AM Page 327

Page 8: Chapter 5

328 CHAPTER 5 Statistical Reasoning

Besides occurring in a poorly chosen sample, bias can arise in many other ways.For example, a researcher may be biased if he or she has a personal stake in the out-come of the study. In that case, the researcher might distort (intentionally or uninten-tionally) the true meaning of the data. You should always be on the lookout for anytype of bias that may affect the results or interpretation of a statistical study. We’ll dis-cuss sources of bias further in Unit 5B.

Types of Statistical StudyBroadly speaking, most statistical studies fall into one of two categories: observationalstudies and experiments. Nielsen’s studies of television viewing are observationalbecause they are designed to observe the television-viewing behavior of the people inits 5000 sample homes. Note that observational studies may still involve some inter-action. For example, an opinion poll is observational, even though researchers mayconduct in-depth interviews, because the poll’s goal is to learn (observe) people’sopinions, not to change them. Similarly, a study in which individuals in the sample areweighed is also observational, because the measurement process records (observes)but does not change a person’s weight.

In contrast, consider a medical study designed to test whether large doses of vita-min C can help prevent colds. To conduct this study, the researchers must ask somepeople in the sample to take large doses of vitamin C. This type of statistical study iscalled an experiment, because some participants receive a treatment (in this case,vitamin C) that they would not otherwise receive.

It is difficult to determine whether an experimental treatment works unless youcompare groups that receive the treatment to groups that don’t. In the vitamin Cstudy, for example, researchers might create two groups of people: a treatment

DEFINITION

A statistical study suffers from bias if its design or conduct tends to favor certainresults.

Time out to thinkThinking about issues of bias, explain why television networks use Nielsen to measureratings rather than doing it themselves.

TWO BASIC TYPES OF STATISTICAL STUDY

1. In an observational study, researchers observe or measure characteristics of thesample members but do not attempt to influence or modify these characteristics.

2. In an experiment, researchers apply a treatment to some or all of the samplemembers and then look to see whether the treatment has any effects.

benn.8206.05.pgs 12/15/06 8:22 AM Page 328

Page 9: Chapter 5

5A Fundamentals of Statistics 329

group that takes large doses of vitamin C and a control group that does not takevitamin C. The researchers can then look for differences in the numbers of coldsamong people in the two groups. Having a control group is usually crucial to inter-preting the results of experiments.

In an experiment, it is very important for the treatment and control groups to bealike in all respects except for the treatment. For example, if the treatment group con-sisted of active people with good diets and the control group consisted of sedentarypeople with poor diets, we could not attribute any differences in colds to vitamin Calone. To avoid this type of problem, assignments to the control and treatment groupsmust be done randomly.

The Placebo Effect and BlindingFor experiments involving people, using a treatment and a control group might notbe enough to get reliable results. The problem is that people can be affected by theirbeliefs as well as by real treatments. For example, stress and other psychological fac-tors have been shown to affect resistance to colds. If people taking vitamin C getfewer colds than people who don’t, we can’t conclude that the vitamin C was respon-sible. It might be that people stayed healthier because they believed that vitamin Cworks. Therefore, people in the control group should be given a placebo—in thiscase, pills that look like vitamin C pills but don’t actually contain vitamin C. As longas the participants don’t know whether they are in the treatment or control group(that is, whether they got the real pills or the placebo), any effect arising from psycho-logical factors—known as a placebo effect—should affect both groups equally. Then,if people in the vitamin C group get fewer colds than people in the control group, wehave evidence that vitamin C really works.

With proper treat-ment, a cold can becured in a week. Leftto itself, it may lingerfor seven days.

—A MEDICAL FOLK SAYING

By the WayThe placebo effect canbe surprisingly powerful.Consider a drug nowused to combat bald-ing, which was tested onbalding men. The drugmaker was pleased tolearn that 86% of themen receiving the drugeither stopped baldingor grew new hair. Butremarkably, so did 42%of the men whoreceived the placebo!In other studies, as manyas 75% of participantsreceiving a placebohave actually improved.

TREATMENT AND CONTROL GROUPS

The treatment group in an experiment is the group of sample members whoreceive the treatment being tested.

The control group in an experiment is the group of sample members who do notreceive the treatment being tested.

It is important for the treatment and control groups to be selected randomly andto be alike in all respects except for the treatment.

DEFINITIONS

A placebo lacks the active ingredients of a treatment being tested in a study, but isidentical in appearance to the treatment. Thus, study participants cannot distin-guish the placebo from the real treatment.

The placebo effect refers to the situation in which patients improve simplybecause they believe they are receiving a useful treatment.

benn.8206.05.pgs 12/15/06 8:22 AM Page 329

Page 10: Chapter 5

330 CHAPTER 5 Statistical Reasoning

In statistical terminology, the practice of keeping people in the dark about who isin the treatment group and who is in the control group is called blinding. A single-blind experiment is one in which the participants don’t know which group theybelong to, but the experimenters (the people administering the treatment) do know.Using a placebo is one way to create a single-blind experiment. Sometimes, a single-blind experiment can still be unreliable if the experimenters can subtly influenceoutcomes. For example, in an experiment that involves interviews, the experi-menters might speak differently to people who received the real treatment than tothose who received the placebo. This type of problem can be avoided by making theexperiment double-blind, which means neither the participants nor the experi-menters know who belongs to each group. (Of course, someone must keep track ofthe two groups in order to evaluate the results at the end. In typical double-blindexperiments, researchers hire experimenters to make any necessary contact with theparticipants.)

❉EXAMPLE 4 What’s Wrong with This Experiment?For each of the experiments described below, identify any problems and explain howthe problems could have been avoided.

a. A chiropractor wants to know if his adjustments relieve back pain. He per-forms adjustments on 25 patients with back pain. Afterward, 18 of thepatients say they feel better. He concludes that the adjustments are an effec-tive treatment.

b. A new drug for attention deficit disorder (ADD) is supposed to make theaffected children more polite. Randomly selected children suffering fromADD are divided into treatment and control groups. Those in the controlgroup receive a placebo that looks just like the real drug. The experiment issingle-blind. Experimenters interview the children one-on-one to decidewhether they became more polite.

SOLUTION

a. The 25 patients who receive adjustments represent a treatment group, butthis study lacks a control group. The patients may be feeling better becauseof a placebo effect rather than any real effect of the adjustments. The chiro-practor might have improved his study by hiring an actor to do a fakeadjustment (one that feels like a real manipulation, but doesn’t actually con-

BLINDING IN EXPERIMENTS

An experiment is single-blind if the participants do not know whether they aremembers of the treatment group or members of the control group, but the experi-menters do know.

An experiment is double-blind if neither the participants nor the experimenters(people administering the treatment) know who belongs to the treatment groupand who belongs to the control group.

benn.8206.05.pgs 12/15/06 8:23 AM Page 330

Page 11: Chapter 5

5A Fundamentals of Statistics 331

DILBERT reprinted by permission of United Feature Syndicate, Inc.

form to chiropractic guidelines) on a control group. Then he could havecompared the results in the two groups to see whether a placebo effect wasinvolved.

b. Because the experimenters know which children received the real drug, dur-ing the interviews they may inadvertently speak differently or interpretbehavior differently with these children. In that case, their conclusionsmight not be valid. The experiment should have been double-blind, so thatthe experimenters conducting the interviews would not have known whichchildren received the real drug and which children received the placebo.

Now try Exercises 45–50.

Case-Control StudiesSometimes it may be impractical or unethical to conduct an experiment. For example,suppose we want to study how alcohol consumed during pregnancy affects newbornbabies. Because it is already known that alcohol can be harmful during pregnancy, itwould be unethical to divide a sample of pregnant mothers randomly into two groupsand then force the members of one group to consume alcohol. However, we may beable to conduct a case-control study, in which the participants naturally form groupsby choice. In this example, the cases consist of mothers who consume alcohol duringpregnancy by choice, and the controls consist of mothers who choose not to consumealcohol.

A case control study is observational because the researchers do not change thebehavior of the participants. But it also resembles an experiment because the caseseffectively represent a treatment group and the controls represent a control group.

DEFINITIONS

A case-control study is an observational study that resembles an experimentbecause the sample naturally divides into two (or more) groups. The participantswho engage in the behavior under study form the cases, which makes them like atreatment group in an experiment. The participants who do not engage in thebehavior are the controls, making them like a control group in an experiment.

benn.8206.05.pgs 12/15/06 8:23 AM Page 331

Page 12: Chapter 5

332 CHAPTER 5 Statistical Reasoning

❉EXAMPLE 5 Which Type of Study?For each of the following questions, what type of statistical study is most likely to leadto an answer? Why?

a. What is the average income of stock brokers?b. Do seat belts save lives?c. Can lifting weights improve runners’ times in a 10-kilometer race?d. Can a new herbal remedy reduce the severity of colds?

SOLUTION

a. An observational study can tell us the average income of stock brokers. Weneed only survey (observe) the brokers.

b. It would be unethical to do an experiment in which some people were toldto wear seat belts and others were told not to wear them. Instead, we canconduct an observational case-control study. Some people choose to wear seatbelts (the cases) and others choose not to wear them (the controls). By com-paring the death rates in accidents between cases and controls, we can learnwhether seat belts save lives. (They do.)

c. We need an experiment to determine whether lifting weights can improverunners’ 10K times. One group of runners will be put on a weight-liftingprogram, and a control group will be asked to stay away from weights. Wemust try to ensure that all other aspects of their training are similar. Thenwe can see whether the runners in the lifting group improve their timesmore than those in the control group. Note that we cannot use blinding inthis experiment because there is no way to prevent participants from know-ing whether they are lifting weights.

d. We should use a double-blind experiment, in which some participants get theactual remedy while others get a placebo. We need double-blind condi-tions because the severity of a cold may be affected by mood or other fac-tors that experimenters might inadvertently influence.

Now try Exercises 51–56.

Surveys and Opinion PollsSurveys and opinion polls may be the most common types of statistical study, and wemust be very careful when we interpret them. Fortunately, survey and poll results usu-ally include something called the margin of error.

Suppose a poll finds that 76% of the public supports the President, with a marginof error of 3 percentage points. The 76% is a sample statistic; that is, 76% of the peo-ple in a sample said they support the President. The margin of error helps us under-stand how well this sample statistic is likely to approximate the true populationparameter (in this case, the percentage of all Americans who support the President).By adding and subtracting the margin of error from the sample statistic, we find arange of values, or a confidence interval, likely to contain the population parameter.In this case, we add and subtract 3 percentage points to find a confidence intervalfrom 73% to 79%.

By the WayPoliticians and mar-keters often pretendthey are trying to con-duct a true opinion pollor survey when, in fact,they are deliberatelytrying to get particularresults. These types ofsurveys are called pushpolls because they tryto “push” people’sopinions.

benn.8206.05.pgs 12/15/06 8:23 AM Page 332

Page 13: Chapter 5

5A Fundamentals of Statistics 333

DEFINITION

The margin of error in a statistical study is used to describe a confidence inter-val that is likely to contain the true population parameter. We find this interval bysubtracting and adding the margin of error from the sample statistic obtained inthe study. That is, the confidence interval is

to Asample statistic 1 margin of error B from Asample statistic 2 margin of error B

How confident can we be in a poll result? Unless we are told otherwise, we assumethat the margin of error is defined to give us 95% confidence that the confidenceinterval contains the population parameter. We’ll discuss the precise meaning of “95%confidence” in Unit 6D, but for now you can think of it as follows: If the poll wererepeated 20 times with 20 different samples, 19 of the 20 polls (that is, 95% of thepolls) would have a confidence interval that contains the true population parameter.

❉EXAMPLE 6 Close ElectionAn election eve poll finds that 52% of surveyed voters plan to vote for Smith, and sheneeds a majority (more than 50%) to win without a runoff. The margin of error in thepoll is 3 percentage points. Will she win?

SOLUTION We subtract and add the margin of error of 3 percentage points to find aconfidence interval

We can be 95% confident that the actual percentage of people planning to vote forher is between 49% and 55%. Because this confidence interval leaves open the possi-bility of both a majority and less than a majority, this election is too close to call.

Now try Exercises 57–60. ➽

from 52% 2 3% 5 49% to 52% 1 3% 5 55%

Time out to thinkIn Example 6, suppose the poll found the candidate had 55% of the vote. Shouldshe be confident of a win?

benn.8206.05.pgs 12/15/06 8:23 AM Page 333

Page 14: Chapter 5

334 CHAPTER 5 Statistical Reasoning

EXERCISES 5A

QUICK QUIZChoose the best answer to each of the following questions.Explain your reasoning with one or more complete sentences.

1. You conduct a poll in which you randomly select 1000 reg-istered voters from Texas and ask if they approve of the jobtheir governor is doing. The population for this study is

a. all registered voters in the state of Texas.

b. the 1000 people that you interview.

c. the governor of Texas.

2. Results of the poll described in Exercise 1 would mostlikely suffer from bias if you chose the participants from

a. all registered voters in Texas.

b. all people with a Texas drivers license.

c. people who donated money to the governor’s campaign.

3. When we say that a sample is representative of the popula-tion, we mean that

a. the results found for the sample are similar to those wewould find for the entire population.

b. the sample is very large.

c. the sample was chosen in the best possible way.

4. Consider an experiment designed to see whether cashincentives improve school attendance. The researcherchooses two groups of 100 high school students. She offersone group $10 for every week of perfect attendance. Shetells the other group that they are part of an experimentbut does not give them any incentive. The students who donot receive an incentive represent

a. the treatment group. b. the control group.

c. the observation group.

5. The experiment described in Exercise 4 is

a. single-blind. b. double-blind. c. not blind.

6. The purpose of a placebo is

a. to prevent participants from knowing whether theybelong to the treatment group or the control group.

b. to distinguish between the cases and the controls in acase-control study.

c. to determine whether diseases can be cured without anytreatment.

7. If we see a placebo effect in an experiment to test a newtreatment designed to cure warts, we know that

a. the experiment was not properly double-blind.

b. the experimental groups were too small.

c. warts were cured among members of the control group.

8. An experiment is single-blind if

a. it lacks a treatment group. b. it lacks a control group.

c. the participants do not know whether they belong to thetreatment or control group.

9. Poll X predicts that Powell will receive 49% of the vote,while Poll Y predicts that he will receive 53% of the vote.Both polls have a margin of error of 3 percentage points.What can you conclude?

a. One of the two polls must have been conducted poorly.

b. The two polls are consistent with each other.

c. Powell will receive 51% of the vote.

10. A survey reveals that 12% of Americans believe Elvis is stillalive, with a margin of error of 4 percentage points. Theconfidence interval for this poll is

a. from 10% to 14%. b. from 8% to 16%.

c. from 4% to 20%.

REVIEW QUESTIONS11. Why do we say that the term statistics has two meanings?

Describe both meanings.

12. Define the terms population, sample, population parameter,and sample statistics as they apply to statistical studies.

13. Describe the five basic steps in a statistical study, and givean example of their application.

14. Why is it so important that a statistical study use a repre-sentative sample? Briefly describe four common samplingmethods.

15. What is bias? How can it affect a statistical study? Giveexamples of several forms of bias.

16. Describe and contrast observational studies and experi-ments. What do we mean by the treatment group andcontrol group in an experiment? What do we mean by thecases and controls in an observational case-control study?

benn.8206.05.pgs 12/15/06 8:23 AM Page 334

Page 15: Chapter 5

5A Fundamentals of Statistics 335

17. What is a placebo? Describe the placebo effect and how itcan make experiments difficult to interpret. How can mak-ing an experiment single-blind or double-blind help?

18. What is meant by the margin of error in a survey or opin-ion poll? How is it used to identify a confidence interval?

DOES IT MAKE SENSE?Decide whether each of the following statements makes sense(or is clearly true) or does not make sense (or is clearly false).Explain your reasoning.

19. In my experimental study, I used a sample that was largerthan the population.

20. I followed all the guidelines for sample selection carefully,yet my sample still did not reflect the characteristics of thepopulation.

21. I wanted to test the effects of vitamin C on colds, so I gavethe treatment group vitamin C and gave the control groupvitamin D.

22. I don’t believe the results of the experiment, because theresults were based on interviews but the study was notdouble-blind.

23. The pre-election poll found that Kennedy would get 58%of the vote, with a margin of error of 4%, but he ended uplosing the election.

24. By choosing my sample carefully, I can make a good esti-mate of the average height of Americans by measuring theheights of only 500 people.

BASIC SKILLS & CONCEPTSPopulation and Sample. For the studies described in Exer-cises 25–30, describe the population, sample, population param-eters, and sample statistics.

25. In order to gauge public opinion on how to handle Iran’sgrowing nuclear program, the Pew Research Center sur-veyed 1001 Americans by telephone.

26. Astronomers typically determine the distance to a galaxy (agalaxy is a huge collection of billions of stars) by measuringthe distances to just a few stars within it and taking themean (average) of these distance measurements.

27. In a USA Today Internet poll, readers responded voluntar-ily to the question “Do you consume at least one caf-feinated beverage every day?”

28. The Gallup Organization conducted a poll of 1003 Ameri-cans in its household panel who plan to take a summervacation to determine what percentage of people plan tocancel their summer vacation because of the increase ingasoline prices.

29. Harris Interactive surveyed 2435 U.S. adults nationwideand asked them to rate quality of American public schools.

30. The American Institute of Education conducts an annualstudy of attitudes of incoming college students by survey-ing approximately 261,000 first-year students at 462 col-leges and universities. There are approximately 1.6 millionfirst-year college students in this country.

Steps in a Study. Describe how you would apply the five basicsteps of a statistical study to the issues in Exercises 31–36.

31. You want to determine the average number of hours perday students at a middle school spend listening to iPods.

32. As an airline marketing executive, you want to know ifthere has been an increase in frustration with air travelamong business travelers.

33. You want to know the percentage of male college studentsin America who do Sudoku puzzles at least once per week.

34. You want to know the typical percentage of the bill that isleft as a tip in restaurants.

35. You want to know the average lifetime of windshieldwipers on cars made in Japan.

36. You want to know the percentage of high school studentswho are vegetarians.

37. Representative Sample? You want to determine themean (average) number of hours spend studying each weekby high school girls. Which of the following samples ismost likely to be representative, and why? Also explainwhy each of the other choices is not likely to make a repre-sentative sample for this study.

• The girls’ track team

• The girls in an advanced placement calculus course

• The girls in the cast of the current theater production

• The first 50 girls you meet in the school cafeteria

38. Representative Sample? You want to determine the typi-cal dietary habits of students at a college. Which of the fol-lowing would make the best sample, and why? Also explainwhy each of the other choices would not make a good sam-ple for this study.

• Students in a single dormitory

• Students majoring in public health

• Students who participate in intercollegiate sports

• Students enrolled in a required mathematics class

Identify the Sampling Method. Exercises 39–44 eachdescribe a sample. Identify the sampling method as simple ran-dom sampling, systematic sampling, convenience sampling, or

benn.8206.05.pgs 10/1/07 9:38 AM Page 335

Page 16: Chapter 5

336 CHAPTER 5 Statistical Reasoning

stratified sampling. Briefly explain why you think this samplingmethod was chosen.

39. An IRS (Internal Revenue Service) auditor randomlyselects for audits 30 taxpayers in each of the filing statuscategories: single, head of household, married filing jointly,and married filing separately.

40. People magazine chooses its “25 most beautiful women” bylooking at responses from readers who voluntarily mail in asurvey printed in the magazine.

41. A study of the use of antidepressants selects 50 participantswhose ages are between 20 and 29, 50 participants whoseages are between 30 and 39, and 50 participants whoseages are between 40 and 49.

42. Every 100th computer chip that is produced is given a reli-ability test.

43. A computer randomly selects 400 names from a list of allregistered voters. Those selected are surveyed to predictwho will win the election for Mayor.

44. A taste test for chips and salsa is given at the entrance to asupermarket.

Type of Study. For Exercises 45–50, state whether the study isan observational study or an experiment. If it is an experiment,describe the treatment and control groups and discuss whethersingle- or double-blinding is needed. If it is observational, statewhether it is a case-control study and, if it is, distinguishbetween the cases and the controls.

45. A study at the University of Southern California separated108 volunteers into groups, based on psychological testsdesigned to determine how often they lied and cheated.Those with a tendency to lie had different brain structuresthan those who did not lie (British Journal of Psychiatry).

46. A National Cancer Institute study of 716 melanomapatients and 1014 cancer-free patients matched by age, sex,and race found that those having a single large mole hadtwice the risk of melanoma. Having 10 or more moles wasassociated with a 12 times greater risk of melanoma(Journal of the American Medical Association).

47. In a study done at Boston University, researchers tooksnapshots of 4000 white adults every four years for 30 yearsand determined that 9 of 10 men and 7 of 10 women willeventually become overweight (Annals of Internal Medicine).

48. A breast cancer study began by asking 25,624 women ques-tions about how they spent their leisure time. The healthof these women was tracked over the next 15 years. Thosewomen who said they exercise regularly were found tohave a lower incidence of breast cancer (New England Jour-nal of Medicine).

49. A (hypothetical) study of 45 swimmers found that thosewho were placed on a weight-training regimen in additionto daily swimming workouts improved their times by 3.5%.

50. A survey of 275,811 first-year college students revealedthat 32.4% of these students had an A average in highschool (Higher Education Research Institute).

Which Type of Study? For each of the questions in Exercises51–56, what type of statistical study is most likely to lead to ananswer? Why?

51. How many hours per week does the average public schoolteacher work?

52. What is the percentage of American voters who favor aconstitutional amendment banning gay marriages?

53. Do teenagers with a diet high in dairy products have ahigher incidence of acne?

54. Do drivers of the same model car get better mileage withhigh-ethanol fuel?

55. Does a multi-vitamin a day reduce the incidence ofstrokes?

56. Are the Sunday horoscopes in a local newspaper moreaccurate than the weekday horoscopes?

Margin of Error. Each of Exercises 57–60 states both a samplestatistic and a margin of error. Find the confidence interval ineach case, and answer any additional questions asked. Be sure toexplain your answers clearly.

57. A poll is conducted the day before a state election for Sen-ator. There are only two candidates running. The pollshows that 53% of the voters surveyed favor the Republi-can candidate, with a margin of error of 2.5 percentagepoints. Should the Republican plan a victory party? Whyor why not?

58. A poll is conducted the day before an election for U.S.Representative. There are only two candidates running.The poll shows that 48.5% of the voters surveyed favor theDemocratic candidate, with a margin of error of 2.0 per-centage points. Based on this poll, should the Democraticcandidate expect to lose the election? Why or why not?

59. Of 133 adult Americans surveyed in a Gallup poll who saidtheir vacation plans had changed because of high gasolineprices, 58% said they had changed their destination orshortened their trip. With a margin of error of 9.0 per-centage points, can you say that a majority of Americanschanged their destination or shortened their trip?

60. In a survey of 1002 people, 701 (which is 70%) said thatthey voted in the most recent presidential election (based

benn.8206.05.pgs 12/15/06 8:23 AM Page 336

Page 17: Chapter 5

5A Fundamentals of Statistics 337

on data from ICR Research Group). The margin of errorfor the survey was 3 percentage points. However, actualvoting records show that only 61% of all eligible votersactually did vote. Does this necessarily imply that peoplelied when they answered the survey?

65. In a TIME/CNN poll, 748 adults were asked whether theybelieved their children would have a higher standard of liv-ing than they have; 63% of those polled said “yes.” Themargin of error was 3.7 percentage points.

66. A Gallup poll of 1002 American adults determined that81% of those surveyed believed that the state of moral val-ues in the country overall was getting worse. The marginof error was 3.2 percentage points.

67. Based on its survey of 60,000 households (see Example 2),the U.S. Labor Department reported an unemploymentrate of 6.4% in June 2003. The margin of error for thereport was 0.2 percentage point.

68. The Pew Research Center asked 1546 adult Americanswhether humans would land on Mars within the next50 years; 76% of these people said either “definitely yes”or “probably yes.” The margin of error for the poll was2.5 percentage points.

69. A Fox News opinion poll asked 900 registered voters, “Doyou personally think the government is listening to yourphone conversations?” Thirty percent of those surveyedresponded “yes” and 58% responded “no.” The margin oferror was 3.0 percentage points.

70. A Roper Organization survey of 2000 adults revealed that64% of those surveyed kept money in a regular savingsaccount. The margin of error for the survey was 2.2 per-centage points.

WEB PROJECTSFind useful links for Web Projects on the text Web site:www.aw.com/bennett-briggs

71. Current Nielsen Ratings. Find the Nielsen ratings forthe past week. What were the three most popular televi-sion shows? Explain both the “rating” and the “share” foreach show.

72. Nielsen Sample. Use information available on theNielsen Media Research Web site to answer each of thefollowing questions.

a. How does Nielsen select the sample of homes to beincluded in a viewer survey?

b. Describe a few ways by which Nielsen attempts tocheck that the results from its people meter surveys areaccurate.

c. Based on what you have learned, do you think theNielsen ratings are reliable? If so, why? If not, whynot?

FURTHER APPLICATIONSExperiment Results. Consider an experiment designed todetermine the effectiveness of a new drug. The drug is given toparticipants in the treatment group, while participants in thecontrol group receive a placebo. For each set of results describedin Exercises 61–64, discuss whether there appears to be evidencethat the treatment is effective.

61. 70% of those in the treatment group showed improve-ment; 30% of those in the placebo group showedimprovement.

62. 45% of those in the treatment group showed improve-ment; 45% of those in the placebo group showedimprovement.

63. 90% of those in the treatment group showed improve-ment; 50% of those in the placebo group showedimprovement.

64. 25% of those in the treatment group showed improve-ment; 50% of those in the placebo group showedimprovement.

Interpreting Real Studies. For each of Exercises 65–70, dothe following:

a. Identify the population and the population parameter ofinterest.

b. Briefly describe the sample and sample statistic for thestudy.

c. Find the confidence interval likely to contain the populationparameter of interest.

benn.8206.05.pgs 12/15/06 8:23 AM Page 337

Page 18: Chapter 5

338 CHAPTER 5 Statistical Reasoning

73. Attitude Update. The Pew Research Center for the Peo-ple and the Press studies public attitudes toward the press,politics, and public policy issues. Go to its Web site andfind the latest survey about attitudes. Write a one-pagesummary of what Pew surveyed, how it conducted the sur-vey, and what it found.

74. Labor Statistics. Use the Bureau of Labor Statistics Webpage to learn about its monthly survey. Choose one aspectof the survey, such as how the sample is chosen or how it isused to compare unemployment rates over time. Write ashort summary of what you learn.

75. Professional Polling. Visit the Web site of a nationalpolling organization and report on a recent poll. Write ashort description of the poll and its results, commentingon features such as sampling technique, sample size, andmargin of error.

IN THE NEWS76. Statistics in the News. Select three news stories from the

past week that involve statistics in some way. In each case,write one or two paragraphs describing the role of statisticsin the story.

77. Statistics in Your Major. Write two to three paragraphsdescribing the ways in which you think the science of sta-tistics is important in your major field of study. (If you have

not chosen a major, answer this question for a major thatyou are considering.)

78. Statistics in Sports. Choose a sport and describe threedifferent statistics commonly tracked by participants in orspectators of the sport. In each case, briefly describe theimportance of the statistic to the sport.

79. Sample and Population. Find a report in today’s newsconcerning any type of statistical study. What is the popu-lation being studied? What is the sample? Why do youthink the sample was chosen as it was?

80. Poor Sampling. In a recent newspaper or magazine, findan article about a study that attempts to describe somecharacteristic of a population, but that you believe involvedpoor sampling (for example, a sample that was too small orunrepresentative of the population under study). Describethe population, the sample, and what you think was wrongwith the sample. Briefly discuss how you think the poorsampling affected the study results.

81. Good Sampling. In a recent newspaper or magazine, findan article that describes a statistical study in which thesample was well chosen. Describe the population, the sam-ple, and why you think the sample was a good one.

82. Margin of Error. Find a report of a recent survey or poll.Interpret the sample statistic and margin of error quotedfor the survey or poll.

UNIT 5B Should You Believe a Statistical Study?

Most statistical research is carried out with integrity and care. Nevertheless, statisticalresearch is sufficiently complex that bias can arise in many different ways. We shouldalways examine reports of statistical research carefully, looking for anything thatmight make us question the results. In this unit, we discuss eight guidelines that canhelp you answer the question “Should I believe a statistical study?”

Guideline 1: Identify the Goal, Population, and Type of Study

Before evaluating the details of a statistical study, we must know what it is about.Based on what you hear or read, try to answer basic questions such as these:

• What was the goal of the study?

• What was the population under study? Was the population clearly and appropri-ately defined?

• What type of study was used? Was the type appropriate for the goal?

benn.8206.05.pgs 12/15/06 8:23 AM Page 338

Page 19: Chapter 5

5B Should You Believe a Statistical Study? 339

If you can’t find reasonable answers to these questions, it will be difficult to evaluateother aspects of the study.

❉EXAMPLE 1 Appropriate Type of Study?A newspaper reports: “Researchers gave each of the 100 participants their astrologicalhoroscopes, and asked them whether the horoscopes appeared to be accurate. Eighty-five percent of the participants reported that the horoscopes were accurate. Theresearchers concluded that horoscopes are valid most of the time.” Analyze this studyaccording to Guideline 1.

SOLUTION The goal of the study was to determine the validity of horoscopes. Basedon the news report, it appears that the study was observational: The researchers simplyasked the participants about the accuracy of the horoscopes. However, because theaccuracy of a horoscope is somewhat subjective, this study should have been a con-trolled experiment in which some people were given their actual horoscope and oth-ers were given a fake horoscope. Then the researchers could have looked fordifferences between the two groups. Moreover, because researchers could easily influ-ence the results by how they questioned the participants, the experiment should havebeen double-blind. In summary, the type of study was inappropriate to the goal andits results are meaningless. Now try Exercises 19–20.

Guideline 2: Consider the SourceStatistical studies are supposed to be objective, but the people who carry them out andfund them may be biased. Thus, it is important to consider the source of a study andevaluate the potential for biases that might invalidate its conclusions.

❉EXAMPLE 2 Is Smoking Healthy?By 1963, enough research on the health dangers of smoking hadaccumulated that the Surgeon General of the United States publiclyannounced that smoking is bad for health. Research done since thattime has built further support for this claim. However, while thevast majority of studies show that smoking is unhealthy, a few stud-ies found no dangers from smoking, and perhaps even healthbenefits. These studies generally were carried out by the TobaccoResearch Institute, funded by the tobacco companies. Analyze theTobacco Research Institute studies according to Guideline 2.

SOLUTION Tobacco companies had a financial interest in mini-mizing the dangers of smoking. Because the studies carried out atthe Tobacco Research Institute were funded by the tobacco compa-nies, there may have been pressure on the researchers to produceresults to the companies’ liking. This potential for bias does notmean their research was biased, but the fact that it contradicts virtu-ally all other research on the subject should be cause for concern.

Now try Exercises 21–22. ➽

By the WaySurveys show that nearlyhalf of Americansbelieve their horo-scopes. However, in con-trolled experiments, thepredictions of horo-scopes come true nomore often than wouldbe expected bychance.

Copyright © 1998, 2004 by Sidney Harris.

benn.8206.05.pgs 12/15/06 8:23 AM Page 339

Page 20: Chapter 5

340 CHAPTER 5 Statistical Reasoning

Guideline 3: Look for Bias in the SampleLook for bias that may prevent the sample from being representative of the popula-tion. There are two particularly common forms of bias that can affect sample selection.

CASE STUDY The 1936 Literary Digest PollThe Literary Digest, a popular magazine of the 1930s, successfully predicted the out-comes of several elections using large polls. In 1936, editors of the Literary Digestconducted a particularly large poll in advance of the presidential election. They ran-domly chose a sample of 10 million people from various lists, including names in tele-phone books and rosters of country clubs. They mailed a postcard “ballot” to each ofthese 10 million people. About 2.4 million people returned the postcard ballots. Basedon the returned ballots, the editors of the Literary Digest predicted that Alf Landonwould win the presidency by a margin of 57% to 43% over Franklin Roosevelt.Instead, Roosevelt won with 62% of the popular vote. How did such a large survey goso wrong?

The sample suffered from both selection bias and participation bias. The selectionbias arose because the Literary Digest chose its 10 million names in ways that favoredaffluent people. For example, selecting names from telephone books meant choosingonly from those who could afford telephones back in 1936. Similarly, country clubmembers are usually quite wealthy. The selection bias favored Landon because hewas the Republican, and affluent voters of the 1930s tended to vote for Republicancandidates.

The participation bias arose because return of the postcard ballots was voluntary.People who felt most strongly about the election were more likely to be among thosewho returned their postcard ballots. This bias also tended to favor Landon because hewas the challenger—people who did not like President Roosevelt could express theirdesire for change by returning the postcards. Together, the two forms of bias madethe sample results useless, despite the large number of people surveyed.

BIAS IN CHOOSING A SAMPLE

Selection bias occurs whenever researchers select their sample in a way that tendsto make it unrepresentative of the population. For example, a pre-election pollthat surveys only registered Republicans has selection bias because it is unlikely toreflect the opinions of all voters.

Participation bias occurs primarily with surveys and polls; it arises wheneverpeople choose whether to participate. Because people who feel strongly about anissue are more likely to participate, their opinions may not represent the larger pop-ulation that is less emotionally attached to the issue. (Surveys or polls in which peo-ple choose whether to participate are often called self-selected or voluntary responsesurveys.)

HISTORICAL NOTE

A young pollster namedGeorge Gallup con-ducted his own surveyprior to the 1936 elec-tion. Sending postcardsto only 3000 randomlyselected people, he cor-rectly predicted not onlythe outcome of theelection, but also theoutcome of the LiteraryDigest poll to within 1%.Gallup went on toestablish a very success-ful polling organization.

By the WayAfter decades of argu-ing to the contrary, in1999 the Philip MorrisCompany—the world’slargest seller of tobaccoproducts—publiclyacknowledged thatsmoking causes lungcancer, heart disease,emphysema, and otherserious diseases. Shortlythereafter, Philip Morrischanged its name toAltria.

benn.8206.05.pgs 12/15/06 8:23 AM Page 340

Page 21: Chapter 5

5B Should You Believe a Statistical Study? 341

❉EXAMPLE 3 Self-Selected PollThe television show Nightline conducted a poll in which viewers were asked whetherthe United Nations headquarters should be kept in the United States. Viewers couldrespond to the poll by paying 50 cents to call a “900” phone number with their opin-ions. The poll drew 186,000 responses, of which 67% favored moving the UnitedNations out of the United States. Around the same time, a poll using simple randomsampling of 500 people found that 72% wanted the United Nations to stay in theUnited States. Which poll is more likely to be representative of the general opinionsof Americans?

SOLUTION The Nightline sample suffered from severe participation bias. Not onlydid viewers choose whether to call in for the survey, but they had to pay to participate.This cost made it even more likely that respondents would be those who felt a needfor change. Thus, despite its large number of respondents, the Nightline survey wastoo biased to be trusted. In contrast, a simple random sample of 500 people is quitelikely to be representative, so the finding of this small survey has a better chance ofrepresenting the true opinions of all Americans. Now try Exercises 23–24.

Guideline 4: Look for Problems in Defining orMeasuring the Variables of Interest

Statistical studies usually attempt to measure something, and we call the things beingmeasured the variables of interest in the study. The term variable simply refers to anitem or quantity that can vary or take on different values. For example, variables inthe Nielsen ratings include show being watched and number of viewers.

By the WayMore than a third of allAmericans routinely shutthe door or hang up thephone when contactedfor a survey, therebymaking self-selection aproblem for legitimatepollsters. One reasonpeople hang up may bethe proliferation of sell-ing under the guise ofmarket research (oftencalled “sugging”), inwhich a telemarketerpretends you are part ofa survey in order to getyou to buy something.

DEFINITION

A variable is any item or quantity that can vary or take on different values. Thevariables of interest in a statistical study are the items or quantities that the studyseeks to measure.

Results of a statistical study may be especially difficult to interpret if the variablesunder study are difficult to define or measure. For example, imagine trying to conducta study of how exercise affects resting heart rates. The variables of interest would beamount of exercise and resting heart rate. However, both variables are difficult to defineand measure. In the case of amount of exercise, it’s not clear what the definition covers:Does it include walking to class? Even if we specify the definition, how can we meas-ure amount of exercise given that some forms of exercise are more vigorous than oth-ers? The following two examples describe real cases in which defining or measuringvariables caused problems in statistical studies.

Time out to thinkHow would you measure your resting heart rate? Describe some difficulties in defin-ing and measuring resting heart rate.

benn.8206.05.pgs 12/15/06 8:23 AM Page 341

Page 22: Chapter 5

342 CHAPTER 5 Statistical Reasoning

❉EXAMPLE 4 Can Money Buy Love?A Roper poll reported in USA Today involved a survey of the wealthiest 1% of Ameri-cans. The survey found that these people would pay an average of $487,000 for “truelove,” $407,000 for “great intellect,” $285,000 for “talent,” and $259,000 for “eternalyouth.” Analyze this result according to Guideline 4.

SOLUTION The variables in this study are very difficult to define. How, for example,do you define “true love”? And does it mean true love for a day, a lifetime, or some-thing else? Similarly, does the ability to balance a spoon on your nose constitute “tal-ent”? Because the variables are so poorly defined, it’s likely that different peopleinterpreted them differently, making the results very difficult to interpret.

Now try Exercise 25.

❉EXAMPLE 5 Illegal Drug SupplyLaw enforcement authorities try to stop illegal drugs from entering the country. Acommonly quoted statistic is that they succeed in stopping only about 10% to 20% ofthe drugs entering the United States. Should you believe this statistic?

SOLUTION There are essentially two variables in the study: quantity of illegal drugsintercepted and quantity of illegal drugs NOT intercepted. It should be relatively easy tomeasure the quantity of illegal drugs that law enforcement officials intercept. How-ever, because the drugs are illegal, it’s unlikely that anyone is reporting the quantity ofdrugs that are not intercepted. How, then, can anyone know that the intercepteddrugs are 10% to 20% of the total? In a New York Times analysis, a police officer wasquoted as saying that his colleagues refer to this type of statistic as “P.F.A.,” for“pulled from the air.” Now try Exercise 26.

Guideline 5: Watch Out for Confounding VariablesVariables that are not intended to be part of the study can sometimes make it difficultto interpret results properly. Such variables are often called confounding variables,because they confound (confuse) a study’s results.

It’s not always easy to discover confounding variables. Sometimes they are discov-ered years after a study was completed, and sometimes they are not discovered at all.Fortunately, confounding variables are sometimes more obvious and can be discov-ered simply by thinking hard about factors that may have influenced a study’sresults.

❉EXAMPLE 6 Radon and Lung CancerRadon is a radioactive gas produced by natural processes (the decay of uranium) in theground. The gas can leach into buildings through the foundation and can accumulatein relatively high concentrations if doors and windows are closed. Imagine a studythat seeks to determine whether radon gas causes lung cancer by comparing the lungcancer rate in Colorado, where radon gas is fairly common, with the lung cancer ratein Hong Kong, where radon gas is less common. Suppose the study finds that the

By the WayMany hardware storessell simple kits that youcan use to test whetherradon gas is accumulat-ing in your home. If it is,the problem can beeliminated by installingan appropriate“radonmitigation”system,whichusually consists of a fanthat blows the radon outfrom under the housebefore it can get in.

benn.8206.05.pgs 12/15/06 8:23 AM Page 342

Page 23: Chapter 5

5B Should You Believe a Statistical Study? 343

lung cancer rates are nearly the same. Is it fair to conclude that radon is not a signifi-cant cause of lung cancer?

SOLUTION The variables under study are amount of radon and lung cancer rate. How-ever, because smoking can also cause lung cancer, smoking rate may be a confoundingvariable in this study. In particular, the smoking rate in Hong Kong is much higherthan the smoking rate in Colorado, so any conclusions about radon and lung cancermust take the smoking rate into account. In fact, careful studies have shown thatradon gas can cause lung cancer, and the U.S. Environmental Protection Agency(EPA) recommends taking steps to prevent radon from building up indoors.

Now try Exercises 27–28.

Guideline 6: Consider the Setting and Wording inSurveys

Even when a survey is conducted with proper sampling and with clearly defined termsand questions, it’s important to watch out for problems in the setting or wording thatmight produce inaccurate or dishonest responses. Dishonest responses are particu-larly likely when the survey concerns sensitive subjects, such as personal habits orincome. For example, the question “Do you cheat on your income taxes?” is unlikelyto elicit honest answers from those who cheat, especially if the setting does not guar-antee complete confidentiality.

In other cases, even honest answers may not be accurate if the wording of ques-tions invites bias. Sometimes just the order of the words in a question can affect theoutcome. A poll conducted in Germany asked the following two questions:

• Would you say that traffic contributes more or less to air pollution than industry?

• Would you say that industry contributes more or less to air pollution than traffic?

With the first question, 45% answered traffic and 32% answered industry. With thesecond question, only 24% answered traffic while 57% answered industry. Thus, sim-ply changing the order of the words traffic and industry dramatically changed the sur-vey results.

❉EXAMPLE 7 Do You Want a Tax Cut?The Republican National Committee commissioned a poll to find out whetherAmericans supported a tax-cut proposal. Asked whether they favored the tax cut,67% of respondents answered yes. Should we conclude that Americans supported theproposal?

SOLUTION A question like “Do you favor a tax cut?” is biased because it does notgive other options (much like the fallacy of limited choice discussed in Unit 1A). In fact,an independent poll conducted at the same time gave respondents a list of options forusing surplus revenues. This poll found that 31% wanted the money devoted to SocialSecurity, 26% wanted it used to reduce the national debt, and only 18% favored usingit for a tax cut. (The remaining 25% of respondents chose a variety of other options.)

Now try Exercises 29–30. ➽

By the WayPeople are more likely tochoose the item thatcomes first in a surveybecause of what psy-chologists call theavailability error—thetendency to make judg-ments based on what isavailable in the mind.Professional pollingorganizations must bevery careful to avoid thisproblem, sometimes byposing the question tosome people in oneorder and to others inthe opposite order.

benn.8206.05.pgs 12/15/06 8:23 AM Page 343

Page 24: Chapter 5

344 CHAPTER 5 Statistical Reasoning

Guideline 7: Check That Results Are Presented FairlyEven when a statistical study is done well, it may be misrepresented in graphs or con-cluding statements. Researchers may occasionally misinterpret the results of theirown studies or jump to conclusions that are not supported by the results, particularlywhen they have personal biases toward certain interpretations. In other cases, newsreporters or others may misinterpret a survey or jump to unwarranted conclusionsthat make a story seem more spectacular. Misleading graphs are an especially com-mon problem (see Unit 5D). In general, you should look for inconsistencies betweenthe interpretation of a study (in pictures and words) and any actual data given with it.

❉EXAMPLE 8 Does the School Board Need a Statistics Lesson?The school board in Boulder, Colorado, created a hubbub when it announced that28% of Boulder school children were reading “below grade level,” and hence con-cluded that methods of teaching reading needed to be changed. The announcementwas based on reading tests on which 28% of Boulder school children scored below thenational average for their grade. Do these data support the board’s conclusion?

SOLUTION The fact that 28% of Boulder children scored below the national aver-age for their grade implies that 72% scored at or above the national average. Thus,the school board’s ominous statement about students reading “below grade level”makes sense only if “grade level” means the national average score for a particulargrade. This interpretation of “grade level” is curious because it means that half thestudents in the nation are always below grade level—no matter how high the scores.The conclusion that teaching methods needed to be changed was not justified bythese data. Now try Exercises 31–32.

Guideline 8: Stand Back and Consider the ConclusionsFinally, even if a study seems reasonable according to all the previous guidelines, youshould stand back and consider the conclusions. Ask yourself questions such asthese:

• Did the study achieve its goals?

• Do the conclusions make sense?

• Can you rule out alternative explanations for the results?

• If the conclusions do make sense, do they have any practical significance?

❉EXAMPLE 9 Practical SignificanceAn experiment is conducted in which the weight losses of people who try a new “FastDiet Supplement” are compared to the weight losses of a control group of people whotry to lose weight in other ways. After eight weeks, the results show that the treatmentgroup lost an average of pound more than the control group. Assuming that it hasno dangerous side effects, does this study suggest that the Fast Diet Supplement is agood treatment for people wanting to lose weight?

SOLUTION Compared to the average person’s body weight, the difference of poundhardly matters at all. Thus, while the statistics in this case may be interesting, theydon’t seem to have much practical significance. Now try Exercises 33–36. ➽

12

12

Extraordinary claimsrequire extraordinaryevidence.

—CARL SAGAN (1934–1996)

benn.8206.05.pgs 12/15/06 8:23 AM Page 344

Page 25: Chapter 5

5B Should You Believe a Statistical Study? 345

EXERCISES 5B

SUMMARY Eight Guidelines for Evaluating a Statistical Study

1. Identify the goal of the study, the population considered, and the type of study.2. Consider the source, particularly with regard to whether the researchers may be

biased.3. Look for bias that may prevent the sample from being representative of the

population.4. Look for problems in defining or measuring the variables of interest, which can

make it difficult to interpret results.5. Watch out for confounding variables that can invalidate the conclusions of a

study.6. Consider the setting and the wording of questions in any survey, looking for

anything that might tend to produce inaccurate or dishonest responses.7. Check that results are presented fairly in graphs and concluding statements,

since both researchers and media often create misleading graphics or jump toconclusions that the results do not support.

8. Stand back and consider the conclusions. Did the study achieve its goals? Do theconclusions make sense? Do the results have any practical significance?

QUICK QUIZChoose the best answer to each of the following questions.Explain your reasoning with one or more complete sentences.

1. You read about an issue that was subject to an observa-tional study when clearly it should have been studied witha double-blind experiment. The results from the observa-tional study are therefore

a. still valid, but a little less reliable.

b. valid, but only if you first correct for the fact that thewrong type of study was done.

c. essentially meaningless.

2. A study conducted by the oil company Exxon Mobil showsthat there was no lasting damage from a large oil spill inAlaska. This conclusion

a. is definitely invalid, because the study was biased.

b. may be correct, but the potential for bias means that youshould look very closely at how the conclusion wasreached.

c. could be correct if it falls within the confidence intervalof the study.

3. Consider a study designed to learn about the social net-works of all college freshmen, in which the researchersrandomly interviewed students living in on-campus dormi-tories. The way this sample was chosen means the studywill suffer from

a. selection bias.

b. participation bias.

c. confounding variables.

4. The show American Idol selects winners based on votes castby anyone who wants to vote. This means that the winner

a. is the person most Americans want to win.

b. may or may not be the person most Americans want towin, because the voting is subject to participation bias.

c. may or may not be the person most Americans want towin, because the voting should have been double-blind.

5. Consider an experiment in which you measure the weightsof 6-year-olds. The variable of interest in this study is

a. the size of the sample.

b. the weights of 6-year-olds.

c. the ages of the children under study.

benn.8206.05.pgs 12/15/06 8:23 AM Page 345

Page 26: Chapter 5

346 CHAPTER 5 Statistical Reasoning

6. Consider a survey in which 1000 people are asked “Howoften do you go to the dentist?” The variable of interest inthis study is

a. the number of visits to the dentist.

b. the 1000-person size of the sample.

c. the integers 0 through 5.

7. Imagine a survey of randomly selected people found thatpeople who used sunscreen were more likely to have beensunburned in the past year. Which explanation for thisresult seems most likely?

a. Sunscreen is useless.

b. The people in the study all used sunscreen that hadpassed its expiration date.

c. People who use sunscreen are more likely to spend timein the sun.

8. You want to know whether people prefer Smith or Jonesfor mayor, and you are considering two possible ways toword the question. Wording X is “Do you prefer Smith orJones for mayor?” Wording Y is “Do you prefer Jones orSmith for mayor?” (That is, the names are reversed in thetwo wordings.) The best approach is to

a. use Wording X for everyone.

b. use the same wording for everyone—it doesn’t matterwhether it is Wording X or Wording Y.

c. use Wording X for half the people and Wording Y forthe other half.

9. A self-selected survey is one in which

a. the people being surveyed decide which question toanswer.

b. people decide for themselves whether to be part of thesurvey.

c. the people who design the survey are also the surveyparticipants.

10. If a statistical study is carefully conducted in every possibleway, then

a. its results must be correct.

b. we can have confidence in its results, but it is still possi-ble that they are not correct.

c. we say that the study is perfectly biased.

REVIEW QUESTIONS11. Briefly describe each of the eight guidelines for evaluating

statistical studies. Give an example to which each guidelineapplies.

12. Describe and contrast selection bias and participation biasin sampling. Give an example of each.

13. What do we mean by variables of interest in a study?

14. What are confounding variables, and what problems canthey cause?

DOES IT MAKE SENSE?Decide whether each of the following statements makes sense(or is clearly true) or does not make sense (or is clearly false).Explain your reasoning.

15. The TV survey got more than 1 million phone-inresponses, so it is clearly more valid than the survey by theprofessional pollsters, which involved interviews with onlya few hundred people.

16. The survey of religious beliefs suffered from selection biasbecause the questionnaires were handed out only atCatholic churches.

17. My experiment proved beyond a doubt that vitamin C canreduce the severity of colds, because I controlled the exper-iment carefully for every possible confounding variable.

18. Everyone who jogs for exercise should try the new trainingregimen, because careful studies suggest it can increaseyour speed by 1%.

BASIC SKILLS & CONCEPTSWould You Believe This Study? Exercises 19–30 eachdescribe some aspect of a statistical study. Based solely on theinformation given in each case, decide whether you have anyreason to doubt the results of the study. Explain your reasoning.

19. Researchers who want to assess the quality of schoollunches in American elementary schools visit a school inTopeka, Kansas.

20. An experimental, double-blind study finds that people whoeat more fast food are more likely to feel tired throughoutthe day.

21. The staff at the conservative Heritage Foundation con-ducted a study to find out what people think of the newDemocratic tax plan.

22. A study financed by a major pharmaceutical company findsthat its new drug is no more effective against high bloodpressure than older, less expensive drugs.

23. A TV talk show host asks the TV audience, “Do you sup-port a national speed limit of 55 mph?” and asks people tovote by telephone at a toll-free number.

benn.8206.05.pgs 12/15/06 8:23 AM Page 346

Page 27: Chapter 5

5B Should You Believe a Statistical Study? 347

24. In trying to determine whether their candidate for gover-nor has a chance of defeating the incumbent Democrat,the Republican Party conducts a survey of 1000 of itsmembers, selected at random.

25. A study claims to have found that Europeans lead morefulfilling lives than Americans.

26. A government study finds, based on people who had theirtax returns audited, that 15% of taxpayers understate theirincome.

27. In a study designed to determine whether people who wearhelmets while riding a bicycle have fewer accidents,researchers tracked 500 riders with helmets for one month.

28. A study seeks to learn about obesity among children. Theresearchers monitor the eating and exercise habits of thechildren in the study, carefully recording everything theyeat and all their activity.

29. A consumer pollster for soft drinks asked customers in asupermarket, “Do you prefer Zinger sodas or some otherbrand?”

30. To gauge public opinion on whether there should be aconstitutional amendment to ban flag burning, a surveyasked people, “Do you support the American flag?”

Would You Believe This Claim? Exercises 31–36 eachdescribe a claim based on a statistical study. Based solely on theinformation given in each case, decide whether you have anyreason to doubt the claim. Explain your reasoning.

31. A study involving 200 long-distance runners claimed that anew energy drink is preferable for all athletes.

32. Citing statistical data indicating that half the children inthe school district are of above average weight, the SchoolBoard claims to have proved that new exercise classesshould be mandated for everyone.

33. The U.S. Census Bureau claims that a larger proportion ofU.S. residents than ever have earned high school and col-lege diplomas.

34. Based on data showing that a new cold treatment canshorten the average duration of a cold from 7 days to6.8 days, the company that sells the treatment claims thateveryone should use it.

35. A study of 20 nations (in the Canadian Medical AssociationJournal ) discovered that Germany has the most meanannual visits to a doctor (8.5), while Finland has thefewest (3.2).

36. Researchers, monitoring the health of 200 people who takeat least two pills per day, claim that people who take pillsregularly have better health.

FURTHER APPLICATIONSBias. Exercises 37–44 present situations in which bias may bean issue. Describe one potential source of bias in the situation,and briefly discuss whether the bias should affect your view ofthe situation.

37. People visiting the Web site SaveTheAnimals.com canvote on whether or not euthanasia of prairie dogs isacceptable.

38. Market researchers conduct a survey at a supermarket on aweekday between 10:00 a.m. and noon to determine whatfraction of customers use coupons.

39. An exit poll designed to predict the winner of a local elec-tion uses interviews with everyone who votes between 7:00and 7:30 a.m.

40. An exit poll designed to predict the winner of a nationalelection uses interviews with randomly selected voters inNew York.

41. In order to determine the opinions of people in the 18- to24-year age group on controlling illegal immigration,researchers survey a random sample of 1000 NationalGuard members in this age group.

42. A college mails survey forms to all current seniors, askingfor the students’ choice of their all-time best and worstprofessor. Students are asked to return the survey in thecampus mail.

benn.8206.05.pgs 12/15/06 8:23 AM Page 347

Page 28: Chapter 5

348 CHAPTER 5 Statistical Reasoning

43. Planned Parenthood members are surveyed to determinewhether American adults prefer abstinence, counseling andeducation, or morning-after pills for high school students.

44. Scientists working for Greenpeace (which opposes geneti-cally engineered crops) conduct a study to determinewhether Monsanto’s new, genetically engineered soybeanposes any threat to the environment.

45. It’s All in the Wording. Princeton Survey Research Asso-ciates did a study for Newsweek magazine illustrating theeffects of wording in a survey. Two questions were asked:

• Do you personally believe that abortion is wrong?

• Whatever your own personal view of abortion, do youfavor or oppose a woman in this country having thechoice to have an abortion with the advice of her doctor?

To the first question, 57% of the respondents replied yes,while 36% responded no. In response to the second ques-tion, 69% of the respondents favored the choice, while24% opposed the choice. Discuss why the two questionsproduced seemingly contradictory results. How could theresults of the questions be used selectively by variousgroups?

46. Tax or Spend? A Gallup poll asked the following twoquestions:

• Do you favor a tax cut or “increased spending on othergovernment programs”? Result: 75% for tax cut.

• Do you favor a tax cut or “spending to fund new retire-ment savings accounts, as well as increased spending oneducation, defense, Medicare and other programs”?Result: 60% for the spending.

Discuss why the two questions produced seemingly contra-dictory results. How could the results of the questions beused selectively by various groups?

Stat-Bytes. Politicians must make their political statements(often called sound-bytes) very short because the attention spanof listeners is so short. A similar effect occurs in reporting sta-tistical news. Major statistical studies are often reduced to oneor two sentences. The summaries of statistical reports in Exer-cises 47–52 are taken from various news sources. Discuss whatcrucial information is missing and what more you would wantto know before you acted on the report.

47. The Atlantic, summarizing a Federal Highway Administra-tion report, says that the worst traffic bottleneck in theUnited States is the U.S. 101/I-405 interchange, whichgenerates 27,144 hours of delay every year.

48. CNN reports on a Zagat Survey of America’s Top Restau-rants which found that “only nine restaurants achieved arare 29 out of a possible 30 rating and none of thoserestaurants is in the Big Apple.”

49. USA Today reports that two-thirds of adults say that cellphone use during a dinner for two at a nice restaurant isunacceptable.

50. Only 2% of the estates of Americans who died in the pastyear paid estate taxes, while 60% of Americans favorrepealing estate taxes.

51. Time Magazine reports that 28% of Americans polledbelieve the Bible is literally true, down from 38% in 1976.

52. Thirty percent of newborns in India would qualify forintensive care if they were born in the United States.

Accurate Headlines? Exercises 53–55 give a headline and abrief description of the statistical news story that accompaniedthe headline. In each case, discuss whether the headline accu-rately represents the story.

53. Headline: “Drugs shown in 98 percent of movies”

Story summary: A “government study” claims that druguse, drinking, or smoking was depicted in 98% of the topmovie rentals (Associated Press).

54. Headline: “Sex more important than jobs”

Story summary: A survey found that 82% of 500 peopleinterviewed by phone ranked a satisfying sex life as impor-tant or very important, while 79% ranked job satisfactionas important or very important (Associated Press).

55. Headline: “Grape juice may fight disease”

Story summary: A study of 15 people, partially funded byWelch Foods, found that grape juice helps to expand bloodvessels and increase the levels of HDL cholesterol. Bothconstricted blood vessels and low HDL levels are risk fac-tors for heart disease (Milwaukee Journal Sentinel ).

56. Exercise and Dementia. A recent study in the Annals ofInternal Medicine was summarized by the Associated Press,in part, as follows:

The study followed 1740 people aged 65 and older who showedno signs of dementia at the outset. The participants’ health wasevaluated every two years for six years. Out of the originalpool, 1185 were later found to be free of dementia, 77 percentof whom reported exercising three or more times a week;158 people showed signs of dementia, only 67 percent of whomsaid they exercised that much. The rest either died or withdrewfrom the study.

benn.8206.05.pgs 12/15/06 8:23 AM Page 348

Page 29: Chapter 5

5C Statistical Tables and Graphs 349

IN THE NEWS59. Applying the Guidelines. Find a recent newspaper arti-

cle or television report about a statistical study on a topicthat you find interesting. Write a short report applyingeach of the eight guidelines given in this section. (Some ofthe guidelines may not apply to the particular study youare analyzing. In that case, explain why the guideline is notapplicable.)

60. Believable Results. Find a recent news report about astatistical study whose results you believe are meaningfuland important. In one page or less, summarize the studyand explain why you find it believable.

61. Unbelievable Results. Find a recent news report about astatistical study whose results you don’t believe are mean-ingful or important. In one page or less, summarize thestudy and why you don’t believe its claims.

62. Legal Experts. Find a news report concerning a majorongoing trial. Find out whether any of the “expert wit-nesses” are being paid by either side. Based on what youlearn, describe whether you think the experts are givingbiased testimony.

63. Biased Questioning? Find a recent news report ofresponses to a single question in an opinion poll. State theexact words of the question and the results of the poll.Analyze the question and the reported results for potentialbiases. At the end of your analysis, state whether youbelieve the results, and defend your opinion.

a. How many people completed the study?

b. Fill in the following two-way table (with numbers ofindividuals), using the figures given in the abovepassage:

Exercise No exercise Total

DementiaNo dementiaTotal

c. Draw a Venn diagram with two overlapping circles toillustrate the data.

WEB PROJECTSFind useful links for Web Projects on the text Web site:www.aw.com/bennett-briggs

57. Polling Organization. Go to the Web site for a majorprofessional polling organization. Study results from arecent poll, and evaluate the poll according to the guide-lines in this section.

58. Harper’s Index. Go to the Web site for the Harper’sIndex and study a few of the recently quoted statistics. Besure to select the option on the page that allows you to seethe sources for the statistics. Choose three statistics thatyou find particularly interesting, and discuss whether, inaccord with the guidelines given in this section, youbelieve them.

UNIT 5C Statistical Tables and Graphs

Whether you look at a newspaper, a corporate annual report, or a government study,you are almost sure to see tables and graphs of statistical data. Some of these tablesand graphs are simple; others can be quite complex. Some make it easy to understandthe data; others may be confusing or even misleading. In this unit, we’ll investigatesome of the basic principles behind tables and graphs, preparing for more complexgraphics in Unit 5D.

Frequency TablesA teacher makes the following list of the grades she gave to her 25 students on anessay:

A C C B C D C C F D C C C B B A B D B A A B F C B

benn.8206.05.pgs 12/15/06 8:23 AM Page 349

Page 30: Chapter 5

350 CHAPTER 5 Statistical Reasoning

Time out to thinkBriefly explain why the total relative frequency should always be 1, or 100%.

This list contains all the raw data, but it isn’t easy to read. A better way to displaythese data is with a frequency table—a table showing the number of times, or freq-uency, that each grade appears (Table 5.1). The five possible grades are called thecategories for the table.

There are two common variations on the idea of frequency. The relative fre-quency for a category expresses its frequency as a fraction or percentage of the total.For example, 4 of the 25 students received A grades, so the relative frequency for Agrades is or 16%. The total relative frequency must always be 1, or 100%.However, because of rounding, you may sometimes find that the relative frequenciesin a table or chart add up to slightly more or less than 100%.

The cumulative frequency is the number of responses in a particular categoryand all preceding categories. For example, the cumulative frequency for grades of C

and above is 20, because 20 students received grades of either A, B, or C.

4>25,

❉EXAMPLE 1 Relative and Cumulative FrequencyAdd to Table 5.1 columns showing the relative and cumulative frequencies.

SOLUTION Table 5.2 shows the new columns and calculations.

DEFINITION

A basic frequency table has two columns:

• The first column lists all the categories of data.• The second column lists the frequency of each category, which is the number of

times each category appears in the data set.

Additional columns may include relative frequency (frequency expressed as afraction or percentage of the total) or cumulative frequency (total of frequenciesfor the given category and all previous categories).

TABLE 5.2Grade Frequency Relative Frequency Cumulative Frequency

A 4 4

B 7

C 9

D 3

F 2

Total 25 25 1 5 100%

2 1 3 1 9 1 7 1 4 5 25 2>25 5 8%

3 1 9 1 7 1 4 5 23 3>25 5 12%

9 1 7 1 4 5 20 9>25 5 36%

7 1 4 5 11 7>25 5 28%

4>25 5 16%

TABLE 5.1Grade Frequency

A 4

B 7

C 9

D 3

F 2

Total 25

Now try Exercises 25–26. ➽

benn.8206.05.pgs 12/15/06 8:23 AM Page 350

Page 31: Chapter 5

5C Statistical Tables and Graphs 351

Data TypesEssay grades represent subjective ratings, not actual measurements or counts. We saythat the grade categories are qualitative, because they represent qualities such as bador good. In contrast, scores on a multiple-choice exam are quantitative, because theyrepresent an actual count (or measurement) of the number of correct answers. Aswe’ll see shortly, distinguishing between qualitative and quantitative data can be use-ful in creating tables or graphs.

DATA TYPES

Qualitative data describe qualities or nonnumerical categories.

Quantitative data represent counts or measurements.

❉EXAMPLE 2 Data TypesClassify each of the following types of data as either qualitative or quantitative.

a. Brand names of shoes in a consumer surveyb. Heights of studentsc. Audience ratings of a film on a scale of 1 to 5, where 5 means excellent

SOLUTION

a. Brand names are nonnumerical categories, so they are qualitative data.b. Heights are measurements, so they are quantitative data.c. Although the film rating categories involve numbers, the numbers represent

subjective opinions about a film, not counts or measurements. Thus, theyare qualitative data, despite being stated as numbers.

Now try Exercises 27–34. ➽

Time out to thinkGive another example in which numbers are used to represent qualitative datarather than quantitative data.

Binning DataWhen we deal with quantitative data categories, it’s often useful to group, or bin, thedata into categories that cover a range of possible values. For example, in a table ofincome levels, it might be useful to create bins of $0 to $20,000, $20,001 to $40,000,and so on. In this case, the frequency of each bin is simply the number of people withincomes in that bin.

❉EXAMPLE 3 Binned Exam ScoresConsider the following set of 20 scores from a 100-point exam:

76 80 78 76 94 75 98 77 84 88 81 72 91 72 74 86 79 88 72 75

benn.8206.05.pgs 12/15/06 8:23 AM Page 351

Page 32: Chapter 5

352 CHAPTER 5 Statistical Reasoning

Determine appropriate bins and make a frequency table. Include columns for relativeand cumulative frequency, and interpret the cumulative frequency for this case.

SOLUTION The scores range from 72 to 98. One way to group the data is with 5-pointbins. The first bin represents scores from 95 to 99, the second bin represents scoresfrom 90 to 94, and so on. Note that there is no overlap between bins. We then count thefrequency (the number of scores) in each bin. For example, only 1 score is in bin 95 to99 (the high score of 98) and 2 scores are in bin 90 to 94 (the scores of 91 and 94).Table 5.3 shows the complete frequency table. In this case, we interpret the cumula-tive frequency of any bin to be the total number of scores in or above that bin. Forexample, the cumulative frequency of 6 for the bin 85 to 89 means that 6 scores areeither between 85 and 89 or higher than 89.

Bar Graphs and Pie ChartsBar graphs and pie charts are commonly used to show data whenthe categories are qualitative. You are probably familiar with both,but let’s review the basic ideas.

Consider the essay grade data in Table 5.1. A bar graph wouldshow each category with a bar whose length corresponded to itsfrequency. If you make a bar graph by hand (as opposed to witha computer), you should measure the bar lengths carefully tomake sure they correctly correspond to the frequencies. InFigure 5.3, for example, the vertical axis is marked with frequen-cies centimeter apart. Thus, the bar for A grades is 2 centime-ters long, because the frequency of A grades is 4. Note that theleft side of the bar graph in Figure 5.3 is marked with frequency,while the right side is marked with relative frequency. As youcan see, bar graphs make it easy to display both frequenciessimultaneously.

In contrast, pie charts are used primarily for relative frequen-cies, because the total pie must always represent the total relative

12

TABLE 5.3 Frequency Table for Binned Exam Scores

Scores Frequency Relative Frequency Cumulative Frequency

95 to 99 1 1

90 to 94 2 3

85 to 89 3 6

80 to 84 3 9

75 to 79 7 16

70 to 74 4 20

Total 20 20 1.00 5 100%

0.20 5 20%

0.35 5 35%

0.15 5 15%

0.15 5 15%

0.10 5 10%

0.05 5 5%

A B C D F0

1

2

3

4

5

6

7

9

8

10

4%

8%

12%

16%

20%

24%

28%

36%

32%

Grade

Freq

uen

cy o

f gra

de

Rel

ativ

e fr

equ

ency

Essay Grade Data

FIGURE 5.3 Bar graph for the essay grade data inTable 5.1.

Now try Exercises 35–36. ➽

benn.8206.05.pgs 12/15/06 8:23 AM Page 352

Page 33: Chapter 5

IMPORTANT LABELS FOR GRAPHS

Title/caption: The graph should have a title or caption (or both) that explainswhat is being shown and, if applicable, lists the source of the data.

Vertical scale and title: Numbers along the vertical axis should clearly indicatethe scale. The numbers should line up with the tick marks—the marks along theaxis that precisely locate the numerical values. Include a label that describes thevariable shown on the vertical axis.

Horizontal scale and title: The categories should be clearly indicated along thehorizontal axis. (Tick marks may not be necessary for qualitative data, but shouldbe included for quantitative data.) Include a label that describes the variable shownon the horizontal axis.

Legend: If multiple data sets are displayed on a single graph, include a legend orkey to identify the individual data sets.

5C Statistical Tables and Graphs 353

frequency of 100%. The size of each wedge is proportional to the relative frequencyof the category it represents. Figure 5.4 shows a pie chart for the essay grade data. Tomake comparisons easier, relative frequencies are often written on pie chart wedges.

A16% F

8%

D12%

C36%

B28%

FIGURE 5.4 Pie chart for the essay grade data in Table 5.1.

Nowadays, most people make graphs with the aid of computers that measure barlengths or wedge sizes automatically. However, you must still specify any labels oraxis marks you want on a graph. This labeling is extremely important: Withoutproper labels, a graph is meaningless. The following summary lists the importantlabels for graphs. Of course, not all labels are necessary in all cases. For example, piecharts do not require a vertical or horizontal scale. Notice how these rules wereapplied in Figure 5.3.

❉EXAMPLE 4 Carbon Dioxide EmissionsCarbon dioxide is released into the atmosphere primarily by the combustion of fossilfuels (oil, coal, natural gas). Table 5.4 lists the eight countries that emit the most car-bon dioxide each year. Make bar graphs for the total emissions and the emissions perperson. Put the bars in descending order of size.

benn.8206.05.pgs 12/15/06 8:23 AM Page 353

Page 34: Chapter 5

354 CHAPTER 5 Statistical Reasoning

Time out to thinkNote that the two bar graphs in Figure 5.5 do not show the countries in the sameorder. Why not? What can we learn by comparing the two graphs? Explain.

0

300

600

900

1200

1500

U.S

.

Chi

na

Rus

sia

Japa

n

Uni

ted

Kin

gdom

Ger

man

y

Indi

a

Can

ada

Total CO2 Emissions Per Person CO2 Emissions

CO

2 em

issi

ons (

mill

ions

of m

etri

c to

ns o

f car

bon)

Per

capi

ta C

O2

emis

sion

s(m

etri

c to

ns o

f car

bon)

(a) (b)

U.S

.

Can

ada

Rus

sia

Ger

man

yU

nite

dK

ingd

omJa

pan

Chi

na

Indi

a0

1

2

3

4

5

6

FIGURE 5.5 Bar graphs for (a) total carbon dioxide emissions by country and (b) per per-son carbon dioxide emissions by country. Now try Exercises 37–38. ➽

HISTORICAL NOTE

A bar graph with thebars in descendingorder is often called aPareto chart, after Ital-ian economist VilfredoPareto (1848–1923).

TABLE 5.4 The World’s Eight Leading Emitters of Carbon Dioxide

Total Carbon Dioxide Per Person CarbonEmissions (millions of Dioxide Emissions

Country metric tons of carbon) (metric tons of carbon)

United States 1582 5.4

China 966 0.7

Russia 438 3.0

Japan 329 2.6

India 280 0.3

Germany 230 2.8

Canada 164 5.2

United Kingdom 154 2.6

Source: U.S. Department of Energy, based on 2003 emissions.

SOLUTION The categories are the countries. Because country names are qualitativedata, a bar graph is appropriate.

The values for total carbon dioxide emissions go from 154 to 1582 (millions oftons), so a range of 0 to 1600 makes a good choice for the vertical scale. Each bar’sheight corresponds to its data value, and we label the category (country) under thebar. Figure 5.5a shows the bar graph for total emissions, with bars in order of decreas-ing height.

The data values for per person emissions range from 0.3 to 5.4 (tons), so a range of0 to 6 will work for this vertical scale. Figure 5.5b shows the bar graph, again withbars placed in order of descending height.

benn.8206.05.pgs 12/15/06 8:23 AM Page 354

Page 35: Chapter 5

5C Statistical Tables and Graphs 355

❉EXAMPLE 5 Simple Pie ChartAmong the registered voters in Rochester County, 25% are Democrats, 25% areRepublicans, and 50% are Independents. Make a pie chart showing the breakdown ofparty affiliations in Rochester County.

SOLUTION The wedge sizes should correspond to the relative frequencies. Thus,the wedges for Republicans and Democrats each occupy one-fourth of the pie, whilethe wedge for Independents occupies the remaining half of the pie (Figure 5.6).Note the importance of clear labeling.

Registered Voters in Rochester County

Independent50%

Democrat25%

Republican25%

FIGURE 5.6 Party affiliations of registered voters in Rochester County.

❉EXAMPLE 6 Student MajorsFigure 5.7 is a pie chart showing planned major areas forfirst-year college students. Make a bar graph showingthe same data, with the bars in order of decreasing size.What are the three most popular major areas? Commenton the relative ease with which this question can beanswered with the pie chart and the bar graph.

SOLUTION Figure 5.8 shows the bar graph for the data.Note that, because we have only relative frequency

data from the pie chart, we can show only relative fre-quencies on the bar graph. This bar graph makes itimmediately obvious that the three most popular majorareas are business (16.7%), arts and humanities (12.1%),and professional (11.6%). (“Professional” includes fieldswith professional licensing, such as architecture, nurs-ing, and pharmacy.) In contrast, it takes a fair amount ofstudy of the pie chart before we can easily list the threemost popular major areas.

Other Fields9.9%

Undecided8.3%

Arts and Humanities

12.1%

BiologicalSciences

6.6%

Business16.7%

Education11.0%

Engineering8.7%

PhysicalSciences

2.6%

SocialSciences

10.0%

Professional11.6%

Technical2.1%

What Students Expect to Major In

FIGURE 5.7 Planned major areas for first-year collegestudents.Source: The Chronicle of Higher Education.

Now try Exercises 39–40. ➽

benn.8206.05.pgs 12/15/06 8:23 AM Page 355

Page 36: Chapter 5

356 CHAPTER 5 Statistical Reasoning

0

2

4

6

8

10

12

14

16

18

Per

cen

tage

of s

tud

ents

What Students Expect to Major In

Bu

sin

ess

Pro

fess

ion

al

Ed

uca

tio

n

Oth

er

Art

s an

dH

um

anit

ies

Soci

al S

cien

ces

En

gin

eeri

ng

Un

dec

ided

Bio

logy

Tech

nic

al

Ph

ysic

alSc

ien

ces

FIGURE 5.8 Bar graph for the data in Figure 5.7.

0

1

2

3

4

5

6

7

8

75 85 9570 80 90 100

Scores

(a)

Freq

uen

cy

Exam Scores

0

1

2

3

4

5

6

7

8

75 85 9570 80 90 100

Scores

(b)

Freq

uen

cy

Exam Scores

FIGURE 5.9 (a) Histogram for the data in Table 5.3. (b) Line chart for the same data.

Time out to thinkExample 6 discussed an advantage of a bar graph over a pie chart for showing thedata concerning major areas. Do you think the pie chart has any advantages overthe bar graph? If so, what?

Histograms and Line ChartsFor quantitative data categories, the two most common types of graphics arehistograms and line charts. Figure 5.9a shows a histogram for the binned exam data ofTable 5.3. Figure 5.9b shows a line chart for the same data.

Now try Exercises 41–42. ➽

benn.8206.05.pgs 12/15/06 8:23 AM Page 356

Page 37: Chapter 5

5C Statistical Tables and Graphs 357

4

0

6

8

10

12

Hom

icid

es p

er 1

00,0

00 p

eopl

e

Year

1960

1962

1964

1966

1968

1970

1972

1974

1976

1978

1980

1982

1984

1986

1988

1990

1992

1994

1996

1998

2000

2002

2004

U.S. Homicide Rate

FIGURE 5.10 U.S. homicide rate per 100,000 people.Source: FBI Uniform Crime Reports.

A histogram is essentially a bar graph in which the data categories are quantita-tive. Thus, the bars on a histogram must follow the natural order of the numericalcategories. In addition, the widths of histogram bars have a specific meaning. Forexample, the width of each bar in Figure 5.9a represents 5 points on the exam.Because there are no gaps between the categories, the bars on a histogram touch eachother.

A line chart serves the same basic purpose as a histogram, but instead of usingbars, a line chart connects a series of dots. When data are binned, the dot is placed atthe center of each bin. Histograms and line charts are often used to show how somevariable changes with time. For example, the line chart in Figure 5.10 shows how theU.S. homicide rate has changed with time. The categories are time intervals. In thiscase, each bin represents a year in the data. Histograms and line charts with time onthe horizontal axis are often called time-series diagrams.

❉EXAMPLE 7 Oscar-Winning ActressesTable 5.5 shows the ages of 34 recent Academy Award–winning actresses atthe time when they won their award. Make a histogram and a line chart todisplay these data. Discuss the results.

DEFINITIONS

A histogram is a bar graph for quantitative data categories. The barshave a natural order and the bar widths have specific meaning.

A line chart shows the data value for each category as a dot, and the dots areconnected with lines. For each dot, the horizontal position is the center ofthe bin it represents and the vertical position is the data value for the bin.

A time-series diagram is a histogram or line chart in which the horizon-tal axis represents time.

Technical NoteDifferent books definethe terms histogramand bar graph differ-ently. In this book, abar graph is anygraph that uses bars,and histograms arebar graphs used forquantitative datacategories.

TABLE 5.5Number of

Age Actresses

20–29 7

30–39 15

40–49 6

50–59 1

60–69 3

70–79 1

80–89 1

benn.8206.05.pgs 12/15/06 8:23 AM Page 357

Page 38: Chapter 5

358 CHAPTER 5 Statistical Reasoning

SOLUTION The fact that the categories are 10-year bins makes the data quantitative.Thus, a histogram is appropriate. Figure 5.11a shows the histogram. The bars touchone another because there are no gaps between the categories.

Figure 5.11b shows the same data as a line chart. The histogram is also included toshow how it relates to the line chart. In looking at these data, we see that actresses aremost likely to win Oscars when they are fairly young.

0

5

10

20

15

10 20 30 40 50 60 70 80 90

Age at time of award

(b)N

um

ber

of a

ctre

sses

0

5

10

20

15

10 20 30 40 50 60 70 80 90

Age at time of award

(a)

Nu

mb

er o

f act

ress

es

Ages of 34 Academy Award–Winning Actresses Ages of 34 Academy Award–Winning Actresses

Bonds

Gold

7 14 21 28 4 11 18 25 1 8 15 22 29

100

$105

95

July Aug. Sept.

MARKET GAUGE: COMPARING INVESTMENTS

How $100 invested12 weeks ago in stocks (measured by the S.&P. 500), bonds (Lehman Treasury Bond Index) and gold would have fared through yesterday.

Stocks

FIGURE 5.12

HISTORICAL NOTE

Gold was once consid-ered to be a solid invest-ment and an importantpart of any investmentportfolio. However, goldprices have languishedin recent decades. In2006, gold was worthonly about $650 perounce—much less thanits inflation-adjustedvalue of more than$2000 per ounce in 1980.

FIGURE 5.11 Histogram for ages of 34 recent Academy Award–winning actresses. (b) Line chart for the same data, withhistogram overlaid for comparison. Now try Exercises 43–44.

❉EXAMPLE 8 Reading a Time-Series DiagramFigure 5.12 shows a time-series line chart of stock, bond, and gold prices over a12-week period. Suppose that, on July 7, you invested $100 in a stock fund that tracksthe S&P 500, $100 in a bond fund that follows the Lehman Index, and $100 in gold.If you sold all three funds on August 4, how much did you gain or lose?

benn.8206.05.pgs 12/15/06 8:23 AM Page 358

Page 39: Chapter 5

5C Statistical Tables and Graphs 359

SOLUTION The graph shows that the $100 in the stock fund would have been worthabout $101 on August 4. The $100 bond investment would have declined in value toabout $96. The gold investment would have held its initial value of $100. Thus, onAugust 4, your complete portfolio would have been worth

You would have lost $3 on your total investment of $300.Now try Exercises 45–46. ➽

$297.$101 1 $96 1 $100 5

EXERCISES 5C

QUICK QUIZChoose the best answer to each of the following questions.Explain your reasoning with one or more complete sentences.

1. In a class of 100 students, 25 students received a grade ofB. What was the relative frequency of a B grade?

a. 25

b. 0.25

c. It cannot be calculated with the information given.

2. For the class described in Exercise 1, what was thecumulative frequency of a grade of B or above?

a. 25

b. 0.25

c. It cannot be calculated with the information given.

3. Which of the following is an example of qualitative data?

a. waist sizes in inches b. ratings of restaurants

c. meal costs at restaurants

4. The sizes of the wedges in a pie chart tell you

a. the number of categories in the pie chart.

b. the frequencies of the categories in the pie chart.

c. the relative frequencies of the categories in the pie chart.

5. You have a table listing ten tourist attractions and theirannual numbers of visitors. Which type of display wouldbe most appropriate for these data?

a. a bar graph b. a pie chart c. a line chart

6. Where should you put the names of the ten tourist attrac-tions when you make your display of the data described inExercise 5?

a. They should be in the title of the display.

b. They should be in alphabetical order along the verticalaxis.

c. They should be listed along the horizontal axis.

7. You have a list of the GPAs of 100 college graduates, pre-cise to the nearest 0.001. You want to make a frequencytable for these data. A good first step would be to

a. group all the data into bins 0.2 of a grade point wide.

b. draw a pie chart for the 100 individual GPAs.

c. count how many people have identical GPAs.

8. You have a list of the average gasoline price for each monthduring the past year. Which type of display would be mostappropriate for these data?

a. a bar graph b. a pie chart c. a line chart

9. A histogram is

a. a graph that shows how some quantity has changedthrough history.

b. a graph that shows cumulative frequencies.

c. a bar chart for quantitative data.

10. You have a histogram and you want to convert it into a linechart. A good first step would be to

a. make a list of all the categories in alphabetical order.

b. place a dot at the top of each bar, in the center of the bar.

c. calculate all the relative frequencies that you can readfrom the histogram.

REVIEW QUESTIONS11. What is a frequency table? Explain what we mean by the

categories and frequencies. What do we mean by relativefrequency? What do we mean by cumulative frequency?

12. What is the distinction between qualitative data and quan-titative data? Give a few examples of each.

13. What is the purpose of binning? Give an example in whichbinning is useful.

14. What two types of graphs are most common when the cat-egories are qualitative data? Describe the construction ofeach.

benn.8206.05.pgs 9/29/07 11:53 AM Page 359

Page 40: Chapter 5

360 CHAPTER 5 Statistical Reasoning

15. Describe the importance of labeling on a graph, and brieflydiscuss the kinds of labels that should be included ongraphs.

16. What two types of graphs are most common when the cat-egories are quantitative data? Describe the construction ofeach.

DOES IT MAKE SENSE?Decide whether each of the following statements makes sense(or is clearly true) or does not make sense (or is clearly false).Explain your reasoning.

17. I made a frequency table with two columns, one labeledState and one labeled State Capitol.

18. The relative frequency of B grades in our class was 0.3.

19. Your bar graph must be wrong, because your bars arewider than the ones shown on the teacher’s answer key.

20. Your bar graph must be wrong, because it shows differentfrequencies than the ones shown on the teacher’s answerkey.

21. Your pie chart must be wrong, because you have the 45%frequency wedge near the upper left and the answer keyshows it near the lower right.

22. Your pie chart must be wrong, because when I added thepercentages on your wedges, they totaled 124%.

23. I was unable to make a bar chart, because the data cate-gories were qualitative rather than quantitative.

24. I rearranged the bars on my histogram so that the tallestbar would come first.

BASIC SKILLS & CONCEPTSFrequency Tables. Make a frequency table for the data ineach of Exercises 25–26. Include columns for relative frequencyand cumulative frequency. Briefly explain the meaning of eachcolumn.

25. Final grades of 20 students in a math class:

A A B B B B B C C C C C C C C D D D F F

26. A film section of a local newspaper lists 5 five-star films(the highest rating), 10 four-star films, 20 three-star films,15 two-star films, and 5 one-star films.

Qualitative vs. Quantitative. In Exercises 27–34, determinewhether the variable described is qualitative or quantitative, andexplain why.

27. The hair color of individuals

28. The average service time in a bank

29. The responses of people in a sausage taste test whereup to

30. The lowest high temperature in each month of the year inSedona, Arizona

31. The responses (yes, no, undecided) to the question “Willyou vote for a new water treatment plant?”

32. The total income of each household in America

33. The dessert selections at a restaurant used in a customerpreference poll

34. The number of people voting for each dessert selection ina restaurant preference poll

Binned Frequency Tables. In Exercises 35–36, use the indi-cated bin size to make a frequency table for the following set ofexam scores:

89 67 78 75 64 70 83 95 69 84

77 88 98 90 92 68 86 79 60 96

Include columns for relative frequency and cumulative fre-quency. Briefly explain the meaning of each column.

35. Use 5-point bins (95 to 99, 90 to 94, etc.).

36. Use 10-point bins (90 to 99, 80 to 89, etc.).

37. Largest States. The following table shows the five mostpopulous U.S. states as of 2004. Make a bar graph for thesedata, with the bars in descending order.

State Population

California 35.9 million

Texas 22.5 million

New York 19.2 million

Florida 17.4 million

Illinois 12.7 million

38. Food Franchises. The table below shows the five foodcompanies with the most franchises. Make a bar graph forthese data, with the bars in descending order.

Company Number of franchises

McDonald’s 22,183

Subway 21,444

Kentucky Fried Chicken 10,040

Domino’s Pizza 6953

Dunkin’ Donuts 5759

5 5 outstanding0 5 inedible

benn.8206.05.pgs 12/15/06 8:23 AM Page 360

Page 41: Chapter 5

5C Statistical Tables and Graphs 361

Constructing Pie Charts. Exercises 39–40 each give a data set.Compute the percentage for each category and construct a piechart for the data.

39. Six candidates ran for three seats on the City Council. Thevote tallies for the candidates are given in the table below.

Candidate Votes

Aniston 2380

Clooney 1030

Cruise 987

Jolie 1753

Pitt 1914

Streep 2208

40. In a pizza preference poll, 92 people voted for theirfavorite toppings as follows.

Topping Votes

Anchovies 8

Cheese 27

Pepperoni 16

Sausage 36

Vegetarian 23

41. Government Income. The pie chart in Figure 4.12 onp. 308 shows the makeup of federal government receipts.Make a bar graph for these data.

42. Government Spending. The pie chart in Figure 4.13 onp. 309 shows the makeup of federal government spending.Make a bar graph for these data.

43. Oscar-Winning Actors. The following data show theages of 34 recent Academy Award–winning actors at thetime they won their award. Make a frequency table forthese data, using bins of 20–29, 30–39, and so on. Thendraw both a histogram and a line chart to display thebinned data.

32 37 36 32 51 53 33 61 35 45 55 39

76 37 42 40 32 60 38 56 48 48 40 43

62 43 42 44 41 56 39 46 31 47 40 43

44. Oscar Winners. In words, contrast the graphs in Exam-ple 7 with those you drew in Exercise 43. Do actors appearto be more likely to win Oscars when they are younger,older, or neither? Do you think these graphs indicate anydifference in how movie makers treat male and female per-formers? Defend your opinion.

45. Homicide Rates. Study Figure 5.10. Write one to twoparagraphs summarizing how the homicide rate haschanged with time since 1960.

46. Death Rates. Figure 5.13 shows overall death rates in theUnited States during the 20th century. Note that the spikein 1919 was due to a worldwide epidemic of influenza.Write a few sentences summarizing the overall trend,describing how much the death rate changed during thecentury, and putting the 1919 spike into context in termsof its impact on the population.

20

15

5

10

1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000Year

Rat

e

Death Rates per 1000 Population

Figure 5.13 Source: National Center for Health Statistics.

FURTHER APPLICATIONSStatistical Graphs. Each of Exercises 47–56 gives a table ofdata. For each exercise, do the following:

a. Explain whether the data categories are qualitative or quantitative.

b. If the data categories are qualitative, draw either a bar graphor a pie chart for the data. If the data categories are quanti-tative, draw either a histogram or a line chart for the data.

c. Write a one-paragraph summary of any interesting infor-mation revealed by the graphic.

47. The following frequency table gives the ages of the NobelPrize winners in literature at the time of their award for1990 through 2005.

Age Number of winners

58–59 2

60–61 1

62–63 3

64–65 0

66–67 1

68–69 2

70–71 1

72–73 2

74–75 2

76–77 2

benn.8206.05.pgs 12/15/06 8:23 AM Page 361

Page 42: Chapter 5

362 CHAPTER 5 Statistical Reasoning

48. The following table lists the top eight retail companies inthe United States, by total sales volume.

Company Sales (billions of dollars)

Albertson’s 36.8

Home Depot 45.7

JC Penney 33.0

Kmart 37.0

Kroger 49.0

Sears 40.9

Target 36.9

Wal-Mart 193.3

Source: Wall Street Journal Almanac.

49. The following table shows the average SAT scores for vari-ous ethnic groups in the United States in 2005.

Ethnic group Average SAT score

White 1068

Black 864

Native American 982

Asian/Pacific Islander 1091

Hispanic 917

Source: The College Board.

50. The following table lists the ten musical groups with themost platinum albums in the United States (1,000,000sales).

Group Number of platinum albums

The Beatles 92

The Eagles 81

Led Zeppelin 80

AC/DC 60

Aerosmith 59

Pink Floyd 54

Van Halen 50

U2 45

Alabama 44

Fleetwood Mac 44

51. The following table lists areas of the world’s major landmasses.

Land mass Area (millions of sq. miles)

Asia 17.2

Africa 11.6

North America 9.3

South America 6.9

Australia 3.0

Europe 3.8

Antarctica 5.1

All others 2.1

52. The following table gives the percentages of total energyproduced in the United States from various sources.

Energy source Percentage of total energy

Coal 32.2%

Natural gas 31.0%

Crude oil 16.4%

Nuclear power 11.7%

Renewable 8.7%

Source: U.S. Department of Energy.

53. The following table gives the stated religions of first-yearcollege students. (Note: The “other religions” categoryconsists of religions that were stated by less than 1% of thestudents in the sample.)

Religion Percent of sample

Baptist 11.6

Catholic 30.5

Episcopal 1.7

Jewish 2.8

Lutheran 5.8

Methodist 6.4

Mormon 1.5

Presbyterian 4.0

United Church of Christ 1.5

Other religions 19.3

No religion 14.9

Source: UCLA Higher Education Research Institute.

benn.8206.05.pgs 12/15/06 8:23 AM Page 362

Page 43: Chapter 5

5C Statistical Tables and Graphs 363

54. The following table gives the rates of violent crimes (rape,robbery, assault, theft) by age of victim. Rates are units ofcrimes per 1000 people aged 12 or older.

Age group Crime rate

12–15 51.6

16–19 53.0

20–24 43.3

25–34 26.4

35–49 18.5

50–64 10.3

2.0

Source: Bureau of Justice Statistics.

55. The following table gives average family size in the UnitedStates since 1940.

Year Family size Year Family size

1940 3.76 1980 3.29

1950 3.54 1985 3.23

1960 3.67 1990 3.17

1965 3.70 1995 3.19

1970 3.58 2000 3.17

1975 3.42 2003 3.19

Source: U.S. Bureau of Census.

56. Drunk Driving Deaths. Figure 5.14 shows the numberof automobile fatalities in the United States in which alco-hol was involved for each year from 1982 to 2003.

.65

c. The total numbers of automobile fatalities in 1982 and2003 were 43,945 and 42,643, respectively. What percent-age of all fatalities in these two years involved alcohol?

d. In view of your answer to part c, can you offer explana-tions for the trend in these data? Explain.

57. Ages of Presidents. The following table gives the orderof the presidents of the United States and the ages atwhich they first took office.a. Find a creative way to display these data.b. Which presidents could have said that they were the

youngest president (or the same age in years as theyoungest) at the time they took office?

c. Which presidents could have said that they were theoldest president (or the same age in years as the oldest) atthe time they took office?

d. Write a paragraph describing significant features of thedata.

Order 1 2 3 4 5 6 7 8 9 10 11

Age 57 61 57 57 58 57 61 54 68 51 49

Order 12 13 14 15 16 17 18 19 20 21 22

Age 64 50 48 65 52 56 46 54 49 50 47

Order 23 24 25 26 27 28 29 30 31 32 33

Age 55 55 54 42 51 56 55 51 54 51 60

Order 34 35 36 37 38 39 40 41 42 43

Age 62 43 55 56 61 52 69 64 46 54

WEB PROJECTSFind useful links for Web Projects on the text Web site:www.aw.com/bennett-briggs

58. Emissions. Look for updated data concerning inter-national carbon dioxide emissions at the Web site for theInternational Energy Annual, published by the U.S. EnergyInformation Administration (EIA). Create an updated orexpanded version of Figure 5.5. Discuss any new featuresof your updated graphs.

59. Energy Table. Explore some of the many energy tables atthe U.S. Energy Information Administration (EIA) Website. Choose a table that you find interesting, and make agraph of its data. You may choose any of the graph typesdiscussed in this section. Explain how you made yourgraph, and briefly discuss what can be learned from it.

60. Statistical Abstract. Go to the Web site for the StatisticalAbstract of the United States. Explore the selection of “fre-quently requested tables.” Choose one table of interest toyou, and make a graph from its data. You may choose anyof the graph types discussed in this section. Explain how

CO2Alchohol-Related Fatalities

Year

50000

10,00015,00020,00025,00030,000

Fata

litie

s

1982

1984

1986

1988

1990

1992

1994

1996

1998

2000

2002

Figure 5.14 Source: National Highway Traffic SafetyAdministration.

a. How many alcohol-related fatalities were there in 1982?in 2003? Comment on the overall trend over this period.

b. What is the percent change in alcohol-related fatalitiesover this period?

benn.8206.05.pgs 12/15/06 8:23 AM Page 363

Page 44: Chapter 5

364 CHAPTER 5 Statistical Reasoning

you made your graph, and briefly discuss what can belearned from it.

IN THE NEWS61. Frequency Tables. Find a recent news article that

includes some type of frequency table. Briefly describe thetable and how it is useful to the news report. Do youthink the table was constructed in the best possible wayfor the article? If so, why? If not, what would you havedone differently?

62. Bar Graph. Find a recent news article that includes a bargraph with qualitative data categories. Briefly explain whatthe graph shows, and discuss whether it helps make thepoint of the news article.

63. Pie Chart. Find a recent news article that includes a piechart. Briefly discuss the effectiveness of the pie chart. Forexample, would it be better if the data were displayed in abar graph rather than a pie chart? Could the pie chart beimproved in other ways?

64. Histogram. Find a recent news article that includes a his-togram. Briefly explain what the histogram shows, and dis-cuss whether it helps make the point of the news article.Are the labels clear? Is the histogram a time-series dia-gram? Explain.

65. Line Chart. Find a recent news article that includes a linechart. Briefly explain what the line chart shows, and discusswhether it helps make the point of the news article. Are thelabels clear? Is the line chart a time-series diagram? Explain.

UNIT 5D Graphics in the Media

Now that we’ve discussed basic types of statistical graphs, we are ready to explore someof the fancier graphics that appear daily in the news. We will also discuss several cau-tions to keep in mind when interpreting media graphics.

Graphics Beyond the BasicsMany graphical displays of data go beyond the basic types discussed in Unit 5C. Here,we explore a few of the types that are most common in the news media.

Multiple Bar GraphsA multiple bar graph is a simple extension of a regular bargraph. It has two or more sets of bars that allow comparisonbetween two or more data sets. All the data sets must involvethe same categories so that they can be displayed on thesame graph. For example, Figure 5.15 is a multiple bar graphshowing trends in home computing. The categories areyears. The two sets of bars represent two different measuresof home computing: ownership of personal computers andconnection to the Internet. Note that a legend clearly identi-fies the two sets of bars.

❉EXAMPLE 1 Computing TrendsSummarize two major trends shown in Figure 5.15.

SOLUTION The most obvious trend is that both data setsshow an increase with time. That is, the number of homeswith computers and the number of online homes bothincreased with time. We see a second trend by comparing

60

70

80

50

40

30

20

10

01995 1997 1999 2001 2003

PC and On-Line Households in the U.S., 1995–2003(In millions)

On-line householdsHouseholds with PCs

FIGURE 5.15 Trends in home computing.Source: Statistical Abstract of the United States.

benn.8206.05.pgs 12/15/06 8:23 AM Page 364

Page 45: Chapter 5

5D Graphics in the Media 365

the bars within each year. In 1995, the number of online homes (about 10 million) wasless than one-third the number of homes with computers (about 33 million). By 2003,the number of online homes (about 62 million) was about 90% of the number ofhomes with computers (about 70 million). This tells us that a higher percentage ofcomputer users are going online. Now try Exercises 23–24.

Stack PlotsAnother common type of graph, called a stack plot, shows different data sets in a ver-tical stack. Figure 5.16 uses a stack plot to show trends in death rates (deaths per100,000 people) for four diseases since 1900. Each disease has its own color-codedregion, or wedge; note the importance of the legend. The thickness of a wedge at aparticular time tells you its value at that time: When a wedge is thick it has a largevalue, and when it is thin it has a small value.

Pneumonia

Cardiovascular

Tuberculosis

Cancer

180

620

In a stack plot, the thickness of a wedgeat a particular time tells you its value.

For 1980, the top of the cardiovascular wedge is at about 620 along the vertical axis …

… and the bottom is at about 180. So the 1980death rate for cardiovascular disease was

about 620 – 180 = 440 (deaths per 100,000).

900

600

400

700

800

500

300

100

0

200

1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000

Death Rates for Various Diseases: 1900–2004

Year

Dea

ths

per

100,

000

FIGURE 5.16 A stack plot showing trends in death rates from four diseases.

❉EXAMPLE 2 Stack PlotBased on Figure 5.16, what was the death rate for cardiovascular disease in 1980? Dis-cuss the general trends visible on this graph.

SOLUTION For 1980, the cardiovascular wedge extends from about 180 to 620 onthe vertical axis, so its thickness is about 440. Thus, the death rate in 1980 for cardio-vascular disease was about 440 deaths per 100,000 people. The graph shows severalimportant trends. First, the downward slope of the top wedge shows that the overalldeath rate from these four diseases decreased substantially, from nearly 800 deaths per100,000 in 1900 to about 525 in 2003. The drastic decline in the thickness of thetuberculosis wedge shows that this disease was once a major killer, but has been nearly

benn.8206.05.pgs 10/12/07 4:01 PM Page 365

Page 46: Chapter 5

366 CHAPTER 5 Statistical Reasoning

wiped out since 1950. Meanwhile, the cancer wedge shows that the death rate fromcancer rose steadily until the mid-1990s, but has dropped somewhat since then.

Now try Exercises 25–28.

Graphs of Geographical DataWe are often interested in geographical patterns in data. Figure 5.17 shows one com-mon way of displaying geographical data. In this case, the data on per capita (per per-son) income are shown state by state. The legend explains that different colorsrepresent different income levels. Similar colors are used for similar income levels.Thus, it is easy to see that income levels tend to be highest in the northeast and lowestin the south.

FL

NM

DE

DCMD

TX

OK

KS

NE

SD

NDMT

WY

COUT

ID

AZ

AK

NV

WA

CA

OR

KY

ME

NY

PA

MI

VT

NHMA

RICT

VAWV

OHINIL

NCTN

SC

ALMS

AR

LA

HI

MO

IA

MN

WI

NJ

GA

Key:

State Per Capita Income

$20,000–$24,999$25,000–$29,999$30,000–$34,999$35,000–$39,999$40,000–$44,999

FIGURE 5.17 Per capita income in the 50 states (2002).Source: U.S. Department of Commerce.

By the WaySince the mid-1980s,there has been a smallbut noticeable resur-gence of tuberculosis inthe United States. Part ofthe resurgence is due tonew strains of the dis-ease that resist mostcommon drug treatments.

The display in Figure 5.17 works well because each state is associated with aunique income level. For data that vary continuously across geographical areas, acontour map is more convenient. Figure 5.18 shows a contour map of temperatureover the United States at a particular time. Each of the contours connects locationswith the same temperature. For example, the temperature is 50°F everywhere alongthe contour labeled 50° and 60°F everywhere along the contour labeled 60°F.Between these two contours, the temperature is between 50°F and 60°F. Note that inregions where contours are tightly spaced, there are greater temperature changes. Forexample, the closely packed contours in the northeast indicate that the temperaturevaries substantially over small distances. To make the graph easier to read, the regionsbetween adjacent contours are color-coded.

benn.8206.05.pgs 12/15/06 8:23 AM Page 366

Page 47: Chapter 5

5D Graphics in the Media 367

FL

NM

DEMD

TX

OK

KS

NE

SD

NDMT

WY

COUT

ID

AZ

NV

WA

CA

OR

KY

ME

NY

PA

MI

VTNHMA

RICT

VAWV

OHINIL

NCTN

SCALMS

AR

LA

MO

IA

MNWI

NJ

GA

20°F

30°F

40°F

40°F50°F

60°F

70°F

50°F

60°F

70°F

80°F

40°F

30°F

20°F

Widely separated contours mean largeregions have nearly the same temperature.

Closely packed contours mean a largetemperature difference over a short distance.

FIGURE 5.18 A contour map of temperature.

❉EXAMPLE 3 Interpreting Geographical DataStudy Figures 5.17 and 5.18, using them to answer the following questions.

a. Which state(s) had the highest per capita income in 2002?b. Were there any temperatures above 80°F in the United States on the date

shown in Figure 5.18? If so, where?

SOLUTION

a. Connecticut was the only state with a per capita income in the highest cate-gory shown on the graph ($40,000–$44,999), so it had the highest per capitaincome. (The District of Columbia was also in this category, but it is not astate.)

b. The 80° contour passes through southern Florida, so the parts of Floridasouth of this contour had a high temperature above 80°.

Now try Exercises 29–30. ➽

The greatest value ofa picture is when itforces us to noticewhat we neverexpected to see.

—JOHN TUKEY

Time out to thinkLook for a weather map in today’s news. How are the temperature contoursshown? Interpret the temperature data.

benn.8206.05.pgs 10/12/07 4:01 PM Page 367

Page 48: Chapter 5

Three-Dimensional GraphicsToday, computer software makes it easy to give almost any graph athree-dimensional appearance. For example, Figure 5.19 shows thebar graph of Figure 5.3, but “dressed up” with a three-dimensionallook. It may look nice, but the three-dimensional effects arepurely cosmetic. They don’t provide any information that wasn’talready in the two-dimensional graph in Figure 5.3. As thisexample shows, many “three-dimensional” graphics really onlymake two-dimensional data look a little fancier.

In contrast, each of the three axes in Figure 5.20 carries distinctinformation, making it a true three-dimensional graph. Researchersstudying migration patterns of a bird species (the Bobolink) countedthe number of birds flying over seven New York cities throughoutthe night. As shown on the inset map, the cities were aligned east-west so that the researchers would learn what parts of the state thebirds flew over, and at what times of night, as they headed south for

368 CHAPTER 5 Statistical Reasoning

0

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

6

7

8

9

A B C

Grade

Freq

uen

cy o

f gra

de

Essay Grade Data

D F

FIGURE 5.19 This graph has a three-dimensionalappearance, but shows only two-dimensional data.

CubaAlfred Richford

Oneonta

Jefferson

NEW YORK

87

65

43

21

70

60

50

40

30

20

10

40

30

20

10

0

Number of birds

Source: Bill Evans/Cornell Laboratory of Ornithology

SONIC MAPPING TRACES BIRD MIGRATION

JeffersonOneontaRichfordIthaca

AlfredCuba

Beaver Dams

Hours after 8:30 p.m.

Sensors across New York State counted each occurrence of the nocturnal flight call of thebobolink to trace the fall migration on the night of Aug. 28–29, 1993. The data showed theheaviest swath passing over the eastern part of the state.

Ithaca

Beaver Dams

FIGURE 5.20 This graph shows true three-dimensional data.Source: New York Times.

benn.8206.05.pgs 12/15/06 8:23 AM Page 368

Page 49: Chapter 5

5D Graphics in the Media 369

the winter. Thus, the three axes measure number of birds, time of night, and east-westlocation.

❉EXAMPLE 4 Three-Dimensional Bird MigrationBased on Figure 5.20, at about what time was the largest number of birds flyingover the east-west line marked by the seven cities? Over what part of New York didmost of the birds fly? Approximately how many birds passed over Oneonta around12:00 midnight?

SOLUTION The number of birds detected in all the cities peaked between 3 and5 hours after 8:30 p.m., or between about 11:30 p.m. and 1:30 a.m. More birds flewover the two easternmost cities of Oneonta and Jefferson than over cities farther west.Thus, most of the birds were flying over the eastern part of the state. To answer thespecific question about Oneonta, note that 12:00 midnight is the midpoint of timecategory 4. On the graph, this time aligns with the dip between peaks on the line atOneonta. Looking across to the number of birds axis, we see that about 30 birds wereflying over Oneonta at that time. Now try Exercises 31–39.

Combination GraphicsAll of the graphic types we have studied so far are common and fairly easy to create.But the media today are often filled with many varieties of even more complex graph-ics. For example, Figure 5.21 shows a graphic concerning the participation of womenin the summer Olympics. This single graphic combines a line chart, many pie charts,and numerical data. It is certainly a case of a picture being worth far more than athousand words.

Women participating

’081900 ’16 ’24 ’32 ’40 ’48 ’56 ’64 ’72 ’80 ’88 ’96’04 ’12 ’20

Percentage of women participants

Total number of women participating

’28 ’36 ’44 ’52 ’60 ’68 ’76 ’84 ’92 ’00 ’0433 11 14

9.4 16.113.3

14.8

50 86 1082 6 6 14 15 19 26 33 4325 29 39 49

1.81.6% 4.4 9.00.9 2.2 2.99.6 8.1

10.5 11.414.2

20.7

62

21.5

25.8

28.8

34.2

42.0

44.0%

23.0

98 121 135

Number of events for women

The Ever-Growing Presence of Women in Summer Olympics

Source: International Olympic Committee

nogames

nogames

0

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

4,500

5,000

FIGURE 5.21 Source: Adapted from The New York Times.

benn.8206.05.pgs 10/11/07 1:28 PM Page 369

Page 50: Chapter 5

A Few Cautions about GraphicsAs we have seen, graphics can offer clear and meaningful summaries of statistical data.However, even well-made graphics can be misleading if we are not careful in inter-preting them, and poorly made graphics are almost always misleading. Moreover,some people use graphics in deliberately misleading ways. Here, we discuss a few ofthe more common ways in which graphics can lead us astray.

Perceptual DistortionsMany graphics are drawn in a way that distorts our perception of them. Figure 5.22shows one of the most common types of distortion. The dollar-shaped bars are usedto represent the declining value of the dollar over time. The lengths of the bars repre-sent the data, but our eyes tend to focus on the areas of the bars. For example, the bot-tom bar is supposed to show that a dollar in 2005 was worth only 42% as much as adollar in 1980. Its length is indeed 42% that of the top bar, but its area is muchsmaller in comparison (about 18% of the area of the top bar). This gives the percep-tion that the value of the dollar shrank even more than it really did.

Now try Exercises 42–43.

Watch the ScalesFigure 5.23a shows the percentage of college students between 1910 and 2005 whowere women. At first glance, it appears that this percentage grew by a huge marginafter about 1950. But the vertical axis scale does not begin at zero and does not end at100%. The increase is still substantial but looks far less dramatic if we redraw thegraph with the vertical axis covering the full range of 0 to 100% (Figure 5.23b). Froma mathematical point of view, leaving out the zero point on a scale is perfectly honestand can make it easier to see small-scale trends in the data. Nevertheless, as this exam-ple shows, it can be visually deceptive if you don’t study the scale carefully.

Now try Exercises 44–45. ➽

370 CHAPTER 5 Statistical Reasoning

❉EXAMPLE 5 Olympic WomenDescribe three trends shown in Figure 5.21.

SOLUTION The line chart shows that the total number of women competing in thesummer Olympics has risen fairly steadily, especially since the 1960s, reaching nearly5000 in the 2004 games. The pie charts show that the percentage of women among allcompetitors has also increased, reaching 44% in the 2004 games. The bold red num-bers at the bottom show that the number of events for women has also increased dra-matically, reaching 135 in the 2004 games.

Now try Exercises 40–41. ➽

Time out to thinkDo you think the upward trend of the pie charts in Figure 5.21 will continue over thenext few Olympic games? Why or why not?

2005 � $0.42

1980 � $1.00

1990 � $0.63

FIGURE 5.22 The lengths ofthe dollars are proportional totheir spending power, but oureyes are drawn to the areas,which decline more than thelengths.

benn.8206.05.pgs 12/15/06 8:23 AM Page 370

Page 51: Chapter 5

5D Graphics in the Media 371

30

35

40

45

50

55

60

1920 1940 1960 1980 2000 1920 1940 1960 1980 2000

Women as a Percentage of All College StudentsP

erce

nt w

omen

Per

cent

wom

en

Year Year(a) (b)

01020

30405060708090

100

FIGURE 5.23 Both graphs show the same data, but they look very different because their verticalscales have different ranges.Source: National Center for Education Statistics and Bureau of Labor Statistics.

100

50

1950

1960

1970

1980

1990

2000

Bill

ions

of

calc

ulat

ions

per

sec

ond

Year(b)

Computer Speed

Cal

cula

tions

per

sec

ond

102

105

108

1011

1950

1960

1970

1980

1990

2000

Year

0

(a)FIGURE 5.24 Both graphs show the same data, but the one on the left uses an exponential scale.

Sometimes the scale may not be deceptive, but still requires care to avoid misinter-pretation. Consider Figure 5.24a, which shows how the speeds of the fastest comput-ers have increased with time. At first glance, it appears that speeds have beenincreasing linearly. For example, it might look as if the speed increased by the sameamount from 1990 to 2000 as it did from 1950 to 1960. However, if we look closely,we see that each tick mark on the vertical scale represents a tenfold increase in speed.Now we see that computer speed grew from about 1 to 100 calculations per secondbetween 1950 and 1960, and from about 100 million to 10 billion calculations per sec-ond between 1990 and 2005. This type of scale is called an exponential scale (orlogarithmic scale), because each unit corresponds to a power of 10. In general, expo-nential scales are useful for displaying data that vary over a huge range of values. Youcan see this usefulness by looking at Figure 5.24b, where the computer data have beenrecast with an ordinary scale. Because the speeds have grown so rapidly, the ordinaryscale makes it impossible to see any detail in the early years shown on the graph.

By the WayIn 1965, Intel founderGordon E. Moore pre-dicted that advances intechnology would allowcomputer chips to dou-ble in power roughlyevery two years. Thisidea is now calledMoore’s law, and it hasheld fairly true ever sinceMoore first stated it.

Now try Exercise 46. ➽

benn.8206.05.pgs 12/15/06 8:23 AM Page 371

Page 52: Chapter 5

372 CHAPTER 5 Statistical Reasoning

Percentage Change GraphsIs college getting more or less expensive? A quick look at Figure 5.25 might give theimpression that the cost for private colleges has been holding fairly steady while thecost for public colleges fell steeply in 2006 after rising in prior years.

But look more closely and you’ll see that this is not the case at all. The vertical axisin Figure 5.25 represents the percentage increase in costs. A flat graph means only thatcosts increased by the same percentage each year, not that costs held steady. Similarly,the drop in 2006 for public colleges means only that the cost rose by less in that yearthan in the preceding years.

In fact, actual costs (not adjusted for inflation) for both public and private collegeshave risen substantially with time, as shown in Figure 5.26. Moreover, because therate of inflation (as measured by the Consumer Price Index; see Unit 3D) has beenless than the rate of increase in college costs, the real cost of public colleges has steadilyrisen. Graphs that show percentage change are very common, particularly with eco-nomic data. Although they are perfectly honest, you can be misled unless you inter-pret them with great care.

Perc

enta

ge c

hang

e fr

ompr

evio

us a

cade

mic

yea

r

’95

–’96

’96

–’97

’97

–’98

’98

–’99

’99

–’00

’00

–’01

’01–

’02

’02

–’03

’03

–’04

’04

–’05

’05

–’06

0

4%

12%

8%

16%

Changes in College Costs

Public

Private

FIGURE 5.25 This graph shows the rate of increasewith time in tuition and fees at four-year public andprivate colleges.Source: The College Board.

0

$4,000

$16,000

$8,000

$20,000

$12,000

$24,000

Actual College Costs

’95

–’96

’96

–’97

’97

–’98

’98

–’99

’99

–’00

’00

–’01

’01–

’02

’02

–’03

’03

–’04

’04

–’05

’05

–’06

Public Private

FIGURE 5.26 This graph shows the change with time inthe actual cost (not adjusted for inflation) of tuition andfees at four-year public and private colleges.You canuse the rise in these costs to calculate the percentageincreases shown in Figure 5.25.Source: The College Board.

Time out to thinkBased on Figure 5.24a, can you predict the speed of the fastest computers in 2015?Could you make the same prediction with Figure 5.24b? Explain.

PictographsPictographs are graphs embellished with additional artwork. The artwork may makethe graph more appealing, but it can also distract or mislead. Figure 5.27 is a picto-graph showing the rise in world population from 1804 to 2054 (numbers for futureyears are based on United Nations projections). The lengths of the bars correspondcorrectly to world population for the different years listed. However, the artisticembellishments of this graph are deceptive in several ways. For example, your eyemay be drawn to the figures of people lining the globe. Because this line of peoplerises from the left side of the pictograph to the center and then falls, it might give the

Now try Exercise 47. ➽

benn.8206.05.pgs 10/12/07 3:54 PM Page 372

Page 53: Chapter 5

5D Graphics in the Media 373

impression that future world population will be declining. In fact, the line of people ispurely decorative and carries no information.

Perhaps the most serious problem with this pictograph is that it makes it appearthat world population has been rising linearly. However, notice that the time intervalson the horizontal axis are not uniform in size. For example, the interval between thebars for 1 billion and 2 billion people is 123 years (from 1804 to 1927), but the inter-val between the bars for 5 billion and 6 billion people is only 12 years (from 1987 to1999).

Pictographs are very common, but as this example shows, you have to study themcarefully to extract the essential information and not be distracted by the cosmeticeffects. Now try Exercise 48. ➽

Billions of people

1804 1927 1960 1974 1987 1999 2013 2028 2054

World Population(in billions of people)

12

34

56

78

999

Billions of people

1804 1927 1960 1974 1987 1999 2013 2028 2054

World Population(in billions of people)

12

34

56

78

999

FIGURE 5.27 Source: Data from United Nations Population Divi-sion, World Population Prospects.

By the WayIf world population con-tinues to double at thesame rate as in the late20th century, it will reach34 billion by 2100 and192 billion by 2200. Byabout 2650, human pop-ulation would be solarge that it would not fiton the Earth, even ifeveryone stood elbow-to-elbow everywhere.

EXERCISES 5D

QUICK QUIZChoose the best answer to each of the following questions.Explain your reasoning with one or more complete sentences.

1. Consider Figure 5.15. Suppose you were given data for thenumber of households with high-speed Internet access ineach of the years shown. How would you add these data tothe graphic?

a. Add a third bar for each year.

b. Stack the high-speed data on top of the on-line bars.

c. Put a small pie chart on top of each pair of bars.

2. Consider Figure 5.16. According to this graph, the approx-imate death rate from tuberculosis in 1950 was

a. 2 per 100,000.

b. 20 per 100,000.

c. 200 per 100,000.

3. Consider Figure 5.17. According to this graph, what is percapita income in Oregon (OR)?

a. between $25,000 and $30,000

b. exactly $25,000

c. It cannot be determined from the graph.

4. Consider Figure 5.18. According to this map, the tempera-ture in Iowa (IA) was

a. 30°F. b. 40°F. c. between 30°F and 40°F.

benn.8206.05.pgs 12/15/06 8:23 AM Page 373

Page 54: Chapter 5

374 CHAPTER 5 Statistical Reasoning

5. Consider Figure 5.18. Notice the small loop labeled 40°Fnear the southeast corner of Idaho (ID). What can you sayabout temperatures within that small region?

a. They were 40°F.

b. They were higher than 40°F but lower than 50°F.

c. They could have been anything above 40°F.

6. Suppose you are given a contour map showing elevation(altitude) for the state of Vermont. The region with themost closely spaced contours represents

a. the highest altitude.

b. the lowest altitude.

c. the steepest terrain.

7. Consider Figure 5.21. Approximately how many womenparticipated in the 1948 Olympics?

a. 19 b. 9.4 c. 450

8. Consider Figure 5.23a. The way the graph is drawn

a. makes the graph completely invalid.

b. makes the changes from one decade to the next appearlarger than they really were.

c. makes it more difficult to see the upward and downwardtrends that have occurred over time.

9. Consider Figure 5.24a. Moving one tick mark up the verti-cal axis represents an increase in computer speed of

a. 1 billion calculations per second.

b. a factor of 2.

c. a factor of 10.

10. Consider Figure 5.25. In years where the graph slopesdownward with time,

a. college costs decreased.

b. the cost of college rose, but by a lower percentage thanin previous years.

c. the cost of college rose, but the new cost represented alower proportion of the average person’s income.

REVIEW QUESTIONS11. Briefly describe the construction and use of multiple bar

graphs and stack plots.

12. What are geographical data? Briefly describe at least twoways to display geographical data. Be sure to explain themeaning of contours on a contour map.

13. What are three-dimensional graphics? Explain the differ-ence between graphics that only appear three-dimensionaland those that show truly three-dimensional data.

14. Describe how perceptual distortions can arise in graphicsand how they can be misleading.

15. How can graphics be misleading when the scales do notgo all the way to zero? Why are such graphics sometimesuseful?

16. What is an exponential scale? When is an exponential scaleuseful?

17. Explain how a graph that shows percentage change canshow descending bars (or a descending line) even when thevariable of interest is increasing.

18. What is a pictograph? How can a pictograph enhance agraph? How can it make a graph misleading?

DOES IT MAKE SENSE?Decide whether each of the following statements makes sense(or is clearly true) or does not make sense (or is clearly false).Explain your reasoning.

19. My bar chart contains more information than yours,because I made my bars three-dimensional.

20. I used an exponential scale because the data values for mycategories ranged from 7 to 450,000.

21. There’s been only a very slight rise in our stock price overthe past few months, but I wanted to make it look dramaticso I started the vertical scale from the lowest price ratherthan from zero.

22. A graph showing the yearly rate of increase in the numberof computer users has a slight downward trend, eventhough the actual number of users is rising.

BASIC SKILLS & CONCEPTS23. Net Grain Production. Net grain production is the dif-

ference between the amount of grain a country producesand the amount of grain its citizens consume. It is positiveif the country produces more than it consumes, and nega-tive if the country consumes more than it produces. Fig-ure 5.28 shows the net grain production of four countriesin 1990 and projected for 2030.

a. Which of the four countries had to import grain to meetits needs in 1990?

b. Which of the four countries are expected to need toimport grain to meet needs in 2030?

c. Given that India and China are the world’s two mostpopulous countries, what does this graph tell you abouthow world agriculture will have to change between nowand 2030?

benn.8206.05.pgs 9/29/07 11:53 AM Page 374

Page 55: Chapter 5

5D Graphics in the Media 375

Bachelor’sdegree

Advanceddegree

Somecollege/

Associatedegree

Highschool

Not highschool

graduate

Overall

10,000

0

20,000

30,000

40,000

50,000

60,000

70,000

$80,000

200019951985

Median Earnings of Workers 21 Years and Over byEducational Attainment, 1985 to 2000

FIGURE 5.29 Source: TIME Almanac, 1999, p. 886 andU.S. Census Bureau.

–250

–200

–150

–100

–50

0

50

100

U.S. China India Russia

19902030

Mil

lio

ns

of t

on

s

Net Grain Production,1990 and 2030 (projected)

FIGURE 5.28

0

200

400

600

800

1000

1200

1400

1900

1910

1920

1930

1940

1950

1960

1970

1980

1990

2000

Women Men

Year

Col

lege

gra

duat

es (t

hous

ands

)

College Degrees Awarded

FIGURE 5.30

24. Education and Earnings. Figure 5.29 shows medianearnings in three different years according to level of edu-cation.

a. Briefly explain the meaning of each of the three sets ofbars on the graph.

b. Compare in words the change in earnings between 1985and 2000 for people with bachelor’s degrees to thechange for people who did not graduate from high

school. What do these data say about the value of a col-lege education?

c. The graph has a three-dimensional appearance. Is itshowing true three-dimensional data, or is the appear-ance purely cosmetic? Do you think the three-dimensional appearance helps or hinders the display?

25. Stack Plot. Answer the following based on Figure 5.16.

a. State whether the death rate for each of the four diseasesindividually decreased or increased between 1900 and2003.

b. When was the death rate due to cardiovascular diseasesthe greatest, and what was it?

c. What was the death rate due to cancer in 2000?

d. Based on the trends in the graph, speculate on which ofthese four diseases will be responsible for the mostdeaths in 2050. Explain.

26. College Degrees. Figure 5.30 shows the numbers of col-lege degrees awarded to men and women over time.

a. Estimate the numbers of college degrees awarded tomen and to women (separately) in 1930 and in 2005.

b. Did men or women earn more degrees in 1980? Didmen or women earn more degrees in 2005?

c. During what decade did the total number of degreesawarded increase the most?

d. Compare the total numbers of degrees awarded in 1950and 2005.

e. Do you think the stack plot is an effective way to displaythese data? Briefly discuss other ways that might havebeen used instead.

benn.8206.05.pgs 12/15/06 8:23 AM Page 375

Page 56: Chapter 5

376 CHAPTER 5 Statistical Reasoning

Less than 10%

Probability That a Black Student Would Have White Classmates

20% – 40%40% – 60%60% – 80%More than 80%Counties with no data or no black students

FIGURE 5.32 Source: New York Times, April 2, 2000.

27. Federal Spending. Figure 5.31 shows the changes inmajor spending categories of the federal budget. (Paymentsto individuals includes Social Security and Medicare; netinterest represents interest payments on the national debt;all other represents non-defense discretionary spending.)

Interpret the stack plot and discuss some of the trends itreveals.

a. Find the percentage of the budget that went to netinterest in 1990, 1995, and 2005.

b. Find the percentage of the budget that went to defensein 1960, 1980, and 2005.

c. Find the percentage of the budget that went to pay-ments to individuals in 1980, 2000, and 2005.

28. Federal Trends. Consider Figure 5.31. Summarize atleast three trends shown in the figure.

29. School Segregation. One way of measuring segregationis to determine the likelihood that a black student will havewhite classmates. A New York Times study found that, bythis measure, segregation increased significantly in the1990s. Figure 5.32 shows the probability that a black stu-dent had white classmates, by county, during the1997–1998 academic year. Do there appear to be any sig-nificant regional differences? Can you pick out any differ-ences between urban and rural areas? Discuss possibleexplanations for a few of the trends that you see in the figure.

Payments to individuals

National defense

Net interest

All other

20

’60 ’65 ’70 ’75 ’80 ’85 ’90 ’95 ’05’00

40

60

80

100

Per

cent

Year

Percentage Composition of Federal Government Outlays

FIGURE 5.31 Source: Office of Management and Budget.

benn.8206.05.pgs 12/15/06 8:23 AM Page 376

Page 57: Chapter 5

5D Graphics in the Media 377

N

S

EW

FIGURE 5.33

18 million homes 73 million homes

Homes with Cable TV

1980

2005

FIGURE 5.35

U.S. Age Distribution. Parts (a) and (b) of Figure 5.34 displaythe age distribution of the U.S. population from 1960 to 2050

FIGURE 5.34

35

30

25

20

15

10

5

0

1960

1970

1980

1990

2000

2010

2050

Year(a)

Percent of population

�5

�65

5–1718–2425–4445–65

U.S. Age Distribution

�65

�5

25–4418–245–17

35

30

25

20

15

10

5

0

1960

1970

1980

1990

2000

2010

2050

Year(b)

Percent of population

45–65

30. Contour Elevations. Contour maps are often used toshow geographical elevations. Figure 5.33 shows elevationcontours around Boulder, Colorado. Discuss a few key fea-tures shown on the map.

(projected) in two different ways; the age categories are in oppo-site order so that all of the data can be viewed. Use these graphsto answer the questions in Exercises 31–39.

31. Briefly describe the meaning of each bar.

32. Do these graphs display true three-dimensional data, or isthe three-dimensional look cosmetic?

33. How has the percentage of the youngest Americanschanged since 1960?

34. Estimate the percentage of 5- to 17-year-olds in 1960 andin 2000.

35. Estimate the percentage of 45- to 65-year-olds in 1960 andin 2010.

36. In which year did (will) the 25- to 44-year-old group com-prise the largest percentage of the population?

37. In which year did (will) the 45- to 65-year-old group com-prise the largest percentage of the population?

38. Which age group is expected to see the greatest increasebetween 2000 and 2050?

39. Describe the most significant changes that you see in theU.S. population between 1960 and 2050.

40. Extending the Olympic Graph. Make a list of all thedata you would need in order to extend the graph inFigure 5.21 to the 2008 Olympics and beyond.

41. Data for 2008 Olympics. Use the Web to find the datayou need to extend Figure 5.21 (see Exercise 40) throughthe 2008 Olympics (assuming they have occurred by thetime you read this problem). Then photocopy the graphand add the new data on the same graph.

42. Volume Distortion. Figure 5.35 uses television sets torepresent the numbers of homes with cable in 1980 and

benn.8206.05.pgs 9/29/07 11:53 AM Page 377

Page 58: Chapter 5

378 CHAPTER 5 Statistical Reasoning

2005. Note that the heights of the TVs represent the num-bers of homes. Briefly explain how the graph creates a per-ceptual distortion that exaggerates the true change in thenumber of homes with cable.

43. Three-Dimensional Pies. The pie charts in Figure 5.36represent the percentage of Americans in three age cate-gories in 1990 and 2050 (projected). Briefly explain howthe three-dimensional effects create a perceptual distortionin this case. Why would flat pies (without the three-dimen-sional effects) give a more accurate representation of thedata?

46. Cellular Phone Users. The following table shows thenumber of cell phone subscribers in the United States forselected years between 1990 and 2003. Display the datausing both an ordinary vertical scale and an exponentialvertical scale. (Hint: For the exponential scale, use tickmarks at 1 million, 10 million, and 100 milllion.) Whichgraph is more useful? Why?

Year Subscribers (millions)

1990 5.3

1995 33.8

1997 55.3

1998 69.2

1999 86.0

2000 109.5

2001 128.3

2002 140.8

2003 158.7

47. Rising College Costs. Refer to Figures 5.25 and 5.26 toanswer the following questions.

a. In what academic year did public college costs rise bythe largest percentage? What was the percentageincrease?

b. In the same year (as part a), what was the percentageincrease in private college costs?

c. In the same year, which had the larger increase in actualcost (in dollars): public or private colleges? Explain.

48. World Population. Recast Figure 5.27 with a proper hor-izontal axis. What trends are clear in your new graph thatare not clear in the original? Explain.

1990 Age Distribution

Others

65 – 84 85+

2050 Age Distribution

Others

65 – 84 85+

FIGURE 5.36Source: U.S. Census Bureau.

5 0 0

5 5 0

6 0 0

6 5 0

7 0 0

7 5 0

8 0 0

Men Women

Ave

rage

wee

kly

earn

ings

FIGURE 5.37 Source: U.S. Census Bureau.

1 7 0 1 8 0 1 9 0 2 0 0 2 1 0

Oldsmobile

Lexus

Saab

Lincoln

Braking distance (feet)

FIGURE 5.38 Source: Car and Driver.

44. Comparing Earnings. Figure 5.37 compares the averageweekly earnings of men and women. Identify any mislead-ing aspects of the display. Draw the display in a fairer way.

45. Braking Distances. Figure 5.38 shows the braking dis-tance for four different cars. Discuss the ways in which itmight be deceptive. How much greater is the braking dis-tance of Lincolns than the braking distance of Oldsmo-biles? Draw the display in a fairer way.

benn.8206.05.pgs 12/15/06 8:23 AM Page 378

Page 59: Chapter 5

5D Graphics in the Media 379

FURTHER APPLICATIONSCreating Graphics. Exercises 49–52 give tables of real data.For each table, make a graphical display of the data. You maychoose any graphic type that you feel is appropriate to the dataset. In addition to making the display, write a few sentencesexplaining why you chose this type of display and a few sen-tences describing interesting patterns in the data.

49. Percent Never Married. The following table shows thepercentages, for 1970 and 2003, of men and women in var-ious age categories who were never married.

Women 1970 2003 Men 1970 2003

20–24 35.8 75.4 20–24 54.7 86.0

25–29 10.5 40.3 25–29 19.1 54.6

30–34 6.2 22.7 30–34 9.4 33.1

35–39 5.4 14.3 35–39 7.2 21.8

40–44 4.9 12.2 40–44 6.3 17.4

Source: U.S. Census Bureau.

50. Alcohol on the Road. The following table gives the totalnumber of automobile fatalities and the number of fatali-ties in which alcohol was involved for 1982 to 2004. Allfigures are in thousands of deaths.

Year Total Alcohol

1982 43,945 26,173

1984 44,257 24,762

1986 46,087 25,017

1988 47,087 23,833

1990 44,599 22,587

1992 39,250 18,290

1994 40,716 17,308

1996 42,065 17,749

1998 41,501 16,673

2000 41,945 17,380

2002 42,815 17,419

2004 42,643 17,013

Source: National Highway Traffic Safety Administration.

51. Daily Newspapers. The following table gives the numberof daily newspapers and their total circulation (in millions)for selected years since 1920.

Number of CirculationYear daily newspapers (millions)

1920 2042 27.8

1930 1942 39.6

1940 1878 41.1

1950 1772 53.9

1960 1763 58.8

1970 1748 62.1

1980 1747 62.2

1990 1611 62.3

2000 1485 56.1

2003 1456 55.2

Source: Editor & Publisher.

52. Firearm Fatalities. The following table summarizesdeaths due to firearms in different nations in a recent year.

FatalCountry Total Homicides Suicides accidents

U.S. 35,563 15,835 18,503 1225

Germany 1197 168 1004 25

Canada 1189 176 975 38

Australia 536 96 420 20

Spain 396 76 219 101

U.K. 277 72 193 12

Sweden 200 27 169 4

Vietnam 131 85 16 30

Japan 93 34 49 10

Source: Coalition to Stop Gun Violence.

53. Seasonal Effects on Schizophrenia? The graph inFigure 5.39 shows data regarding the relative risk of schiz-ophrenia among people born in different months.

a. Note that the scale of the vertical axis does not includezero. Sketch the same risk curve using an axis thatincludes zero. Comment on the effect of this change.

b. Each value of the relative risk is shown with a dot at itsmost likely value and with an “error bar” indicating therange in which the data value probably lies. The study

benn.8206.05.pgs 12/15/06 8:23 AM Page 379

Page 60: Chapter 5

380 CHAPTER 5 Statistical Reasoning

concludes that “the risk was also significantly associatedwith the season of birth.” Given the size of the errorbars, does this claim appear justified? (Is it possible todraw a flat line that passes through all of the error bars?)

WEB PROJECTSFind useful links for Web Projects on the text Web site:www.aw.com/bennett-briggs

55. Weather Maps. Many Web sites offer contour maps withcurrent weather data. For example, you can use the YahooWeather site to generate many different contour weathermaps. Generate at least two contour weather maps and dis-cuss what they show.

56. Cancer Cure. As shown in Figure 5.16, cancer is one ofthe leading causes of death today. Nevertheless, scientistshave made great progress in treating many forms of cancer.Go to the American Cancer Society Web site and investi-gate research into cancer cures. Read about one or tworecent studies, and write a short report on what you learn.Be sure to include graphics in your report.

57. USA Snapshot. The USA Today Web site offers a dailypictograph for its “USA Snapshot.” Study today’s snapshot.Briefly discuss its purpose and effectiveness.

IN THE NEWS58. News Graphics. Find a recent news report that shows a

multiple bar graph or stack plot. Comment on the effec-tiveness of the display. Could another display have beenused to depict the same data?

59. Geographical Data. Find an example of a graph of geo-graphical data in a recent news report. Comment on theeffectiveness of the display. Could another display havebeen used to depict the same data?

60. Three-Dimensional Effects. Find an example of a three-dimensional display in a recent news report. Are the datathree-dimensional or are the three-dimensional effectscosmetic? Comment on the effectiveness of the display.Could another display have been used to depict the samedata?

61. Graphic Confusion. Find an example in a recent newsreport of a graph that is misleading in one of the ways dis-cussed in this unit. Explain what makes the graph mislead-ing, and describe how it could have been drawn morehonestly.

62. Outstanding News Graph. Find a graph from a recentnews report that, in your opinion, is truly outstanding indisplaying data visually. Discuss what the graph shows, andexplain why you think it is so outstanding.

0.6

0.7

0.8

0.9

1.0

1.1

1.2

1.3

1.4

January

February

Marc

hAprilM

ayJu

neJu

lyAugust

Septem

berOcto

berNovem

berD

ecember

Month of birth

Rel

ativ

e ri

sk

FIGURE 5.39 Source: New England Journal of Medicine.

54. Starting Salaries for Men and Women. Consider thedata in the table below showing the average startingsalaries for men and women with various levels of educa-tion. Construct a graphical display and write two para-graphs that demonstrate as clearly as possible the evidentdisparity in the salaries of men and women.

Male Female

Overall $44,726 $28,367

Not a HS graduate 21,447 14,214

HS graduate only 33,266 21,659

Some college 36,419 22,615

Associate’s degree 43,462 29,537

Bachelor’s degree 63,084 38,447

Master’s degree 76,896 48,205

Professional 136,128 72,445

Doctorate 95,894 73,516

Source: U.S. Census Bureau, 2003.

benn.8206.05.pgs 12/15/06 8:23 AM Page 380

Page 61: Chapter 5

5E Correlation and Causality 381

UNIT 5E Correlation and Causality

A major goal of many statistical studies is to determine whether one factor causesanother. For example, does smoking cause lung cancer? In this unit, we will discusshow statistics can be used to search for correlations that might suggest a cause-and-effect relationship. Then we’ll explore the more difficult task of establishing causality.

Seeking CorrelationWhat does it mean when we say that smoking causes lung cancer? It certainly does notmean that you’ll get lung cancer if you smoke a single cigarette. It does not even meanthat you’ll definitely get lung cancer if you smoke heavily for many years, since someheavy smokers do not get lung cancer. Rather, it is a statistical statement meaning thatyou are much more likely to get lung cancer if you smoke than if you don’t smoke.

Let’s try to understand how researchers learned that smoking causes lung cancer.Before they could investigate cause, researchers first needed to establish correlationsbetween smoking and cancer. The process of establishing correlations began withobservations. The early observations were informal. Doctors noticed that smokersmade up a surprisingly high proportion of their patients with lung cancer. This sug-gestion of a linkage led to carefully conducted studies in which researchers comparedlung cancer rates among smokers and nonsmokers. These studies showed clearly thatheavier smokers were more likely to get lung cancer. In more formal terms, we saythat there is a correlation between the variables amount of smoking and incidence of lungcancer. A correlation is a special type of relationship between variables, in which a riseor fall in one goes along with a corresponding rise or fall in the other.

Smoking is one of theleading causes ofstatistics.

—FLETCHER KNEBEL

DEFINITION

A correlation exists between two variables when higher values of one variableconsistently go with higher values of another or when higher values of one vari-able consistently go with lower values of another.

Here are a few other examples of correlations:

• There is a correlation between the variables height and weight for people. That is,taller people tend to weigh more than shorter people.

• There is a correlation between the variables demand for apples and price of apples.That is, demand tends to decrease as prices increase.

• There is a correlation between practice time and skill among piano players. That is,those who practice more tend to be more skilled.

Establishing a correlation between two variables does not mean that a change inone variable causes a change in the other. Thus, finding the correlation between smok-ing and lung cancer did not by itself prove that smoking causes lung cancer. We couldimagine, for example, that some gene predisposes a person both to smoking and tolung cancer. Nevertheless, identifying the correlation was the crucial first step inlearning that smoking causes lung cancer.

By the WaySmoking is linked tomany serious diseasesbesides lung cancer,including heart diseaseand emphysema. Smok-ing is also linked with lesslethal health conditionssuch as premature skinwrinkling and sexualimpotence.

benn.8206.05.pgs 12/15/06 8:23 AM Page 381

Page 62: Chapter 5

382 CHAPTER 5 Statistical Reasoning

DEFINITION

A scatter diagram is a graph in which each point represents the values of twovariables.

Time out to thinkSuppose there really were a gene that made people prone to both smoking andlung cancer. Explain why we would still find a strong correlation between smokingand lung cancer in that case, but would not be able to say that smoking causedlung cancer.

Scatter DiagramsTable 5.6 shows the production cost and gross receipts (total revenue from ticketsales) for the 15 biggest-budget science fiction and fantasy movies of all time (throughmid-2006). Movie executives presumably hope there is a favorable correlationbetween the production budget and the receipts. That is, they hope that spendingmore to produce a movie will result in higher box office receipts. But is there such acorrelation? We can look for a correlation by making a scatter diagram showing therelationship between the variables production cost and gross receipts.

TABLE 5.6 Biggest-Budget Science Fiction and Fantasy Movies

Production Cost Gross Receipts Movie (millions of dollars) (millions of dollars)

King Kong (2005) 207 218

Spider-Man 2 (2004) 200 373

Chronicles of Narnia (2005) 180 292

Waterworld (1995) 175 88

Van Helsing (2004) 170 120

Polar Express (2004) 170 172

Terminator 3 (2003) 170 150

Poseidon (2006) 160 52

Batman Begins (2005) 150 205

Harry Potter/Goblet of Fire (2005) 150 290

Armageddon (1998) 140 201

Men in Black 2 (2002) 140 190

Spider-Man (2002) 139 403

Final Fantasy: The Spirits Within (2001) 137 32

Hulk (2003) 137 132

Note: Gross receipts are for United States only; worldwide receipts are often sub-stantially higher. These figures are not adjusted for inflation.

benn.8206.05.pgs 12/15/06 8:23 AM Page 382

Page 63: Chapter 5

5E Correlation and Causality 383

0

50

100

150

200

250

300

350

400

450

Spider-Man 2

Waterworld

Hulk Van HelsingTerminator 3

King Kong

50 100 150 200Production cost (millions of dollars)

Gro

ss r

ecei

pts

(mill

ions

of d

olla

rs)

Batman Begins

Harry Potter/Goblet of Fire Chronicles of Narnia

Spider-Man

250

Poseidon

FIGURE 5.40 Scatter diagram for the data in Table 5.6.

The following procedure describes how we make the scatter diagram, which isshown in Figure 5.40:

1. We assign one variable to each axis, and we label each axis with values thatcomfortably fit the data. Here, we assign production cost to the horizontal axisand gross receipts to the vertical axis. We choose a range of $50 to $250 millionfor the production cost axis and $0 to $450 million for the gross receipts axis.

2. For each movie in Table 5.6, we plot a single point at the horizontal positioncorresponding to its production cost and the vertical position corresponding toits gross receipts. For example, the point for the movie Waterworld goes at aposition of $175 million on the horizontal axis and $88 million on the verticalaxis. The dashed lines on Figure 5.40 show how we locate this point.

3. (Optional) If we wish, we can label data points, as is done for selected points inFigure 5.40.

Types of CorrelationLook carefully at the scatter diagram for movies in Figure 5.40. The dots seem to bescattered about with no apparent pattern. In other words, at least for these big-budgetmovies, there appears to be little or no correlation between the amount of moneyspent producing the movie and the amount of money it earned in gross receipts.

Now consider the scatter diagram in Figure 5.41, which shows the weights (incarats) and retail prices of 23 diamonds. Here, the dots show a clear upward trend,indicating that larger diamonds generally cost more. The correlation is not perfect.For example, the heaviest diamond is not the most expensive. But the overall trendseems fairly clear. Because the prices tend to increase with the weights, we say thatFigure 5.41 shows a positive correlation.

Time out to thinkBy studying Table 5.6, associate each of the unlabeled data points in Figure 5.40with a particular movie.

Technical NoteWe often have somereason to think thatone variable dependsat least in part on theother. In the case ofFigure 5.40, we mightguess that grossreceipts shoulddepend on the pro-duction cost. Wetherefore call produc-tion cost the expla-natory variable andgross receipts theresponse variable,because the produc-tion cost might helpexplain the grossreceipts. The explana-tory variable is usuallyplotted on the hori-zontal axis and theresponse variable onthe vertical axis.

benn.8206.05.pgs 12/15/06 8:23 AM Page 383

Page 64: Chapter 5

384 CHAPTER 5 Statistical Reasoning

20

0

40

60

80

100

120

Infa

nt m

orta

lity

(dea

ths

per

1000

live

bir

ths)

Higher life expectancy generally goes with lowerinfant mortality, so this is a negative correlation.

50 60 70 80Life expectancy (years)

Bangladesh

Pakistan

EgyptIndia

Kenya

BrazilPeru

Guatemala

RussiaMexico

SouthKorea

Israel,CzechRepublic

Canada,Australia

Greece

FIGURE 5.42 A scatter diagram for life expectancy andinfant mortality.

In contrast, Figure 5.42 shows a scatter diagram for the variables life expectancy andinfant mortality in 16 countries. We again see a clear trend, but this time it is anegative correlation: Countries with higher life expectancy tend to have lower infantmortality.

Besides stating whether a correlation exists, we can also discuss its strength. Themore closely the data follow the general trend, the stronger is the correlation.

❉EXAMPLE 1 Inflation and UnemploymentPrior to the 1990s, most economists assumed that the unemployment rate and theinflation rate were negatively correlated. That is, when unemployment goes down,inflation goes up, and vice versa. Table 5.7 shows unemployment and inflation datafor the period 1990–2006. Make a scatter diagram for these data. Based on your dia-gram, does it appear that the data support the historical claim of a link between theunemployment and inflation rates?

By the WayIn statistics, thecorrelation coefficientprovides a quantitativemeasure of the strengthof a correlation. It isdefined to be 1 for aperfect (meaning alldata points lie on a sin-gle straight line) positivecorrelation, for a per-fect negative correla-tion, and 0 for nocorrelation.

21

RELATIONSHIPS BETWEEN TWO DATA VARIABLES

No correlation: There is no apparent relationship between the two variables.

Positive correlation: Both variables tend to increase (or decrease) together.

Negative correlation: The two variables tend to change in opposite directions,with one increasing while the other decreases.

Strength of a correlation: The more closely two variables follow the generaltrend, the stronger the correlation (which may be either positive or negative). In aperfect correlation, all data points lie on a straight line.

0

2,000

4,000

6,000

8,000

10,000

12,000

14,000

16,000

18,000

0 0.5 1 1.5 2 2.5

Pri

ce (d

olla

rs)

Weight (carats)

Higher weight generally goes with higher price, so this is a positive correlation.

FIGURE 5.41 A scatter diagram for diamond weightsand prices.

benn.8206.05.pgs 12/15/06 8:23 AM Page 384

Page 65: Chapter 5

5E Correlation and Causality 385

4 5 6Unemployment rate (%)

7 8

2

3

Infla

tion

rate

(%)

4

0

1

5

6

FIGURE 5.43 Scatter diagram for the data in Table 5.7.

SOLUTION We make the scatter diagram by plotting the variable unemployment rateon the horizontal axis and the variable inflation rate on the vertical axis. To make thegraph easy to read, we use values ranging from 3.5% to 8% for the unemploymentrate and from 0 to 6% for the inflation rate. Figure 5.43 shows the result. To the eye,there does not appear to be any obvious correlation between the two variables. (A cal-culation confirms that there is no appreciable correlation.) Thus, these data do notsupport the historical claim of a negative correlation between the unemployment andinflation rates.

❉EXAMPLE 2 Accuracy of Weather ForecastsThe scatter diagrams in Figure 5.44 show two weeks of data comparing the actualhigh temperature for the day with the same-day forecast (left diagram) and the three-day forecast (right diagram). Discuss the types of correlation on each diagram.

TABLE 5.7 U.S. Inflation and Unemployment

Unemployment Inflation Unemployment InflationYear Rate (%) Rate (%) Year Rate (%) Rate (%)

1990 5.6 5.4 1999 4.3 2.2

1991 6.8 4.2 2000 4.0 3.4

1992 7.5 3.0 2001 4.2 1.8

1993 6.9 3.0 2002 5.8 1.6

1994 6.1 2.6 2003 6.0 2.3

1995 5.6 2.8 2004 5.5 2.7

1996 5.4 3.0 2005 5.1 3.4

1997 4.9 2.3 2006 4.6 3.4

1998 4.6 2.3

Source: U.S. Bureau of Labor Statistics; 2006 data through May of that year.

Now try Exercises 23–24. ➽

benn.8206.05.pgs 12/15/06 8:23 AM Page 385

Page 66: Chapter 5

386 CHAPTER 5 Statistical Reasoning

Same-day forecast (°F) Three-day forecast (°F)

3020 6040 50

30

40

20

50

Act

ual

tem

per

atu

re (

°F)

Act

ual

tem

per

atu

re (

°F)

60

70

3020 40 50 60

30

20

40

50

60

70

FIGURE 5.44 Comparison of actual high temperatures with same-day and three-day forecasts.

SOLUTION Both scatter diagrams show a general trend in which higher predictedtemperatures mean higher actual temperatures. Thus, both show positive correla-tions. However, the points in the left diagram lie more nearly on a straight line, indi-cating a stronger correlation than in the right diagram. This makes sense, because weexpect weather forecasts to be more accurate on the same day than three days inadvance. Now try Exercises 25–26.

Possible Explanations for a CorrelationWe began by stating that correlations can help us search for cause-and-effect rela-tionships. But we’ve already seen that causality is not the only possible explanationfor a correlation. For example, the predicted temperatures on the horizontal axis ofFigure 5.44 certainly do not cause the actual temperatures on the vertical axis. Thefollowing box summarizes three possible explanations for a correlation.

❉EXAMPLE 3 Explanation for a CorrelationConsider the correlation between infant mortality and life expectancy in Figure 5.42.Which of the three possible explanations for a correlation applies? Explain.

SOLUTION The negative correlation between infant mortality and life expectancy isprobably an example of common underlying cause. Both variables respond to anunderlying variable that we might call quality of health care. In countries where healthcare is better in general, infant mortality is lower and life expectancy is higher.

Now try Exercises 27–28. ➽

POSSIBLE EXPLANATIONS FOR A CORRELATION

1. The correlation may be a coincidence.2. Both variables might be directly influenced by some common underlying cause.3. One of the correlated variables may actually be a cause of the other. Note that,

even in this case, we may have identified only one of several causes.

benn.8206.05.pgs 12/15/06 8:23 AM Page 386

Page 67: Chapter 5

5E Correlation and Causality 387

❉EXAMPLE 4 How to Get Rich in the Stock Market (Maybe)Every financial advisor has a strategy for predicting the direction of the stock market.Most focus on fundamental economic data, such as interest rates and corporate prof-its. But an alternative strategy relies on a remarkable correlation between the SuperBowl winner in January and the direction of the stock market for the rest of the year:The stock market tends to rise when a team from the old, pre-1970 NFL wins theSuper Bowl, and tends to fall otherwise. This correlation successfully matched 28 ofthe first 32 Super Bowls to the stock market. Suppose that the Super Bowl just endedand the winner was the Detroit Lions, an old NFL team. Should you invest all yourspare cash (and maybe even some that you borrow) in the stock market?

SOLUTION Based on the reported correlation, you might be tempted to invest, sincethe old-NFL winner suggests a rising stock market over the rest of the year. However,this investment would make sense only if you believed that the Super Bowl resultactually causes the stock market to move in a particular direction. This belief is clearlypreposterous, and the correlation is undoubtedly a coincidence. If you are going toinvest, don’t base your investment on this correlation. Now try Exercises 29–34.

Establishing CausalitySuppose you have discovered a correlation and suspect causality. How can you testyour suspicion? Let’s return to the issue of smoking and lung cancer. The strong cor-relation between smoking and lung cancer did not by itself prove that smoking causeslung cancer. In principle, we could have looked for proof with a controlled experi-ment. But such an experiment would be unethical, since it would require forcing agroup of randomly selected people to smoke cigarettes. So how was smoking estab-lished as a cause of lung cancer?

The answer involves several lines of evidence. First, researchers found correlationsbetween smoking and lung cancer among many groups of people: women, men, andpeople of different races and cultures. Second, among groups of people that seemedotherwise identical, lung cancer was found to be rarer in nonsmokers. Third, peoplewho smoked more and for longer periods of time were found to have higher rates oflung cancer. Fourth, when researchers accounted for other potential causes of lungcancer (such as exposure to radon gas or asbestos), they found that almost all theremaining lung cancer cases occurred among smokers.

These four lines of evidence made a strong case, but still did not rule out the possi-bility that some other factor, such as genetics, predisposes people both to smokingand to lung cancer. However, two additional lines of evidence made this possibilityhighly unlikely. One line of evidence came from animal experiments. In controlledexperiments, animals were divided into randomly chosen treatment and controlgroups. The experiments still found a correlation between inhalation of cigarettesmoke and lung cancer, which seems to rule out a genetic factor, at least in the ani-mals. The final line of evidence came from biologists studying cell cultures (that is,small samples of human lung tissue). The biologists discovered the basic process bywhich ingredients in cigarette smoke can create cancer-causing mutations. Thisprocess does not appear to depend in any way on specific genetic factors, making it allbut certain that lung cancer is caused by smoking and not by any preexisting geneticfactor.

By the WayThe Super Bowl Indicatorwent into a slump afterSuper Bowl 32, correctlypredicting the stockmarket’s direction inonly one of the nextseven years.

The truth is rarely pureand never simple.

—OSCAR WILDE

benn.8206.05.pgs 12/15/06 8:23 AM Page 387

Page 68: Chapter 5

388 CHAPTER 5 Statistical Reasoning

CASE STUDY Air Bags and ChildrenBy the mid-1990s, passenger-side air bags had become commonplace in cars. Statisti-cal studies showed that the air bags saved many lives in moderate- to high-speed colli-sions. But a disturbing pattern also appeared. In at least some cases, young children,especially infants and toddlers in child car seats, were killed by air bags in low-speedcollisions.

At first, many safety advocates found it difficult to believe that air bags could be thecause of the deaths. But the observational evidence became stronger, meeting the firstfour guidelines for establishing causality. For example, the greater risk to infants inchild car seats fit Guideline 3, because it indicated that being closer to the air bagsincreased the risk of death. (A child car seat sits on top of the built-in seat, therebyputting a child closer to the air bags than the child would be otherwise.)

To seal the case, safety experts undertook experiments using dummies. They foundthat children, because of their small size, often sit where they could be easily hurt bythe explosive opening of an air bag. The experiments also showed that an air bagcould impact a child car seat hard enough to cause death, thereby revealing the physi-cal mechanism by which the deaths occurred.

By the WayBased on these studies,the government nowrecommends that childcar seats never be usedon the front seat, andthat children under age12 sit in the back seat ifpossible.

GUIDELINES FOR ESTABLISHING CAUSALITY

To investigate whether a suspected cause actually causes an effect:

1. Look for situations in which the effect is correlated with the suspected causeeven while other factors vary.

2. Among groups that differ only in the presence or absence of the suspectedcause, check that the effect is similarly present or absent.

3. Look for evidence that larger amounts of the suspected cause produce largeramounts of the effect.

4. If the effect might be produced by other potential causes (besides the suspectedcause), make sure that the effect still remains after accounting for these otherpotential causes.

5. If possible, test the suspected cause with an experiment. If the experiment can-not be performed with humans for ethical reasons, consider doing the experi-ment with animals, cell cultures, or computer models.

6. Try to determine the physical mechanism by which the suspected cause pro-duces the effect.

Time out to thinkThere’s a great deal of controversy concerning whether animal experiments areethical. What is your opinion of animal experiments? Defend your opinion.

The following box summarizes these ideas about establishing causality. Generallyspeaking, the case for causality is stronger when more of these guidelines are met.By the Way

The first four guidelinesfor establishing causalityare called Mill’s meth-ods, after the Englishphilosopher and econo-mist John Stuart Mill(1806–1873). Mill was aleading scholar of histime and an early advo-cate of women’s right tovote.

benn.8206.05.pgs 12/15/06 8:23 AM Page 388

Page 69: Chapter 5

5E Correlation and Causality 389

CASE STUDY What Is Causing Global Warming?Statistical measurements show that the global average temperature—the average tem-perature everywhere on Earth’s surface—has risen about 1.5°F in the past century,with more than half of this warming occurring in just the past 30 years. But what iscausing this so-called global warming?

Scientists have for decades suspected that the temperature rise is tied to an increasein the atmospheric concentration of carbon dioxide and other greenhouse gases. Com-parative studies of Earth and other planets, particularly Venus and Mars, show thatthe greenhouse gas concentration is the single most important factor in determining aplanet’s average temperature. It is even more important than distance from the Sun.For example, Venus, which is about 30% closer than Earth to the Sun, would be onlyabout 45°F warmer than Earth if it had an Earth-like atmosphere. But because Venushas a thick atmosphere made almost entirely of carbon dioxide, its actual surface tem-perature is about 880°F—hot enough to melt lead. The reason greenhouse gasescause warming is that they slow the escape of heat from a planet’s surface, therebyraising the surface temperature.

In other words, the physical mechanism by which greenhouse gases cause warmingis well understood (satisfying Guideline 6 on our list), and there is no doubt that alarge rise in carbon dioxide concentration would eventually cause Earth to becomemuch warmer. Nevertheless, as you’ve surely heard, many people have questionedwhether the current period of global warming really is due to humans or whether itmight be due to natural variations in the carbon dioxide concentration or other natu-ral factors.

In an attempt to answer these questions, the United States and other nations havedevoted billions of dollars over the past two decades to an unprecedented effort tounderstand Earth’s climate. We still have much more to learn, but the research to datemakes a strong case for human input of greenhouse gases as the cause of global warm-ing. Two lines of evidence make the case particularly strong.

The first line of evidence comes from careful measurements of past and presentcarbon dioxide concentrations in Earth’s atmosphere. Figure 5.45 shows the data.Notice that past changes in the carbon dioxide concentration correlate clearly withtemperature changes, confirming that we should expect a rising greenhouse gas con-centration to cause rising temperatures. Moreover, while the past data show that thecarbon dioxide concentration does indeed vary naturally, it also shows that the recentrise is much greater than any natural increase during the past several hundred thou-sand years. Human activity is the only viable explanation for the huge recent increasein carbon dioxide concentration.

The second line of evidence comes from experiments. We cannot perform con-trolled experiments with our entire planet, but we can run experiments with computermodels that simulate the way Earth’s climate works. Earth’s climate is incredibly com-plex, and many uncertainties remain in attempts to model the climate on computers.However, today’s models are the result of decades of work and refinement. Each timea model of the past failed to match real data, scientists sought to understand the miss-ing (or incorrect) ingredients in the model and then tried again with improved mod-els. Today’s models are not perfect, but they match real climate data quite well, givingscientists confidence that the models have predictive value. Figure 5.46 compares

By the WayCarbon dioxide andother greenhouse gasesare present naturally inEarth’s atmosphere,which is a good thing.Without them, Earth’saverage temperaturewould be a frigid with them, the globalaverage temperature isabout 59°F. From this per-spective, the problemwith global warming isthat human input of car-bon dioxide and othergreenhouse gases intoour atmosphere is rap-idly causing our planetto have too much of agood thing.

210ºF;

By the WayGlobal warming is amajor issue becausecomputer models sug-gest it will have severeconsequences. Amongthe predicted conse-quences are anincrease in the strengthand frequency of hurri-canes and other severestorms, a rise in sea leveldue to both heating ofthe oceans and meltingof glacial ice, and majorchanges to localweather patternsaround the world.

benn.8206.05.pgs 12/15/06 8:23 AM Page 389

Page 70: Chapter 5

390 CHAPTER 5 Statistical Reasoning

model data and real data, showing good agreement and clearly suggesting that humanactivity is the cause of global warming. If you include the effects of the greenhousegases put into the atmosphere by humans, the models agree with the data, but if youleave out these effects, the models fail.

1.0

–1.0

0.0

Year

Cha

nge

(com

pare

d to

pas

tav

erag

e gl

obal

tem

pera

ture

) (˚C

)

1850 1900 1950 2000

–0.5

0.5

Observations show a clear risein average global temperatures(red line) . . .

. . . agreeing with models(green swath) that includeeffects of greenhouse gasesreleased by humans.

FIGURE 5.46 This graph compares the predictions ofvarious climate models (green swath) with observed tem-perature changes (red line) since about 1860. The agree-ment is not perfect—telling us we still have much tolearn—but it is good enough to give us confidence thatgreenhouse gases are indeed causing global warming.

200150

100,000300,000400,000 200,000

Years ago

1750

today

0

350400

CO

2 (p

pm)

300

300

320

340

360

380

250

Tem

pera

ture

cha

nge

(˚C)

(rel

ativ

e to

pas

t mill

enni

um)

–8–10

–6–4–20246

2000 20101990198019701960

Year

CO

2 (p

pm)

Periods of higher CO2 concentration coincide with times of higher global average temperature.

Human use of fossil fuels has raised CO2 levels above all peaks occurring in the past 400,000 years.

FIGURE 5.45 The atmospheric concentration of carbon dioxide and global average tempera-ture over the past 400,000 years.The recent data (right) represent direct meas-urements (at Mauna Loa,Hawaii); the past data come from studies of air bubblestrapped in Antarctic ice.The concentration is measured in parts per million (ppm).

CO2

benn.8206.05.pgs 12/15/06 8:23 AM Page 390

Page 71: Chapter 5

5E Correlation and Causality 391

Time out to thinkCheck the idea that human activity causes global warming against each of the sixguidelines for establishing causality.

Confidence in CausalityIf human activity is causing global warming, we’d be wise to change our activities so asto stop it. But while we have good reason to think that this is the case, not everyone isyet convinced. Moreover, the changes needed to slow global warming might be veryexpensive. How do we decide when we’ve reached the point where something likeglobal warming requires steps to address it?

In an ideal world, we would continue to study the issue until we could establishfor certain that human activity is the cause of global warming. However, we haveseen that it is difficult to establish causality and often impossible to prove causalitybeyond all doubt. We are therefore forced to make decisions about global warming,and many other important issues, despite remaining uncertainty about cause andeffect.

In other areas of mathematics, accepted techniques help us deal with uncertaintyby allowing us to calculate numerical measures of possible errors. But there are noaccepted ways to assign such numbers to the uncertainty that comes with questions ofcausality. Fortunately, another area of study has dealt with practical problems ofcausality for hundreds of years: our legal system. You may be familiar with the follow-ing three broad ways of expressing a legal level of confidence.

By the WayFor criminal trials, theSupreme Courtendorsed this guidancefrom Justice Ginsburg:“Proof beyond a reason-able doubt is proof thatleaves you firmly con-vinced of the defen-dant’s guilt. There arevery few things in thisworld that we know withabsolute certainty, andin criminal cases the lawdoes not require proofthat overcomes everypossible doubt. If, basedon your consideration ofthe evidence, you arefirmly convinced thatthe defendant is guiltyof the crime charged,you must find him guilty.If on the other hand, youthink there is a real possi-bility that he is not guilty,you must give him thebenefit of the doubtand find him not guilty.”

BROAD LEVELS OF CONFIDENCE IN CAUSALITY

Possible cause: We have discovered a correlation, but cannot yet determinewhether the correlation implies causality. In the legal system, possible cause (suchas thinking that a particular suspect possibly caused a particular crime) is often thereason for starting an investigation.

Probable cause: We have good reason to suspect that the correlation involvescause, perhaps because some of the guidelines for establishing causality are satis-fied. In the legal system, probable cause is the general standard for getting a judgeto grant a warrant for a search or wiretap.

Cause beyond reasonable doubt: We have found a physical model that is so suc-cessful in explaining how one thing causes another that it seems unreasonable todoubt the causality. In the legal system, cause beyond reasonable doubt is the usualstandard for conviction. It generally demands that the prosecution show how andwhy (essentially the physical model) the suspect committed the crime. Note thatbeyond reasonable doubt does not mean beyond all doubt.

benn.8206.05.pgs 12/15/06 8:23 AM Page 391

Page 72: Chapter 5

392 CHAPTER 5 Statistical Reasoning

EXERCISES 5E

QUICK QUIZChoose the best answer to each of the following questions.Explain your reasoning with one or more complete sentences.

1. If X is correlated with Y,

a. X causes Y.

b. increasing values of X go with increasing values of Y.

c. increasing values of X go with either increasing ordecreasing values of Y.

2. Consider Figure 5.42. According to this diagram, lifeexpectancy in Russia is about

a. 22 years. b. 63 years. c. 58 years.

3. If the points on a scatter diagram fall on a nearly straightline sloping upward, the two variables have

a. a strong positive correlation.

b. a weak negative correlation.

c. no correlation.

4. If the points on a scatter diagram fall into a broad swaththat slopes downward, the two variables have

a. a strong positive correlation.

b. a weak negative correlation.

c. no correlation.

5. When can you rule out the possibility that changes to vari-able X cause changes to variable Y?

a. when there is no correlation between X and Y

b. when there is a negative correlation between X and Y

c. when a scatter diagram of the two variables shows pointslying in a straight line

6. What type of correlation would you expect between wagesand the unemployment rate?

a. none

b. positive: higher wages would go with higherunemployment

c. negative: higher wages would go with lower unemployment

7. You have found a higher rate of birth defects among babiesborn to women exposed to second-hand smoke. To supporta claim that the second-hand smoke caused the birthdefects, what else should you expect to find?

a. evidence that higher rates of defects are correlated withexposure to greater amounts of smoke

b. evidence that these types of birth defects occur only inbabies whose mothers were exposed to smoke, and neverto any other babies

c. evidence that the types of birth defects in these babiesare more debilitating than other types of birth defects

8. Consider Figure 5.45. According to this graph, how doesthe concentration today compare to the highest concentrations during the 400,000 years before humansbegan industry?

a. The values are about the same.

b. Today’s value is about 10% higher.

c. Today’s value is about 30% higher.

9. Based on the trend shown in Figure 5.45, predict the concentration in the year 2040.

a. 390 ppm b. 420 ppm c. 600 ppm

CO2

CO2CO2

While these broad levels remain fairly vague, they give us at least some commonlanguage for discussing confidence in causality. If you study law, you will learn muchmore about the subtleties of interpreting these terms. However, because statistics haslittle to say about them, we will not discuss them much further in this book.

Time out to thinkGiven what you know about global warming, do you think that human activity is apossible cause, probable cause, or cause beyond reasonable doubt? Defend youropinion. Based on your level of confidence in the causality, how would you recom-mend setting policies with regard to global warming?

benn.8206.05.pgs 12/15/06 8:23 AM Page 392

Page 73: Chapter 5

5E Correlation and Causality 393

10. A jury finding that a person is guilty “beyond reasonabledoubt” is supposed to mean that

a. the person is definitely guilty.

b. the 12 members of the jury each felt that there was morethan a 50% chance that the person was guilty.

c. any reasonable person would conclude that the evidencewas sufficient to establish guilt.

REVIEW QUESTIONS11. What is a correlation? Give three examples of pairs of vari-

ables that are correlated.

12. What is a scatter diagram, and how do you make one? Howcan we use a scatter diagram to look for a correlation?

13. Define and distinguish among positive correlation, nega-tive correlation, and no correlation. How do we determinethe strength of a correlation?

14. Describe the three general categories of explanation for acorrelation. Give an example of each.

15. Briefly describe each of the six guidelines presented in thisunit for establishing causality. Give an example of theapplication of each guideline.

16. Briefly describe three levels of confidence in causality andhow they can be useful when we do not have absoluteproof of causality.

DOES IT MAKE SENSE?Decide whether each of the following statements makes sense(or is clearly true) or does not make sense (or is clearly false).Explain your reasoning.

17. There is a strong negative correlation between the priceof tickets and the number of tickets sold. This suggeststhat if we want to sell a lot of tickets, we should lower theprice.

18. There is a strong positive correlation between the amountof time spent studying and grades in mathematics classes.This suggests that if you want to get a good grade, youshould spend more time studying.

19. I found a nearly perfect positive correlation between vari-able A and variable B, and therefore was able to concludethat an increase in variable A causes an increase in vari-able B.

20. I found a nearly perfect negative correlation betweenvariable C and variable D, and therefore was able to con-clude that an increase in variable C causes a decrease invariable D.

21. I had originally suspected that an increase in variable Ewould cause a decrease in variable F, but I no longerbelieve this because I found no correlation between thetwo variables.

22. I agree that we should require kids to wear helmets if hel-mets really lower injury rates, but it makes no sense to startthis requirement until we have absolute proof that helmetscause the lower injury rate.

BASIC SKILLS & CONCEPTSInterpreting Scatter Diagrams. Exercises 23–26 each show ascatter diagram with its axes labeled. For each exercise, do thefollowing:

a. Indicate the variables for which we can seek a correlationwith this diagram.

b. State whether the diagram shows a positive correlation, anegative correlation, or no correlation. If there is a positiveor negative correlation, state whether it is strong or weak.

c. In words, summarize any conclusions you can draw fromthe diagram.

23.

10

15

20

25

30

35

1500 2500 3500 4500Weight of cars (pounds)

2004 Model Cars

City

gas

mile

age

(mi/

gal)

012345678

50 55 60 65 70Voter turnout (%)

U.S. Presidential Elections, 1964–2004

Une

mpl

oym

ent (

%)

24.

benn.8206.05.pgs 12/15/06 8:23 AM Page 393

Page 74: Chapter 5

394 CHAPTER 5 Statistical Reasoning

0

2

4

6

8

10

Salary level (dollars per year)

Employees of Big Co.

Per

cent

of i

ncom

e gi

ven

to c

hari

ty

$30,

000

$60,

000

$90,

000

$120

,000

$150

,000

$180

,000

$210

,000

$240

,000

$270

,000

50045040035030025020015010050

0

U.S. Farms 1950–2000

Number of farms (millions)

Ave

rage

siz

e (a

cres

)

0 1 2 3 4 5 6

25. FURTHER APPLICATIONSMaking Scatter Diagrams. Exercises 35–40 each give a table ofdata. In each case, do the following:

a. Make a scatter diagram for the data.

b. State whether the two variables appear to be correlated and,if so, whether the correlation is positive or negative andstrong or weak.

c. Suggest a reason for the correlation (or lack of correlation).If you suspect causality, briefly discuss what further evi-dence you would need to establish it.

35. Defense and Economy. The table below gives the percapita gross national product and the per capita expendi-ture on defense for eight developed countries. Grossnational product (GNP) is a measure of the total economicoutput of a country in monetary terms. Per capita GNP isthe GNP averaged over every person in the country.

Per capita Per capitaCountry GNP ($) defense ($)

Australia 26,900 350

France 31,000 553

Germany 30,120 328

Israel 17,380 1673

Japan 37,180 310

Norway 52,000 659

United Kingdom 33,940 583

United States 41,400 1128

36. The following table gives number of home runs and bat-ting average for baseball’s Most Valuable Players,1996–2005 League and

Home BattingPlayer runs average

Ken Caminiti (1996 NL) 40 .326

Juan Gonzalez (1996 AL) 47 .314

Larry Walker (1997 NL) 49 .366

Ken Griffey Jr. (1997 AL) 56 .304

Sammy Sosa (1998 NL) 66 .308

Juan Gonzalez (1998 AL) 45 .318

Chipper Jones (1999 NL) 45 .319

AL 5 American League B .ANL 5 National

26.

Types of Correlation. Exercises 27–34 list pairs of variables.State the units you would use to measure each of the two vari-ables (for example, pounds, years, or miles per hour). Then statewhether you believe the two variables are correlated. If youbelieve they are correlated, state whether the correlation is posi-tive or negative and strong or weak. Explain your reasoning.

27. Latitude north of the equator and average high tempera-ture in June

28. Height of individual and amount of pocket change

29. Age and time spent daily on cell phone

30. Altitude on a mountain hike and air pressure

31. Population of a state and average salary of public schoolteachers

32. Population of a state and percentage of foreign-born residents

33. Fertility rate of women and life expectancy in the country

34. Family income of public school students and experience ofteacher (continued)

benn.8206.05.pgs 12/15/06 8:23 AM Page 394

Page 75: Chapter 5

5E Correlation and Causality 395

Ivan Rodriguez (1999 AL) 35 .332

Jeff Kent (2000 NL) 33 .334

Jason Giambi (2000 AL) 43 .333

Barry Bonds (2001 NL) 73 .328

Ichiro Suzuki (2001 AL) 8 .350

Barry Bonds (2002 NL) 46 .370

Miguel Tejada (2002 AL) 34 .308

Barry Bonds (2003 NL) 45 .341

Alex Rodriguez (2003 AL) 47 .298

Barry Bonds (2004 NL) 45 .362

Vladimir Guerrero (2004 AL) 39 .337

Albert Pujols (2005 NL) 41 .330

Alex Rodriguez (2005 AL) 48 .321

37. The following table gives per capita personal income andpercent of the population below the poverty level for tenstates in 2004.

Per capita Percent ofpersonal population below

State income (dollars) poverty level

California 35,019 13.1

Colorado 36,063 9.7

Illinois 34,351 12.6

Iowa 30,560 8.9

Minnesota 35,861 7.4

Montana 26,857 15.1

Nevada 33,405 10.9

New Hampshire 37,040 5.8

Utah 26,606 9.1

West Virginia 25,872 17.4

Source: U.S. Census Bureau; U.S. Bureau of Economic Analysis.

38. The following table gives the average hours of televisionwatched in households in five categories of annual income.(Hint: For the first and last categories of the householdincome data, place the dot at the position corresponding to$25,000 and $65,000, respectively. For other categories,place the dot at the center of each bin.)

Household income Weekly TV hours

Less than $30,000 56.3

$30,000–$40,000 51.0

$40,000–$50,000 50.5

$50,000–$60,000 49.7

More than $60,000 48.7

Source: Nielsen Media Research.

39. The following table gives the average teacher salary andthe expenditure on public education per pupil for ten statesin 2004.

Average teacher Per pupilState salary (dollars) expenditure (dollars)

Alabama 38,325 6701

Alaska 51,736 9808

Arizona 41,843 5474

Connecticut 57,337 11,774

Massachusetts 53,181 10,772

North Dakota 35,441 6683

Oregon 49,169 7587

Texas 40,476 7168

Utah 38,976 5245

Wyoming 39,532 9673

Source: National Education Association.

40. The following table gives mean daily Caloric intake (allresidents) and infant mortality rate (per 1000 births) forten countries.

Mean daily Infant mortality rateCountry Calories (per 1000 births)

Afghanistan 1523 154

Austria 3495 6

Burundi 1941 114

Colombia 2678 24

Ethiopia 1610 107

Germany 3443 6

Liberia 1640 153

New Zealand 3362 7

Turkey 3429 44

United States 3671 7

benn.8206.05.pgs 12/15/06 8:23 AM Page 395

Page 76: Chapter 5

396 CHAPTER 5 Statistical Reasoning

Correlation and Causality. Exercises 41–46 make statementsabout a correlation. In each case, state the correlation clearly(for example, there is a positive correlation between variable Aand variable B). Then state whether the correlation is mostlikely due to coincidence, a common underlying cause, or adirect cause. Explain your answer.

41. In a large resort city, the crime rate increased at the sametime that the number of tourists increased.

42. Over the past three decades, the number of miles of free-ways in Los Angeles has grown, and traffic congestion hasworsened.

43. When gasoline prices rise, sales of sport utility vehiclesdecline.

44. Sales of ice cream in a local restaurant are positively corre-lated with sales of swimming suits at a local store.

45. Automobile gas mileage decreases with tire pressure.

46. Over a period of twenty years, the number of ministers andpriests in a city increased, as did attendance at movies.

47. Identifying Causes: Headaches. You are trying to iden-tify the cause of late-afternoon headaches that plague youseveral days each week. For each of the following tests andobservations, explain which of the six guidelines for estab-lishing causality you used and what you concluded.

• The headaches occur only on days that you go to work.

• If you stop drinking Coke at lunch on days you go towork, the headaches persist.

• In the summer, the headaches occur less frequently ifyou open the windows of your office slightly. Theyoccur even less often if you open the windows of youroffice fully.

Having made all these observations, what reasonable con-clusion can you reach about the cause of the headaches?

48. Smoking and Lung Cancer. There is a strong correla-tion between tobacco smoking and incidence of lung can-cer, and most physicians believe that tobacco smokingcauses lung cancer. Yet, not everyone who smokes getslung cancer. Briefly describe how smoking could causecancer when not all smokers get cancer.

49. Longevity of Orchestra Conductors. A famous study inForum on Medicine (1978) concluded that the mean lifetimeof conductors of major orchestras was 73.4 years, about5 years longer than that of all American males at the time.The author claimed that a life of music causes a longer life.Evaluate the claim of causality and propose other explana-tions for the longer life expectancy of conductors.

50. High-Voltage Power Lines. Suppose that people livingnear a particular high-voltage power line have a higherincidence of cancer than people living farther from thepower line. Can you conclude that the high-voltage powerline is the cause of the elevated cancer rate? If not, whatother explanations might there be for it? What other typesof research would you like to see before you conclude thathigh-voltage power lines cause cancer?

51. Soccer and Birthdays. A recent study revealed that thebest soccer players in the world tend to have birthdays inthe earlier months of the year. Is this a coincidence or canyou find a plausible explanation?

WEB PROJECTSFind useful links for Web Projects on the text Web site:www.aw.com/bennett-briggs

52. Success in the NFL. Use the Web to find last season’sNFL team statistics. Make a table showing the followingfor each team: number of wins, average yards gained onoffense per game, and average yards allowed on defenseper game. Make scatter diagrams to explore the correla-tions between offense and wins and between defense andwins. Discuss your findings. Do you think that there areother team statistics that would yield stronger correlationswith the number of wins?

53. Statistical Abstract. Explore the “frequently requestedtables” at the Web site for the Statistical Abstract of theUnited States. Choose data that are of interest to you andexplore at least two correlations. Briefly discuss what youlearn from the correlations.

54. Air Bags and Children. Starting from the Web site of theNational Highway Traffic Safety Administration, researchthe latest studies on the safety of air bags, especially withregard to children. Write a short report summarizing yourfindings and offering recommendations for improvingchild safety in cars.

55. Global Warming. Use the Web to find recent informa-tion about global warming and its potential consequences.Discuss the evidence linking human activity to globalwarming. In light of your findings, suggest how we shoulddeal with the issue of global warming.

56. Tobacco Lawsuits. Tobacco companies have been thesubject of many lawsuits relating to the dangers of smok-ing. Research one recent lawsuit. What were the plaintiffstrying to prove? What statistical evidence did they use?How well do you think they established causality? Did theywin? Summarize your findings in one to two pages.

benn.8206.05.pgs 9/29/07 11:53 AM Page 396

Page 77: Chapter 5

Chapter 5 Summary 397

IN THE NEWS57. Correlations in the News. Find a recent news report

that describes some type of correlation. Describe the cor-relation. Does the article give any sense of the strength ofthe correlation? Does it suggest that the correlationreflects any underlying causality? Briefly discuss whetheryou believe the implications the article makes with respectto the correlation.

58. Causation in the News. Find a recent news report inwhich a statistical study has led to a conclusion of causa-

tion. Describe the study and the claimed causation. Do youthink the claim of causation is legitimate? Explain.

59. Legal Causation. Find a news report concerning anongoing legal case, either civil or criminal, in which estab-lishing causality is important to the outcome. Brieflydescribe the issue of causation in the case and how theability to establish or refute causality will influence theoutcome of the case.

CHAPTER 5 SUMMARY

UNIT KEY TERMS KEY IDEAS AND SKILLS

5A

5B

statisticsis a scienceare data

population, samplepopulation parameters,

sample statisticsbiasobservational study

case-control studyexperiment

placebo,placebo effect

blindingsingle-blinddouble-blind

margin of errorconfidence interval

selection biasparticipation biasvariable (in a statistical

study)

Understand and interpret the five basic steps in a statistical study.Understand the importance of a representative sample.Be familiar with four common sampling methods:

simple random samplingsystematic samplingconvenience samplingstratified sampling

Distinguish between observational studies and experiments; also recognize observational case-control studies.

Understand the placebo effect and the importance of blinding in experiments.

Find a confidence interval from a margin of error:from (sample statistic margin of error) to (sample statistic margin of error)

Understand and apply eight guidelines for evaluating a statistical study.

1

2

(Continues on the next page)

benn.8206.05.pgs 12/15/06 8:23 AM Page 397

Page 78: Chapter 5

398 CHAPTER 5 Statistical Reasoning

5C

5D

5E

frequency tablecategoriesfrequencyrelative frequencycumulative frequency

data typesqualitativequantitative

bar chartpie charthistogramline charttime-series diagram

multiple bar graphstack plotgeographical datacontour map

correlationcausescatter diagram

Interpret and create frequency tables.Interpret and create bar graphs and pie charts.Interpret and create histograms and line charts.

Interpret multiple bar graphs, stack plots, contour maps, and othermedia graphs.

Distinguish between true three-dimensional data and graphs that have a three-dimensional look for cosmetic reasons only.

Be aware of common cautions about graphs.

Distinguish between correlation and causality.Create and interpret scatter diagrams and use them to identify

correlations:positive, negative, or no correlationstrength of correlation

Know three possible explanations for a correlation:coincidencecommon underlying causetrue cause

Understand and apply six guidelines for establishing causality.

benn.8206.05.pgs 12/15/06 8:23 AM Page 398