4
36 CHAPTER 3 Stories Categorical Data Tell CATEGORICAL DATA IN YOUR WORLD In 2008, Senator Barack Obama was the Democratic candidate for U.S. President and Alaska Gover- nor Sarah Palin was the Republican candidate for Vice President. Before this election, neither a woman nor an African-American had ever served in these offices. It is no surprise then that the inter- est in political polls was quite high in the days leading up to the election. Even McCain support say Obama, rather (han McCamwill ma " : "r" U'FrsntA-teivovri M*||3||p'l|p ?$SB3Bi8B» iST ""Is ,.^tti . .jS^TSMp ifiini^c|i^\ : ^^l^.-..':Pt >arna ' voters 7 expect that-Bai3ek;^^^p^^^^t^^^^^^^^g next^Ibesday'selection.\/"J . -'-: : _ " . _ r_--'-"-^B^L. ••' %cunpt|^^ Kip IT „-_-..«,-* - »~5#f«Jg WHO 1010 adults, age T8 and older Ml '•-•MB WHAT Opinions of who will -"' win election. --' * RegArdlesstif whdm-you support, and trying to ^b&jas&bjectivS:as possible, who do you think will ^^^^^^^MenM^elecubn in November? lifts WHERE/ Telephone interviews p, - «> - . WHEN/ conducted on October { HOW 23-26,2008 WHY To determine if voters jg£ think their chosen SHI illl * v candidate will win. http://www.galhjp.coft?/j msmmBBmammmm WHAT HAVE W£ ItftRNiD? S B We've learned that we can summarize categorical data by counting the number of cases in each category, sometimes expressing the resulting distribution as percents. We cart display the distribution in a bar chart or a pie chart. When we want to see how two categorical variables are related, we put the counts (and/or percentages) in a two-way table called a contingency table. We look at the marginal distribution of each variable (found in the margins of the table). • We also look at the conditional distribution of a variable within each category of the other variable.

CATEGORICAL DATA IN YOU R WORLD - Mr. Bourbois · PDF fileWHAT HAVE WE LEARNED? 37 Terms Frequency table (Relative frequenc y table) Distribution Area principle Bar char t (Relative

  • Upload
    vonhu

  • View
    218

  • Download
    2

Embed Size (px)

Citation preview

36 CHAPTER 3 Stories Categorical Data Tell

CATEGORICAL DATA IN YOUR WORLD

In 2008, Senator Barack Obama was the Democratic candidate for U.S. President and Alaska Gover-nor Sarah Palin was the Republican candidate for Vice President. Before this election, neither a woman nor an African-American had ever served in these offices. It is no surprise then that the inter-est in political polls was quite high in the days leading up to the election.

Even McCain support say Obama, rather (han McCamwill ma

":"r" U'FrsntA-teivovri

M*||3||p'l|p ?$SB3Bi8B»

iST

""Is

,.^t„ t i . .jS^TSMp

ifiini^c|i^\:^^l^.-..':Pt>arna' voters7

expect t h a t - B a i 3 e k ; ^ ^ ^ p ^ ^ ^ ^ t ^ ^ ^ ^ ^ ^ ^ ^ g next^Ibesday'selection.\/"J . -'-::_ " . _ r _ - - ' - " - ^ B ^ L . ••' % c u n p t | ^ ^

Kip IT „-_-..«,-* - »~5#f«Jg

WHO 1010 adults, age T8 and older Ml

''•-•MB WHAT Opinions of who will -"'

win election.

--' * RegArdlesstif whdm-you support, and trying to ^b&jas&bjectivS:as possible, who do you think will ^^^^^^^MenM^elecubn in November?

lifts WHERE/ Telephone interviews p, - «> - . WHEN/ conducted on October { HOW 23-26,2008 WHY To determine if voters jg£

think their chosen SHI i l l l *v

candidate will win.

http://www.galhjp.coft?/j

msmmBBmammmm WHAT HAVE W£ ItftRNiD?

— S B —

We've learned that we can summarize categorical data by counting the number of cases in each category, sometimes expressing the resulting distribution as percents. We cart display the distribution in a bar chart or a pie chart. When we want to see how two categorical variables are related, we put the counts (and/or percentages) in a two-way table called a contingency table. • We look at the marginal distribution of each variable (found in the margins of

the table). • We also look at the conditional distribution of a variable within each category of

the other variable.

WHAT HAVE WE LEARNED? 37

Terms Frequency table

(Relative frequency table) Distribution

Area principle

Bar chart (Relative frequency bar chart)

Pie chart

categorical data condition

Contingency table

Marginal distribution

Conditional distribution

independence

Association

Skills

SHOW

• We can display these conditional and marginal distributions by using bar charts or pie charts.

• If the conditional distributions of one variable are (roughly) the same for every category of the other, the variables are independent. If not, we say there's an association.

A frequency table lists the categories in a categorical variable and gives the count (or percentage) of observations for each category. The distribution of a variable gives • the possible values of the variable and • the relative frequency of each value. In a statistical display, each data value should be represented by the same amount of area. Bar charts show a bar whose area represents the count (or percentage) of obser-vations for each category of a categorical variable. Pie charts show how a "whole" divides into categories by showing a wedge of a cir-cle whose area corresponds to the proportion in each category. The methods in this chapter are appropriate for displaying and describing categor-ical data. Be careful not to use them with quantitative data. A contingency table displays counts and, sometimes, percentages of individuals falling into categories on two or more variables. The table categorizes the individu-als on all variables at once to reveal possible patterns in one variable that may depend on the category of the other. In a contingency table,- the distribution of either variable alone is called the mar-ginal distribution. The counts or percentages are the totals found in the margins (last row or column) of the table. A conditional distribution restricts the Who to consider only a smaller group of in-dividuals. Variables are said to be independent if the conditional distribution of one variable is roughly the same for each category of the other. When we see evidence that one variable depends on another, we say there's an association between them.

• Be able to recognize when a variable is categorical and choose an appropriate display for it.

>• Understand how to examine the association between categorical variables by comparing conditional and marginal percentages.

• Be able to summarize the distribution of a categorical variable with a frequency . table. • Be able to display the distribution of a categorical variable with a bar chart or

pie chart, • Know how to make and interpret a contingency table. • Know how to make and interpret displays of the conditional distributions of one

variable for two or more groups. • Be able to describe the distribution of a categorical variable in terms of its pos-

sible values and relative frequencies. • Be able to describe and discuss patterns and associations found in a contin-

gency table and displays of conditional distributions.

EXERCISES 4 1

16. Movies by genre and rating. Here's a table that classi-fies movies released in 2005 by genre and MPAA rating:

Action/Adventure Comedy Drama Thriller/Horror

Total

G

66.7 33.3

0 0

100%

PG

25 60.0 15.0

0

100%

PG-13

30.4 35.7 14.3 19.6

100%

R

23.7 10.5 44.7 21.1

100%

Total

29? 31-7 233 15.8

100%

19. Cars. A survey of autos parked in student and staff lots at a large university classified the brands by coun-try of origin, as seen in the table.

American European Asian

Driver

Student Staff

107 105 33 12 55 47

Is there an association between types of movies and their ratings? Explain.

17. GSobai warming. The Pew Research Center for the Peo-ple and the Press (http://people-press.org) has asked a representative sample of U.S. adults about global warming, repeating the question over time. In January 2007, the responses reflected an increased belief that global warming is real and due to human activity. Here's a display of the percentages of respondents choosing each of the major alternatives offered:

Due to Natural Due to Human pattems21% Activity 41%

No Solid Evidence 20%

Don't Know 10%

List the errors in this display. 18. Teens and technology. The Gallup organization

surveyed 744 teenagers, asking them what technologies they use every day. Here's a graph summarizing the responses. a) How is this graph misleading? b) Explain why we could not use a pie chart for these

data.

female Male

Total

I

35 50

85

M

36 44

80

C

6 21

27

Total

77 115

192

Use Every Day

a) What percent of all the cars purveyed were American?

b) What percent of the American cars were owned by students?

c) What percent of the students owned American cars? d) What is the marginal distribution of the drivers? e) What is the conditional distribution of driver for

American cars? 20. Politics revisited. Here again is the table summarizing

the political views of males and females in an Intro Stats class.

Politics

C/J

a) What percent of the class called themselves "Liberal"?

b) What percent of the females were liberals? c) What percent of the liberals were females? d) What's the marginal distribution of Sex? e) Write a sentence describing the conditional distri-

bution of Sex among the liberals. 21. Cars again. Look again at the table about the drivers

and origins of cars in Exercise 19. a) What's the marginal distribution of Origin? b) What's the conditional distribution of Origin for the

students' cars? c) What's the conditional distribution of Origin for the

cars driven by staff? d) Do you think the origin of the car is independent of

the type of driver? Explain. 22. Politics again. Have another look at the table showing

Stats students' political views in Exercise 20. a) What's the marginal distribution of Politics? b) What's the conditional distribution of Politics

among the males?

EXERCISES 43

might be at greater risk of bone fractures. Their obser-vations are summarized in the table:

Experienced fractures No fractures

Total

Ta^S^ i ; >NoS§fc !

14 244 123 4627

137 4871 ?

Total-

258 4750

5008>

Do these results suggest there's an association between taking SSRI antidepressants and experiencing bone fractures? Explain.

29. Magnet schools revisited. The Chance magazine arti-cle described in Exercise 5 further examined the impact of an applicant's ethnicity on the likelihood of admission to the Houston Independent School Districts magnet schools programs. Those data are summarized in the following table:

Admission Decision ['/•*

o Black/Hispanic Asian White Total

Accepted Wait-fisted Turned.away:

485 0 32 110 49 133 336 251 359 931 300 524

Total 5*•4s*s.2B

517 292 946

1755

32. Graphs in the news II. Find a pie chart of categorical data from a newspaper, a magazine, or the Internet. a) Is the graph clearly labeled? b) Does it violate the area principle? c) Does the accompanying article tell the W's of the

variable? d) Do you think the article correctly interprets the

data? Explain. 33. Tables in the news. Find a frequency table of categorical

data from a newspaper, a magazine, or the Internet. a) Is it clearly labeled? b) Does it display percentages or counts? c) Does the accompanying article tell the W's of the

variable? d) Do you think the article correctly interprets the

data? Explain. 34. Tables in the news II. Find a contingency table of categor-

ical data from a newspaper, a magazine, or the Internet. a) Is it clearly labeled? b) Does it display percentages or counts? c) Does the accompanying article tell the Ws of the

variables? d) Do you think the article correctly interprets the

data? Explain.

Does it appear that the admissions decisions are made independent of the applicant's ethnicity? Explain.

30. Tattoos. A study by the University of Texas Southwest-ern Medical Center examined 626 people to see if an increased risk of contracting hepatitis C was associated with having a tattoo. If the subject had a tattoo, researchers asked whether it had been done in a commercial tattoo parlor or elsewhere. Write a brief description of the association between tattooing and hepatitis C, including an appropriate graphical display.

Has hepatitis C No hepatitis C

Tattoo done in commercial Tattoo done parlor elsewhere No tattoo

17 8 18 35 53 495 -

31. Graphs in the news. Find a bar graph of categorical data from a newspaper, a magazine, or the Internet. a) Is the graph clearly labeled? b) Does it violate the area principle? c) Does the accompanying article tell the W's of the

variable? d) Do you think the article correctly interprets the

data? Explain.