20
Relations and Relations and Categorical Data Categorical Data Target Goal: Target Goal: I can describe relationships I can describe relationships among categorical data using among categorical data using two way tables. two way tables. 1.1 cont. 1.1 cont. Hw: Hw: pg 24: 20, 21, 23, 26, 27 pg 24: 20, 21, 23, 26, 27 - 32 - 32

Relations and Categorical Data Target Goal: I can describe relationships among categorical data using two way tables. 1.1 cont. Hw: pg 24: 20, 21, 23,

Embed Size (px)

Citation preview

Relations and Categorical Relations and Categorical Data Data

Target Goal:Target Goal:I can describe relationships among I can describe relationships among

categorical data using two way categorical data using two way tables.tables.

1.1 cont.1.1 cont.Hw:Hw: pg 24: 20, 21, 23, 26, 27 - 32 pg 24: 20, 21, 23, 26, 27 - 32

Now we will look at describing Now we will look at describing relationships between two or more relationships between two or more categorical variables.categorical variables.

• Ex. Gender, race, occupationEx. Gender, race, occupation

• To analyze categorical data, To analyze categorical data, use use counts or percentscounts or percents of individuals of individuals that fall into various categories.that fall into various categories.

Example : Education and AgeExample : Education and Age

• Table presents Census Bureau data on Table presents Census Bureau data on the years of school completed by the years of school completed by Americans at different ages.Americans at different ages.

Two Way Table: Describes Two Two Way Table: Describes Two

Categorical VariablesCategorical Variables

• Row and column variable: least to mostRow and column variable: least to most• Marginal distributions:Marginal distributions: totals that appear totals that appear

at the right and bottom margins for each at the right and bottom margins for each individual variable.individual variable.

• Round off error: Round off error: There is round off error There is round off error depending on groupings. depending on groupings.

PercentsPercents

• To describe relationships among To describe relationships among categorical variables, calculate the categorical variables, calculate the appropriate percents from the appropriate percents from the counts given.counts given.

• Percents:Percents: are often more are often more informative than counts.informative than counts.

The percent of people The percent of people 25 years of age or 25 years of age or olderolder that have that have at least 4 years of college is:at least 4 years of college is:

44,84525.6%

175,230

= table total

total with four years of college

• TipTip for deciding on what fraction for deciding on what fraction gives the percent you want:gives the percent you want:

• Ask, Ask, “What group “What group represents the represents the totaltotal that I want a percent of?”that I want a percent of?”

• Can be tricky!Can be tricky!

Exercise:Exercise: PercentsPercents

• Give the marginal distribution of Give the marginal distribution of ageage among among people 25 years or older in percentspeople 25 years or older in percents, starting , starting from the counts in table 4.6.from the counts in table 4.6.

• Which totals do we use?Which totals do we use?

Find each one and the total:Find each one and the total:

• Age 25 to 34: Age 25 to 34: 37,786/175,23037,786/175,230 = = 21.6%21.6%• Age 35 to 54: Age 35 to 54: • Age 55+:Age 55+:• Total = 100.1% due to roundingTotal = 100.1% due to rounding

46.5%46.5%

32.0%32.0%

Exercise: Using Percents to Exercise: Using Percents to Make Bar GraphMake Bar Graph

• Using the counts in table 4.6, find the Using the counts in table 4.6, find the percent of people percent of people in each age group in each age group who did not complete high school.who did not complete high school.

Percent of people in each age group who did not complete high school.

• age 25 to 34:age 25 to 34:

• age 35 to 54:age 35 to 54:

age 55+: age 55+:

= = 11.8%11.8%4,474/4,474/37,7837,786611.2%11.2%

25.4%25.4%

• Draw a bar graph that compares these Draw a bar graph that compares these percents. State briefly what the data percents. State briefly what the data show. (3 min)show. (3 min)

• Conclusion:Conclusion:• The percentage of people who did not The percentage of people who did not

finish high school is about the same for finish high school is about the same for the 25 - 34 and the 35 – 54 age groups the 25 - 34 and the 35 – 54 age groups 11.8 and 11.2 % respectively.11.8 and 11.2 % respectively.

• But, the percentage almost doubles to But, the percentage almost doubles to 25.4% for the 55 and over age group.25.4% for the 55 and over age group.

Marginal distributionMarginal distribution: compare : compare each variable separately.each variable separately.(Denominator is the grand total.)(Denominator is the grand total.)Conditional distributionConditional distribution: refers : refers to only “people” who to only “people” who satisfy a satisfy a certain condition certain condition (age 25-34). (age 25-34).

• Look only at column (or row).Look only at column (or row).• Column (or row) total is the Column (or row) total is the

denominator.denominator.

Result: Result: comparingcomparing conditional distributions conditional distributions of of “education” in different “age groups”“education” in different “age groups” describes the describes the association association between age and education.between age and education.

• Bar graphs to compare the education levels Bar graphs to compare the education levels of three age groups.of three age groups.

• Each graph compares the percents of three Each graph compares the percents of three groups who fall in one of the four education groups who fall in one of the four education levels.levels.

Young adults by gender and chance of getting rich

Female Male Total

Almost no chance 96 98 194

Some chance, but probably not 426 286 712

A 50-50 chance 696 720 1416

A good chance 663 758 1421

Almost certain 486 597 1083

Total 2367 2459 4826

An

aly

zing C

ate

gorica

l Data

An

aly

zing C

ate

gorica

l Data

• Two-Way Tables and Marginal Two-Way Tables and Marginal DistributionsDistributions

Response Percent

Almost no chance 194/4826 = 4.0%

Some chance 712/4826 = 14.8%

A 50-50 chance 1416/4826 = 29.3%

A good chance 1421/4826 = 29.4%

Almost certain 1083/4826 = 22.4%

Example, p. 13Example, p. 13

Examine the marginal distribution of chance of getting rich.

05

101520253035

Almost none

Some chance

50-50 chance

Good chance

Almost certain

Perc

ent

Survey Response

Chance of being wealthy by age 30

Young adults by gender and chance of getting rich

Female Male Total

Almost no chance 96 98 194

Some chance, but probably not 426 286 712

A 50-50 chance 696 720 1416

A good chance 663 758 1421

Almost certain 486 597 1083

Total 2367 2459 4826

An

aly

zing C

ate

gorica

l Data

An

aly

zing C

ate

gorica

l Data

• Two-Way Tables and Conditional DistributionsTwo-Way Tables and Conditional Distributions

Response Male

Almost no chance 98/2459 = 4.0%

Some chance 286/2459 = 11.6%

A 50-50 chance 720/2459 = 29.3%

A good chance 758/2459 = 30.8%

Almost certain 597/2459 = 24.3%

Example, p. 15Example, p. 15

Calculate the conditional distribution of opinion among males.Examine the relationship between gender and opinion.

05

101520253035

Almost no chance

Some chance

50-50 chance

Good chance

Almost certain

Perc

ent

Opinion

Chance of being wealthy by age 30

Males

Female

96/2367 = 4.1%

426/2367 = 18.0%

696/2367 = 29.4%

663/2367 = 28.0%

486/2367 = 20.5%

05

101520253035

Almost no chance

Some chance

50-50 chance

Good chance

Almost certain

Perc

ent

Opinion

Chance of being wealthy by age 30

Males

Females0%

20%

40%

60%

80%

100%

Males Females

Perc

ent

Opinion

Chance of being wealthy by age 30

Almost certain

Good chance

50-50 chance

Some chance

Almost no chance

Describes the value of that variable among Describes the value of that variable among individuals who have a specific value of individuals who have a specific value of another variable.another variable.

The conditional dist of ______________ The conditional dist of ______________ among _____________. among _____________.

hw. 29) What percent of females thought hw. 29) What percent of females thought they were going to be married in the they were going to be married in the next ten years.next ten years. The conditional dist of _________________ The conditional dist of _________________ among ___________________________. among ___________________________.

You will be asked to express the Conditional Distribution You will be asked to express the Conditional Distribution

variablethe"What % of"

on marriageopinion

adolescents of a given gender.

An

aly

zing C

ate

gorica

l Data

An

aly

zing C

ate

gorica

l Data

Organizing a Statistical ProblemOrganizing a Statistical Problem

• As you learn more about statistics, you As you learn more about statistics, you will be asked to solve more complex will be asked to solve more complex problems.problems.

• Here is a four-step process you can followHere is a four-step process you can follow..

State: What’s the question that you’re trying to answer?

Plan: How will you go about answering the question? What statistical techniques does this problem call for?

Do: Make graphs and carry out needed calculations.

Conclude: Give your practical conclusion in the setting of the real-world problem.

See pg. 18 for an example. Hw question on 4 step process.

How to Organize a Statistical Problem: A Four-Step Process

Looking Ahead…Looking Ahead…

We’ll learn how to display quantitative data.DotplotsStemplotsHistograms

We’ll also learn how to describe and compare distributions of quantitative data.

In the next Section…