A Mathematical View of Our World

Preview:

DESCRIPTION

A Mathematical View of Our World. 1 st ed. Parks, Musser, Trimpe, Maurer, and Maurer. Chapter 9. Collecting and Interpreting Data. Section 9.1 Populations, Samples, and Data. Goals Study populations and samples Study data Quantitative data Qualitative data Study bias - PowerPoint PPT Presentation

Citation preview

A Mathematical View A Mathematical View of Our Worldof Our World

11stst ed. ed.

Parks, Musser, Trimpe, Parks, Musser, Trimpe, Maurer, and MaurerMaurer, and Maurer

Chapter 9Chapter 9

Collecting and Collecting and Interpreting DataInterpreting Data

Section 9.1Section 9.1Populations, Samples, and DataPopulations, Samples, and Data

• GoalsGoals

• Study populations and samplesStudy populations and samples

• Study dataStudy data• Quantitative dataQuantitative data• Qualitative dataQualitative data

• Study biasStudy bias

• Study simple random samplingStudy simple random sampling

9.1 Initial Problem9.1 Initial Problem

• How can a professor choose 5 students How can a professor choose 5 students from among 25 volunteers in a fair way? from among 25 volunteers in a fair way? • The solution will be given at the end of the section.The solution will be given at the end of the section.

Populations and SamplesPopulations and Samples• The entire set of objects being studied is The entire set of objects being studied is

called the called the populationpopulation. . • A population can consist of:A population can consist of:

• People or animalsPeople or animals

• PlantsPlants

• Inanimate objectsInanimate objects

• EventsEvents

• The members of a population are called The members of a population are called elementselements..

Populations and Samples, cont’dPopulations and Samples, cont’d

• Any characteristic of elements of the Any characteristic of elements of the population is called a population is called a variablevariable..• When we collect information from a When we collect information from a

population element, we say that we population element, we say that we measuremeasure the variable being studied. the variable being studied.

• A variable that is naturally numerical is A variable that is naturally numerical is called called quantitativequantitative..

• A variable that is not numerical is called A variable that is not numerical is called qualitativequalitative..

Populations and Samples, cont’dPopulations and Samples, cont’d

• A A censuscensus measures the variable for measures the variable for every element of the population.every element of the population.• A census is time-consuming and A census is time-consuming and

expensive, unless the population is very expensive, unless the population is very small.small.

• Instead of dealing with the entire Instead of dealing with the entire population, a subset, called a population, a subset, called a samplesample, , is usually selected for study.is usually selected for study.

Example 1Example 1

• Suppose you want to determine voter Suppose you want to determine voter opinion on a ballot measure. You opinion on a ballot measure. You survey potential voters among survey potential voters among pedestrians on Main Street during pedestrians on Main Street during lunch.lunch.

a)a) What is the population?What is the population?

b)b) What is the sample?What is the sample?

c)c) What is the variable being measured?What is the variable being measured?

Example 1, cont’dExample 1, cont’da)a) Solution: The population consists of all the Solution: The population consists of all the

people who intend to vote on the ballot people who intend to vote on the ballot measure.measure.

Example 1, cont’dExample 1, cont’db)b) Solution: The sample consists of all the people Solution: The sample consists of all the people

you interviewed on Main Street who intend to you interviewed on Main Street who intend to vote on the ballot measure.vote on the ballot measure.

Example 1, cont’dExample 1, cont’d

c)c) Solution: The variable being Solution: The variable being measured is the voter’s intent to measured is the voter’s intent to vote “yes” or “no” on the ballot vote “yes” or “no” on the ballot measure. measure.

DataData

• The measurement information recorded The measurement information recorded from a sample is called from a sample is called datadata..• Quantitative dataQuantitative data is measurements for a is measurements for a

quantitative variable.quantitative variable.

• Qualitative dataQualitative data is measurements for a is measurements for a qualitative variable.qualitative variable.

Data, cont’dData, cont’d

• Qualitative data with a natural ordering Qualitative data with a natural ordering is called is called ordinalordinal..• For example, a ranking of a pizza on a For example, a ranking of a pizza on a

scale of “Excellent” to “Poor” is ordinal.scale of “Excellent” to “Poor” is ordinal.

• Qualitative data without a natural Qualitative data without a natural ordering is called ordering is called nominalnominal..• For example, eye color is nominal.For example, eye color is nominal.

Data, cont’dData, cont’d

• The types of data are illustrated below. The types of data are illustrated below.

Example 2Example 2• Suppose you survey potential voters Suppose you survey potential voters

among the people on Main Street among the people on Main Street during lunch to determine their political during lunch to determine their political affiliation and age, as well as their affiliation and age, as well as their opinion on the ballot measure.opinion on the ballot measure.

• Classify the variables as quantitative or Classify the variables as quantitative or qualitative. qualitative.

Example 2, cont’dExample 2, cont’d

• Solution: Solution: • Political affiliation is a qualitative variable.Political affiliation is a qualitative variable.

• Age is a quantitative variable.Age is a quantitative variable.

• Opinion on the ballot measure is a Opinion on the ballot measure is a qualitative variable.qualitative variable.

Question:Question:Suppose you survey potential voters among the Suppose you survey potential voters among the people on Main Street during lunch to determine people on Main Street during lunch to determine their political affiliation and age, as well as their their political affiliation and age, as well as their opinion on the ballot measure. Classify the opinion on the ballot measure. Classify the qualitative variables political affiliation and opinion qualitative variables political affiliation and opinion on the ballot measure as ordinal or nominal.on the ballot measure as ordinal or nominal.

a. Both are ordinal.a. Both are ordinal.b. Both are nominal.b. Both are nominal.c. Political affiliation is ordinal and opinion is nominal.c. Political affiliation is ordinal and opinion is nominal.d. Political affiliation is nominal and opinion is ordinal.d. Political affiliation is nominal and opinion is ordinal.

Samples, cont’dSamples, cont’d• Statistical inferenceStatistical inference is used to make an is used to make an

estimation or prediction for the entire estimation or prediction for the entire population based on data collected from the population based on data collected from the sample.sample.

• If a sample has characteristics that are If a sample has characteristics that are typical of the population as a whole, we say typical of the population as a whole, we say it is a it is a representative samplerepresentative sample..• A A biasbias is a flaw in the sampling that makes it is a flaw in the sampling that makes it

more likely the sample will not be representative.more likely the sample will not be representative.

Common Sources of BiasCommon Sources of Bias

• Faulty samplingFaulty sampling: The sample is not : The sample is not representative.representative.

• Faulty questionsFaulty questions: The questions are : The questions are worded to influence the answers.worded to influence the answers.

• Faulty interviewingFaulty interviewing: Interviewers fail to : Interviewers fail to survey the entire sample, misread survey the entire sample, misread questions, and/or misinterpret answers.questions, and/or misinterpret answers.

Common Sources of Bias, cont’dCommon Sources of Bias, cont’d

• Lack of understanding or knowledgeLack of understanding or knowledge: : The person being interviewed does not The person being interviewed does not understand the question or needs more understand the question or needs more information.information.

• False answersFalse answers: The person being : The person being interviewed intentionally gives incorrect interviewed intentionally gives incorrect information.information.

Example 3Example 3

• Suppose you wish to determine voter Suppose you wish to determine voter opinion regarding eliminating the capital opinion regarding eliminating the capital gains tax. You survey potential voters gains tax. You survey potential voters on a street corner near Wall Street in on a street corner near Wall Street in New York City.New York City.

• Identify a source of bias in this poll.Identify a source of bias in this poll.

Example 3, cont’dExample 3, cont’d

• Solution: One source of bias in Solution: One source of bias in choosing the sample is that people who choosing the sample is that people who work on Wall Street would benefit from work on Wall Street would benefit from the elimination of the tax and are more the elimination of the tax and are more likely to favor the elimination than the likely to favor the elimination than the average voter may be.average voter may be.• This is faulty sampling.This is faulty sampling.

Example 4Example 4

• Suppose a car manufacturer wants to Suppose a car manufacturer wants to test the reliability of 1000 alternators. test the reliability of 1000 alternators. They will test the first 30 from the lot for They will test the first 30 from the lot for defects.defects.

• Identify any potential sources of bias.Identify any potential sources of bias.

Example 4, cont’dExample 4, cont’d

• Solution: One source of bias could be that Solution: One source of bias could be that the first 30 alternators are chosen for the the first 30 alternators are chosen for the sample. It may be that defects are either sample. It may be that defects are either much more likely at the beginning of a much more likely at the beginning of a production run or much less likely at the production run or much less likely at the beginning. In either case, the sample would beginning. In either case, the sample would not be representative.not be representative.• This is potentially faulty sampling.This is potentially faulty sampling.

Simple Random SamplesSimple Random Samples

• Representative samples are usually Representative samples are usually chosen randomly.chosen randomly.

• Given a population and a desired Given a population and a desired sample size, a sample size, a simple random samplesimple random sample is any sample chosen in such a way is any sample chosen in such a way that all samples of the same size are that all samples of the same size are equally likely to be chosen.equally likely to be chosen.

Simple Random Samples, cont’dSimple Random Samples, cont’d• One way to choose a simple random sample One way to choose a simple random sample

is to use a random number generator or is to use a random number generator or table.table.• A A random number generatorrandom number generator is a computer or is a computer or

calculator program designed to produce calculator program designed to produce numbers with no apparent pattern.numbers with no apparent pattern.

• A A random number tablerandom number table is a table produced is a table produced with a random number generator.with a random number generator.

• An example of the first few rows of a random An example of the first few rows of a random number table is shown on the next slide.number table is shown on the next slide.

Random Number TableRandom Number Table

Example 5Example 5

• Choose a simple random sample of Choose a simple random sample of size 5 from 12 semifinalists: Astoria, size 5 from 12 semifinalists: Astoria, Beatrix, Charles, Delila, Elsie, Frank, Beatrix, Charles, Delila, Elsie, Frank, Gaston, Heidi, Ian, Jose, Kirsten, and Gaston, Heidi, Ian, Jose, Kirsten, and Lex.Lex.

Example 5, cont’dExample 5, cont’d

• Solution: Assign numerical labels to the Solution: Assign numerical labels to the population elements, in any order, as shown population elements, in any order, as shown below:below:

Example 5, cont’dExample 5, cont’d• Solution, cont’d: Choose a random spot Solution, cont’d: Choose a random spot

in the table to begin. in the table to begin. • In this case, we could choose to start at In this case, we could choose to start at

the top of the third column and to read the top of the third column and to read down, looking at the last 2 digits in each down, looking at the last 2 digits in each number. This choice is arbitrary.number. This choice is arbitrary.

• Numbers that correspond to population Numbers that correspond to population labels are recorded, ignoring duplicates, labels are recorded, ignoring duplicates, until 5 such numbers have been found.until 5 such numbers have been found.

Example 5, cont’dExample 5, cont’d

Example 5, cont’dExample 5, cont’d

• Solution, cont’d: The numbers located Solution, cont’d: The numbers located are 01, 06, 10, 11, and 07.are 01, 06, 10, 11, and 07.• The simple random sample consists of The simple random sample consists of

Beatrix, Gaston, Heidi, Kirsten, and Lex.Beatrix, Gaston, Heidi, Kirsten, and Lex.

Question:Question:

Choose a different simple random Choose a different simple random sample of size 5 from the 12 sample of size 5 from the 12 semifinalists: Astoria, Beatrix, Charles, semifinalists: Astoria, Beatrix, Charles, Delila, Elsie, Frank, Gaston, Heidi, Ian, Delila, Elsie, Frank, Gaston, Heidi, Ian, Jose, Kirsten, and Lex.Jose, Kirsten, and Lex.

Question, cont’dQuestion, cont’d

Use the first 2 digits of each number, Use the first 2 digits of each number, reading across the row starting in reading across the row starting in row 128 of the random number table.row 128 of the random number table.

a. Delila, Beatrix, Lex, Kirsten, Josea. Delila, Beatrix, Lex, Kirsten, Jose

b. Frank, Jose, Elsie, Delila, Ianb. Frank, Jose, Elsie, Delila, Ian

c. Charles, Ian, Frank, Beatrix, Gastonc. Charles, Ian, Frank, Beatrix, Gaston

d. Jose, Beatrix, Ian, Heidi, Lexd. Jose, Beatrix, Ian, Heidi, Lex

Example 6Example 6

• Choose a simple random sample of Choose a simple random sample of size 8 from the states of the United size 8 from the states of the United States of America.States of America.

Example 6, cont’dExample 6, cont’d

• Solution: Numerical labels can be Solution: Numerical labels can be assigned to the population elements in assigned to the population elements in any order.any order.

• In this example we choose to order the In this example we choose to order the states by area.states by area.• The labels are shown on the next slide.The labels are shown on the next slide.

Example 6, cont’dExample 6, cont’d

Example 6, cont’dExample 6, cont’d

• Solution, cont’d: We randomly choose Solution, cont’d: We randomly choose to start at the top row, left column of the to start at the top row, left column of the number table and read the last 2 digits number table and read the last 2 digits of each entry across the row.of each entry across the row.• The entries are 039The entries are 0391818 771 7719595 477 4777272

2182187070 871 8712222 994 9944545 100 1004141 317 3179595 638 6385757 6456456969 348 3489393 204 2042929 435 4353737 253 2536868 952 95237 37 1771770707 342 3428080 047 0475555 643 6430101 668 6683636 1221220101……

Example 6, cont’dExample 6, cont’d

• Solution, cont’d:Solution, cont’d:• The numbers obtained from the table are The numbers obtained from the table are

18, 22, 45, 41, 29, 37, 07, 01.18, 22, 45, 41, 29, 37, 07, 01.

• The states selected for the sample are The states selected for the sample are Washington, Florida, Vermont, West Washington, Florida, Vermont, West Virginia, Arkansas, Kentucky, Nevada, Virginia, Arkansas, Kentucky, Nevada, and Alaska.and Alaska.

9.1 Initial Problem Solution9.1 Initial Problem Solution

• To fairly select 5 students from 25 volunteers, To fairly select 5 students from 25 volunteers, a professor could choose a simple random a professor could choose a simple random sample.sample.

• Solution: Assign the students labels of 00 Solution: Assign the students labels of 00 through 24 according to some ordering.through 24 according to some ordering.• Pick a starting place in a random number table Pick a starting place in a random number table

and read until 5 students have been selected.and read until 5 students have been selected.

Initial Problem Solution, cont’dInitial Problem Solution, cont’d

• Suppose the first 2 digits of each entry Suppose the first 2 digits of each entry in the last column are used.in the last column are used.• The first 5 numbers that are 24 or less are The first 5 numbers that are 24 or less are

20, 04, 16, 07, and 06.20, 04, 16, 07, and 06.

• The students that were assigned these The students that were assigned these labels are fairly chosen from the 25 labels are fairly chosen from the 25 volunteers.volunteers.

Section 9.2Section 9.2

Survey Sampling MethodsSurvey Sampling Methods• GoalsGoals

• Study sampling methodsStudy sampling methods• Independent samplingIndependent sampling• Systematic samplingSystematic sampling• Quota samplingQuota sampling• Stratified samplingStratified sampling• Cluster samplingCluster sampling

9.2 Initial Problem9.2 Initial Problem

• You need to interview at least 800 people You need to interview at least 800 people nationwide.nationwide.

• You need a different interviewer for each county.You need a different interviewer for each county.

• Each interviewer costs $50 plus $10 per interview.Each interviewer costs $50 plus $10 per interview.

• Your budget is $15,000Your budget is $15,000..

• Which is better, a simple random sample of all Which is better, a simple random sample of all adults in the U.S. or a simple random sample of adults in the U.S. or a simple random sample of adults in randomly-selected counties?adults in randomly-selected counties?• The solution will be given at the end of the section.The solution will be given at the end of the section.

Sample Survey DesignSample Survey Design

• Simple random sampling can be Simple random sampling can be expensive and time-consuming in expensive and time-consuming in practice.practice.

• Statisticians have developed Statisticians have developed sample sample survey designsurvey design to provide less to provide less expensive alternatives to simple expensive alternatives to simple random sampling. random sampling.

Independent SamplingIndependent Sampling

• In In independent samplingindependent sampling, each member of , each member of the population has the same fixed chance of the population has the same fixed chance of being selected for the sample.being selected for the sample.• The size of the sample is not fixed ahead of The size of the sample is not fixed ahead of

time.time.

• For example, in a 50% independent sample, For example, in a 50% independent sample, each element of the population has a 50% each element of the population has a 50% chance of being selected. chance of being selected.

Example 1Example 1

• Find a 50% independent sample of the Find a 50% independent sample of the 12 semifinalists: Astoria, Beatrix, 12 semifinalists: Astoria, Beatrix, Charles, Delila, Elsie, Frank, Gaston, Charles, Delila, Elsie, Frank, Gaston, Heidi, Ian, Jose, Kirsten, and Lex. Heidi, Ian, Jose, Kirsten, and Lex.

Example 1, cont’dExample 1, cont’d

• Solution: Because a random number table Solution: Because a random number table contains 10 digits, there is a 50% chance contains 10 digits, there is a 50% chance that one of the five digits 0, 1, 2, 3, or 4 will that one of the five digits 0, 1, 2, 3, or 4 will occur.occur.

• Let the digits 0, 1, 2, 3, or 4 represent Let the digits 0, 1, 2, 3, or 4 represent “select this contestant” and let the remaining “select this contestant” and let the remaining digits represent “do not select this digits represent “do not select this contestant”.contestant”.

Example 1, cont’dExample 1, cont’d• Solution, cont’d:Solution, cont’d: We randomly choose We randomly choose

column 6 in the random number table and column 6 in the random number table and look at the first 12 digits: 99445 20429 04.look at the first 12 digits: 99445 20429 04.• The first 9 indicates that Astoria is not selected.The first 9 indicates that Astoria is not selected.

• The second 9 indicates that Beatrix is not selected.The second 9 indicates that Beatrix is not selected.

• The 4 represents that Charles is selected, and so The 4 represents that Charles is selected, and so on…on…

• The 50% independent sample is Charles, The 50% independent sample is Charles, Delila, Frank, Gaston, Heidi, Ian, Kirsten, Delila, Frank, Gaston, Heidi, Ian, Kirsten, and Lex.and Lex.

Question:Question:

Choose a 40% independent sample Choose a 40% independent sample from the 12 semifinalists: Astoria, from the 12 semifinalists: Astoria, Beatrix, Charles, Delila, Elsie, Frank, Beatrix, Charles, Delila, Elsie, Frank, Gaston, Heidi, Ian, Jose, Kirsten, and Gaston, Heidi, Ian, Jose, Kirsten, and Lex. Lex.

Use the first 12 digits of row 145 of Use the first 12 digits of row 145 of the random number table and use the random number table and use digits 0, 1, 2, 3 for selection.digits 0, 1, 2, 3 for selection.

Question, cont’dQuestion, cont’d

Use the first 12 digits of row 145 of the Use the first 12 digits of row 145 of the random number table and use digits 0, 1, 2, random number table and use digits 0, 1, 2, 3 for selection.3 for selection.

a. Astoria, Beatrix, Charles, Delilaa. Astoria, Beatrix, Charles, Delila

b. Charles, Elsie, Frank, Gastonb. Charles, Elsie, Frank, Gaston

c. Charles, Elsie, Frank, Gaston, Heidi, c. Charles, Elsie, Frank, Gaston, Heidi, Jose, Kirsten, Lex.Jose, Kirsten, Lex.

d. Beatrix, Charles, Delila, Frank, Heidi, d. Beatrix, Charles, Delila, Frank, Heidi, Ian, Ian, Lex.Lex.

Example 2Example 2

• Find a 10% independent sample of the Find a 10% independent sample of the 100 automobiles produced in one day 100 automobiles produced in one day at a factory. at a factory.

Example 2, cont’dExample 2, cont’d

• Solution: Choose some ordering for the Solution: Choose some ordering for the 100 automobiles.100 automobiles.

• There is a 10% chance that the digit 0 There is a 10% chance that the digit 0 will occur, so let the digit 0 represent will occur, so let the digit 0 represent “select this automobile” and let the “select this automobile” and let the other 9 digits represent “do not select other 9 digits represent “do not select this automobile”.this automobile”.

Example 2, cont’dExample 2, cont’d

• Solution, cont’d: We randomly start in the Solution, cont’d: We randomly start in the first column, first row of the random first column, first row of the random number table and read from left to right.number table and read from left to right.

• In the first 100 digits we read in the table, In the first 100 digits we read in the table, a 0 occurs in the positions 1, 7, 8, 19, 33, a 0 occurs in the positions 1, 7, 8, 19, 33, 39, 62, 70, 73, 81, 88, 93, 95, 98, and 39, 62, 70, 73, 81, 88, 93, 95, 98, and 100. 100.

Example 2, cont’dExample 2, cont’d• Solution, cont’d: TSolution, cont’d: The automobiles that he automobiles that

are selected are highlighted.are selected are highlighted.

Systematic SamplingSystematic Sampling

• In In systematic samplingsystematic sampling, we decide ahead of , we decide ahead of time what proportion of the population we time what proportion of the population we wish to sample.wish to sample.

• For a For a 1-in-1-in-kk systematic sample systematic sample::• List the population elements in some order.List the population elements in some order.

• Randomly choose a number, Randomly choose a number, rr, from 1 to , from 1 to kk..

• The elements selected are those labeled The elements selected are those labeled rr, , rr + + kk,, r r + 2 + 2kk, , rr + 3 + 3kk, …, …

Example 3Example 3

• Use systematic sampling to select a 1-Use systematic sampling to select a 1-in-10 systematic sample of the 100 in-10 systematic sample of the 100 automobiles produced in one day at a automobiles produced in one day at a factory. factory.

Example 3, cont’dExample 3, cont’d

• Solution: List the automobiles in some Solution: List the automobiles in some order.order.

• Suppose we randomly choose Suppose we randomly choose rr = 5. = 5.• Since Since rr = 5 and = 5 and kk = 10, the automobiles = 10, the automobiles

selected for the sample are those labeled selected for the sample are those labeled 5, 15, 25, 35, 45, 55, 65, 75, 85, and 95.5, 15, 25, 35, 45, 55, 65, 75, 85, and 95.

Example 3, cont’dExample 3, cont’d• Solution, cont’d: TSolution, cont’d: The automobiles that he automobiles that

are selected are highlighted.are selected are highlighted.

Example 3, cont’dExample 3, cont’d

• A systematic sample is easier to A systematic sample is easier to choose than an independent sample.choose than an independent sample.

• However, the regularity in the selection However, the regularity in the selection of a systematic sample can sometimes of a systematic sample can sometimes be a source of bias. be a source of bias.

Question:Question:Choose a 1-in-3 systematic sample from the 12 Choose a 1-in-3 systematic sample from the 12 semifinalists: Astoria, Beatrix, Charles, Delila, semifinalists: Astoria, Beatrix, Charles, Delila, Elsie, Frank, Gaston, Heidi, Ian, Jose, Kirsten, Elsie, Frank, Gaston, Heidi, Ian, Jose, Kirsten, and Lex. Use the randomly chosen value of and Lex. Use the randomly chosen value of rr = = 2 2

a. Beatrix, Elsie, Heidi, Kirstena. Beatrix, Elsie, Heidi, Kirsten

b. Astoria, Delila, Gaston, Joseb. Astoria, Delila, Gaston, Jose

c. Charles, Frank, Ian, Lexc. Charles, Frank, Ian, Lex

d. Astoria, Charles, Elsie, Gaston, Ian, d. Astoria, Charles, Elsie, Gaston, Ian, KirstenKirsten

Quota SamplingQuota Sampling

• In In quota samplingquota sampling, the sample is chosen to , the sample is chosen to be representative for known important be representative for known important variables. variables. • Quotas may be set for age groups, genders, Quotas may be set for age groups, genders,

ethnicities, occupations, and so on.ethnicities, occupations, and so on.

• There is no way to know ahead of time which There is no way to know ahead of time which variables are important enough to require variables are important enough to require quotas.quotas.

• Quota sampling is not always reliable. Quota sampling is not always reliable.

Stratified SamplingStratified Sampling• In In stratified samplingstratified sampling, the population is , the population is

subdivided into 2 or more nonoverlapping subdivided into 2 or more nonoverlapping subsets, each of which is called a subsets, each of which is called a stratumstratum..

Stratified Sampling, cont’dStratified Sampling, cont’d

• A A stratified random samplestratified random sample is obtained is obtained by selecting a simple random sample by selecting a simple random sample from each stratum.from each stratum.• A stratified sample can be less costly A stratified sample can be less costly

because the strata allow a smaller sample because the strata allow a smaller sample to be used. to be used.

Example 4Example 4

• Select a stratified random sample of 10 Select a stratified random sample of 10 men and 10 women from a population men and 10 women from a population of 200.of 200.• Suppose there are equal numbers of men Suppose there are equal numbers of men

and women in the population.and women in the population.

• Use the first 2 digits of the 2Use the first 2 digits of the 2ndnd and 3 and 3rdrd columns of the random number table for columns of the random number table for selecting men and women, respectively. selecting men and women, respectively.

Example 4, cont’dExample 4, cont’d

• Solution: The 2 strata are men and women.Solution: The 2 strata are men and women.• Choose a simple random sample from the Choose a simple random sample from the

men.men.• Number the 100 men with labels 00 through 99.Number the 100 men with labels 00 through 99.

• The 10 men chosen from the random number The 10 men chosen from the random number table are those with labels 77, 31, 25, 66, 49, table are those with labels 77, 31, 25, 66, 49, 38, 00, 95, 24, and 57.38, 00, 95, 24, and 57.

Example 4, cont’dExample 4, cont’d

• Solution, cont’d: Choose a simple Solution, cont’d: Choose a simple random sample from the women.random sample from the women.• Number the 100 women with labels 00 Number the 100 women with labels 00

through 99.through 99.

• The 10 women chosen from the random The 10 women chosen from the random number table are those with labels 47, 63, number table are those with labels 47, 63, 95, 12, 49, 37, 48, 94, 35, and 78.95, 12, 49, 37, 48, 94, 35, and 78.

Example 4, cont’dExample 4, cont’d• Solution, cont’d: The stratified random Solution, cont’d: The stratified random

sample is represented below. sample is represented below.

Question:Question:Suppose the 12 semifinalists can be divided into 2 Suppose the 12 semifinalists can be divided into 2 strata as follows.strata as follows.

Junior division: Astoria, Charles, Delila, Gaston, Junior division: Astoria, Charles, Delila, Gaston, Heidi, LexHeidi, Lex

Senior division: Beatrix, Elsie, Frank, Ian, Jose, Senior division: Beatrix, Elsie, Frank, Ian, Jose, Kirsten Kirsten

Choose a stratified random sample so the sample Choose a stratified random sample so the sample contains 2 members of each stratum. Label the contains 2 members of each stratum. Label the members of each stratum 01 through 06. For the members of each stratum 01 through 06. For the junior division use the first 2 digits of column 3, junior division use the first 2 digits of column 3, starting at the top and reading down. For the senior starting at the top and reading down. For the senior division use the last 2 digits of column 3, starting at division use the last 2 digits of column 3, starting at the top and reading down. the top and reading down.

Question, cont’dQuestion, cont’d

Choose a stratified random sample Choose a stratified random sample so the sample contains 2 members of so the sample contains 2 members of each stratum. each stratum.

a. Frank, Beatrix, Astoria, Charlesa. Frank, Beatrix, Astoria, Charles

b. Gaston, Lex, Ian, Elsieb. Gaston, Lex, Ian, Elsie

c. Heidi, Astoria, Jose, Beatrixc. Heidi, Astoria, Jose, Beatrix

d. Lex, Charles, Beatrix, Frank d. Lex, Charles, Beatrix, Frank

Cluster SamplingCluster Sampling

• In In cluster samplingcluster sampling, the population is divided , the population is divided into nonoverlapping subsets called into nonoverlapping subsets called sampling sampling unitsunits or or clustersclusters. . • Clusters may vary in size.Clusters may vary in size.

• A A frameframe is a complete list of the sampling is a complete list of the sampling units.units.

• A A samplesample is a collection of sampling units is a collection of sampling units selected from the frame.selected from the frame.

Cluster Sampling, cont’dCluster Sampling, cont’d• In cluster sampling, a simple random In cluster sampling, a simple random

sample determines the clusters to be sample determines the clusters to be included in the sample.included in the sample.

Example 5Example 5

• Select a cluster sample of 12 Select a cluster sample of 12 individuals from a population of 96 individuals from a population of 96 people who all live in four-person people who all live in four-person suites.suites.• Use the first 2 digits of the 4Use the first 2 digits of the 4thth column of column of

the random number table.the random number table.

Example 5, cont’dExample 5, cont’d

• Solution: The clusters will be the 24 Solution: The clusters will be the 24 suites.suites.• Label the suites 01 through 24.Label the suites 01 through 24.

• We need a simple random sample of 3 We need a simple random sample of 3 of these suites to obtain a cluster of these suites to obtain a cluster sample of 12 people.sample of 12 people.

Example 5, cont’dExample 5, cont’d• Solution, cont’d: The people in suites 21, 17, Solution, cont’d: The people in suites 21, 17,

and 10 are selected.and 10 are selected.

Sampling SummarySampling Summary

9.2 Initial Problem Solution9.2 Initial Problem Solution• You need to interview at least 800 people You need to interview at least 800 people

nationwide.nationwide.

• You need a different interviewer for each You need a different interviewer for each county.county.

• Each interviewer costs $50 plus $10 per Each interviewer costs $50 plus $10 per interview.interview.

• Your budget is $15,000.Your budget is $15,000.

• Which is better, a simple random sample of all Which is better, a simple random sample of all adults in the U.S. or a simple random sample adults in the U.S. or a simple random sample of adults in randomly-selected counties?of adults in randomly-selected counties?

Initial Problem Solution, cont’dInitial Problem Solution, cont’d• A simple random sample is unbiased, so A simple random sample is unbiased, so

this might seem to be the best choice.this might seem to be the best choice.• However, there are 3130 counties in the However, there are 3130 counties in the

U.S.U.S.• If, for example, you get people in your sample If, for example, you get people in your sample

from only 400 of the counties, it would cost you from only 400 of the counties, it would cost you 400($50) + 800($10) = $28,000.400($50) + 800($10) = $28,000.

• You cannot afford to choose a simple You cannot afford to choose a simple random sample. random sample.

Initial Problem Solution, cont’dInitial Problem Solution, cont’d• The second type of sample is a much less The second type of sample is a much less

expensive choice.expensive choice.• You must pay 800($10) = $8000 for the You must pay 800($10) = $8000 for the

interviews, which leaves $7000 for hiring interviews, which leaves $7000 for hiring interviewers.interviewers.• You can select a simple random sample of up to You can select a simple random sample of up to

140 counties.140 counties.

• Then select a simple random sample of people Then select a simple random sample of people from each selected county, for a total of 800 from each selected county, for a total of 800 people. people.

Section 9.3Section 9.3

Central Tendency and VariabilityCentral Tendency and Variability• GoalsGoals

• Study measures of central tendencyStudy measures of central tendency• MeanMean• MedianMedian• ModeMode

• Study measures of dispersionStudy measures of dispersion• RangeRange• QuartilesQuartiles• Standard deviationStandard deviation

9.3 Initial Problem9.3 Initial Problem• Which stockbroker should you choose if you Which stockbroker should you choose if you

want to minimize risk while maintaining a want to minimize risk while maintaining a steady rate of growth?steady rate of growth?• One stockbroker’s recommendations had One stockbroker’s recommendations had

percentage gains of 21%, -3%, 16%, 27%, 9%, percentage gains of 21%, -3%, 16%, 27%, 9%, 11%, 13%, 6%, and 17%.11%, 13%, 6%, and 17%.

• The other’s recommendations had percentage The other’s recommendations had percentage gains of 11%, 13%, 16%, 8%, 5%, 14%, 15%, gains of 11%, 13%, 16%, 8%, 5%, 14%, 15%, 17%, and 18%.17%, and 18%.

• The solution will be given at the end of the section.The solution will be given at the end of the section.

Measures of Central TendencyMeasures of Central Tendency

• Statistics that tell us about the location of Statistics that tell us about the location of values in a data set are called values in a data set are called measures of measures of locationlocation..

• The most important measures of location, The most important measures of location, called called measures of central tendencymeasures of central tendency, tell us , tell us where the center of the data set lies.where the center of the data set lies.• The most important measures of central The most important measures of central

tendency are mean, median, and mode. tendency are mean, median, and mode.

The MeanThe Mean

• The The meanmean is the most common type of is the most common type of average.average.• This is an arithmetic mean.This is an arithmetic mean.

• If there are If there are NN numbers in a data set, the numbers in a data set, the mean is: mean is:

1 2 Nx x x

N

The Mean, cont’dThe Mean, cont’d

• The mean of a sample is denoted by , The mean of a sample is denoted by , which is read “x-bar”.which is read “x-bar”.

• The mean of a population is denoted by The mean of a population is denoted by μμ, the Greek letter pronounced “mew”., the Greek letter pronounced “mew”.

x

Example 1Example 1• Find the mean of each data set.Find the mean of each data set.

a)a) 1, 1, 2, 2, 31, 1, 2, 2, 3

b)b)1, 1, 2, 2, 111, 1, 2, 2, 11

c)c)1, 1, 2, 2, 471, 1, 2, 2, 47

• Solution:Solution:

a)a)The mean is The mean is 1 1 2 2 3 9 4

15 5 5

Example 1, cont’dExample 1, cont’d• Solution, cont’d:Solution, cont’d:

b)b) The mean is The mean is

c)c) The mean is The mean is

1 1 2 2 11 17 23

5 5 5

1 1 2 2 47 53 310

5 5 5

Example 2Example 2

• A college graduate reads that a A college graduate reads that a company with 5 employees has a mean company with 5 employees has a mean salary of $48,000. salary of $48,000.

• How might this be misleading? How might this be misleading?

Example 2, cont’dExample 2, cont’d• Solution: One possibility is that every employee Solution: One possibility is that every employee

earns a salary of $48,000.earns a salary of $48,000.

• Another possibility is that the owner makes Another possibility is that the owner makes $120,000, while the other 4 employees each earn $120,000, while the other 4 employees each earn $30,000.$30,000.

48000 48000 48000 48000 48000 240000$48,000

5 5

120000 30000 30000 30000 30000 240000$48,000

5 5

Example 2, cont’dExample 2, cont’d

• There are also other possible situations, There are also other possible situations, but these two are enough to show that but these two are enough to show that the salary the graduate could expect to the salary the graduate could expect to earn can vary widely based only on earn can vary widely based only on knowing the mean salary. knowing the mean salary.

Question:Question:

Find the mean of the data set: 19, Find the mean of the data set: 19, 27, 83, 94. Round to 2 decimal 27, 83, 94. Round to 2 decimal places.places.

a. 54.33a. 54.33

b. 55.75b. 55.75

c. 44.60c. 44.60

d. 56.50d. 56.50

The MedianThe Median• The The medianmedian is the “middle number” of a data is the “middle number” of a data

set when the values are arranged from set when the values are arranged from smallest to largest.smallest to largest.• If there are an odd number of data points, the If there are an odd number of data points, the

data point exactly in the middle of the list is the data point exactly in the middle of the list is the median.median.

• If there are an even number of data points, the If there are an even number of data points, the mean of the two data points in the middle of the mean of the two data points in the middle of the list is the median.list is the median.

Example 3Example 3

• Find the mean and median of each Find the mean and median of each data set.data set.

a)a) 0, 2, 40, 2, 4

b)b) 0, 2, 4, 100, 2, 4, 10

c)c) 0, 2, 4, 10, 10000, 2, 4, 10, 1000

Example 3, cont’dExample 3, cont’d

a)a) Solution for 0, 2, 4Solution for 0, 2, 4

• The median is 2.The median is 2.

• The mean is: The mean is:

0 2 4 62

3 3

Example 3, cont’dExample 3, cont’d

b)b) Solution: for 0, 2, 4, 10 Solution: for 0, 2, 4, 10

• The median is:The median is:

• The mean is:The mean is:

2 4 63

2 2

0 2 4 10 164

4 4

Example 3, cont’dExample 3, cont’d

c)c) Solution: for 0, 2, 4, 10, 1000 Solution: for 0, 2, 4, 10, 1000

• The median is 4.The median is 4.

• The mean is:The mean is:

0 2 4 10 1000 1016203.2

5 5

Example 3, cont’dExample 3, cont’d

• One very large or very small data One very large or very small data value can change the mean value can change the mean dramatically.dramatically.

• Large or small data values do not Large or small data values do not have much of an effect on the have much of an effect on the median.median.

Example 4Example 4

• Find the median salary for the 2 Find the median salary for the 2 situations.situations.

a)a) Five employees each earn $48,000.Five employees each earn $48,000.

b)b) Four employees earn $30,000 and one Four employees earn $30,000 and one earns $120,000. earns $120,000.

Example 4, cont’dExample 4, cont’d

• Solution: Solution:

a)a) The median salary is $48,000.The median salary is $48,000.• The median is the same as the mean.The median is the same as the mean.

b)b) The median salary is $30,000.The median salary is $30,000.• In this case the median more accurately In this case the median more accurately

shows the typical salary than does the mean shows the typical salary than does the mean of $48,000.of $48,000.

Question:Question:

Find the median of the data set: 19, Find the median of the data set: 19, 27, 83, 94. Round to 2 decimal 27, 83, 94. Round to 2 decimal places. places.

a. 27.00a. 27.00

b. 83.00b. 83.00

c. 56.50c. 56.50

d. 55.00d. 55.00

Symmetric DistributionsSymmetric Distributions• If the mean and median of a data set are equal, the If the mean and median of a data set are equal, the

data distribution is called data distribution is called symmetricsymmetric..• An example of a symmetric data set is shown An example of a symmetric data set is shown

below. below.

Skewed DistributionsSkewed Distributions• A distribution is A distribution is skewed leftskewed left if the mean is less than if the mean is less than

the median.the median.• A distribution is A distribution is skewed rightskewed right if the mean is greater if the mean is greater

than the median.than the median.

The ModeThe Mode

• The mode is the most commonly-The mode is the most commonly-occurring value in a data set.occurring value in a data set.

• A data set may have:A data set may have:• No mode.No mode.

• One mode.One mode.

• Multiple modes.Multiple modes.

Example 5Example 5

• Find the mode(s) of the following set of Find the mode(s) of the following set of test scores: 26, 32, 54, 62, 67, 70, 71, test scores: 26, 32, 54, 62, 67, 70, 71, 71, 74, 76, 80, 81, 84, 87, 87, 87, 89, 71, 74, 76, 80, 81, 84, 87, 87, 87, 89, 93, 95, 96.93, 95, 96.

• Solution: The value 87 occurs more Solution: The value 87 occurs more times than any other score. The mode times than any other score. The mode is 87.is 87.

Example 5, cont’dExample 5, cont’d

The Weighted MeanThe Weighted Mean• A weighted mean is calculated when A weighted mean is calculated when

different data points have different different data points have different levels of importance, called weights.levels of importance, called weights.

• If the numbers in a data set,If the numbers in a data set,

, have weights , have weights

then the weighted mean is:then the weighted mean is:

1 2, , , Nx x x 1 2, , , Nw w w

1 1 2 2

1 2

N N

N

w x w x w x

w w w

Example 6Example 6

• Suppose your grades one semester are:Suppose your grades one semester are:• An A in a 5-credit courseAn A in a 5-credit course

• A B in a 4-credit courseA B in a 4-credit course

• A C in two 3-credit coursesA C in two 3-credit courses

• What is your GPA that semester?What is your GPA that semester?

Example 6, cont’dExample 6, cont’d

• Solution: A grade of A is worth 4 points, Solution: A grade of A is worth 4 points, a B 3 points, and a C 2 points.a B 3 points, and a C 2 points.

• The weights are the number of credits.The weights are the number of credits.• Your GPA is the weighted mean of your Your GPA is the weighted mean of your

grades:grades:

4(5) 3(4) 2(3) 2(3)2.93

5 4 3 3

Example 7Example 7• Determine the per capita income for the Determine the per capita income for the

group of nations listed in the table.group of nations listed in the table.

Example 7, cont’dExample 7, cont’d

• Solution: The populations of the Solution: The populations of the countries are the weights.countries are the weights.

• The per capita income of the entire The per capita income of the entire group is the weighted mean: 24.2group is the weighted mean: 24.2• The per capita income for the group of The per capita income for the group of

countries in 2002 was about $24,200.countries in 2002 was about $24,200.

Measures of VariabilityMeasures of Variability• The measures of central tendency describe The measures of central tendency describe

only part of the behavior of a data set.only part of the behavior of a data set.• Statistics that tell us how the data varies from Statistics that tell us how the data varies from

its center are called its center are called measures of variabilitymeasures of variability or or measures of spreadmeasures of spread..

• The measures of variability studied here are:The measures of variability studied here are:• RangeRange

• QuartilesQuartiles

• Standard deviationStandard deviation

The RangeThe Range

• The range of a data set is the difference The range of a data set is the difference between the largest data value and the between the largest data value and the smallest data value.smallest data value.

Example 8Example 8

• Compute the mean and the range for Compute the mean and the range for each data set.each data set.

a)a) 3, 4, 5, 6, 7, 83, 4, 5, 6, 7, 8

b)b) 0, 2, 5, 7, 8, 110, 2, 5, 7, 8, 11

Example 8, cont’dExample 8, cont’d• Solution:Solution:

a)a) 3, 4, 5, 6, 7, 83, 4, 5, 6, 7, 8• The mean is 5.5. The mean is 5.5. • The range is 8 – 3 = 5.The range is 8 – 3 = 5.

b)b) 0, 2, 5, 7, 8, 110, 2, 5, 7, 8, 11• The mean is 5.5. The mean is 5.5. • The range is 11 – 0 = 11.The range is 11 – 0 = 11.

• The two data sets have the same mean, but the The two data sets have the same mean, but the difference in ranges shows that the second data difference in ranges shows that the second data set is more spread out.set is more spread out.

Example 9Example 9

• Compute the range for each data set.Compute the range for each data set.

a)a) 0, 2, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 0, 2, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 7, 96, 6, 7, 9

b)b) 0, 8, 9, 6, 1, 4, 6, 0, 1, 5, 3, 0, 9, 8, 0, 5, 0, 8, 9, 6, 1, 4, 6, 0, 1, 5, 3, 0, 9, 8, 0, 5, 6, 9, 5, 06, 9, 5, 0

Example 9, cont’dExample 9, cont’d• Solution:Solution:

a)a) 0, 2, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 0, 2, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 7, 9 6, 6, 7, 9

• The range is 9 – 0 = 9.The range is 9 – 0 = 9.

b)b) 0, 8, 9, 6, 1, 4, 6, 0, 1, 5, 3, 0, 9, 8, 0, 5, 0, 8, 9, 6, 1, 4, 6, 0, 1, 5, 3, 0, 9, 8, 0, 5, 6, 9, 5, 06, 9, 5, 0

• The range is 9 – 0 = 9.The range is 9 – 0 = 9.

Example 9, cont’dExample 9, cont’d

• Solution, cont’d:Solution, cont’d: The two data sets The two data sets have the same have the same range, but the range, but the graphs show that graphs show that one data set varies one data set varies more than the more than the other.other.

QuartilesQuartiles• QuartilesQuartiles are measures of location that divide are measures of location that divide

a data set approximately into fourths.a data set approximately into fourths.• The quartiles are labeled as the The quartiles are labeled as the

• first quartile, qfirst quartile, q11

• second quartile, qsecond quartile, q22

• The second quartile is the same as the The second quartile is the same as the median.median.

• third quartile, qthird quartile, q33

Quartiles, cont’dQuartiles, cont’d• To find the quartiles, arrange the data To find the quartiles, arrange the data

values in order from smallest to values in order from smallest to largest.largest.

1)1) Find the median. This is also the second Find the median. This is also the second quartile.quartile.

2)2) If the number of data points is even, go to If the number of data points is even, go to Step 3. If the number of data point is odd, Step 3. If the number of data point is odd, remove the median from the list before remove the median from the list before going to Step 3.going to Step 3.

Quartiles, cont’dQuartiles, cont’d

3)3) Divide the remaining data points into Divide the remaining data points into a lower half and an upper half.a lower half and an upper half.

4)4) The first quartile, The first quartile, qq11, is the median of , is the median of

the lower half of the data.the lower half of the data.

5)5) The third quartile, The third quartile, qq33, is the median , is the median

of the upper half of the data. of the upper half of the data.

Quartiles, cont’dQuartiles, cont’d

• The interquartile range, IQR, is the The interquartile range, IQR, is the difference between the first and third difference between the first and third quartiles.quartiles.

• IQR = IQR = qq33 - - qq11

• The IQR is a measure of variability.The IQR is a measure of variability.• About half of the data points lie within About half of the data points lie within

the IQRthe IQR

Example 10Example 10

• Find the median, the first and third Find the median, the first and third quartiles, and the interquartile range for quartiles, and the interquartile range for the test scores: 26, 32, 54, 62, 67, 70, the test scores: 26, 32, 54, 62, 67, 70, 71, 71, 74, 76, 80, 81, 84, 87, 87, 87, 71, 71, 74, 76, 80, 81, 84, 87, 87, 87, 89, 93, 95, 96.89, 93, 95, 96.

Example 10, cont’dExample 10, cont’d• Solution: Solution: • The median is The median is

• Since there is an even number of data points, we Since there is an even number of data points, we do not remove the median from the list.do not remove the median from the list.

• The first quartile is the median of the lower The first quartile is the median of the lower half of the list: 26, 32, 54, 62, 67, 70, 71, 71, half of the list: 26, 32, 54, 62, 67, 70, 71, 71, 74, 76.74, 76.• The first quartile is The first quartile is

76 8078

2m

1

67 7068.5

2q

Example 10, cont’dExample 10, cont’d• Solution, cont’d: Solution, cont’d: • The third quartile is the median of the The third quartile is the median of the

upper half of the list: 80, 81, 84, 87, 87, upper half of the list: 80, 81, 84, 87, 87, 87, 89, 93, 95, 96.87, 89, 93, 95, 96.• The third quartile is The third quartile is

• The IQR is 87 – 68.5 = 18.5The IQR is 87 – 68.5 = 18.5

3

87 8787

2q

The Five-Number SummaryThe Five-Number Summary• The The five-number summaryfive-number summary of a data set is a of a data set is a

list of 5 informative numbers related to that list of 5 informative numbers related to that set:set:• The smallest value, The smallest value, ss

• The first quartile, The first quartile, qq11

• The median, The median, mm

• The third quartile, The third quartile, qq33

• The largest value, The largest value, LL

• The numbers are always written in this order.The numbers are always written in this order.

Example 11Example 11

• Consider the set of test scores from the Consider the set of test scores from the previous example: 26, 32, 54, 62, 67, previous example: 26, 32, 54, 62, 67, 70, 71, 71, 74, 76, 80, 81, 84, 87, 87, 70, 71, 71, 74, 76, 80, 81, 84, 87, 87, 87, 89, 93, 95, 96.87, 89, 93, 95, 96.

• The five-number summary for this data The five-number summary for this data set is 26, 68.5, 78, 87, 96.set is 26, 68.5, 78, 87, 96.

Question:Question:

Find the 5 number summary of the Find the 5 number summary of the data set: 19, 27, 83, 94. data set: 19, 27, 83, 94.

a. 19, 27, 55, 83, 94a. 19, 27, 55, 83, 94

b. 19, 23, 55, 85.5, 94b. 19, 23, 55, 85.5, 94

c. 19, 23, 55, 88.5, 94c. 19, 23, 55, 88.5, 94

d. 19, 27, 55, 85.5, 94d. 19, 27, 55, 85.5, 94

Box-and-Whisker PlotBox-and-Whisker Plot

• The The box-and-whisker plotbox-and-whisker plot, also called a , also called a box box plotplot, is a graphical representation of the five-, is a graphical representation of the five-number summary of a data set.number summary of a data set.• The box (rectangle) represents the IQR.The box (rectangle) represents the IQR.

• The location of the median is marked within the box.The location of the median is marked within the box.

• The whiskers (lines) represent the lower and The whiskers (lines) represent the lower and upper 25% of the data.upper 25% of the data.

Box-and-Whisker Plot, cont’dBox-and-Whisker Plot, cont’d

Example 12Example 12• The list of test scores from the previous The list of test scores from the previous

example had a five-number summary of 26, example had a five-number summary of 26, 68.5, 78, 87, 96.68.5, 78, 87, 96.

• The box-and-whisker plot for this data set is The box-and-whisker plot for this data set is shown below.shown below.

Example 13Example 13• The monthly rainfall for 2 cities is shown The monthly rainfall for 2 cities is shown

below.below.• Use box-and-whisker plots to compare the Use box-and-whisker plots to compare the

rainfall amounts.rainfall amounts.

Example 13, cont’dExample 13, cont’d

• Solution: In St. Louis, MO, the rainfalls were: Solution: In St. Louis, MO, the rainfalls were: 2.21, 2.23, 2.31, 2.64, 2.96, 3.20, 3.26, 3.29, 2.21, 2.23, 2.31, 2.64, 2.96, 3.20, 3.26, 3.29, 3.74, 4.10, 4.12.3.74, 4.10, 4.12.• The median is 3.08.The median is 3.08.

• The first quartile is 2.475.The first quartile is 2.475.

• The third quartile is 3.515.The third quartile is 3.515.

• The five-number summary for St. Louis is The five-number summary for St. Louis is 2.21, 2.475, 3.08, 3.515, 4.12.2.21, 2.475, 3.08, 3.515, 4.12.

Example 13, cont’dExample 13, cont’d

• Solution, cont’d: In Portland, OR, the rainfalls Solution, cont’d: In Portland, OR, the rainfalls were: 0.46, 1.13, 1.47, 1.61, 2.08, 2.31, 3.05, were: 0.46, 1.13, 1.47, 1.61, 2.08, 2.31, 3.05, 3.61, 3.93, 5.17, 6.14, 6.16.3.61, 3.93, 5.17, 6.14, 6.16.• The median is 2.68.The median is 2.68.

• The first quartile is 1.54.The first quartile is 1.54.

• The third quartile is 4.55.The third quartile is 4.55.

• The five-number summary for Portland is The five-number summary for Portland is 0.46, 1.54, 2.68, 4.55, 6.16.0.46, 1.54, 2.68, 4.55, 6.16.

Example 13, cont’dExample 13, cont’d

• Solution, cont’d: The 2 box-and-whisker plots Solution, cont’d: The 2 box-and-whisker plots are shown above.are shown above.

• Note that the amount of rainfall in Portland, Note that the amount of rainfall in Portland, OR, varies much more from month-to-month OR, varies much more from month-to-month than it does in St. Louis, MO. than it does in St. Louis, MO.

Standard DeviationStandard Deviation• The standard deviation is a widely-used The standard deviation is a widely-used

measure of variability.measure of variability.• Calculating the standard deviation requires Calculating the standard deviation requires

several intermediate steps, which will be several intermediate steps, which will be illustrated using the data set of incomes illustrated using the data set of incomes shown below.shown below.

Deviation From The MeanDeviation From The Mean

• The difference between a data point and the The difference between a data point and the mean of the data set is called the mean of the data set is called the deviation deviation from the meanfrom the mean of that data point. of that data point.

Deviation From The Mean, cont’dDeviation From The Mean, cont’d

• The mean income is $35,800.The mean income is $35,800.

VarianceVariance• The variance is a type of average of all the The variance is a type of average of all the

deviations from the mean.deviations from the mean.• Variance is calculated differently for data from a Variance is calculated differently for data from a

sample or from the entire population.sample or from the entire population.

• Sample variance, sSample variance, s22: Divide the sum of all : Divide the sum of all the squared deviations from the mean by the squared deviations from the mean by nn – 1. – 1.

• Population variance, Population variance, σσ22: Divide the sum of all : Divide the sum of all the squared deviations from the mean by the squared deviations from the mean by n.n.

Sample VarianceSample Variance• The variance of the incomes is calculated by first The variance of the incomes is calculated by first

squaring all the deviations.squaring all the deviations.

Sample Variance, cont’dSample Variance, cont’d

• The squared deviations are added and The squared deviations are added and then divided by then divided by nn – 1 = 9 – 1 = 8. – 1 = 9 – 1 = 8.

• 2,465,560,000308,195,000

8

Standard DeviationStandard Deviation• Standard deviation is the square root Standard deviation is the square root

of the variance.of the variance.• Taking the square root allows the Taking the square root allows the

standard deviation to have the same units standard deviation to have the same units as the original data values.as the original data values.

• Because it is related to variance, the Because it is related to variance, the standard deviation formula also standard deviation formula also distinguishes between samples and the distinguishes between samples and the population.population.

Standard Deviation, cont’dStandard Deviation, cont’d

• Sample standard deviation isSample standard deviation is

• Population standard deviation is Population standard deviation is

• The standard deviation of the incomes is:The standard deviation of the incomes is:

2s s

2 308,195,000 $17,555.00s s

2

Example 14Example 14

• Find the sample standard deviation of Find the sample standard deviation of the weights (in pounds) in the 2 data the weights (in pounds) in the 2 data sets.sets.• Turkeys: 17, 18, 19, 20, 21Turkeys: 17, 18, 19, 20, 21

• Dogs: 13, 16, 19, 22, 25Dogs: 13, 16, 19, 22, 25

Example 14, cont’dExample 14, cont’d• Solution: Solution: • The sample mean for the turkeys is 19 The sample mean for the turkeys is 19

pounds.pounds.• The sample mean for the dogs is also The sample mean for the dogs is also

19 pounds.19 pounds.• We note that although the means are the We note that although the means are the

same, the standard deviations should same, the standard deviations should reflect the amount of variability in the data reflect the amount of variability in the data values.values.

Example 14, cont’dExample 14, cont’d

• Solution, cont’d: The deviations from the Solution, cont’d: The deviations from the mean for the turkey weights are found.mean for the turkey weights are found.

Example 14, cont’dExample 14, cont’d• Solution, cont’d: Solution, cont’d: • The sample variance The sample variance

of the turkey weights of the turkey weights is 2.5 square is 2.5 square pounds. pounds.

• The sample standard The sample standard deviation of the deviation of the turkey weights is turkey weights is 1.58 pounds.1.58 pounds.

2 2.5 1.58s s

2 10 102.5

5 1 4s

Example 14, cont’dExample 14, cont’d

• Solution, cont’d: The deviations from the Solution, cont’d: The deviations from the mean for the dog weights are found.mean for the dog weights are found.

Example 14, cont’dExample 14, cont’d

• Solution, cont’d: Solution, cont’d: • The sample variance The sample variance

of the dog weights is of the dog weights is 22.5 square pounds. 22.5 square pounds.

• The sample standard The sample standard deviation of the dog deviation of the dog weights is 4.74 weights is 4.74 pounds.pounds.

2 22.5 4.74s s

2 90 9022.5

5 1 4s

Example 14, cont’dExample 14, cont’d

• Solution, cont’d: The standard Solution, cont’d: The standard deviation of the sample of dog weights deviation of the sample of dog weights is larger than the standard deviation of is larger than the standard deviation of the sample of turkey weights because the sample of turkey weights because there was a much wider spread among there was a much wider spread among the dog weights.the dog weights.

Question:Question:

Find the sample standard deviation Find the sample standard deviation of the data set: 19, 27, 83, 94. Round of the data set: 19, 27, 83, 94. Round to the nearest hundredth.to the nearest hundredth.

a. 4382.75a. 4382.75

b. 38.22b. 38.22

c. 66.20c. 66.20

d. 1460.92 d. 1460.92

9.3 Initial Problem Solution9.3 Initial Problem Solution• Which stockbroker should you choose if Which stockbroker should you choose if

you want to minimize risk while you want to minimize risk while maintaining a steady rate of growth?maintaining a steady rate of growth?• One stockbroker’s recommendations had One stockbroker’s recommendations had

percentage gains of 21%, -3%, 16%, 27%, percentage gains of 21%, -3%, 16%, 27%, 9%, 11%, 13%, 6%, and 17%.9%, 11%, 13%, 6%, and 17%.

• The other’s recommendations had The other’s recommendations had percentage gains of 11%, 13%, 16%, 8%, percentage gains of 11%, 13%, 16%, 8%, 5%, 14%, 15%, 17%, and 18%.5%, 14%, 15%, 17%, and 18%.

Initial Problem Solution, cont’dInitial Problem Solution, cont’d

• First you could calculate the mean rate of First you could calculate the mean rate of return for each stockbroker.return for each stockbroker.

• Both stockbrokers have a mean rate of Both stockbrokers have a mean rate of return of 13%.return of 13%.

• Since the average growth rates are the Since the average growth rates are the same, you can measure the variability to same, you can measure the variability to determine which stockbroker’s determine which stockbroker’s recommendations have the least recommendations have the least variability. variability.

Initial Problem Solution, cont’dInitial Problem Solution, cont’d• First stockbroker:First stockbroker:

Initial Problem Solution, cont’dInitial Problem Solution, cont’d• Second stockbroker:Second stockbroker:

Initial Problem Solution, cont’dInitial Problem Solution, cont’d

• The standard deviation of the second The standard deviation of the second portfolio is much smaller than the portfolio is much smaller than the standard deviation of the first stock standard deviation of the first stock portfolio.portfolio.

• Since the growth rates were the same, Since the growth rates were the same, the second stockbroker should be the second stockbroker should be chosen in order to minimize risk.chosen in order to minimize risk.

Recommended