14
11.1 Find Measures of Central Tendency and Dispersion STATISTICS Numerical values used to summarize and compare sets of data MEASURE OF CENTRAL TENDENCY A number used to represent the center or middle of a set of data values. This is represented by the mean, median, and mode. MEASURE OF DISPERSION A statistic that tells you how dispersed, or spread out, data values are STANDARD DEVIATION A measure that describes the typical difference (or deviation) between a data value and the mean. The standard deviation is a statistic that tells you how tightly all the various examples are clustered around the mean in a set of data. OUTLIER A value that is much greater than or much less than most of the other values in a data set RANGE The spread of the data set found by subtracting the largest number and the smallest number in the set. This lets you know how many numbers your set of data covers. MEASURES OF CENTRAL TENDENCY The mean, or ______________ , of n numbers is the _________ of the numbers ______________ by n. The mean is denoted by x , which is read as "x-bar." For the data set x 1 , x 2 ,…x n , the mean is The median of n numbers is the _____________________ number when the numbers are written in order. (If n is even, the median is the _______ of the two middle numbers.) The mode of n numbers is the number or numbers that occur ____________ . There may be _____ mode, ____ mode, or __________________ modes. Example 1: Find measures of central tendency Quiz Scores: The data sets give quiz scores for two different biology classes. Find the mean, median, and mode of each data set. Class A Class B 15, 17, 17, 17, 18, 19, 21, 22, 25 16, 18, 19, 21, 22, 22, 22, 24, 25 Class A: Mean: = _____ Median: ____ Mode: ____ Class B: Mean: = _____ Median: ____ Mode: ____ x x

MEASURES OF CENTRAL TENDENCY › cms › lib05 › MI... · 11.1 Find Measures of Central Tendency and Dispersion STATISTICS Numerical values used to summarize and compare sets of

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MEASURES OF CENTRAL TENDENCY › cms › lib05 › MI... · 11.1 Find Measures of Central Tendency and Dispersion STATISTICS Numerical values used to summarize and compare sets of

11.1 Find Measures of Central Tendency and Dispersion

STATISTICS

Numerical values used to summarize and compare sets of data

MEASURE OF CENTRAL

TENDENCY

A number used to represent the center or middle of a set of data values. This is represented by the mean, median, and mode.

MEASURE OF DISPERSION

A statistic that tells you how dispersed, or spread out, data values are

STANDARD DEVIATION

A measure that describes the typical difference (or deviation) between a data value and the mean. The standard deviation is a statistic that tells you how tightly all the various examples are clustered around the mean in a set of data.

OUTLIER

A value that is much greater than or much less than most of the other values in a data set

RANGE

The spread of the data set found by subtracting the largest number and the smallest number in the set. This lets you know how many numbers your set of data covers.

MEASURES OF CENTRAL TENDENCY

• The mean, or ______________, of n numbers is the _________ of the numbers ______________ by n.

The mean is denoted by x , which is read as "x-bar." For the data set x1, x2,…xn, the mean is

• The median of n numbers is the _____________________ number when the numbers are written in order. (If n is even, the median is the _______ of the two middle numbers.) • The mode of n numbers is the number or numbers that occur ____________. There may be _____ mode, ____ mode, or __________________ modes.

Example 1: Find measures of central tendency Quiz Scores: The data sets give quiz scores for two different biology classes. Find the mean, median, and mode of each data set.

Class A Class B

15, 17, 17, 17, 18, 19, 21, 22, 25

16, 18, 19, 21, 22, 22, 22, 24, 25

Class A: Mean: =_____ Median: ____ Mode: ____

Class B: Mean: =_____ Median: ____ Mode: ____

x

x

Page 2: MEASURES OF CENTRAL TENDENCY › cms › lib05 › MI... · 11.1 Find Measures of Central Tendency and Dispersion STATISTICS Numerical values used to summarize and compare sets of

WHAT IS IT ASKING YOU TO DO?

Example 2: Find the range and standard deviation Find the range and standard deviation for the quiz scores in each data set from Example 1.

Class A: Range = ________ − ________ = ______

σ ≈ _____

Class B: Range = ________ − ________ = ______

σ ≈ _____

Because the range and standard deviation for Class ____ are greater, its quiz scores are _________ spread out. Example 3: Examine the effect of an outlier

Soccer: The winning scores for the first 9 games of the soccer season are: 3, 4, 2, 5, 3,1, 4, 3, 2.

a. Find the mean, median, mode, range, and standard deviation of the data set.

b. The winning score in the next game is an outlier, 9. Find the new mean, median, mode, range, and standard deviation.

c. Which measure of central tendency does the outlier affect the most? the least?

d. What effect does the outlier have on the range and standard deviation?

a. Mean: _______

Median: _______ Mode: _______

Range: _______

Std. Dev.: σ ≈ _______

b. Mean: _______

Median: _______ Mode: _______

Range: _______

Std. Dev.: σ ≈ _______

c. The ______________ is most affected by the outlier.

The ______________ and

___________ are not affected by the outlier.

d. The outlier caused both the range and

standard deviation to ___________________.

Page 3: MEASURES OF CENTRAL TENDENCY › cms › lib05 › MI... · 11.1 Find Measures of Central Tendency and Dispersion STATISTICS Numerical values used to summarize and compare sets of

and

Clearing the Clearing the Clearing the Clearing the list list list list memorymemorymemorymemory

Press . The word EDIT should be highlighted (if not, arrow over to it). You should see five choices;

the fourth is 4:ClrList. Press . The screen will now say ClrList. Specify lists one and two, by pressing

(you should see L1 above the key), then (you should see L2 above the key).

The screen will now say ClrList L1, L2. Press . Calculator will say Done signifying a clear memory.

Entering data for 1Entering data for 1Entering data for 1Entering data for 1----Variable statisticsVariable statisticsVariable statisticsVariable statistics

Press . Press (you should see 1:Edit on the screen). You should see 3 columns: L1, L2, L3.

The cursor should be at L1 (if not, arrow over to it). Type in the first number, then . Type in the

second number, then . When finished, press (you should see the word QUIT above the key).

CalculatingCalculatingCalculatingCalculating 1111----Variable statisticsVariable statisticsVariable statisticsVariable statistics

Press . Use the blue to move the highlighted bar over the CALC menu. Choose the 1-Var stats

option (that is, press ). You'll see the words 1-Var Stats on the screen. Press (you should

see L1 above the key). You'll see the words 1-Var Stats L1 on the screen. Press . The mean is the top value on the screen.

Clearing the list memoryClearing the list memoryClearing the list memoryClearing the list memory

To clear the entire memory, press (says EDIT on the screen above the key), then (says CLRxy on the screen above the key).

Entering data for 1Entering data for 1Entering data for 1Entering data for 1----Variable statisticsVariable statisticsVariable statisticsVariable statistics

Press to enter statistics editing mode. We will use the name xStat for our list. So press

. Enter the first number, then press (twice-to get past the y-values). Enter the next

number, then . Continue until all the data has been entered. As your final step, press

to signal the end of the data set.

Calculating 1Calculating 1Calculating 1Calculating 1----Variable statisticsVariable statisticsVariable statisticsVariable statistics

Press . Hit (twice). Press the 1-VAR option ( ).The population standard deviation is listed as σx.

Page 4: MEASURES OF CENTRAL TENDENCY › cms › lib05 › MI... · 11.1 Find Measures of Central Tendency and Dispersion STATISTICS Numerical values used to summarize and compare sets of

Clearing the memoryClearing the memoryClearing the memoryClearing the memory

To clear the entire memory, press (the word MEM is above the key) (for RESET)

(for stored memory; it says MEM). A new screen will ask you Are you sure? Press (for Yes).

Entering data foEntering data foEntering data foEntering data for 1r 1r 1r 1----Variable statisticsVariable statisticsVariable statisticsVariable statistics

Press to enter statistics editing mode. Type in the first number. Press . Type in

the second number. Press . Continue until all the data has been entered. Then press .

CalculatCalculatCalculatCalculating 1ing 1ing 1ing 1----Variable statisticsVariable statisticsVariable statisticsVariable statistics

Press and/or until the screen is empty. Press (CALC on screen)

(OneVa on screen) (LIST) (NAMES on screen) (xStat on screen) ,

and lastly . On the screen you will see a list of statistics. x is the first thing on the list.

Page 5: MEASURES OF CENTRAL TENDENCY › cms › lib05 › MI... · 11.1 Find Measures of Central Tendency and Dispersion STATISTICS Numerical values used to summarize and compare sets of

11.3 USE NORMAL DISTRIBUTIONS

The standard deviation can help you find the story behind the data. To understand this concept, it can help to

learn about what statisticians call normal distribution of data. A normal distribution of data means that most

of the examples in a set of data are close to the "average" or mean while relatively few examples tend to one

extreme or the other.

Let's say you collected data about an person’s daily calorie intake from a sample of people. You use your data

to complete a bar graph and notice that the tops of the bars of the graph could have their lines smoothed to

form a bit of a bell shape. You realize, the numbers for people's typical calorie consumption will probably turn

out to be normally distributed, meaning most people intake around an average amount of calories, with very

few people intake a lot less than that, and very few people intake a lot more than that. That is, for most people,

their consumption will be close to the mean, while fewer people eat a lot more or a lot less than the mean.

When you think about it, that's just common sense. Not that many people are getting by on a single serving of

kelp and rice, OR on eight meals of steak and milkshakes. Most people consume somewhere in between.

Page 6: MEASURES OF CENTRAL TENDENCY › cms › lib05 › MI... · 11.1 Find Measures of Central Tendency and Dispersion STATISTICS Numerical values used to summarize and compare sets of

Your normally distributed data looks something like this. Your mean (what the average person consumes), being

represented by the y-axis is 1800 calories and your standard deviation is 300.

You’ll notice that the curve is separated into 3 standard deviations on each side of the mean. The ones on the right are for

values (daily calories) above or greater than the mean (1800), and the ones on the left are below or less than the mean

(1800).

Understanding one standard deviation: One standard deviation away from the mean would be found by adding 300 to

1800 and then subtracting 300 from 1800. One standard deviation in either direction always accounts for 68% of the

information surveyed. In this case, it means that 68% of the people consume between 1500 and 2100 calories per day.

Specifically, 34% consume between 1800 and 2100 calories, and 34% consume between 1550 and 1800 calories.

Understanding two standard deviations: Two standard deviations from the mean would be found by adding

300 + another 300 to 1800 and then subtracting 300 – another 300 from 1800. Two standard deviations in either direction

represent more information and, thus, a higher percentage. Two s.d.s always account for 95% of the information, meaning

that 95% of the people surveyed consume 1200 and 2400 calories per day. Specifically, 47.5% consume between 1800

and 2400 calories, and 47.5% consume between 1500 and 1800 calories.

Understanding three standard deviations: Three standard deviations from the mean would be found by adding

300 + 300 + another 300 to 1800 and then subtracting 300 – 300 – another 300 from 1800. Three standard deviations in

either direction represent even more information and, thus, a higher percentage. Three s.d.s always account for 99.7% of

the information, meaning that 99.7% of the people surveyed consume 900 and 2700 calories per day. Specifically,

49.85% consume between 1800 and 2700 calories, and 49.85% consume between 900 and 1800 calories.

1800 MEAN

1 s.d. above the mean = 1800+300 = 2100 calories

1 s.d. below the mean = 1800−−−−300 = 1500 calories

2 s.d. above the mean = 1800+300+300 = 2400 calories

2 s.d. below the mean = 1800−−−−300−−−−300 = 1200 calories

3 s.d. above the mean = 1800+300+300+300 = 2700 calories

3 s.d. below the mean = 1800−−−−300−−−−300−−−−300 = 900 calories

3 2 1 1 2 3

Page 7: MEASURES OF CENTRAL TENDENCY › cms › lib05 › MI... · 11.1 Find Measures of Central Tendency and Dispersion STATISTICS Numerical values used to summarize and compare sets of

Example 1 (Math Scores): The math scores of the 2004 SAT exam are normally distributed with a mean of 518 and a standard deviation of 114.

a. About what percent of the test-takers have scores between 518 and 746?

b. About what percent of the test-takers have scores less than 404?

a. The scores of 518 and 746 represent __________ standard deviations to the __________of the mean. So, the percent

of test-takers with scores between 518 and 746 is __________ % + __________% = __________ %.

b. A score of 404 is one standard deviation to the left of the mean. So, the percent of scores less than 404 is

__________% + __________ % + __________% = __________%.

Example 2: A normal distribution has a mean of 63.7 and a standard deviation of 2.9. Find the probability that a randomly selected X-value from the distribution is in the given interval.

a. Between 57.9 and 66.6

b. At least 66.6

a.

b.

Example 3: A normal distribution has mean x and standard deviation σ. For a randomly selected x-value from the data, find

and

a. The probability that a randomly selected x-value lies between __________ and _______________ is the shaded area under the

normal curve. Therefore:

= ______________ + ______________ +______________=

Page 8: MEASURES OF CENTRAL TENDENCY › cms › lib05 › MI... · 11.1 Find Measures of Central Tendency and Dispersion STATISTICS Numerical values used to summarize and compare sets of

b. The probability that a randomly selected x-value is less _______________ is the shaded area under the normal curve.

Therefore:

= ______________ + ______________ +______________ =

Z-score: Given a data value (x), we find its z-value by σ

xxz

−= . We then use this z-value and the

provided chart to find the probability that the value is less than or equal to that amount.

Height: A survey of a group of women found that the height of the women is normally distributed with a mean height of 64.5 inches and

a standard deviation of 2.5 inches. Find the probability that a woman is at most 58 inches tall.

Page 9: MEASURES OF CENTRAL TENDENCY › cms › lib05 › MI... · 11.1 Find Measures of Central Tendency and Dispersion STATISTICS Numerical values used to summarize and compare sets of

11.4 SELECT AND DRAW CONCLUSIONS FROM SAMPLES

POPULATION A group of people or objects that you want information about

SAMPLE

When it is too difficult, time-consuming, or expensive to survey everyone in a population,

information is gathered from a sample, or subset of the population being studied.

UNBIASED

SAMPLE

In order to draw accurate conclusions about a population, you should select an unbiased

sample. An unbiased sample is representative of the entire population you want information

about.

Although there are many ways to sample a population, a random sample is preferred

because it is most likely to represent the population in an unbiased way.

BIASED SAMPLE

A biased sample is one that over-represents or under-represents certain parts or groups of the

population

Example: A teacher wants to survey everyone at her school about the quality of the school lunches. Identify the type of sample described and tell if the sample is biased.

1. The teacher surveys every 7th student that goes through the lunch line 2. From a random name lottery that includes every student’s and teacher’s name in the school, the

teacher randomly selects 150 students and teachers to survey.

3. The teacher walks into the lunchroom and surveys the first 25 people that she sees.

1

Type of Sample: ______________________________ Biased or Unbiased: _______________________________ Why?

2.

Type of Sample: ______________________________ Biased or Unbiased: _______________________________ Why?

3

Type of Sample: ______________________________ Biased or Unbiased: _______________________________ Why?

Page 10: MEASURES OF CENTRAL TENDENCY › cms › lib05 › MI... · 11.1 Find Measures of Central Tendency and Dispersion STATISTICS Numerical values used to summarize and compare sets of

Example: A local politician wants to survey all of his constituents.

4. He calls the constituents that are members of his political party and asks if they will complete the survey. He then mails them the survey, which they mail back to him for use in his study.

4.

Type of Sample: ______________________________ Biased or Unbiased: __________________________

Why?

SAMPLE SIZE

When conducting a survey, you need to make sure the size of your sample is large

enough so that it accurately represents the population. As the sample size increases,

your margin or error decreases.

MARGIN OF

ERROR

The number that gives a limit on how much the responses of the sample would differ

from the responses of the population. For example, if 30% of the people in a poll

prefer vanilla ice cream over chocolate and the margin of error is ± 2.6 %, then it is

likely that between 27.4% and 32.6% of the actual population prefers vanilla ice

cream. When a random sample of size n is taken from a large population, the margin of error

is approximated by:

Margin of error = ± n

1

Example: In a survey of 1432 people, 26% said that they read the newspaper every day. (a) What is the margin of error for the survey? (b) Give an interval that is likely to contain the exact percent of all people who read the newspaper every day.

a.

b.

Example: In a poll about which movie channel its customers prefer to watch, a cable company wants a margin of error to be ±3%. How many people would they need to survey?

Example: A group of students survey the local community about their favorite beverage. How many people did they survey if the margin of error is ±7%?

Page 11: MEASURES OF CENTRAL TENDENCY › cms › lib05 › MI... · 11.1 Find Measures of Central Tendency and Dispersion STATISTICS Numerical values used to summarize and compare sets of

11.5 Choose the Best Model for Two-Variable Data

Types of Models Linear

baxy +=

� when the equation appears to be increasing or

decreasing at a constant rate (m), or following the same pattern over and over again

� a would be your slope and b would be the y-intercept

Quadratic

cbxaxy ++= 2

� when the points appear to make either a U shape or a

horseshoe shape ∩. � remember that quadratics are symmetric about the axis

of symmetry, so look for the points to mirror one another after the graph hits it maximum or minimum (vertex)

or

Cubic

dcxbxaxy +++= 23

� Cubic functions must have two turning points, even

though sometimes those turning points will not be as defined as the graphs shown

Exponential Growth or Decay

xaby =

� when the points decrease rapidly and then appear to

level off and get closer together (RIGHT) OR � when the points start off close together and then begin

to gain value very rapidly (LEFT)

Page 12: MEASURES OF CENTRAL TENDENCY › cms › lib05 › MI... · 11.1 Find Measures of Central Tendency and Dispersion STATISTICS Numerical values used to summarize and compare sets of

THE GOAL: Given a table of information, determine whether it can be best represented with a linear, quadratic, cubic, exponential, or power equation. Then, find that equation and verify that it was the best model for the data. To figure this out, we will:

1. Make a scatter plot on the calculator. 2. Assess the plot. Using our above descriptions, we will look at the data points and determine which of the

above patterns they follow. 3. Use the regression applications on the calculator to determine the equation for our data. 4. Graph the equation to make sure that it is appropriate for our model.

Example 1: The table shows the secretaries’ salaries y (in dollars) for a certain bank, where x is the number of years of experience and y is the salary. Use a graphing calculator to find a model for the data.

x 1 2 3 4 5 6 7

y 30,624 32,436 34,167 35,989 37,684 39,311 41,098

1. Make a scatter plot. The points lie approximately _________________________________. 2. Use the _____________________ regression feature to find an equation of the model. 3. Graph the model along with the data to verify that the model fits the data well. A model for the data is y ==== _________________________

Example 2: An environmental group observes a deer population in a park where hunting has been banned. The table shows the population y counted x years after the ban began. Use a graphing calculator to find a model for the data.

x 0 5 10 15 20

y 500 729 1271 2206 3765 1. Make a scatter plot. The points are level at first and then begin to _________ _ rapidly.

2. Use the _____________________ regression feature to find an equation of the model.

3. Graph the model along with the data to verify that the model fits the data well. A model for the data is y ==== _________________________

Page 13: MEASURES OF CENTRAL TENDENCY › cms › lib05 › MI... · 11.1 Find Measures of Central Tendency and Dispersion STATISTICS Numerical values used to summarize and compare sets of

Example 3: A manager at a local amusement park kept a record of the number of people to ride the most popular roller coaster at the park. The table shows the number of people y that rode the roller coaster x hours after the park had opened. Use a graphing calculator to find a model for the data.

x 0 2 4 6 8 10 12

y 85 163 282 341 398 381 304

1. Make a scatter plot.

2. Use the _____________________ regression feature to find an equation of the model.

3. Graph the model along with the data to verify that the model fits the data well. A model for the data is y ==== _________________________ Example 4:

x -5 -4 -3 -2 -1 1 2

y -20 0 3 0 -4 0 18

1. Make a scatter plot. The points appear to __________________________________________.

2. Use the _____________________ regression feature to find an equation of the model.

3. Graph the model along with the data to verify that the model fits the data well. A model for the data is y ==== _________________________

Page 14: MEASURES OF CENTRAL TENDENCY › cms › lib05 › MI... · 11.1 Find Measures of Central Tendency and Dispersion STATISTICS Numerical values used to summarize and compare sets of

TI 83-84

Linear Regression

1. Press STAT. 2. Arrow over to CALC. 3. Choose #4 LinReg(ax+b) 4. You should see LinReg(ax+b) on the screen. 5. Press ENTER. The screen will show you the coefficients and constant of the linear equation.

Quadratic Regression

1. Press STAT. 2. Arrow over to CALC. 3. Choose #5 QuadReg. 4. You should see QuadReg on the screen. 5. Press ENTER. The screen will show you the coefficients and constant of the quadratic

equation.

Cubic Regression

1. Press STAT. 2. Arrow over to CALC. 3. Choose #6 CubicReg. 4. You should see CubicReg on the screen. 5. Press ENTER. The screen will show you the coefficients and constant of the cubic equation.

Exponential Regression

1. Press STAT. 2. Arrow over to CALC. 3. Choose #0 ExpReg. 4. You should see ExpReg on the screen. 5. Press ENTER. The screen will show you the coefficients and constant of the exponential

equation.