35
§ 14.3 Numerical § 14.3 Numerical Summaries of Data Summaries of Data

§ 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set In the last section we looked at ways to graphically represent a data set-- today

  • View
    498

  • Download
    2

Embed Size (px)

Citation preview

Page 1: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

§ 14.3 Numerical Summaries of § 14.3 Numerical Summaries of DataData

Page 2: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

Numerical Summaries of a Numerical Summaries of a Data SetData Set

In the last section we looked at ways to graphically represent a data set--today we will look at numerical ways to summarize similar information.

The are two major types of numerical summary:1. Measures of location. 2. Measures of spread.

Page 3: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

Numerical Summaries of a Numerical Summaries of a Data SetData Set

In the last section we looked at ways to graphically represent a data set--today we will look at numerical ways to summarize similar information.

The are two major types of numerical summary:1. Measures of location. 2. Measures of spread.

average/mean rangemedian interquartile

rangequartiles standard

deviation

Page 4: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

The Average / MeanThe Average / Mean

The average or mean of a data set of size N is found by adding the numbers and dividing by N.

Or more formally, if the data set is { x1 , x2 , x3 , . . . , xN } then the mean is given by:

x1 + x2 + x3 + . . . + xN

N

Page 5: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

The Average / MeanThe Average / Mean What about when we are given a frequency table?

Let’s look at the test scores from yesterday:

14812191316106211Frequency

967672646056484440363228244Score

Page 6: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

The Average / Mean From The Average / Mean From a Frequency Tablea Frequency Table

Step 1: Calculate the total of the data.

Total = x1 * f1 + x2 *

f2+ x3* f3 + . . . + xk * f1k

Step 2: Calculate N.

N = f1 + f2 + f3 + . . . + fk

Step 3: Calculate the

average.

Average = Total / N

fkf2 . . .

f1Frequency

xkx2 . . .

x1Data

Entering Data and Finding the Mean on the TI-83:

1. Hit [Stat]2. Select “1: Edit…”3. Enter data into L1. If you are working

from a frequency table enter the corresponding frequencies into L2.

4. Go to the “List” menu ([2nd], [Stat])5. Select “3: mean( “6. You should now be on the ‘main’ screen.

Proceed as follows:(a) If you are working from just a list of data, type “L1” ([2nd], [1]) , close the parentheses and hit [Enter].(b) If you are working from a freq. table type “L1” followed by “,” and “L2” ([2nd], [2]). . Then close the parentheses and hit enter.

Page 7: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

Example: Average SalaryAverage Salary

The average salary at a local computer manufacturer with 50 employees is $42,000.

The owner draws a yearly salary of $800,000.

What is the average salary of the other 49 employees?

Page 8: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

Example: 105 Exam 105 Exam ScoresScores

Suppose you have averaged a 132 out of 150 on the first 3 exams in Math 105. What score would you need on the fourth exam to have an average of 135?

Page 9: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

PercentilesPercentiles

The p th percentile of a data set is the value such that p percent of the numbers fall at or below the value.

The rest of the data falls at or above the value.

We will call the p th percent of N the locator, and write it as L .

Page 10: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

Example:Example: Height Height

Sorting Data on the TI-83:1. Enter data into L1 as before.2. Hit [Stat]3. Select “2: SortA( “4. You should now be on the ‘main’ screen. Hit L1. ([2nd], 1)5. Close the parentheses and hit enter.

Page 11: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

Finding the Finding the p p th Percentileth Percentile

Step 1: Sort the original data set by size.(Suppose {d1 , d2 , d3 , . . . , dN } is the sorted set)

Step 2: Compute the value of the locator.L = ( p /100 )( N )

Step 3: The p th percentile is:(a) The average of dL and dL+1 if L is a whole

number.(b) dL+ if L is not a whole number. L+ is L rounded

up.

Page 12: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

Percentiles: The Median and Quartiles

The 50th percentile, called the median, is the percentile that is most commonly used. The median will be written M.

The other two commonly used percentiles are the quartiles:

The first quartile, written as Q1, is the 25th percentile.

The third quartile, denoted Q3, is the 75th percentile.

Page 13: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

14812191316106211FrequencFrequencyy

967672646056484440363228244ScoreScore

Example:Example: Let’s examine the test scores again. . .

Find the quartiles and the median.

Page 14: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

The Five-Number The Five-Number SummarySummary

One way to give a nice profile of a data set is the “five-number summary,” which consists of:1. The lowest value, called the Min.2. The first quartile, Q1.3. The median, M.4. The third quartile, Q3.5. The highest value, called the Max.

Page 15: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

14812191316106211FrequencFrequencyy

967672646056484440363228244ScoreScore

Example:Example: The Five-Number Summary for our test score example would look like this:

Page 16: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

The Five-Number The Five-Number Summary: Box PlotsSummary: Box Plots

We can also represent the Five-Number Summary graphically in what is called a box plot or a box-and-whiskers plot.

Min Max M Q3 Q1

Page 17: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

14812191316106211FrequencFrequencyy

967672646056484440363228244ScoreScore

Example:Example: Here is the box plot for our test score example:

Min = 4 Max = 96M = 44Q3 = 48Q1 = 36

Page 18: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

§ 14.4 Measures of Spread§ 14.4 Measures of Spread

Page 19: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

Example - Find the average and median of the following data sets:

• Set 1 = {45, 46, 47, 48, 49, 51, 52, 53, 54, 55}

• Set 2 = {1, 12, 20, 31, 41, 59, 70, 78, 89, 99}

Page 20: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

The RangeThe Range

One way to measure the spread of data is to examine the range, given by

R = Max - Min

The problem with using the range is that outliers can severely affect it.

Page 21: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

14812191316106211FrequencFrequencyy

967672646056484440363228244ScoreScore

Example: Example: Looking again at our ‘test score’ example. . .

We see that the range with the outliers (4 and 96) would be R = 96 - 4 = 92.

However, without those pieces of data we would haveR = 76 - 24 = 52.

Page 22: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

The Interquartile RangeThe Interquartile Range

In order to eliminate the problems caused by outliers, we could make use of the interquartile range--the difference between the third and first quartile:

IQR = Q3 - Q1

This measure tells us where the middle 50% of the data is located.

Page 23: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

14812191316106211FrequencFrequencyy

967672646056484440363228244ScoreScore

Example: Example: Your instructor didn’t feel like making a different example. . .

The IQR for this set of data is:

IQR = Q3 - Q1 = 48 - 36 = 12

Page 24: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

The Standard Deviation

The idea: Measure how spread out your data set is by examining how far each piece of information is from some fixed reference point.

The reference point we will use is the mean (average).

Page 25: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

The Standard Deviation

We could try to average the Deviations from the Mean:

(Data value - Mean)

Page 26: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

Example: Example: Once again, the test score data. . .

14812191316106211FrequeFrequencyncy

967672646056484440363228244Score Score ( x )( x )

49.39

29.39

25.39

17.39

13.39

9.39

1.39

-2.61

-6.61

-10.61

-14.61

-18.61

-22.61

-42.61

((x x - - 46.61)46.61)

Page 27: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

The Standard Deviation

We could try to average the Deviations from the Mean:

(Data value - Mean) However, negative deviations and

positive deviations will cancel each other out--in fact (assuming we don’t round off any of our figures) the average of the deviations from the mean will always be 0!

Page 28: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

The Standard Deviation

What would happen if we squared the deviations from the mean?

The squared deviations are always non-negative, so there would be no canceling.

The average of these squared deviations is called the variance, V.

Page 29: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

The Standard Deviation

Unfortunately, there is a problem with using the variance as well--the units of measure.For instance if we were studying people’s height in inches (in), the variance would appear in units of in2.

Page 30: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

The Standard Deviation

Unfortunately, there is a problem with using the variance as well--the units of measure.For instance if we were studying people’s height in inches (in), the variance would come be in units of in2.

The solution to our dilemma is simple--we will just take the square root of the variance to get the what is called the standard deviation, .

Page 31: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

Finding The Standard Deviation

Step 1: Find the average/mean of the data set. Call it A.

Step 2: For each number x in the data set find the deviation from the mean, x - A.

Step 3: Square each of the deviations found in Step 2.

Step 4: Find the average of the squared deviations found in step 3. This is the variance, V.

Step 5: Take the square root of the variance. This is the standard deviation, .

Page 32: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

Finding The Standard Deviation

Another way to find the Standard Deviation by hand is to use the following formula:

N

∑ ( xi - A )2

i=1

N√ =

Page 33: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

Finding The Standard DeviationFinding allall of the information from 14.2-

14.3 on the TI-83:1. Enter data as shown previously. Quit to the main screen.2. Hit [Stat]3. Move right to the “CALC“ menu.4. Select “1-Var Stats”.5. Now on the main screen, type “L1”. (If you are using data from a frequency table also type “,” and “L2”) Hit [Enter].6. Interpret the information as follows:

x is the mean/average, A; x is the Standard Deviation;n is the size of your data set;If you arrow down the Min, Max,

Median and quartiles should all be listed.

Page 34: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today

Example:

Find the standard deviation for the following data set.

{1, 6, 14, 19}

Page 35: § 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set  In the last section we looked at ways to graphically represent a data set-- today