26

Measures of Centrality and Variability

  • Upload
    gema

  • View
    24

  • Download
    0

Embed Size (px)

DESCRIPTION

Measures of Centrality and Variability. STA 220 - Lecture #4. Centrality and Variability. Methods to take large amounts of data and present it in a concise form Want to present height of females and males in STA 220 Could measure everyone and graph results - PowerPoint PPT Presentation

Citation preview

Page 1: Measures of Centrality and Variability
Page 2: Measures of Centrality and Variability

Methods to take large amounts of data and present it in a concise form› Want to present height of females and

males in STA 220› Could measure everyone and graph results› More interested in that

describes the most likely representation of the height of the students in the class This is called

2

Page 3: Measures of Centrality and Variability

Once you have your measure of centrality may want or need to know

Is the data repeatable?› This would be

3

Page 4: Measures of Centrality and Variability

3 common measures of centrality› › ›

4

Page 5: Measures of Centrality and Variability

Mean› Mathematical average of all the data

size sample

size sample

dataMean

Mean

5

Page 6: Measures of Centrality and Variability

Example› Suppose Suzy is taking Chemistry. There is

a lab quiz every other week. Near the end of the semester, Suzy wants to determine her quiz average. Her quiz scores are: 78, 92, 83, 95, 98, 87 and 93.

Mean

Mean

Mean

7626

793 95839278

6

Page 7: Measures of Centrality and Variability

Mathematical shorthand:› Data points are often referred to as xi where

i is 1…n, n being

› For Suzy’s quiz scores, n = 7 and x1 = 78, x2 = 92, x3 = 83, x4 = 95, x5 = 98, x6 = 87, and x7 = 93.

› The mean would be denoted by , called x-bar. For Suzy’s quizzes, 43.89x

7

Page 8: Measures of Centrality and Variability

The median is the of the dataset, such that half of all data points are to that value AND half of all data points are

to that value.

8

Page 9: Measures of Centrality and Variability

To find the median:1. Rearrange data from smallest to largest2. If n is odd, calculate

3. If n is even, calculate 4. Count the sorted data set until you get to the

data point in the position you calculated in part 2 or 3

5. If the number of data points, n, was odd, then you are done. If n is even, then compute the mean of the data point in the position and

position. 9

Page 10: Measures of Centrality and Variability

Example› Given the following salary information from a

group of engineers, determine the median salary: $75,400; $83,600; $45,700; $43,900; $62,100; $90,500; $55,800.

› First reorder the data in increasing order: 43,900; 45,700; 55,800; 62,100; 75,400; 83,600;

90,500› Since n = 7 is odd, compute

= (7+1)/2 = 4 43,900; 45,700; 55,800; ; 75,400; 83,600;

90,50010

Page 11: Measures of Centrality and Variability

Example› A group of students are taking the following number

of credit hours: 12, 17, 15, 14, 9, 16, 18, 16, 14, 12. Find the median number of credit hours being taken by this group of students.

› Put the data in increasing order: 9, 12, 12, 14, 14, 15, 16, 16, 17, 18

› Since n = 10 is even, compute = 10/2 = 5

› Next, identify the data points in the fifth and sixth position 9, 12, 12, 14, 14, 15, 16, 16, 17, 18

› Compute the mean of the fifth and sixth data points =14.5

11

Page 12: Measures of Centrality and Variability

The mode is the number that appears the most often in the data set.

Example: Here are the number of cavities found in a class of 1st graders:› 0,1,0,1,0,5,5,3,4,0,0,2,0,1,0,3,2,4,7,1.

Find the mode.› 0 occurs times, while 1 occurs times,

2, 3, 4, and 5 occurs , and 7 occurs once. As 0 occurs the most often, it is .

12

Page 13: Measures of Centrality and Variability

Comparing Mean, Median, Mode› Mean

Strong Points Uses all of the data

Weak Points Sensitive to extremes. Test scores: 34, 92, 95, 94, 89

have a mean of 80.8. If the professor dropped the lowest test score, 34, then the mean would be

May not be an actual, observable value. For example, the average family has 1.6 children. What does it mean to have 0.6 of a child?

13

Page 14: Measures of Centrality and Variability

Comparing Mean, Median and Mode› Median and Mode

Strong Points Not sensitive to . In test

score example from before the median would be 34, 89, 92, 94, 95.

The mode is an observable value; the median is an observable value

Weak Points The value may not be unique. In the case of the mode, it

is possible to have several values that appear the most. Both do not use actual/all data values. The mode keys in

on frequency, while the median just looks at the middle of the data set.

14

Page 15: Measures of Centrality and Variability

In 1995, the mean salary of a MLB player was $1,080,000 while the median salary of a MLB player was $275,000.› Recall the median is the point where half of

the data points are above and half are below – Thus, at least half of the players in the MLB earned less than

› A mean of $1,080,000 tells you that there are players earning millions of dollars – but this may not be the number of all players in the MLB

15

Page 16: Measures of Centrality and Variability

The Corps of Engineers wants to dredge a harbor in Hackensack, NJ. The EPA has these guidelines for harbor dredging:› The sediment is tested for the presence of

PCBs.› If PCBs < 25 parts per billion, then its OK to

dredge and dump.› If 25 ppb ≤ PCBs ≤ 50 ppb, then its OK to

dredge and dump, but then a cap must be placed on the dump pile.

› If PCBs ≥ 50 ppb, then the harbor can not be dredged and dumped.

16

Page 17: Measures of Centrality and Variability

6 samples are taken, and the average PCBs was 46.5 ppb. The Corps of Engineers should be allowed to dredge and dump the harbor, then cap the dump site…or should they?

The actual samples were: 66, 74, 81, 55, 1, 2. › The average is› The median is

17

Page 18: Measures of Centrality and Variability

Measures of variability describe the

of the data All measures of variability are greater than

or equal to › Measures close to indicate that the data is

highly consistent and repeatable 4 measures of variability: ,

Average deviation, , Standard Deviation

18

Page 19: Measures of Centrality and Variability

Range› Difference between the largest data point

in the dataset and the smallest data point in the dataset

› or Range = Example

› Suppose the daily low temperatures for the past week have been -3, -7, -2, 0, 2, 4. What is the range?

› Range = = 11

19

Page 20: Measures of Centrality and Variability

Average Deviation› The average deviation of the data from its

mean value.› There are 4 steps:1.Compute the of the data set, x-bar2.Calculate the absolute value of the

between each data point, xi , and the mean value, x-bar

3.Add up all of the values calculated in step 24.Divide the sum from step 3 by

20

Page 21: Measures of Centrality and Variability

Average Deviation, Example› Suppose you have the following four data

points in your dataset: 1,2,4,5. Find the average deviation.

5.146 4.

.3

23-5 ;13-4 ;13-2 ;23-1 .2

34

5421x .1

21

Page 22: Measures of Centrality and Variability

Average Deviation› In mathematical shorthand, the average

deviation can be expressed as:

› Good method is to make a table:

iationAverageDev

|Xi – (x-bar)| Result1 |1-3| 22 |2-3| 14 |4-3| 15 |5-3| 2

12/4 = 3

6/4 = 1.5

22

Page 23: Measures of Centrality and Variability

Variance› Similar to average deviation1.Compute the mean of the dataset, x-bar2.Calculate the difference between each data

point, xi , and the mean value, x-bar3. all of the values in step 24.Add up all the values in step 35.Divide the sum in step 4 by the total

number of data points

23

Page 24: Measures of Centrality and Variability

Variance, Example› Good idea to make a table similar to the

one we used for average deviation

XiXi – (x-bar)

Xi – (x-bar)

1 1-3 -2 42 2-3 -1 14 4-3 1 15 5-3 2 4

12/4 = 3

24

Page 25: Measures of Centrality and Variability

Variance› Mathematical shorthand:

Variance

25

Page 26: Measures of Centrality and Variability

Standard Deviation› The standard deviation is just the › By taking the square root, the units of the

standard deviation are the same as the original units of the data

› In the previous example:

inches 1.58 Deviation Standard Deviation Standard

Variance Deviation Standard

26