View
28
Download
0
Category
Preview:
DESCRIPTION
Measures of Centrality and Variability. STA 220 - Lecture #4. Centrality and Variability. Methods to take large amounts of data and present it in a concise form Want to present height of females and males in STA 220 Could measure everyone and graph results - PowerPoint PPT Presentation
Citation preview
Methods to take large amounts of data and present it in a concise form› Want to present height of females and
males in STA 220› Could measure everyone and graph results› More interested in that
describes the most likely representation of the height of the students in the class This is called
2
Once you have your measure of centrality may want or need to know
Is the data repeatable?› This would be
3
3 common measures of centrality› › ›
4
Mean› Mathematical average of all the data
size sample
size sample
dataMean
Mean
5
Example› Suppose Suzy is taking Chemistry. There is
a lab quiz every other week. Near the end of the semester, Suzy wants to determine her quiz average. Her quiz scores are: 78, 92, 83, 95, 98, 87 and 93.
Mean
Mean
Mean
7626
793 95839278
6
Mathematical shorthand:› Data points are often referred to as xi where
i is 1…n, n being
› For Suzy’s quiz scores, n = 7 and x1 = 78, x2 = 92, x3 = 83, x4 = 95, x5 = 98, x6 = 87, and x7 = 93.
› The mean would be denoted by , called x-bar. For Suzy’s quizzes, 43.89x
7
The median is the of the dataset, such that half of all data points are to that value AND half of all data points are
to that value.
8
To find the median:1. Rearrange data from smallest to largest2. If n is odd, calculate
3. If n is even, calculate 4. Count the sorted data set until you get to the
data point in the position you calculated in part 2 or 3
5. If the number of data points, n, was odd, then you are done. If n is even, then compute the mean of the data point in the position and
position. 9
Example› Given the following salary information from a
group of engineers, determine the median salary: $75,400; $83,600; $45,700; $43,900; $62,100; $90,500; $55,800.
› First reorder the data in increasing order: 43,900; 45,700; 55,800; 62,100; 75,400; 83,600;
90,500› Since n = 7 is odd, compute
= (7+1)/2 = 4 43,900; 45,700; 55,800; ; 75,400; 83,600;
90,50010
Example› A group of students are taking the following number
of credit hours: 12, 17, 15, 14, 9, 16, 18, 16, 14, 12. Find the median number of credit hours being taken by this group of students.
› Put the data in increasing order: 9, 12, 12, 14, 14, 15, 16, 16, 17, 18
› Since n = 10 is even, compute = 10/2 = 5
› Next, identify the data points in the fifth and sixth position 9, 12, 12, 14, 14, 15, 16, 16, 17, 18
› Compute the mean of the fifth and sixth data points =14.5
11
The mode is the number that appears the most often in the data set.
Example: Here are the number of cavities found in a class of 1st graders:› 0,1,0,1,0,5,5,3,4,0,0,2,0,1,0,3,2,4,7,1.
Find the mode.› 0 occurs times, while 1 occurs times,
2, 3, 4, and 5 occurs , and 7 occurs once. As 0 occurs the most often, it is .
12
Comparing Mean, Median, Mode› Mean
Strong Points Uses all of the data
Weak Points Sensitive to extremes. Test scores: 34, 92, 95, 94, 89
have a mean of 80.8. If the professor dropped the lowest test score, 34, then the mean would be
May not be an actual, observable value. For example, the average family has 1.6 children. What does it mean to have 0.6 of a child?
13
Comparing Mean, Median and Mode› Median and Mode
Strong Points Not sensitive to . In test
score example from before the median would be 34, 89, 92, 94, 95.
The mode is an observable value; the median is an observable value
Weak Points The value may not be unique. In the case of the mode, it
is possible to have several values that appear the most. Both do not use actual/all data values. The mode keys in
on frequency, while the median just looks at the middle of the data set.
14
In 1995, the mean salary of a MLB player was $1,080,000 while the median salary of a MLB player was $275,000.› Recall the median is the point where half of
the data points are above and half are below – Thus, at least half of the players in the MLB earned less than
› A mean of $1,080,000 tells you that there are players earning millions of dollars – but this may not be the number of all players in the MLB
15
The Corps of Engineers wants to dredge a harbor in Hackensack, NJ. The EPA has these guidelines for harbor dredging:› The sediment is tested for the presence of
PCBs.› If PCBs < 25 parts per billion, then its OK to
dredge and dump.› If 25 ppb ≤ PCBs ≤ 50 ppb, then its OK to
dredge and dump, but then a cap must be placed on the dump pile.
› If PCBs ≥ 50 ppb, then the harbor can not be dredged and dumped.
16
6 samples are taken, and the average PCBs was 46.5 ppb. The Corps of Engineers should be allowed to dredge and dump the harbor, then cap the dump site…or should they?
The actual samples were: 66, 74, 81, 55, 1, 2. › The average is› The median is
17
Measures of variability describe the
of the data All measures of variability are greater than
or equal to › Measures close to indicate that the data is
highly consistent and repeatable 4 measures of variability: ,
Average deviation, , Standard Deviation
18
Range› Difference between the largest data point
in the dataset and the smallest data point in the dataset
› or Range = Example
› Suppose the daily low temperatures for the past week have been -3, -7, -2, 0, 2, 4. What is the range?
› Range = = 11
19
Average Deviation› The average deviation of the data from its
mean value.› There are 4 steps:1.Compute the of the data set, x-bar2.Calculate the absolute value of the
between each data point, xi , and the mean value, x-bar
3.Add up all of the values calculated in step 24.Divide the sum from step 3 by
20
Average Deviation, Example› Suppose you have the following four data
points in your dataset: 1,2,4,5. Find the average deviation.
5.146 4.
.3
23-5 ;13-4 ;13-2 ;23-1 .2
34
5421x .1
21
Average Deviation› In mathematical shorthand, the average
deviation can be expressed as:
› Good method is to make a table:
iationAverageDev
|Xi – (x-bar)| Result1 |1-3| 22 |2-3| 14 |4-3| 15 |5-3| 2
12/4 = 3
6/4 = 1.5
22
Variance› Similar to average deviation1.Compute the mean of the dataset, x-bar2.Calculate the difference between each data
point, xi , and the mean value, x-bar3. all of the values in step 24.Add up all the values in step 35.Divide the sum in step 4 by the total
number of data points
23
Variance, Example› Good idea to make a table similar to the
one we used for average deviation
XiXi – (x-bar)
Xi – (x-bar)
1 1-3 -2 42 2-3 -1 14 4-3 1 15 5-3 2 4
12/4 = 3
24
Variance› Mathematical shorthand:
Variance
25
Standard Deviation› The standard deviation is just the › By taking the square root, the units of the
standard deviation are the same as the original units of the data
› In the previous example:
inches 1.58 Deviation Standard Deviation Standard
Variance Deviation Standard
26
Recommended