Upload
diana-diana
View
72
Download
3
Tags:
Embed Size (px)
DESCRIPTION
second year
Citation preview
MEASURES OF LOCATION and VARIATION for 1 variable
Lectures 3+4+5 Topics•Measures of Central Tendency for numerical and categorical dataMean, Median, Mode + other means, Fractiles•Measures of Variation for numerical and binary data The Range, Variance and Standard Deviation
•Shape Symmetric, Skewed, Skewness, Kurtosis
Summary Measures
Central Tendency part of Location
Mean
MedianMode
Summary Measures
Variation
Variance
Standard Deviation
Coefficient of Variation
RangeFractiles
Measures of Central Tendency
Central Tendency
Mean Median Mode
n
xn
ii∑
= 1
The Mean (Arithmetic mean, Average)
•It is the Arithmetic Average of data values:
•The Most Common Measure of Central Tendency
•Affected by Extreme Values (Outliers)
n
xn
1ii∑
= n
xxx n2i +•••++=
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 5 Mean = 6
=xSample Mean
Sum of the observationsNumber of observations
Mean =
THE ARITHMETIC MEAN
This is the most popular and useful measure of central location
n
xx i
n1i=∑
=
Sample mean Population mean
N
x iN
1i=∑=µ
Sample size Population size
n
xx i
n1i=∑=
THE ARITHMETIC MEAN
=+++
=∑
= =
10
...
101021
101 xxxx
x ii
• Example 1The reported time spent on the Internet of 10 adults are 0, 7, 12, 5, 33, 14, 8, 0, 9, 22 hours. Find the mean time spent on the Internet.
00 77 222211.0 hours11.0 hours
• Example 2Suppose the telephone bills representthe population of measurements ( 200). The population mean is
=+++
=∑=µ =
200
x...xx
200
x 20021i200
1i 42.1942.19 38.4538.45 45.7745.7743.5943.59
THE ARITHMETIC MEAN
The arithmetic mean
WEIGHTED MEAN FOR DATA GROUPED BY CATEGORIES OR VARIANTS
i
iiki
f
fxx
∑∑= =1
When many of the measurements have the same value, the measurement can be summarized in a frequency table. Suppose the number of children in a sample of 16 families were recorded as follows:
NUMBER OF CHILDREN 0 1 2 3NUMBER OF FAMILIES 3 4 7 2
16 families
5.116
)3(2)2(7)1(4)0(316
....
1616162211
161 =+++=++=∑= = fxfxfxfx
x iii
MEAN
FOR TABULATED DATA BY CLASSES
APPROXIMATING DESCRIPTIVE MEASURES FOR GROUPED DATA BY CLASSES
Approximating descriptive measures for grouped data may be needed in two cases: when approximated values.suffices the needs, when only secondary grouped data are available.
iki
iiki
f
fxx
1
1
=
=
∑∑=
x midpointf frequency
Class Class Frequency Midpoint i limits fi xi xi fi 1 2-5 3 3.5 10.5
2 5-8 6 6.5 39.03 8-11 8 9.5 76.0…. …. … …. …. .6 17-20 2 18.5 37.0
n =sample size= 30=f1+…+fn 312.0
Class Class Frequency Midpoint i limits fi xi xi fi 1 2-5 3 3.5 10.5
2 5-8 6 6.5 39.03 8-11 8 9.5 76.0…. …. … …. …. .6 17-20 2 18.5 37.0
n =sample size= 30=f1+…+fn 312.0
Example 3 Approximate the mean (calculate the mean) of the telephone call
durations problem as represented by the frequency distribution
0
2
4
6
8
10
2 5 8 11 14 17 20 More6.5
26.10
:valueReal
=x
The Median
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5 Median = 5
•Important Measure of Central Tendency
•In an ordered array, the median is the “middle” number.
•If n is odd, the median is the middle number.•If n is even, the median is the average of the 2
middle numbers.•Not Affected by Extreme Values
Odd number of observations
0, 0, 5, 7, 8 9, 12, 14, 220, 0, 5, 7, 8, 9, 12, 14, 22, 330, 0, 5, 7, 8, 9, 12, 14, 22, 33
Even number of observations
Example
Find the median of the time spent on the internetfor the adults of example 1
THE MEDIAN The Median of a set of observations is the value that
falls in the middle when the observations are arranged in order of magnitude or ranked increasingly
Suppose only 9 adults were sampled
(exclude, say, the longest time (33))
Comment
8.5, 8
MEDIAN
Data Tabulated discretely – as ungrouped
Data Tabulated by classes - estimation
MEDIAN AND MODE
Median
Me
1-Me
1ii
0 n
n - 1) (21
K x ∑ ∑
=
++=
inMe
The Mode
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
•A Measure of Central Tendency•Value that Occurs Most Often•Not Affected by Extreme Values•There May Not be a Mode•There May be Several Modes•Used for Either Numerical or Categorical Data
0 1 2 3 4 5 6
No Mode
THE MODE
The Mode of a set of observations is the variable value that occurs most frequently.
Set of data may have one mode (or modal class), or two or more modes.
The modal classFor large data setsthe modal class is much more relevant than a single-value mode.
MEDIAN AND MODE
Mode
21
1
0 K x ∆+∆
∆+=Mo
RELATIONSHIP AMONG MEAN, MEDIAN, AND MODE
If a distribution is symmetrical, the mean, median and mode coincide
• If a distribution is non symmetrical, and If a distribution is non symmetrical, and skewed to the left or to the right, the skewed to the left or to the right, the three measures differ.three measures differ.
A positively skewed distribution(“skewed to the right”)
MeanMedian
Mode MeanMedian
Mode
A negatively skewed distribution(“skewed to the left”)
OTHER MEANS
Harmonic
Geometric
Square
FRACTILES
Quartiles: 3
Percentiles: 99
Summary Measures
Central Tendency
MeanMedian
Mode
n
xn
ii∑
= 1
Summary Measures
Variation
Variance
Standard Deviation
Coefficient of Variation
Range
( )1n
xxs
2i2
−∑ −=
Measures of Variation
Variation
Variance Standard Deviation Coefficient of Variation
PopulationVariance
Sample
Variance
PopulationStandardDeviation
Sample
Standard
Deviation
Range
100%⋅
=X
SCV
• Measure of Variation
• Difference Between Largest & Smallest Observations:
Absolute Range =
• Relative Range =
•Ignores How Data Are Distributed:
The Range
SmallestrgestLa xx −
7 8 9 10 11 12
Range = 12 - 7 = 5
7 8 9 10 11 12Range = 12 - 7 = 5
meanxx SmallestLa /)( rgest −
INTERQUARTILE RANGE
Can eliminate some outlier problems by using the interquartile range
Eliminate high- and low-valued observations and calculate the range of the middle 50% of the data
Interquartile range = 3rd quartile – 1st quartile
IQR = Q3 – Q1
INTERQUARTILE RANGE
Median(Q2)
XmaximumX
minimum Q1 Q3
Example:
25% 25% 25% 25%
12 30 45 57 70
Interquartile range = 57 – 30 = 27
QUARTILES Quartiles split the ranked data into 4 segments
with an equal number of values per segment
25% 25% 25% 25%
• The first quartile, QThe first quartile, Q11, is the value for which 25% , is the value for which 25% of the observations are smaller and 75% are of the observations are smaller and 75% are largerlarger
• QQ22 is the same as the median (50% are is the same as the median (50% are smaller, 50% are larger)smaller, 50% are larger)
• Only 25% of the observations are greater than Only 25% of the observations are greater than the third quartilethe third quartile
QQ11
QQ22
QQ33
QUARTILE FORMULAS
Find a quartile by determining the value in the appropriate position in the ranked data, where
First quartile position: Q1 = 0.25(n+1)
Second quartile position: Q2 = 0.50(n+1) (the median position)
Third quartile position: Q3 = 0.75(n+1)
where n is the number of observed values
(n = 9)(n = 9)
QQ11 = is in the = is in the 0.25(0.25(9+1) = 2.5 position 9+1) = 2.5 position of the of the ranked dataranked data
so use the value half way between the 2so use the value half way between the 2ndnd and 3 and 3rdrd values,values,
so so QQ 11 = 12.5 = 12.5
QUARTILES
Sample Ranked Data: 11 12 13 16 16 17 18 21 22
• Example: Find the first Example: Find the first quartilequartile
DEVIATION
Individual deviation from the mean =
Overall deviation = 0, because
Summing squared deviations
or
absolute values of the deviations
meanxi −
( )∑ =− 0XX i
( )∑ − 2XX i
|| xxi∑ −
•Important Measure of Variation
•Shows Variation About the Mean
• Computed as an arithmetic mean of squared deviations or as a square mean of individual deviations
•For the Population:
•For the Sample:
Variance
( )N
Xi∑ −=2
2 µσ
( )1
22
−∑ −=n
XXs i
For the Population: use N in the denominator.
For the Sample : use n - 1 in the denominator.
•Most Important Measure of Variation
•Shows Variation About the Mean:
•For the Population:
•For the Sample:
Standard Deviation
( )N
X i∑ −=2µσ
( )1
2
−∑ −=n
XXs i
For the Population: use N in the denominator.
For the Sample : use n - 1 in the denominator.
Sample Standard Deviation
( )1
2
−∑ −=n
XX i
Data: 10 12 14 15 17 18 18 24
s =
n = 8 Mean =16
18
1624161816171615161416121610 2222222
−−+−+−+−+−+−+− )()()()()()()(
= 4.2426
s
:X i
Comparing Standard Deviations
( )1
2
−∑ −n
XX is =
= 4.2426
( )N
Xi∑ −=2µσ = 3.9686
Value for the Standard Deviation is larger for data considered as a Sample.
Data : 10 12 14 15 17 18 18 24:X i
N= 8 Mean =16
Comparing Standard Deviations
Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B - AGE
Data A - AGE
Mean = 15.5 s = .9258
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5 s = 4.57
Data C - AGE
COEFFICIENT OF VARIATION
Measure of Relative VariationAlways a % or coefficientShows Variation Relative to MeanUsed to Compare 2 or More GroupsFormula ( for Sample):
100%⋅
=X
SCV
COMPARING COEFFICIENT OF VARIATION Stock A: Average Price last year = $50
Standard Deviation (sd) = $5
Stock B: Average Price last year = $100
(sd) = $5
100%⋅
=X
SCV
Coefficient of Variation:
Stock A: CV = 10%
Stock B: CV = 5%
Both average prices are representatives
SHAPE Describes How Data Are Distributed between smallest and largest values Measures of Shape:
Symmetric or skewed
Right-Skewed or Positively Skewed
Left-Skewed or Positive Skew-ness Symmetric
Mean = Median = ModeMean Median Mode Median MeanMode
BOX PLOT – GRAPHICAL PRESENTATION OF CTM
CENTRAL TENDENCY MEASURES SUMMARY FOR 1 VARIABLE
Discussed Measures of Central Tendency Mean, Median, Mode Addressed Measures of Variation The Range, Variance, Standard Deviation, Coefficient of Variation Determined Shape of Distributions Symmetric or SkewedCoefficient of skewness
Mean = Median = ModeMean Median Mode Mode Median Mean