Upload
adityo-yudi
View
239
Download
0
Embed Size (px)
Citation preview
7/23/2019 MODULE 09 Frequency Distributions
1/40
Module 9: Frequency Distributions
This module includes descriptions of frequencydistributions, frequency tables, histograms and
frequency polygons. Also included are stem and
leaf plots.
Reviewed ! "ay ! #"$%&'( 9
7/23/2019 MODULE 09 Frequency Distributions
2/40
Frequency Distribution
A frequency distribution is the organi*ation of a data
set into contiguous, mutually e+clusive intervals so
that the number or proportion of observations fallingin each interval is apparent.
7/23/2019 MODULE 09 Frequency Distributions
3/40
Frequency Distribution Example
An e+ample of a frequency distribution is the age distributionof one of the classes for an in-class version of this course.
The class included n 1)1 students, with the age distribution
as shown on a following slide.
Two things are notable about this distribution. $ne has to do
with the way we measure age, specified as age at last
birthday, not age at nearest birthday.
The other is the ages listed as ! years, with )1 studentshaving this age. This is an interval with indefinite length and
it will require special handling.
7/23/2019 MODULE 09 Frequency Distributions
4/40
Age 0o.
)1
)) 1
) 11)/ 9
)! 12
) 1
)2
)3 !
)9 /
1 1
) /
/ )
! )1
Total 1)1
7/23/2019 MODULE 09 Frequency Distributions
5/40
Intervals for a Frequency Distribution
4 &se not less than intervals, generally 3-1!
4 &se equal width intervals if feasible and appropriate
4 Avoid intervals with indefinite length, if possible4 5elect interval width by dividing difference between
smallest and largest observation by 1
4 Ta6e into account the manner measurements were
made when setting up intervals
7/23/2019 MODULE 09 Frequency Distributions
6/40
7or the age distribution e+ample, the ages range from )1
years to ! years and older8 to construct at least intervals
for this age range, each interval has to be at least ) years
wide. To create the intervals, we need to consider themidpoint of the intervals8 The midpoint for the age interval
for persons )1 years old is )1.! years since they will not
become )) until their ))nd birthday. Thus, the age interval
for persons )1 and )) years old goes from )1 to ), with ))
as the midpoint.
)1 )) )
The intervals will be written as )1 ), ) )!, )! )2
etc. :e will deal with the ! interval in a subsequent
slide.
Midpoint
7/23/2019 MODULE 09 Frequency Distributions
7/40
Age 0o. ;nterval"idpoint
)1 ))
)) 1
) 11)/
)/ 9
)! 12)
) 1
)2 )3)3 !
)9 /
1 1)
) /
/
/ )
! )1 um =
)1 )) )) 13 13
)) 1
) 11)/ ) 12 !
)/ 9
)! 12) )!
) 1
)2 )3 11 9 9)3 !
)9 / 2 2!
1 1
) ! / 29) /
/ ! / 3
/ )
! )1 / )1 12 1
Total 1)1
7/23/2019 MODULE 09 Frequency Distributions
13/40
4 &se Braph of a frequency distribution4 7or >ontinuous variables
Rules4 0o space between bars
4 (qual areas must represent equal percentages ornumbers
4 Cercentages often preferred to numbers for verticala+is
istogram
7/23/2019 MODULE 09 Frequency Distributions
14/40
From a Frequency !able to a istogram
Histocomes from the Bree6 word for cell. ;n this case, the
reference is to a cell li6e those in bee honeycombs, all of whichhave an equal area when viewed from the top. That is, a
histogram is an equal cell or equal area graph, with each
percentage represented by the same area. To prepare a histogram
from a frequency table, we must first decide the shape and areafor the cell that represents a single =. :e can then stac6 the
appropriate number of these cells on top of each other to
represent the percentage of the frequency distribution located
within the interval of interest.
7or this e+ample, we will ma6e each cell be two years wide and
one unit high so that each percentage point for a specific interval
will loo6 something li6e
7/23/2019 MODULE 09 Frequency Distributions
15/40
The )1 to ) years interval of the age frequency distribution includes 13= of
the observations, with a midpoint of ))8 so for this interval, we need to draw
13 cells, as shown below.
)
))
1)
1/
1
13
3
1
/
)
)/
)
/))3))/))) /3
"e
rcentageofstudents
Interval Midpoint
Each yellow bloc# represents $ cell% thus the
&$ to &' years interval requires $( bloc#s
13
7/23/2019 MODULE 09 Frequency Distributions
16/40
)
))
1)
1/
1
13
3
1
/
)
)/
)
/))3))/))) /3
"ercentageof)tudents
Interval Midpoint
13
12
)!
9
/ /
7/23/2019 MODULE 09 Frequency Distributions
17/40
Dealing with our Interval with Indefinite *ength
4 The areashown for an indefinite interval should be on
same basis as it is for other intervals. $ur indefinite
interval contains 12= of the distribution so it needs to
include the area equivalent to that of 12 cells, where each
cell is ) years wide by 1= tall.
4 The widthof the indefinite interval is 1 years so that it is
3 cells wide
4 The height of the bar for this interval thus needs to be12#3 or two and 1#3=.
7/23/2019 MODULE 09 Frequency Distributions
18/40
) )) )/ ) )3 ) / 3 / /) // / /3 ! !) !/
)
))
1)
1/
1
13
3
1
/
)
)/
)
"ercentageofstudents
+ge Interval Midpoint
Indefinite Interval
13
12
)!
9
/ /
1)3
1)3
1)
3
1)3
1)3
1)3
1)3
1)3
7/23/2019 MODULE 09 Frequency Distributions
19/40
7/23/2019 MODULE 09 Frequency Distributions
20/40
Source: Am J Public Health, April 2004;94:559
7/23/2019 MODULE 09 Frequency Distributions
21/40
7/23/2019 MODULE 09 Frequency Distributions
22/40
)ource: Am J Public Health, June 2004;94:559
7/23/2019 MODULE 09 Frequency Distributions
23/40
Data )ource: Am J Public Health, June 2004;94:559
+ revised version of the histogram on the previous slide%to deal
with the indefinite interval for patients who are D 2/ years old, we
made an arbitrary decision that the oldest patient was 3! years old .
"idpoint of interval 19 30 40 50 60 70 80
Age, years
0
20
40
60
Numb
erofPatients
3!
'argest value
7/23/2019 MODULE 09 Frequency Distributions
24/40
"idpoint of
interval
19 30 40 50 60 70 87
+ge, years
0
20
40
60
-umberof"atients
'argest value
100
Data )ource: Am J Public Health, June 2004;94:559
A histogram of data fromAJPH,Eune )/8 9/!!9 article with oldest
patient assumed to be 1 years old.
7/23/2019 MODULE 09 Frequency Distributions
25/40
+ge Distribution of the ./)/ "opulation, by )ex: $901
Millions
15 10 5 0 5 10 15
)ource: .) 2E-).) 3.4E+.
!-9
1-1/
1!-19
)-)/
)!-)9
-/
!-9
/-//
/!-/9
!-!/
!!-!9
-/
!-9
2-2/
2!-29
3-3/
F!
+ge
Interval
FemaleMale3!
7/23/2019 MODULE 09 Frequency Distributions
26/40
10 5 0 5 1015
+ge Distribution of the ./)/ "opulation, by )ex: &111
Millions
Male Female
15
!-9
1-1/
1!-19)-)/
)!-)9
-/
!-9
/-//
/!-/9
!-!/
!!-!9
-/
!-9
2-2/
2!-29
3-3/
F!
+geI
nterval
)ource: .) 2E-).) 3.4E+.
Year 1950
Year 2000
3!
7/23/2019 MODULE 09 Frequency Distributions
27/40
15 10 5 0 5 10 15
Male Female
Millions
!-9
1-1/
1!-19
)-)/
)!-)9
-/
!-9
/-//
/!-/9
!-!/
!!-!9
-/
!-9
2-2/
2!-29
3-3/
F!
+geInterval
+ge Distribution of the ./)/ "opulation, by )ex: &101
)ource: .) 2E-).) 3.4E+.
Year 2000
Year 2050
3!
7/23/2019 MODULE 09 Frequency Distributions
28/40
Frequency "olygon
7/23/2019 MODULE 09 Frequency Distributions
29/40
A frequency polygon is prepared by connecting themidpoints of the tops of the histogram bars with
straight lines in such a manner that the area covered
by the resulting figure includes 1= of the
histogram area.
Frequency "olygon
F " l f + E l
7/23/2019 MODULE 09 Frequency Distributions
30/40
Frequency "olygon for +ge Example
)
))
1)
1
13
3
1
/
)
"ercentageo
fstudents
) )) )/ ) )3 ) / 3 / /) // / /3 ! !) !/
1/
)/
)
+ge Interval Midpoint
7/23/2019 MODULE 09 Frequency Distributions
31/40
)ource:Am J Public Health, Aug 200!;9!:!209
7/23/2019 MODULE 09 Frequency Distributions
32/40
)ource:Am J Public Health, Sept !9"#;##:$"4
7/23/2019 MODULE 09 Frequency Distributions
33/40
)ource:Am J Public Health, Sept !9"#;##:$"4
7/23/2019 MODULE 09 Frequency Distributions
34/40
)ource:Am J Public Health, Sept !9"#;##:$"4
) d * f "l
7/23/2019 MODULE 09 Frequency Distributions
35/40
)tem and *eaf "lot
5tem# 'eaf >ount
)3 1) )) 1!
)! 111) /
)/ )! !
) /23
)) 1!!!39 2
)1 !!!!!!!! 3
) 12222 919 1)))//!239 1)
13 1)/!!233999 1/
12 )))//!22333 1!
1 111))))////9 1/
1! 111)//9999 1)
1/ 11)!299 1
1 1//2333 3
1) /229 2
11 1)/ !
1 1)22 /
9 )/
2 1) )
The stem contains the
leading digits of the
observations8 because we
have digit numbers from
11 to )3) for this data set
the stem includes the firsttwo digits of the numbers.
;n general the stem consists
of n-1 digits, where n is the
number of digits in the
number.
7or the last two sets of
numbers the stem
contains only 1 digit
because the observations
are ) digit numbers.
The leaf always contains
the last digit of eachobservation with the
same leading digits8
arranged in increasing
order of magnitude from
9 to form the leaf.
Bives the number
of digits in the
leaf of each stem
7/23/2019 MODULE 09 Frequency Distributions
36/40
4 A chart that displays a frequency distribution
similar to a histogram.
4 A stem and leaf plot shows
4 The spread of the data 4 The mode
4 :hether the distribution is s6ewed
4 :hether there are gaps in the data4 :hether there are any unusual data points
)tem and *eaf "lot
7/23/2019 MODULE 09 Frequency Distributions
37/40
)tem and *eaf "lot
A stem and leaf plot typically consists of threecolumns, the first two being separated by a single
blan6 space. The first column is the stem or leading
digits. The second is the leaf which represents the
values in the interval following the stem digits. The
third column indicates the number of data values or
count in the interval.
7/23/2019 MODULE 09 Frequency Distributions
38/40
)tem and *eaf "lot Example
5uestion %isplay the information in the table below in a 5tem and 'eaf Clot.
7/23/2019 MODULE 09 Frequency Distributions
39/40
)tem *eaf "lot for 3lood 2holesterol Data
5tem#'eaf >ount
)2 1
) 1 1
)!
)/ 2 1
) ! 1
)) 13 )1 2 1
) 1 !
19 1)))//!239 1)
13 1)/!!23399 1
12 )))//!239 1
1 111))))////!!!2339999 )/
1! 111))/////!!!3339999 )!1/ 111)!299 11
1 1/2333 1
1) /29 !
11 1/ )
1
9 ! 1
)+) 6utput
7/23/2019 MODULE 09 Frequency Distributions
40/40
Y
30
20
10
0
Std. Dev = 28.66
Mean = 167.6
N = 129.00
)
3
)
2
)
1
)
!
)
/
2
)
,
!
)
)
13
)
1
2
)
1
1
9
1))),//!239
1
3
1)/!!23399
1
2
))),,//!239
1
111))))////!!!2339999
1
!
111)),,/////!!!3339999
1
/
111),!299
1
,
1,/2333
1
)
,/29
1
1
1/
1
9
!
0ote The stem G 'eaf plot is plotted in decreasing order of
magnitude, while the @istogram is plotted in increasing order, from
ll h hi h id i
istogram and )tem 7 *eaf plot for the 2holesterol Data