56
Chapter 2: Frequency Distributions and Graphs Objectives: Organizing data using frequency distributions. Illustrating data using graphs. Interpreting graphs. © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

Organizing data using frequency distributions. Illustrating data ...fmalam.kau.edu.sa/Files/0007085/Files/157188_STAT_110_CH...Overview of Chapter 2 Sec. # Title Page(s) 2 - 1 Organizing

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • Chapter 2: Frequency Distributions and Graphs

    Objectives:

    ❑ Organizing data using frequency

    distributions.

    ❑ Illustrating data using graphs.

    ❑ Interpreting graphs.

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Overview of Chapter 2

    Sec. # Title Page(s)

    2 - 1 Organizing data 42 - 56

    2 - 2Histogram, frequency polygons, and

    ogives57 - 74

    2 - 3 Other types of graphs 74 - 108

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • 2 – 1: Organizing data

    Inductee Blood

    type

    1 A

    2 B

    3 B

    4 AB

    ⋮ ⋮

    25 A

    These data are called

    raw data ! (i.e., they

    are in their original

    form)

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • 2 – 1: Organizing data (cont.)

    By using descriptive statistical methods, we

    transform raw data into …

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • 1. Frequency Distributions

    Inductee Blood

    type

    1 A

    2 B

    3 B

    4 AB

    ⋮ ⋮

    25 A

    Blood type Number of

    inductees

    A 5

    B 7

    O 9

    AB 4

    Descriptive

    Statistics

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • 2. Graphs

    A20%

    B28%

    O36%

    AB16%

    A B O AB

    Inductee Blood

    type

    1 A

    2 B

    3 B

    4 AB

    ⋮ ⋮

    25 A

    Descriptive

    Statistics

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • 2 – 1: Organizing data (cont.)

    A frequency distribution is the organization of raw data in table form, using classes and frequencies.

    A class is a quantitative or qualitative aspect in which data are accordingly distributed.

    A frequency of a class is the number of data values placed in this class.

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • 2 – 1: Organizing data (cont.)

    Types of frequency distributions:

    Categorical frequency distribution is used

    when in the cases of nominal-level or ordinal-

    level data.

    Grouped frequency distribution is used in the

    case of quantitative data with large range.

    Ungrouped frequency distribution is used in

    the case of quantitative data with relatively

    small range or when the data are discrete.

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Example 2 – 1: Distribution of Blood Types

    (Categorical frequency distribution, page 43)

    Twenty-five army inductees were given a

    blood test to determine their blood type.

    The data set is as shown in page 43.

    This is called the

    sample size!

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Example 2 – 1 (cont.)

    Step 1. determine the classes.

    We have four blood types, therefore, there are four classes are they are A, B, O, and AB.

    Step 2. Create a table with three columns, the first column is for blood types (classes), the second column for counting, and the third column is for the number (#) of inductees (frequencies).

    Step 3. Tally data and then delete column 2.

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Example 2 – 1 (cont.)

    A AB

    AB

    A

    A AB

    AB A A

    Class Tally Frequency

    A |||| 5

    B 7

    O 9

    AB |||| 4

    Total 25

    Raw data Frequency distribution

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Example 2 – 1 (cont.)

    Blood type

    (Class)

    # of inductees

    (Frequency)

    A 5 = 𝒇𝟏

    B 7 = 𝒇𝟐

    O 9 = 𝒇𝟑

    AB 4 = 𝒇𝟒

    Total ∑𝒇 = 𝒇𝟏 + 𝒇𝟐 + 𝒇𝟑 + 𝒇𝟒 = 𝟐𝟓

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Important Rules!

    Regardless of the type of the frequency

    distribution, if the sample size is represented by 𝒏, then

    ∑𝒇 = 𝒏

    Regardless of the type of the frequency

    distribution, the (cumulative) percentage of class

    number 𝒊 is defined as

    𝑷𝒊 =𝒇𝒊𝒏× 𝟏𝟎𝟎%

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Important Rules! (cont.)

    When manually drawing the pie-chart, we need to

    calculate the degree of class number 𝒊 which is defined as

    𝑫𝒊 =𝒇𝒊𝒏× 𝟑𝟔𝟎°

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Example 2 – 1 (Revisited)

    Blood type

    (Class)

    # of inductees

    (Frequency)

    Percentage of

    inductees (%)

    Degree of

    inductees

    A 5 20% 72˚

    B 7 28% 100.8˚

    O 9 36% 129.6˚

    AB 4 16% 57.6˚

    Total 25 100% 360˚

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Example 2 – 2: Record High Temperatures

    (Groped frequency distribution, page 47)

    The data in page 47 represent the record

    high temperatures in degrees Fahrenheit (˚F)

    for each of the 50 (= 𝒏) states. Construct a grouped frequency distribution for the data

    using 7 classes.

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Example 2 – 2 (cont.)

    Step 1. determine the classes as follows:

    Calculate the range (𝑹) which is the difference between the highest value (𝑯) and the lowest value (𝑳), i.e.

    𝑹 = 𝑯− 𝑳 = 𝟏𝟑𝟒 − 𝟏𝟎𝟎 = 𝟑𝟒

    Given the number of classes, the class width is defined as:

    𝐜𝐥𝐚𝐬𝐬 𝐰𝐢𝐝𝐭𝐡 = 𝒉 =𝑹

    𝐧𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐜𝐥𝐚𝐬𝐬𝐞𝐬 = 𝒌

    =𝟑𝟒

    𝟕= 𝟒. 𝟗 ≈ 𝟓

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Example 2 – 2 (cont.)

    Step 1. (cont.)

    A class of a grouped frequency distribution consists of

    class limits (boundaries); a lower limit, which is the

    smallest data value that can be included in the class, and

    an upper limit, which represents the largest data value

    that can be included in the class.

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Example 2 – 2 (cont.)

    Step 1. (cont.) (textbook)

    Lower limits: Consider the lowest value to be the starting point for the lowest class limits, i.e., first use 100 as the lower limit of the first class then repeatedly add the class width to get the lower limit of the next six classes, i.e., 105, 110, 115, 120, 125, 130.

    Upper limits: Subtract one unit from the lower limits of the second class until the seventh class to get the upper limit of the first class until the sixth class. Finally, use the largest value as the upper limit of the final class.

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Example 2 – 2 (cont.)

    Step 1. (cont.)

    The limits of the classes are:

    𝑳 = 100 – 104

    105 – 109

    𝒍𝟑 =110 – 114 = 𝒖𝟑115 – 119

    120 – 124

    125 – 129

    130 – 134 = 𝑯

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Example 2 – 2 (cont.)

    Step 2. Create a table with three columns, the first

    column is for temperature (classes), the second

    column for counting, and the third column is for the

    number (#) of states (frequencies).

    Step 3. Tally data and then delete column 2.

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Example 2 – 2 (cont.)

    Temperature

    (Class)

    # of states

    (Frequency)

    100 – 104 2

    105 – 109 8

    110 – 114 18

    115 – 119 13

    120 – 124 7

    125 – 129 1

    130 – 134 1

    Total 50

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Important Rules!

    Sometimes, we need to calculate the midpoints

    of each class which is given by

    𝐜𝐥𝐚𝐬𝐬 #𝒊 𝐦𝐢𝐝𝐩𝐨𝐢𝐧𝐭 =𝒍𝒊 + 𝒖𝒊𝟐

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Important Rules! (cont.)

    Grouped (and also ungrouped) frequency

    distributions use class boundaries so that there are

    no gaps in the frequency distribution. They are given

    by

    𝒍𝒊 − 𝟎. 𝟓, 𝒖𝒊 + 𝟎. 𝟓

    in the case of grouped frequency distributions.

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Important Rules! (cont.)

    Usually, grouped frequency distributions consist of

    equal class widths. The class width based on the

    limits of any class 𝒊 is given by

    𝐜𝐥𝐚𝐬𝐬 𝐰𝐢𝐝𝐭𝐡 = 𝒍𝒊+𝟏 − 𝒍𝒊or

    𝐜𝐥𝐚𝐬𝐬 𝐰𝐢𝐝𝐭𝐡 = 𝒖𝒊+𝟏 − 𝒖𝒊

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Example 2 – 2 (Revisited)

    Temperature

    (Class)

    Class

    boundaries

    Class

    Midpoints

    # of states

    (Frequency)

    100 – 104 99.5 – 104.5 102 2

    105 – 109 104.5 – 109.5 107 8

    110 – 114 109.5 – 114.5 112 18

    115 – 119 114.5 – 119.5 117 13

    120 – 124 119.5 – 124.5 122 7

    125 – 129 124.5 – 129.5 127 1

    130 – 134 129.5 – 134.5 132 1

    Total 50

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Example 2 – 3: MPGs for SUVs

    (Ungrouped frequency distribution, page 49)

    The data shown in page 49 represent the

    number of miles per gallon (mpg) that 30 (=𝒏) selected four-wheel-drive sports utility vehicles obtained in city driving. Construct a

    frequency distribution, and analyze the

    distribution.

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Example 2 – 3 (cont.)

    Step 1. determine the classes.

    Notice that the range of data is small (𝑅 = 19 − 12 = 7). The classes are 12, 13, 14, 15, 16, 17, 18, and 19.

    Step 2. Create a table with three columns, the first column is for

    MPG (classes), the second column for counting, and the third

    column is for the # of SUVs (frequencies).

    Step 3. Tally data and then delete column 2.

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Example 2 – 3 (cont.)

    MPG

    (Class)

    # of SUVs

    (Frequency)

    12 6

    13 1

    14 3

    15 6

    16 8

    17 2

    18 3

    19 1

    Total 30

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Remark!

    Notice that the class boundaries calculated for the

    ungrouped frequency distribution since MPG is an

    example of a continuous variable.

    Only in the case of continuous data, we can

    obtain class boundaries by subtracting 0.5 from

    each class value to get the lower class boundary,

    and adding 0.5 to each class value to get the

    upper class boundary.

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Example 2 – 3 (Revisited)

    MPG

    (Class)Class boundaries

    # of SUVs

    (Frequency)

    12 11.5 – 12.5 6

    13 12.5 – 13.5 1

    14 13.5 – 14.5 3

    15 14.5 – 15.5 6

    16 15.5 – 16.5 8

    17 16.5 – 17.5 2

    18 17.5 – 18.5 3

    19 18.5 – 19.5 1

    Total 30

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • 2 – 2: Histograms, frequency polygons,

    and ogives

    Histogram:

    Definition is in page 57, and the corresponding

    illustrative Example 2 – 4 is in pages 57-58.

    Frequency polygon:

    Definition is in page 58, and the corresponding

    illustrative Example 2 – 5 is in pages 58-59.

    Ogive:

    Definition is in page 59, and the corresponding

    illustrative Example 2 – 6 is in pages 59-61.

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Histogram

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Frequency Polygon

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Cumulative Frequency Distribution

    A cumulative frequency distribution is a

    distribution that shows the number of data values

    less than or equal to each upper boundary.

    The values are found by adding the frequencies of

    the classes less than or equal to the upper class

    boundary of a specific class. This gives an

    ascending cumulative frequency. The last

    cumulative frequency must be equal to 𝒏.

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Example 2 – 2 (Revisited)

    Temp.

    (Class)

    Class

    boundariesFrequency

    Cumulative

    Frequency

    Less than 99.5 0

    100 – 104 99.5 – 104.5 2 Less than 104.5

    105 – 109 104.5 – 109.5 8 Less than 109.5

    110 – 114 109.5 – 114.5 18 Less than 114.5

    115 – 119 114.5 – 119.5 13 Less than 119.5

    120 – 124 119.5 – 124.5 7 Less than 124.5

    125 – 129 124.5 – 129.5 1 Less than 129.5

    130 – 134 129.5 – 134.5 1 Less than 134.5© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Example 2 – 2 (Revisited)

    Temp.

    (Class)

    Class

    boundariesFrequency

    Cumulative

    Frequency

    Less than 99.5 0

    100 – 104 99.5 – 104.5 2 Less than 104.5 2

    105 – 109 104.5 – 109.5 8 Less than 109.5 10

    110 – 114 109.5 – 114.5 18 Less than 114.5 28

    115 – 119 114.5 – 119.5 13 Less than 119.5 41

    120 – 124 119.5 – 124.5 7 Less than 124.5 48

    125 – 129 124.5 – 129.5 1 Less than 129.5 49

    130 – 134 129.5 – 134.5 1 Less than 134.5 𝟓𝟎 = 𝒏

    Keep Adding!

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Ogive

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Why using ogives?

    The ogive is mainly used to visually represent how

    many cumulative frequency (percentage) are

    approximately below a certain upper class

    boundary and vice versa.

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Case 1: Getting the cumulative frequency based on

    an upper class boundary.

    cumulative frequency ≈ 𝟒𝟓

    Upper (limit) boundary= 𝟏𝟐𝟐

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Case 2: Getting the upper class boundary based on a

    cumulative frequency.

    Upper (limit) boundary≈ 𝟏𝟏𝟕. 𝟓

    cumulative frequency = 𝟑𝟓

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Case 3: Getting the cumulative frequencies based on

    two upper class boundaries.

    cumulative frequency = 𝟒𝟓

    𝐑𝐞𝐪𝐮𝐢𝐫𝐞𝐝 cumulative frequency = 𝟒𝟓 − 𝟑𝟓 = 𝟏𝟎

    cumulative frequency = 𝟑𝟓

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Remarks!

    If the sample size and the (cumulative)

    percentage of a specific class are given, then the

    corresponding (cumulative) frequency of this class

    can be calculated by

    𝒇𝒊 =𝑷𝒊

    𝟏𝟎𝟎%× 𝒏

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Remarks! (cont.)

    If both the (cumulative) frequency and the

    (cumulative) percentage of a specific class are

    given, then the sample size of this class can be

    calculated by

    𝒏 =𝒇𝒊𝑷𝒊

    × 𝟏𝟎𝟎%

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • 2 – 2: Other types of graphs

    Bar graph:

    Definition is in page 75, and the corresponding

    illustrative Example 2 – 8 is in pages 77.

    Time series chart:

    Definition is in page 78, and the corresponding

    illustrative Example 2 – 10 is in pages 78-79.

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • 2 – 2: Other Types of Graphs

    Pie chart:

    Definition is in page 80, with Example 2 – 11 in

    pages 80-81, and Example 2 – 12 in pages 82.

    Stem-and-leaf plot:

    Definition is in page 84, with Example 2 – 14 in

    pages 84, and Example 2 – 15 in pages 85.

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Summary of graphs

    Bar graph is obtained based on a categorical frequency

    distribution, i.e., it is typically used with qualitative data.

    Note that this graph is often preferred when the data are

    ordinal-level qualitative. Also, note that this figure can

    be also be used when the data are

    Time series charts are used when the quantitative data

    are observed over a period of time (e.g., minutes, hours,

    etc.). Here, the independent variable is the time and the

    variable observed over time is the dependent variable.

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Summary of graphs (cont.)

    Like the bar graph, the Pie chart is obtained based

    on a categorical frequency distribution, and it is

    highly recommended for nominal-level qualitative

    data.

    Stem-and-leaf plot is only used when the data are

    quantitative.

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Bar graph

    0

    2

    4

    6

    8

    10

    12

    14

    16

    18

    A+ A B+ B C+ C D+ D F DN

    Grade

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Time series chart

    0

    100

    200

    300

    400

    500

    600

    700

    800

    2003 2004 2005 2006 2007

    Number of homicides

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Pie Chart

    Remember: the degree of a class in a pie chart is defined as

    𝒇

    𝒏× 𝟑𝟔𝟎°

    A20%

    B28%

    O36%

    AB16%

    A B O AB

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Stem and leaf plot

    The stem and leaf plot is similar to a horizontally

    flipped histogram!

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Stem and leaf plot

    Consider the following numbers 1403 and 102.

    The stem of 1403 is 140 and the leaf is 3.

    The stem of 102 is 10 and the leaf is 2.

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Stem and leaf plot

    By joining the stems and the leaves, we notice that the

    minimum is 02, while the maximum is 57.

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Summary

    Nominal Ordinal Discrete Continuous

    Categorical

    frequency

    distribution

    ✓ ✓

    Grouped

    frequency

    distribution

    (large range)

    Ungrouped

    frequency

    distribution

    ✓✓

    (small range)

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019

  • Summary (cont.)

    Nominal Ordinal Discrete Continuous

    Bar Chart ✓ ✓ ✓

    Time Series

    Independent variable:

    time

    Dependent variable:

    discrete or continuous

    Pie Chart ✓ ✓

    Stem-and-

    leaf plot✓ ✓

    © FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019