Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
STAT 3090 Lecture Notes Chapter 3
Chapter 3 – Organizing, Displaying, and Interpreting Data
From information about baseball to information about income, the most basic task when working with data is to summarize a great deal of information. Graphical images are universally regarded as a powerful method of communicating data.
We will use the following to graphically display data:
Graphical Displays for Qualitative Data
• Frequency/Relative Frequency Distribution• Bar Chart• Stacked Bar Chart• Pie Chart
Graphical Displays for Quantitative Data
• Frequency/Relative Frequency Distribution• Histogram• Stem and Leaf Display• Dot Plots• Time Series Data
Qualitative Data - Frequency Distributions
Data organization using a table:
• A frequency distribution summarizes data by category (class) and the number of observations of each category.
• A relative frequency distribution summarizes data by category and the relative frequency or the proportion of observations in each category.
You should be able to create frequency distributions with and without the use of technology.
1
STAT 3090 Lecture Notes Chapter 3
Example 1: Olympic Hockey
The table shows the gold medal winners in hockey in the Winter Olympics since 1920. Complete the table below to create a frequency and relative frequency distribution. Source: www.olympic.org
Year Winner Year Winner Year Winner1920 Canada 1960 U.S.A. 1992 Unified Team1924 Canada 1964 Soviet Union 1994 Sweden1928 Canada 1968 Soviet Union 1998 Czech Republic1932 Canada 1972 Soviet Union 2002 Canada1936 Great Britain 1976 Soviet Union 2006 Sweden1948 Canada 1980 U.S.A. 2010 Canada1952 Canada 1984 Soviet Union 2014 Canada1956 Soviet Union 1988 Soviet Union
Frequency Distribution Relative Frequency Distribution
Winner
Relative Frequency
Canada 9/23
Soviet Union 7/23
U.S.A. 2/23
Great Britain 1/23
Sweden 2/23
Czech Republic 1/23
2
Winner Frequency
Canada 9
Soviet Union 7
U.S.A. 2
Great Britain 1
Sweden 2
Czech Republic 1
Unified Team 1
STAT 3090 Lecture Notes Chapter 3
Unified Team 1/23
TOTAL: 23 TOTAL: 23/23
With large data sets, it is useful to be able to create a frequency and relative frequency distribution using technology. Instructions for creating these using Excel and Minitab may be found in the Chapter 3 Technology Guide.
Qualitative Data – Bar Charts and Pie Charts
A bar chart displays the categories and a bar representing the frequency or relative frequency of each category.
Bar Chart with Frequencies
Bar Chart with Relative Frequencies
3
STAT 3090 Lecture Notes Chapter 3
Pie Chart
A pie chart provides the relative frequencies of each category represented by the proportion of a circle that corresponds to the relative frequency.
4
STAT 3090 Lecture Notes Chapter 3
Qualitative Data - Stacked Bar Charts
While a bar chart represents the frequencies of categories of one variable, a stacked (segmented) bar chart represents the frequencies of combinations of categories of two variables.
Example 2
The following stacked bar graph represents the medal count results for the top ten countries (by total medal count) in the 2014 Winter Olympics in Sochi.
Example 2: Interpreting a Stacked Bar Graph
Use the graph above to answer the following.
a. How many bronze medals did Canada earn? 5
b. What proportion of medals earned by Canada were bronze? 5/25
c. Which country earned the most bronze medals? U.S.A.
5
STAT 3090 Lecture Notes Chapter 3
Example 3:
Consider the sales performance of the following sales persons.
Salesperson Total Sales (Thousands of Dollars)
Susan 187William 201Beth 207Rob 193
The first bar chart more accurately depicts the roughly 10% difference in sales performance.
6
STAT 3090 Lecture Notes Chapter 3
7
STAT 3090 Lecture Notes Chapter 3
Using Excel, create a stacked bar graph with the following data representing the overall medal counts from the 2016 Rio Olympics.
COUNTRY GOLD SILVER BRONZE TOTAL
USA 46 37 38 121
GBR 27 23 17 67
CHN 26 18 26 70
RUS 19 18 19 56
GER 17 10 15 42
JPN 12 8 21 41
FRA 10 18 14 42
KOR 9 3 9 21
ITA 8 12 8 28
AUS 8 11 10 29
8
STAT 3090 Lecture Notes Chapter 3
USA GBR CHN RUS GER JPN FRA KOR ITA AUS0
20
40
60
80
100
120
140
2016 Rio Olympics
GOLD SILVER BRONZE
9
STAT 3090 Lecture Notes Chapter 3
Quantitative Data - Frequency and Relative Frequency Distributions
When data are qualitative, selecting the categories for display is relatively easy. When the data are quantitative, this is not particularly obvious. One should consider the range of values in the dataset along with the number of categories (bins) desired when constructing a frequency or relative frequency distribution. Be sure that number of categories and bin width appropriately represents the data set.
Example 1: Poverty Level by State
The following table gives the percent of people living below the poverty level for each state and the District of Columbia in the year 2015.
Percent of People Living Below the Poverty Level by State
The data ranges from roughly 8% to roughly 22% so a bin width of 2% seems reasonable.
We may summarize these data using a frequency distribution, relative frequency distribution and cumulative relative frequency distribution.
Source: 2015 American Community Survey
Percent of People Living Below the
Poverty Level
Frequency
(Number of States)Relative Frequency
of )Cumulative Relative
Frequency
[8.2 , 10.2] 4 4/51 4/51
(10.2 , 12.2] 13 13/51 17/51
(12.2 , 14.2] 8 8/51 25/51
(14.2 , 16.2] 13 13/51 38/51
(16.2 , 18.2] 7 7/51 45/51
(18.2 , 20.2] 4 4/51 49/51
10
18.5 10.5 15.1 19.6 22.0 10.8 16.1 16.7 17.9
10.3 12.4 13.6 13.4 14.8 20.4 15.4 15.9 12.1
17.4 17.3 14.5 9.7 14.6 15.4 13.2 11.3 11.1
19.1 15.7 12.2 11.5 12.6 16.4 13.9 10.2
15.3 17.0 13.0 15.8 14.7 11.0 16.6 11.2
11.5 10.6 18.5 10.2 8.2 14.8 13.7 12.2
STAT 3090 Lecture Notes Chapter 3
(20.2 , 22.2] 2 2/51 51/51
Example 2: Using the Cumulative Relative Frequency Distribution
Use the table that you created in Example 1 to answer the following.
A. What proportion of states have less than or equal to 16.2% of the population living below the poverty level? 0.74
B. What proportion of states have more than 14.2% of the population living below the poverty level? 1-.49=.51
The frequency tables that you created in the previous example may be displayed graphically using a histogram.
A histogram is a graph that uses bars to portray the frequencies or relative frequencies of the possible outcomes of the variable. It is a useful graphical tool for large quantitative data sets.
11
STAT 3090 Lecture Notes Chapter 3
Quantitative Data – Stem and Leaf Diagrams
Another useful tool for displaying data is a stem and leaf diagram. The stem and leaf diagram is similar to a histogram however, unlike a histogram, the values of the original data set are apparent given the display.
A stem and leaf diagram may be constructed using the following steps:
Step 1: Sort the data from low to high
Step 2: Split the values into stem and leaf:
leaf = units place: stem = all digits left of the units place
For example, for the value 112, the stem is 11 and the leaf is 2.
Step 3: List the stems from lowest to highest
Step 4: Order the leaves from lowest to highest and place next to each stem.
Example 3: Days to Payment
12
STAT 3090 Lecture Notes Chapter 3
The following data represent the number of days required to collect insurance payments for a random sample of customers of a local dentist.
Number of Days to Collect Payment
34 55 36 39 36
32 35 30 47 31
60 66 48 43 33
24 37 38 65 35
22 45 33 29 41
38 35 28 56 56
Create a stem and leaf display for these data.
13
STAT 3090 Lecture Notes Chapter 3
Qualitative Data – Dot Plots
A dot plot shows a dot for each observation, placed above each value on the number line for that observation. Like the stem and leaf plot, it portrays each observation (you can recreate the data set from the plot). The following shows a dot plot for a distribution of test scores.
10095908580757065Test Scores
Dotplot of Test Scores
The Shape of a Distribution
•The shape of a distribution is described by mentioning any symmetry or skewness, the number of peaks, any clusters or gaps, and any unusually high or low observations, called outliers.
•
•In describing the shape of a distribution, concentrate on the main features.
─Look for rough symmetry or clear skewness.
─Look for major peaks or gaps, not just for minor ups and downs in the bars of the histogram.
─Look for clear outliers not just for the smallest and largest observations.
•In determining the shape of a
distribution, it is often helpful
to outline a graph with a
smooth curve.
14
STAT 3090 Lecture Notes Chapter 3
•Symmetry vs. skewness:
─A distribution is symmetric if the right and left sides of the graph are approximate mirror images.
─A distribution is skewed left if the left side of the graph extends much further out than the right.
─A distribution is skewed right if the right side of the graph extends much further out than the left.
15
STAT 3090 Lecture Notes Chapter 3
Describing The Shape of a Distribution
A way to remember how to describe a distribution
•Center (mean, median)
•Unusual (any outliers)
•Shape (symmetric- be careful!, skewed, uniform, bimodal)
•Spread (range, interquartile range, standard deviation)
16
STAT 3090 Lecture Notes Chapter 3
Example 4
Describe the shape of the fat content in McDonald’s breakfast menu items, based on the stem-and-leaf plot. Key 1|3 = 13 grams.
Comparing Two Distributions
•When you are asked to compare two distributions, you need to discuss the similarities and/or differences in their shapes, their centers, and their spreads.
─Provide a measure of center for each distribution, and discuss which one is larger/smaller. (Chapter 4)
─Provide a measure of spread for each distribution, and discuss which is larger/smaller. (Chapter 4)
Example 5
17
Skewed right with a gap in the 40’s and one unusually high observation of 56g.
STAT 3090 Lecture Notes Chapter 3
These dot plots represent pet ownership in two different city blocks. Write a sentence to compare their shape.
18
It appears that block A is right skewed while block B is more symmetric and mound shaped.
STAT 3090 Lecture Notes Chapter 3
Chapter 3 Practice
1. Obesity, high blood pressure, high cholesterol, and heart disease are partially caused by a poor diet. As such, the FDA requires nutrition labels on most packaged foods. Below is a dot plot of the amount of sugar contained in a single serving of 12 popular breakfast cereals.
Describe the shape of the distribution.
The distribution is approximately symmetric and bimodal with a gap from 5 to 10 grams of sugar.
2. The following histogram represents data for the percentage of people without health insurance for the 50 states in 2013. How would you describe the shape of the distribution of uninsured rates for the 50 states from the histogram below?
19
STAT 3090 Lecture Notes Chapter 3
Approximately symmetric & unimodal
3. Suppose we survey everyone in the STAT 3090 and ask them what month they were born in (January, February, etc.). Which of the following graphs would be most appropriate for displaying the results?
Bar Graph Histogram
Bar Graph
4. What is the difference between a frequency distribution and a relative frequency distribution?
A frequency distribution summarizes data by category (class) and the number of observations of each category whereas a relative frequency distribution summarizes data by category using the proportion of observations in each category.
5. A survey asked students to report the web browser they primarily use. The relative frequency bar graph below summarizes the 639 student responses. How many more respondents primarily use Chrome than primarily use Firefox?
20
STAT 3090 Lecture Notes Chapter 3
About 345
21