23
STAT 3090 Lecture Notes Chapter 3 Chapter 3 – Organizing, Displaying, and Interpreting Data From information about baseball to information about income, the most basic task when working with data is to summarize a great deal of information. Graphical images are universally regarded as a powerful method of communicating data. We will use the following to graphically display data: Graphical Displays for Qualitative Data Frequency/Relative Frequency Distribution Bar Chart Stacked Bar Chart Pie Chart Graphical Displays for Quantitative Data Frequency/Relative Frequency Distribution Histogram Stem and Leaf Display Dot Plots Time Series Data Qualitative Data - Frequency Distributions Data organization using a table: A frequency distribution summarizes data by category (class) and the number of observations of each category. 1

Weebly · Web viewDescribe the shape of the distribution. The distribution is approximately symmetric and bimodal with a gap from 5 to 10 grams of sugar. The following histogram represents

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Weebly · Web viewDescribe the shape of the distribution. The distribution is approximately symmetric and bimodal with a gap from 5 to 10 grams of sugar. The following histogram represents

STAT 3090 Lecture Notes Chapter 3

Chapter 3 – Organizing, Displaying, and Interpreting Data

From information about baseball to information about income, the most basic task when working with data is to summarize a great deal of information. Graphical images are universally regarded as a powerful method of communicating data.

We will use the following to graphically display data:

Graphical Displays for Qualitative Data

• Frequency/Relative Frequency Distribution• Bar Chart• Stacked Bar Chart• Pie Chart

Graphical Displays for Quantitative Data

• Frequency/Relative Frequency Distribution• Histogram• Stem and Leaf Display• Dot Plots• Time Series Data

Qualitative Data - Frequency Distributions

Data organization using a table:

• A frequency distribution summarizes data by category (class) and the number of observations of each category.

• A relative frequency distribution summarizes data by category and the relative frequency or the proportion of observations in each category.

You should be able to create frequency distributions with and without the use of technology.

1

Page 2: Weebly · Web viewDescribe the shape of the distribution. The distribution is approximately symmetric and bimodal with a gap from 5 to 10 grams of sugar. The following histogram represents

STAT 3090 Lecture Notes Chapter 3

Example 1: Olympic Hockey

The table shows the gold medal winners in hockey in the Winter Olympics since 1920. Complete the table below to create a frequency and relative frequency distribution. Source: www.olympic.org

Year Winner Year Winner Year Winner1920 Canada 1960 U.S.A. 1992 Unified Team1924 Canada 1964 Soviet Union 1994 Sweden1928 Canada 1968 Soviet Union 1998 Czech Republic1932 Canada 1972 Soviet Union 2002 Canada1936 Great Britain 1976 Soviet Union 2006 Sweden1948 Canada 1980 U.S.A. 2010 Canada1952 Canada 1984 Soviet Union 2014 Canada1956 Soviet Union 1988 Soviet Union

Frequency Distribution Relative Frequency Distribution

Winner

Relative Frequency

Canada 9/23

Soviet Union 7/23

U.S.A. 2/23

Great Britain 1/23

Sweden 2/23

Czech Republic 1/23

2

Winner Frequency

Canada 9

Soviet Union 7

U.S.A. 2

Great Britain 1

Sweden 2

Czech Republic 1

Unified Team 1

Page 3: Weebly · Web viewDescribe the shape of the distribution. The distribution is approximately symmetric and bimodal with a gap from 5 to 10 grams of sugar. The following histogram represents

STAT 3090 Lecture Notes Chapter 3

Unified Team 1/23

TOTAL: 23 TOTAL: 23/23

With large data sets, it is useful to be able to create a frequency and relative frequency distribution using technology. Instructions for creating these using Excel and Minitab may be found in the Chapter 3 Technology Guide.

Qualitative Data – Bar Charts and Pie Charts

A bar chart displays the categories and a bar representing the frequency or relative frequency of each category.

Bar Chart with Frequencies

Bar Chart with Relative Frequencies

3

Page 4: Weebly · Web viewDescribe the shape of the distribution. The distribution is approximately symmetric and bimodal with a gap from 5 to 10 grams of sugar. The following histogram represents

STAT 3090 Lecture Notes Chapter 3

Pie Chart

A pie chart provides the relative frequencies of each category represented by the proportion of a circle that corresponds to the relative frequency.

4

Page 5: Weebly · Web viewDescribe the shape of the distribution. The distribution is approximately symmetric and bimodal with a gap from 5 to 10 grams of sugar. The following histogram represents

STAT 3090 Lecture Notes Chapter 3

Qualitative Data - Stacked Bar Charts

While a bar chart represents the frequencies of categories of one variable, a stacked (segmented) bar chart represents the frequencies of combinations of categories of two variables.

Example 2

The following stacked bar graph represents the medal count results for the top ten countries (by total medal count) in the 2014 Winter Olympics in Sochi.

Example 2: Interpreting a Stacked Bar Graph

Use the graph above to answer the following.

a. How many bronze medals did Canada earn? 5

b. What proportion of medals earned by Canada were bronze? 5/25

c. Which country earned the most bronze medals? U.S.A.

5

Page 6: Weebly · Web viewDescribe the shape of the distribution. The distribution is approximately symmetric and bimodal with a gap from 5 to 10 grams of sugar. The following histogram represents

STAT 3090 Lecture Notes Chapter 3

Example 3:

Consider the sales performance of the following sales persons.

Salesperson Total Sales (Thousands of Dollars)

Susan 187William 201Beth 207Rob 193

The first bar chart more accurately depicts the roughly 10% difference in sales performance.

6

Page 7: Weebly · Web viewDescribe the shape of the distribution. The distribution is approximately symmetric and bimodal with a gap from 5 to 10 grams of sugar. The following histogram represents

STAT 3090 Lecture Notes Chapter 3

7

Page 8: Weebly · Web viewDescribe the shape of the distribution. The distribution is approximately symmetric and bimodal with a gap from 5 to 10 grams of sugar. The following histogram represents

STAT 3090 Lecture Notes Chapter 3

Using Excel, create a stacked bar graph with the following data representing the overall medal counts from the 2016 Rio Olympics.

COUNTRY GOLD SILVER BRONZE TOTAL

 USA 46 37 38 121

 GBR 27 23 17 67

 CHN 26 18 26 70

 RUS 19 18 19 56

 GER 17 10 15 42

 JPN 12 8 21 41

 FRA 10 18 14 42

 KOR 9 3 9 21

 ITA 8 12 8 28

 AUS 8 11 10 29

8

Page 9: Weebly · Web viewDescribe the shape of the distribution. The distribution is approximately symmetric and bimodal with a gap from 5 to 10 grams of sugar. The following histogram represents

STAT 3090 Lecture Notes Chapter 3

 USA  GBR  CHN  RUS  GER  JPN  FRA  KOR  ITA  AUS0

20

40

60

80

100

120

140

2016 Rio Olympics

GOLD SILVER BRONZE

9

Page 10: Weebly · Web viewDescribe the shape of the distribution. The distribution is approximately symmetric and bimodal with a gap from 5 to 10 grams of sugar. The following histogram represents

STAT 3090 Lecture Notes Chapter 3

Quantitative Data - Frequency and Relative Frequency Distributions

When data are qualitative, selecting the categories for display is relatively easy. When the data are quantitative, this is not particularly obvious. One should consider the range of values in the dataset along with the number of categories (bins) desired when constructing a frequency or relative frequency distribution. Be sure that number of categories and bin width appropriately represents the data set.

Example 1: Poverty Level by State

The following table gives the percent of people living below the poverty level for each state and the District of Columbia in the year 2015.

Percent of People Living Below the Poverty Level by State

The data ranges from roughly 8% to roughly 22% so a bin width of 2% seems reasonable.

We may summarize these data using a frequency distribution, relative frequency distribution and cumulative relative frequency distribution.

Source: 2015 American Community Survey

Percent of People Living Below the

Poverty Level

Frequency

(Number of States)Relative Frequency

of )Cumulative Relative

Frequency

[8.2 , 10.2] 4 4/51 4/51

(10.2 , 12.2] 13 13/51 17/51

(12.2 , 14.2] 8 8/51 25/51

(14.2 , 16.2] 13 13/51 38/51

(16.2 , 18.2] 7 7/51 45/51

(18.2 , 20.2] 4 4/51 49/51

10

18.5 10.5 15.1 19.6 22.0 10.8 16.1 16.7 17.9

10.3 12.4 13.6 13.4 14.8 20.4 15.4 15.9 12.1

17.4 17.3 14.5 9.7 14.6 15.4 13.2 11.3 11.1

19.1 15.7 12.2 11.5 12.6 16.4 13.9 10.2

15.3 17.0 13.0 15.8 14.7 11.0 16.6 11.2

11.5 10.6 18.5 10.2 8.2 14.8 13.7 12.2

Page 11: Weebly · Web viewDescribe the shape of the distribution. The distribution is approximately symmetric and bimodal with a gap from 5 to 10 grams of sugar. The following histogram represents

STAT 3090 Lecture Notes Chapter 3

(20.2 , 22.2] 2 2/51 51/51

Example 2: Using the Cumulative Relative Frequency Distribution

Use the table that you created in Example 1 to answer the following.

A. What proportion of states have less than or equal to 16.2% of the population living below the poverty level? 0.74

B. What proportion of states have more than 14.2% of the population living below the poverty level? 1-.49=.51

The frequency tables that you created in the previous example may be displayed graphically using a histogram.

A histogram is a graph that uses bars to portray the frequencies or relative frequencies of the possible outcomes of the variable. It is a useful graphical tool for large quantitative data sets.

11

Page 12: Weebly · Web viewDescribe the shape of the distribution. The distribution is approximately symmetric and bimodal with a gap from 5 to 10 grams of sugar. The following histogram represents

STAT 3090 Lecture Notes Chapter 3

Quantitative Data – Stem and Leaf Diagrams

Another useful tool for displaying data is a stem and leaf diagram. The stem and leaf diagram is similar to a histogram however, unlike a histogram, the values of the original data set are apparent given the display.

A stem and leaf diagram may be constructed using the following steps:

Step 1: Sort the data from low to high

Step 2: Split the values into stem and leaf:

leaf = units place: stem = all digits left of the units place

For example, for the value 112, the stem is 11 and the leaf is 2.

Step 3: List the stems from lowest to highest

Step 4: Order the leaves from lowest to highest and place next to each stem.

Example 3: Days to Payment

12

Page 13: Weebly · Web viewDescribe the shape of the distribution. The distribution is approximately symmetric and bimodal with a gap from 5 to 10 grams of sugar. The following histogram represents

STAT 3090 Lecture Notes Chapter 3

The following data represent the number of days required to collect insurance payments for a random sample of customers of a local dentist.

Number of Days to Collect Payment

34 55 36 39 36

32 35 30 47 31

60 66 48 43 33

24 37 38 65 35

22 45 33 29 41

38 35 28 56 56

Create a stem and leaf display for these data.

13

Page 14: Weebly · Web viewDescribe the shape of the distribution. The distribution is approximately symmetric and bimodal with a gap from 5 to 10 grams of sugar. The following histogram represents

STAT 3090 Lecture Notes Chapter 3

Qualitative Data – Dot Plots

A dot plot shows a dot for each observation, placed above each value on the number line for that observation. Like the stem and leaf plot, it portrays each observation (you can recreate the data set from the plot). The following shows a dot plot for a distribution of test scores.

10095908580757065Test Scores

Dotplot of Test Scores

The Shape of a Distribution

•The shape of a distribution is described by mentioning any symmetry or skewness, the number of peaks, any clusters or gaps, and any unusually high or low observations, called outliers.

•In describing the shape of a distribution, concentrate on the main features.

─Look for rough symmetry or clear skewness.

─Look for major peaks or gaps, not just for minor ups and downs in the bars of the histogram.

─Look for clear outliers not just for the smallest and largest observations.

•In determining the shape of a

distribution, it is often helpful

to outline a graph with a

smooth curve.

14

Page 15: Weebly · Web viewDescribe the shape of the distribution. The distribution is approximately symmetric and bimodal with a gap from 5 to 10 grams of sugar. The following histogram represents

STAT 3090 Lecture Notes Chapter 3

•Symmetry vs. skewness:

─A distribution is symmetric if the right and left sides of the graph are approximate mirror images.

─A distribution is skewed left if the left side of the graph extends much further out than the right.

─A distribution is skewed right if the right side of the graph extends much further out than the left.

15

Page 16: Weebly · Web viewDescribe the shape of the distribution. The distribution is approximately symmetric and bimodal with a gap from 5 to 10 grams of sugar. The following histogram represents

STAT 3090 Lecture Notes Chapter 3

Describing The Shape of a Distribution

A way to remember how to describe a distribution

•Center  (mean, median)

•Unusual (any outliers)

•Shape (symmetric- be careful!, skewed, uniform, bimodal)

•Spread (range, interquartile range, standard deviation)

16

Page 17: Weebly · Web viewDescribe the shape of the distribution. The distribution is approximately symmetric and bimodal with a gap from 5 to 10 grams of sugar. The following histogram represents

STAT 3090 Lecture Notes Chapter 3

Example 4

Describe the shape of the fat content in McDonald’s breakfast menu items, based on the stem-and-leaf plot. Key 1|3 = 13 grams.

Comparing Two Distributions

•When you are asked to compare two distributions, you need to discuss the similarities and/or differences in their shapes, their centers, and their spreads.

─Provide a measure of center for each distribution, and discuss which one is larger/smaller. (Chapter 4)

─Provide a measure of spread for each distribution, and discuss which is larger/smaller. (Chapter 4)

Example 5

17

Skewed right with a gap in the 40’s and one unusually high observation of 56g.

Page 18: Weebly · Web viewDescribe the shape of the distribution. The distribution is approximately symmetric and bimodal with a gap from 5 to 10 grams of sugar. The following histogram represents

STAT 3090 Lecture Notes Chapter 3

These dot plots represent pet ownership in two different city blocks. Write a sentence to compare their shape.

 

18

It appears that block A is right skewed while block B is more symmetric and mound shaped.

Page 19: Weebly · Web viewDescribe the shape of the distribution. The distribution is approximately symmetric and bimodal with a gap from 5 to 10 grams of sugar. The following histogram represents

STAT 3090 Lecture Notes Chapter 3

Chapter 3 Practice

1. Obesity, high blood pressure, high cholesterol, and heart disease are partially caused by a poor diet. As such, the FDA requires nutrition labels on most packaged foods. Below is a dot plot of the amount of sugar contained in a single serving of 12 popular breakfast cereals.

Describe the shape of the distribution.

The distribution is approximately symmetric and bimodal with a gap from 5 to 10 grams of sugar.

2. The following histogram represents data for the percentage of people without health insurance for the 50 states in 2013.  How would you describe the shape of the distribution of uninsured rates for the 50 states from the histogram below?

19

Page 20: Weebly · Web viewDescribe the shape of the distribution. The distribution is approximately symmetric and bimodal with a gap from 5 to 10 grams of sugar. The following histogram represents

STAT 3090 Lecture Notes Chapter 3

Approximately symmetric & unimodal

3. Suppose we survey everyone in the STAT 3090 and ask them what month they were born in (January, February, etc.). Which of the following graphs would be most appropriate for displaying the results?

Bar Graph Histogram

Bar Graph

4. What is the difference between a frequency distribution and a relative frequency distribution?

A frequency distribution summarizes data by category (class) and the number of observations of each category whereas a relative frequency distribution summarizes data by category using the proportion of observations in each category.

5. A survey asked students to report the web browser they primarily use. The relative frequency bar graph below summarizes the 639 student responses.  How many more respondents primarily use Chrome than primarily use Firefox?

20

Page 21: Weebly · Web viewDescribe the shape of the distribution. The distribution is approximately symmetric and bimodal with a gap from 5 to 10 grams of sugar. The following histogram represents

STAT 3090 Lecture Notes Chapter 3

About 345

21