24
Chapter 2 Describing Data Graphically 11

Chapter 2bkrein/Introductory Statistics... · 5.According to the International Bartenders Association, a Mai Tai is a drink which contains 40 ml light rum, 20 ml dark rum, 15 ml orange

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chapter 2bkrein/Introductory Statistics... · 5.According to the International Bartenders Association, a Mai Tai is a drink which contains 40 ml light rum, 20 ml dark rum, 15 ml orange

Chapter 2

Describing Data Graphically

11

Page 2: Chapter 2bkrein/Introductory Statistics... · 5.According to the International Bartenders Association, a Mai Tai is a drink which contains 40 ml light rum, 20 ml dark rum, 15 ml orange

12 CHAPTER 2. DESCRIBING DATA GRAPHICALLY

2.1 Describing Qualitative Data Graphically

One of the things we are going to want to do is to summarize a data set. We begin with describingqualitative data.

Consider the following situation: you have asked several registered voters whether or not theysupport the reelection of the incumbent congressional representative. You have coded the data as:F for in Favor, O for opposed, U for Undecided, and N for no reply.

F O U F N F F O U U N N O O F F F O U O F O O F UO O F U N O O F F O U U N O F F F F O F O O U U NF O O U U N N O O N U U N N O F F O O O N U U O F

At a glance it is impossible to know how the incumbent stands: does the data support a reelectionor not? If we summarize the data the answer should be obvious. We will start with a FrequencyDistribution

A Frequency Distribution is a listing of all the different outcomes along with the frequen-cies(counts).

Before we proceed with construction of our frequency distribution, we need to remember that ourgoal here is to summarize the data and make the data usable for the reader. Since this is the case,we want to make sure that any display we produce can stand on its own. In other words, the readershouldn’t have to sort through the reading to figure out what the display is all about. We proceed.

For the data above, we have only four different possibilities: F, O, U, and N. We need to countthe number of each reply. There are 21 F’s, 26 O’s, 16 U’s, and 12 N’s.

How are you Planning on Voting in the Congressional Race?

Response Number of VotersFor the Incumbent 21Against the Incumbent 26Undecided 16Non-responsive 12

We can now see that the representative doesn’t have the majority required to insure reelection ifthis sample is representative of the population.1

We have two variations on the frequency distribution: the percentage distribution and therelative frequency distribution. (The relative frequency is the decimal form of the percent.)

These two are simply the frequency distribution with the frequencies replaced with the percent-ages or the relative frequencies, whichever is appropriate.

Example 2.1.1.

1It probably isn’t. The sample size is very small.

Page 3: Chapter 2bkrein/Introductory Statistics... · 5.According to the International Bartenders Association, a Mai Tai is a drink which contains 40 ml light rum, 20 ml dark rum, 15 ml orange

2.1. DESCRIBING QUALITATIVE DATA GRAPHICALLY 13

Using the information from above construct a percentage distribution and a relative frequencydistribution.

Solution.

If we add the frequencies from the table above we will see that we have a total of 75 voters. Toget the percentages we divide the values by the total and then multiply by 100%

For the first category we have21

75× 100% = 28.0%

For the other three we have 34.7%, 21.3%, and 16.0%. Our percentage distribution is

How are you Planning on Voting in the Congressional Race?

Response Percent of VotersFor the Incumbent 28.0%Against the Incumbent 34.7%Undecided 21.3%Non-responsive 16.0%

n = 75

How are you Planning on Voting in the Congressional Race?

Response Relative Frequency of VotersFor the Incumbent .280Against the Incumbent .347Undecided .213Non-responsive .160

n = 75

We have also added the total. In looking at a distribution such as this, we should include thesample size, n. This gives us some sense of how significant the percentages are. If we noted that75% of people asked were in favor we might think ‘landslide’ but if this is based on only four peopleasked, we are not as impressed as if a thousand2 were asked. Instead of including n, some pollsinclude the margin of error. That will be covered in a later chapter. Further note that the twodistributions here are essentially the same. You can go back and forth from one to the other bymoving the decimal.

We can see that the incumbent should either get to work on getting some votes or start lookingfor a new job.

Once we have our data summarized in a distribution, we would like to get a ‘picture’ of thedata. We will discuss two here: the pie chart and the bar chart.

2Pretty close to what a lot of polls are.

Page 4: Chapter 2bkrein/Introductory Statistics... · 5.According to the International Bartenders Association, a Mai Tai is a drink which contains 40 ml light rum, 20 ml dark rum, 15 ml orange

14 CHAPTER 2. DESCRIBING DATA GRAPHICALLY

2.1.1 Pie Charts

In a pie chart, we take a ‘pie’ and slice it into pieces that correspond to the different categories withthe sizes of the wedges proportional to the size of the sample (or population). In the case above,the F category had 28% of the sample so it gets 28% of the pie, etc.

Example 2.1.2.

Draw a pie chart using the information above.

Solution.

We need to determine how many degrees each piece gets. For F, we need 28% of 360◦

28% of 360◦ = 100.8◦

34.7% of 360◦ = 124.9◦

21.3% of 360◦ = 76.7◦

16% of 360◦ = 57.6◦

Where we begin with our slices doesn’t matter. By measuring from the right side of a circle weget the following.

How are you Planning on Voting in the Congressional Race?

For the Incumbent

28%Against the Incumbent

34.7%

Undecided

21.3% Non-responsive

16%

Notice in the graph we have included labels, a title, and the percentages. We want the graphto be able to be understood without looking at the original information given.

2.1.2 Bar Charts

An alternative graphical display would be a bar chart. In a bar chart we draw bars whose height isdetermined by the frequency (or percentage, or relative frequency).

Page 5: Chapter 2bkrein/Introductory Statistics... · 5.According to the International Bartenders Association, a Mai Tai is a drink which contains 40 ml light rum, 20 ml dark rum, 15 ml orange

2.1. DESCRIBING QUALITATIVE DATA GRAPHICALLY 15

Example 2.1.3.

Use the voter data to draw a bar chart using the frequencies.

Solution.

The graph is included here. Note that the labels are clearly labeled, the graph has a title, andthe scale on the vertical axis is linear. The bars are labeled and the vertical axis starts at 0. Alsonote that the widths of the bars are all the same. If the widths are not the same, your brain isconfused about whether it is looking at the height of the bars or the areas. If the widths are equal,your brain doesn’t need to worry about that.

Further, if we had used the relative frequencies or the percentages, our graph would still havethe same shape. The only difference would be the scale on the vertical axis.

0

5

10

15

20

25

30

Voter Preference

Voters in Favor of Reelecting the Incumbent

Number

ofVoters

For

Against

Undecided

Norespon

se

A variation of the bar chart is the Pareto chart. It is a bar chart with the bars going from tallestto smallest. One advantage of the Pareto chart is it removes the arbitrarity nature of picking whichcategories go first, second, etc.

Page 6: Chapter 2bkrein/Introductory Statistics... · 5.According to the International Bartenders Association, a Mai Tai is a drink which contains 40 ml light rum, 20 ml dark rum, 15 ml orange

16 CHAPTER 2. DESCRIBING DATA GRAPHICALLY

0

5

10

15

20

25

30

Voter Preference

Voters in Favor of Reelecting the Incumbent

Number

ofVoters

Against

For

Undecided

Norespon

se2.1.3 Exercises

1. According to the Beer Marketer’s Insights, 2019, in 2018 the Unites States beer markethad Anheuser-Busch owning 40.8% of the market, MillerCoors had 23.5%, Constellation had9.9%, Heineken USA had 3.5%, Pabst Brewing had 2.1%, and the remaining 20% went toother domestics and imports.

(a) Summarize the information in a percentage distribution.

(b) Draw a pie chart.

(c) Draw a Pareto chart.

2. Madison Bumgarner, an MLB pitcher, has four types of pitches (with percent thrown): achangeup(7.7%), curveball(22.8%), two-seamer(34.3%), and cutter(35.2%).

(a) Summarize the information in a percentage distribution.

(b) Draw a pie chart.

(c) Draw a bar chart.

3. On the game Family Feud, 99 adults were asked to ‘Name something people try to kill byusing poison’. 68 said Rats/Mice, 13 said Bugs/Ants, 11 said Weeds, and 7 said a CheatingMate.

(a) Summarize the information in a percentage distribution.

(b) Draw a pie chart.

(c) Draw a bar chart using frequencies.

Page 7: Chapter 2bkrein/Introductory Statistics... · 5.According to the International Bartenders Association, a Mai Tai is a drink which contains 40 ml light rum, 20 ml dark rum, 15 ml orange

2.1. DESCRIBING QUALITATIVE DATA GRAPHICALLY 17

4. According to cbpp.org, 15% of your tax dollars go to Defense, 24% for Social Security, 16%for Medicare/Medicaid, 9% for Safety net programs, 7% for Interest on debt. The rest forOthers.

(a) Summarize the information in a percentage distribution.

(b) Draw a pie chart.

(c) Draw a Pareto chart.

5. According to the International Bartenders Association, a Mai Tai is a drink which contains40 ml light rum, 20 ml dark rum, 15 ml orange curacao, 15 ml orgreat syrup, and 10 ml limejuice.

(a) Summarize the information in a percentage distribution.

(b) Draw a pie chart.

(c) Draw a Pareto chart.

6. According to the UCS Satellite Database, ucsusa.org, there are 2062 operating satellites. Ofthose, 901 are from the United States, 153 are from Russia, 299 are from China, and 709 fromother countries.

(a) Summarize the information in a percentage distribution.

(b) Draw a pie chart.

(c) Draw a bar chart using frequencies.

7. Of the 901 satellites orbiting the Earth, 38 are civil, 523 are commercial, 164 are government,and 176 are military.

(a) Summarize the information in a percentage distribution.

(b) Draw a pie chart.

(c) Draw a bar chart using frequencies.

8. A May 2019 Pew poll asked teens what their favorite online platform was. 32% said YouTube,15% said Instagram, 35% said Snapchat, 10% said Facebook. Group the remaining as ‘Other’.

(a) Summarize the information in a percentage distribution.

(b) Draw a pie chart.

(c) Draw a Pareto chart.

9. In the 2017-18 school year, Cabrillo College transfered 252 students to the University ofCalifornia. 124 of those transfers went to the Santa Cruz campus, 28 to Santa Barbara, 17to San Diego, 16 to Los Angeles, 29 to Davis, 30 to Berlkeley, and the remainder to the othercampuses.

(a) Summarize the information in a percentage distribution.

(b) Draw a pie chart.

Page 8: Chapter 2bkrein/Introductory Statistics... · 5.According to the International Bartenders Association, a Mai Tai is a drink which contains 40 ml light rum, 20 ml dark rum, 15 ml orange

18 CHAPTER 2. DESCRIBING DATA GRAPHICALLY

(c) Draw a Pareto chart using frequencies.

10. The US military has 5 branches: The Army, Marine Corp, Navy, Air Force, and Coast Guard.In February 2018 the breakdown of the military was 471513 in the Army, 184,427 in the MarineCorps, 325,802 inthe Navy, 323,222 in the Air Force, and 42,042 in the Coast Guard.

(a) Summarize the information in a percentage distribution.

(b) Draw a pie chart.

(c) Draw a bar chart using percentages.

11. According to the 2017 Assual Report of the California DUI Management Information System,in 2015 there were 130468 drivers licenses suspended or revoked in California due to alcoholuse. There were 9074 suspensions due to .01 Zero tolerance suspensions, 86933 First-offendersuspentions, 31093 Repeat-offender suspensions, and 3368 Repeat-Offender revocations.

(a) Summarize the information in a frequency distribution.

(b) Draw a pie chart.

(c) Draw a bar chart using the percentages.

12. According to wikipedia, in the Gulf War (August 1990 to February 1991) there were 146 UStroops killed, 92 Senegal troops killed, 47 United Kingdom troops killed, 24 Saudi Arabiatroops killed and 32 troops from other contries killed.

(a) Summarize the information in a frequency distribution.

(b) Draw a pie chart.

(c) Draw a bar chart using frequencies.

13. nps.gov reports that in the American Civil War, there were 110100 Union soldiers killed inbattle, 224580 died due to diseases, 275174 were wounded in action, and 30192 died whileprisoners of war.

(a) Summarize the information in a percentage distribution.

(b) Draw a pie chart.

(c) Draw a Pareto chart using frequencies.

14. nps.gov reports that in the American Civil War, there were 94000 Confederate soldiers killedin battle, 164000 died due to diseases, 194026 were wounded in action, and 31000 died whileprisoners of war.

(a) Summarize the information in a percentage distribution.

(b) Draw a pie chart.

(c) Draw a Pareto chart using frequencies.

Page 9: Chapter 2bkrein/Introductory Statistics... · 5.According to the International Bartenders Association, a Mai Tai is a drink which contains 40 ml light rum, 20 ml dark rum, 15 ml orange

2.2. DESCRIBING QUANTITATIVE DATA GRAPHICALLY 19

2.2 Describing Quantitative Data Graphically

Just like with qualitative data, we need to be able to present the data in a way that is useful tothe reader. It needs to be organized. We will organize the data as we did before with a few moredetails that need to be ironed out.

On June 8th and June 13th, 2002, volunteers of the Pelagic Shark Research Foundation3 col-lected, among other fish, Miliobatis californica, or bat ray, and made measurements. One of themeasurement was the total length. Several lengths, in cm, are given:

27 36 22 26 26 26 33 36 22 22 34 29 39 28 31 25 35 22 20 44

27 34 30 26 34 27 28 28 28 24 29 22 24 36 33 33 34 37 34 35

We would like to organize this. We would like to create a frequency distribution like we didwith the election data when we discussed qualitative data. We need to make classes. With thequalitative data, the classes were obvious (count the ‘For’s’, etc.). For quantitative data, we need tocreate the classes. We will proceed by first deciding how many different classes to have. Generallyspeaking, we want 6 to 12 classes. The more data we have, the more classes we can have.

By scanning the data we notice that the minimum value is 20 and the maximum value is 39. Ifwe think about putting the data on a number line, the data is 19 cm across (= 39− 20) also calledthe range, more on the range later. Since we only have 40 data values, not a lot, we will shoot forabout 6 classes.

Class Width =range

number of classes=

19

6= 3.17

If we look at the data we will notice that the data values are all measured to the nearest wholenumber. We would like our classes to also be whole numbers. We will choose to round this to 3.Since 20 is the minimum value, we will start with 20. The first class will be 20 to 22. Since thedata values are rounded off, this is really getting bat rays with a length of 19.5 to 22.5 cm. This isthe width of 3 we want.

Let us construct our table and count the data values.

Total Length of Bat Rays Collected at Elkhorn Slough, June 8,13, 2002

Total Length Number of Bat Rays20-22 623-25 326-28 1129-31 432-34 835-37 638-40 2

3pelagic.org/slough/index.html

Page 10: Chapter 2bkrein/Introductory Statistics... · 5.According to the International Bartenders Association, a Mai Tai is a drink which contains 40 ml light rum, 20 ml dark rum, 15 ml orange

20 CHAPTER 2. DESCRIBING DATA GRAPHICALLY

2.2.1 Histograms and Polygons

Our next goal it to get a graph of the data. To do this we will draw a histogram. A histogram issimilar to a bar chart. One big difference with histograms is that they are representing quantitativedata. As such, the horizontal axis will be numeric, whereas in a bar chart, it is non-numeric. Sincethe lengths of the bat rays are continuous, we would like to see that in our graphs. To realize this,we will not have any gaps between the bars. Gaps would suggest discrete data.

Since the lengths for the bat rays were measured to the nearest cm, we have some apparent gapsin the table. For example we have 22 to 23 missing, 25-26, etc. They aren’t really missing. Thelengths need to be measured to some digit. In this case the data is measured to the nearest cm soif a bat ray is 25.8 cm long, it is classified as being 26 cm. With this in mind, we will use thesevalues when we create a graph.

20 22 24 26 28 30 32 34 36 38 40 4201234567891011121314

Total Length, cm

Bat Ray Lengths, Elkhorn SloughJune 8, 13, 2002

Number

ofBatRays

Notice that, like the qualitative graphs, we can see what information the graph is trying toconvey without having to read the problem. It has a title, the axes are labeled, the scales arelinear. Further, the bars are touching and they start and end on the class boundaries, not the classlimits. Along the horizontal axis, you will notice 2 small diagonal lines. These lines indicate a breakin the scale. If we required the axis start at 0, we would have a lot of ‘dead’ space.

So what can we get from the graph. It appears that there is a lot of up-down. If you look atthe scale, the big jumps are only a few rays. To make a statement about the distribution of thelengths of the rays would not be very reliable. The graph is based on a sample of only 40 rays. Ifyou are the one catching, tagging, mesuring, and releasing the rays, 40 may seem like a lot. Froma statistical standpoint, it isn’t very large. Finally, we used the frequencies along the vertical axis.We could just a easily used percentages or relative frequencies.

Page 11: Chapter 2bkrein/Introductory Statistics... · 5.According to the International Bartenders Association, a Mai Tai is a drink which contains 40 ml light rum, 20 ml dark rum, 15 ml orange

2.2. DESCRIBING QUANTITATIVE DATA GRAPHICALLY 21

Before we proceed with our next graphical display, some terminology is in order. For quantitativedata we have class limits, class boundaries, class midpoints, and class widths. Additionally, we havefrequencies, relative frequencies, and percents. The frequencies, relative frequencies, and percentsare done exactly as is done with qualitative data.

Class limits are the minimum and maximum possible measured data values. Class boundariesare the minimum and maximum possible values the data can actually be. Midpoints are the middlepossible value of the class. The class width is the distance between consecutive class midpoints.

In the frequency distribution above, the last class was 38-40. These are the class limits. Theactual possible lengths of the fish for this class is between 37.5 and 40.5. These are the classboundaries. The class midpoint is the average of the class limits or boundaries. So the class midpointis 39. The class width is the difference of the class boundaries. In this case it is 40.5− 37.5 = 3 aswe started with.

We can add all of this to our frequency distribution, if we like.

Class Class Class RelativeLimits Boudaries Midpoint Frequency Frequency Percent20-22 19.5-22.5 21 6 .150 1523-25 22.5-25.5 24 3 .075 7.526-28 25.5-28.5 27 11 .275 27.529-31 28.5-31.5 30 4 .100 1032-34 31.5-34.5 33 8 .200 2035-37 34.5-37.5 36 6 .150 1538-40 37.5-40.5 39 2 .050 5

This is awfully busy. We include here for the sake of completion. We use the boundaries onour histograms. In the histograms, and other graphical displays, we can use either the frequency,relative frequency, or percent. Furthermore we will use the midpoints in the next graphical display,the polygon.

2.2.2 Frequency Polygons

Another graphical display we will discuss is the Polygon. The polygon is made up of linear segmentsconnecting the top center of each bar in a histogram, if it were there. We can either construct ahistogram first and then ‘connect the dots’ or we can find the points, plot them, and then draw theline segments. The points we are looking for are of the form

(Midpoint of class, Number in the class)

In the following table, the midpoints have been added. Also, the classes before and after theclasses with data have been included. We will need them for the graph.

Page 12: Chapter 2bkrein/Introductory Statistics... · 5.According to the International Bartenders Association, a Mai Tai is a drink which contains 40 ml light rum, 20 ml dark rum, 15 ml orange

22 CHAPTER 2. DESCRIBING DATA GRAPHICALLY

Total Length Midpoint Number of Bat Rays17-19 18 020-22 21 623-25 24 326-28 27 1129-31 30 432-34 33 835-37 36 638-40 39 241-43 42 0

18 20 22 24 26 28 30 32 34 36 38 40 4201234567891011121314

Total Length, cm

Bat Ray Lengths, Elkhorn SloughJune 8, 13, 2002

Number

ofBatRays

In the table above, we included the classes 17-19 and 41-43 even though they have no data valuesin the class. If we don’t include those classes, our polygon will be ‘floating’ above the horizontalaxis. We didn’t have this issue with the histograms. If there were no data values in a class, therewas no bar there.

2.2.3 Describing Distributions

In the graphs we looked at before (histograms and polygons), the histograms had a ‘stair-step’ lookto them and the polygons had the ‘connect-the-dots’ look. Imagine if we had a lot more data valuesthan we had we could get more classes. This would reduce the effects we note in the graphs. If wecontinue, we will get what looks like a curve.

Not only do we want to summarize distributions using a graph, we would also like to describethem verbally. The adage ‘a picture is worth a thousand words’ is true when we look at graphs.

Page 13: Chapter 2bkrein/Introductory Statistics... · 5.According to the International Bartenders Association, a Mai Tai is a drink which contains 40 ml light rum, 20 ml dark rum, 15 ml orange

2.2. DESCRIBING QUANTITATIVE DATA GRAPHICALLY 23

We are not going to try to describe it in all detail. We have a few standard descriptors that we willemploy.

In the bat ray example, we are noticing the up-down effect of having not many data values. Thebat ray data has been extended to include all data from 2001 to 2007. The graph is given.

0 10 20 30 40 50 60 70 80 90 100 110 120 130 1400102030405060708090100110120

Total Length, cm

Bat Ray Lengths, Elkhorn Slough2001-2007

Number

ofBat

Rays

The gap on the horizontal axis is gone-it isn’t as much of a deal so we chose to remove thebreak. Also, the scales have changed. We have many more bat rays (n=564) so the vertical axisneeds to accommodate the larger frequencies and the horizontal axis goes much further for somelarger specimens that were captured. They are easy to overlook on the graph, the graph has fewsmall rectangles for the now non-empty classes.

Even though it would have been more convenient to not include the larger data values, this ispoor statistical practice. Hence, they are included.

When we look at the graph, the graph seems to rise to a peak and then trail off more graduallythan it rose. This is what we called ‘skewed right’. We also have a few bat rays that are long for abat ray in the slough. These are what are called outliers. An outlier is a data value which is verylarge or very small for the data set. More on those later. The graphs below show the basic typesof distributions. Not all distrubitons follow one of these: some defy description.

Uniform Distribution Bell Shaped Distribution

Page 14: Chapter 2bkrein/Introductory Statistics... · 5.According to the International Bartenders Association, a Mai Tai is a drink which contains 40 ml light rum, 20 ml dark rum, 15 ml orange

24 CHAPTER 2. DESCRIBING DATA GRAPHICALLY

Skewed Right Distribution Skewed Left Distribution

When we are describing our distributions, we cannot expect them to look exactly like the graphsabove. For the larger bat ray data, it would be skewed right. The smaller data set we can’t reallyclassify using one of the terms above.

2.2.4 Exercises

1. The speeds of vehicles on Highway 1 are measured and the speeds in mph are summarizedbelow.

Speed(mph) Number of vehicles55-57 358-60 1561-63 2664-66 4367-69 3270-72 2173-75 1576-78 979-81 582-84 2

(a) For each class find the class boundaries, class midpoints, percentages, and relative fre-quencies.

(b) Using the percentages, draw a histogram.

(c) Using the frequencies, draw a polygon.

(d) Describe the histogram as skewed left, skewed right, bell shaped, or none of these.

2. At Mountain Middle School, the PE teacher has students run as part of their exercise regimen.The times to run a sprint are in the table.

Time(seconds) Number of Students8-11 8

12-15 4616-19 3420-23 1224-27 2

Page 15: Chapter 2bkrein/Introductory Statistics... · 5.According to the International Bartenders Association, a Mai Tai is a drink which contains 40 ml light rum, 20 ml dark rum, 15 ml orange

2.2. DESCRIBING QUANTITATIVE DATA GRAPHICALLY 25

(a) For each class find the class boundaries, class midpoints, percentages, and relative fre-quencies.

(b) Using the percentages, draw a histogram.

(c) Using the frequencies, draw a polygon.

(d) Describe the histogram as skewed left, skewed right, bell shaped, or none of these.

3. The 2018 Big Sur International Marathon Race times for women is summarized4

Times(hr:min) Number of Runners3:00 - 3:30 453:30 - 4:00 1844:00 - 4:30 3304:30 - 5:00 4575:00 - 5:30 3165:30 - 6:00 2826:00 - 6:30 67

(a) For each class find the class midpoints, percentages, and relative frequencies.

(b) Using the relative frequencies, draw a histogram.

(c) Using the frequencies, draw a polygon.

(d) Describe the histogram as skewed left, skewed right, bell shaped, or none of these.

4. The 2018 Big Sur International Marathon Race times for men is summarized5

Times(hr:min) Number of Runners2:00 - 2:30 12:30 - 3:00 263:00 - 3:30 1653:30 - 4:00 3294:00 - 4:30 3574:30 - 5:00 3385:00 - 5:30 2255:30 - 6:00 1866:00 - 6:30 34

(a) For each class find the class midpoints, percentages, amd relative frequencies.

(b) Using the relative frequencies, draw a histogram.

(c) Using the frequencies, draw a polygon.

(d) Describe the histogram as skewed left, skewed right, bell shaped, or none of these.

5. The population of every 50 states of the US is summarized in the table below. The data isfrom the 2000 Census. Draw a histogram and comment on the shape of the distribution.

4Several finished after 6:00 without official times and were included in the last class.5Several finished after 6:00 without official times and were included in the last class.

Page 16: Chapter 2bkrein/Introductory Statistics... · 5.According to the International Bartenders Association, a Mai Tai is a drink which contains 40 ml light rum, 20 ml dark rum, 15 ml orange

26 CHAPTER 2. DESCRIBING DATA GRAPHICALLY

Population Number of States0 - 3,000,000 22

3,000,000 - 6,000,000 156,000,000 - 9,000,000 6

9,000,000 - 12,000,000 212,000,000 - 15,000,000 215,000,000 - 18,000,000 118,000,000 - 21,000,000 221,000,000 - 24,000,000 024,000,000 - 27,000,000 027,000,000 - 30,000,000 030,000,000 - 33,000,000 033,000,000 - 36,000,000 1

6. The area of each state in the Continental United States, in square miles, is summarized inthe table.

Area (mi2) Number of States0 - 50,000 19

50,000 - 100,000 23100,000 - 150,000 4150,000 - 200,000 1200,000 - 250,000 0250,000 - 300,000 1

(a) Draw a frequency histogram. Using a ruler, make each bar in the histogram betweenhalf an inch and an inch.

(b) Describe the histogram as skewed left, skewed right, bell shaped, or none of these.

(c) Alaska has an area of 663,000 mi2. How far off your paper would a bar need to be toaccommodate Alaska if it were part of the data set.

7. According to the Roller Coaster Database, rcdb.com, in November 2019, the heights of oper-ating roller coasters with reported heights in North America are given in the following table.FYI: The tallest is Kingda Ka in New Jersey at 456 feet. By comparison, the Giant Dipperin Santa Cruz is 70 feet tall.

Height(feet) Number of Roller Coasters1-50 191

51-100 174101-150 127151-200 34201-250 25251-300 0301-350 4351-400 0401-450 2451-500 1

(a) Draw a frequency histogram.

Page 17: Chapter 2bkrein/Introductory Statistics... · 5.According to the International Bartenders Association, a Mai Tai is a drink which contains 40 ml light rum, 20 ml dark rum, 15 ml orange

2.2. DESCRIBING QUANTITATIVE DATA GRAPHICALLY 27

(b) Draw the frequency polygon for the data.

(c) Describe the histogram as skewed left, skewed right, bell shaped, or none of these.

8. According to an online Beatles blog, the length of Beatles songs are listed in the table below.

Length (seconds) Number of Songs1-60 4

61-120 24121-180 134181-240 34241-300 9301-360 2361-420 1421-480 2481-540 1

(a) Draw a frequency histogram.

(b) Draw the frequency polygon for the data.

(c) Describe the histogram as skewed left, skewed right, bell shaped, or none of these.

9. The total annual rainfall for the years 1916 through 2015 are given (a few years have missingreports) in inches per year.

20.29 9.77 35.74 20.01 25.14 28.25 22.94 8.71 23.76 18.3724.28 14.92 14.46 16.45 11.02 24.50 15.12 12.46 19.87 21.0426.38 25.97 14.71 24.14 36.59 24.80 20.82 18.55 21.63 18.6613.28 17.84 15.76 19.30 23.62 27.78 19.38 14.26 17.96 26.6314.40 29.82 15.51 17.07 10.53 18.52 25.41 15.31 21.31 14.3831.93 14.09 28.22 20.78 18.48 13.00 31.79 32.14 17.10 8.4911.07 29.40 19.01 26.65 16.92 39.02 36.78 19.82 19.16 28.2514.39 13.36 16.58 14.81 18.34 19.05 29.24 5.14 22.66 21.3219.68 21.52 22.27 15.44 29.30 24.91 8.90 10.14 11.73 18.3925.48 16.35 13.09 9.28 17.11 24.80

(a) Construct a frequency distribution for the annual rainfall in Watsonville.

(b) Draw the frequency histogram for the data.

(c) Draw the frequency polygon for the data.

(d) Describe the histogram as skewed left, skewed right, bell shaped, or none of these.

10. The following are the heights, in cm, of the first 45 Presidents. (There are only 44 data valuesbecause one President was elected in two nonconsecutive terms)

188 170 189 163 183 171 185 168 173183 173 173 175 178 183 193 178 173174 183 188 180 168 170 178 182 180183 178 182 188 175 179 185 192 182183 177 185 188 188 182 185 191

Page 18: Chapter 2bkrein/Introductory Statistics... · 5.According to the International Bartenders Association, a Mai Tai is a drink which contains 40 ml light rum, 20 ml dark rum, 15 ml orange

28 CHAPTER 2. DESCRIBING DATA GRAPHICALLY

(a) Construct a frequency distribution for the heights of US presidents.

(b) Draw the frequency histogram for the data.

(c) Draw the frequency polygon for the data.

(d) Describe the histogram as skewed left, skewed right, bell shaped, or none of these.

Page 19: Chapter 2bkrein/Introductory Statistics... · 5.According to the International Bartenders Association, a Mai Tai is a drink which contains 40 ml light rum, 20 ml dark rum, 15 ml orange

2.3. DOTPLOTS AND STEM AND LEAF DISPLAYS 29

2.3 Dotplots and Stem and Leaf Displays

There are several different ways to display data graphically. We are only looking at a few of them.The next one we will look at is the dotplot. As the name suggests, we will look at the dataindividually as dots. Unlike the histograms and polygons where we looked at graphs of summarydata, in dotplots we will be able to see the individual data values. Since it is a graphical displaywe would the graph to stand alone. Titles etc.

2.3.1 Dotplots

Example 2.3.1.

The 2019 Oscar nominations for Best Picture are:“Black Panther”, “BlacKkKlansman”, “ Bo-hemial Rhapsody”, “The Favourite”, “Green Book”, “Roma”, “A Star Is Born”, and “Vice”. Therunning times for these movies are, respectively, 2h 14 min, 2 h 15 min, 2 h 14 min, 1h 59 min,2h 10 min, 2h 15 min, 2h 16min, and 2h 12 min. Construct a dotplot of running times of the bestpicture nominations.

Solution.

We start with a linear scale. Notice in the plot the scale is in hours and minutes. This is a morenatural way to display the time instead of converting to minutes only. Also note that each moviegets its own dot. When there are two or more data values that are the same, we simply stack themas shown in the plot. Notice that one of the data values is much smaller than the others. This iswhat is called an outlier. Right now, we have no way to objectively determine if a data value is anoutlier or not.

1:58 2:0 2:2 2:4 2:6 2:8 2:10 2:12 2:14 2:16 2:18

Running Times of 2019 Best Picture Oscar Nominees

Running time, hours:minutes

An outlier is a data value which is very large or very small relative to the rest of the data set.

Let us look at two dot plots together

Example 2.3.2.

The scores for the 2009-2018 season Super Bowl are: 31-17, 31-25, 21-17, 34-31, 43-8, 28-24,24-10, 34-28, 41-33, and 13-3

Construct a dot plot with the winning and losing scores. What do you observe?

Page 20: Chapter 2bkrein/Introductory Statistics... · 5.According to the International Bartenders Association, a Mai Tai is a drink which contains 40 ml light rum, 20 ml dark rum, 15 ml orange

30 CHAPTER 2. DESCRIBING DATA GRAPHICALLY

Construct a dotplot for the winning point differential (winning score-losing score)

Solution.

The winning scores are: 31, 31, 21, 34, 43, 28, 24, 34, 41, and 13The losing scores are: 17, 25, 17, 31, 8, 24, 10, 28, 33, and 3

0 5 10 15 20 25 30 35 40 45

Super Bowl XLIV thru LIII Scores

Score

Winning

Losing

Notice that the winning scores tend to be greater than the losing scores. We can see that becausethe ‘dots’ are further to the right for the winning scores. There are no outliers for either group andthe distribution doesn’t show any great deal of skewness.

The differential scores are 14, 6, 4, 3, 35, 4, 14, 6, 8, 10

0 5 10 15 20 25 30 35 40

Super Bowl XLIV thru LIII

Final Scored Differential

Winning Score - Losing Score

For this, we have an outlier of 35 (A sad day for Bronco fans). Furthermore, there are no datavalues which are negative. Why?

When we are looking at a dotplot, we are looking for several things. We can see if the datavalues are clustered in any particular place, we can assess skewness, etc. If you look at the dotplot,you will notice the data values are all clustered to the right with the exception of one on the left.It is ‘far away’ from the rest of the data. This is what is called an outlier.

2.3.2 Stem and Leaf Displays

One of the disadvantages of looking at a frequency distribution of a data set is we lose the actualdata values. When we have a class of, say 10-12 and we say there are 4 data values in that class,what are the values? Although we lose the individual data values, we gain an overall idea of thedistribution as a whole. One way to get the best of both worlds, when possible, is to construct astem and leaf plot.

Page 21: Chapter 2bkrein/Introductory Statistics... · 5.According to the International Bartenders Association, a Mai Tai is a drink which contains 40 ml light rum, 20 ml dark rum, 15 ml orange

2.3. DOTPLOTS AND STEM AND LEAF DISPLAYS 31

Consider the data set25 30 32 67 40 55 53 76 48 3132 58 42 66 60 51 42 39 61 4838 27 43 34 52 60 44 31 20 38

For each data value we will split it into two parts: a stem and a leaf. In this case we will splitthe data values between the tens place and the ones place so we get

2 5

Stem Leaf

If we look at the data values we will have the stems 2 through 7. We list them vertically, inorder and then to the right of the lines we list all the leaves and we get:

2 5 7 0

3 0 2 1 2 9 8 4 1 8

4 0 8 2 2 8 3 4

5 5 3 8 1 2

6 7 6 0 1 0

7 6

We can sort the leaves and we get the following display:

2 0 5 7

3 0 1 1 2 2 4 8 8 9

4 0 2 2 3 4 8 8

5 1 2 3 5 8

6 0 0 1 6 7

7 6

To see the shape of the distribution, think of this as a sideways histogram. We can see that thedistribution could be described as skewed right. This is a fairly efficient way to sort data as well.

There are times when the options for classes are not sufficient. We need to modify the displayslightly as we will see in the next example.

Example 2.3.3.

Recall the data for the bat rays from before: On June 8th and June 13th, 2002, volunteersof the Pelagic Shark Research Foundation collected Miliobatis californica, or bat ray, and mademeasurements. One of the measurement was the total length. Several lengths, in cm, are given:

27 36 22 26 26 26 33 36 22 22 34 29 39 28 31 25 35 22 20 4427 34 30 26 34 27 28 28 28 24 29 22 24 36 33 33 34 37 34 35

Solution.

Page 22: Chapter 2bkrein/Introductory Statistics... · 5.According to the International Bartenders Association, a Mai Tai is a drink which contains 40 ml light rum, 20 ml dark rum, 15 ml orange

32 CHAPTER 2. DESCRIBING DATA GRAPHICALLY

If we look at the data, almost all the data have stems of 2 or 3. One way to deal with this is tosplit the classes up. We will have two stems with 2’s. The first will take leaves that are 0-4, andthe other 5-9. We proceed.

The leaves here are in the order given in the data.

Bat Ray Lengths, cm, Collected June 8, 13, 2002

2 2 2 2 2 0 4 2 4

2 7 6 6 6 9 8 5 7 6 7 8 8 8 9

3 3 4 1 4 0 4 3 3 4 4

3 6 6 9 5 6 7 5

4 4

Here is display with the leaves ordered.

Bat Ray Lengths, cm, Collected June 8, 13, 2002

2 0 2 2 2 2 2 4 4

2 5 6 6 6 6 7 7 7 8 8 8 8 9 9

3 0 1 3 3 3 4 4 4 4 4

3 5 5 6 6 6 7 9

4 4

2.3.3 Exercises

1. While on vacation, a vacationer spent, on food, the following amounts per day, in dollars:26.31, 18.56, 43.65, 19.22, 25.63, 31.2, 14.59. Draw a dotplot using the data and comment onthe data.

2. A basketball player records the number of points scored in the last several games. The scoreswere: 16, 13, 2, 18, 20, 15, 19, 15. Draw a dotplot using the data and comment on the data.

3. At an assembly line, workers assemble computers. At one stage of the process, the time tocomplete the task is too long so employees are getting retrained. The times, in seconds, beforeand after the training to complete the stage are recorded:

Before: 125, 136, 129, 133, 141, 155, 135

After: 118, 123, 122, 126, 126, 138, 150, 124

(a) Draw two scatterplots using the same axis and scale.

(b) Does it appear as if the training has helped? Explain

4. A restaurant owner owns two pizza shops: UberThick and UltraThin. The number of pizzassold for several days at each restaurant are below.

UberThick: 56, 84, 99, 67, 77, 66, 81, 103

Page 23: Chapter 2bkrein/Introductory Statistics... · 5.According to the International Bartenders Association, a Mai Tai is a drink which contains 40 ml light rum, 20 ml dark rum, 15 ml orange

2.3. DOTPLOTS AND STEM AND LEAF DISPLAYS 33

UltraThin: 46, 55, 90, 74, 66, 50, 71

(a) Draw two scatterplots using the same axis and scale.

(b) Based on the dot plots, which store seems busier?

5. Following are the number of non-fatal shark attacks in the United States for the years 1987to 2016. Use the data to draw a Stem and leaf display. Put leaves in order and describe thedistribution, if possible.

53 56 55 52 57 38 33 31 52 53

45 39 30 45 43 51 45 29 21 27

23 39 17 18 17 16 15 23 12 13

6. The number of US fatalities from hurricanes for the years 1990 through 2019 are given.

(a) One of the data values is clearly an outlier. Which data value?

(b) Remove the data value and construct an ordered stem and leaf display for the remainingdata6.

42, 10, 35, 18, 36, 74, 55, 69, 553, 45,

22, 126, 81, 67, 38, 35, 54, 55, 40, 41,

94, 130, 68, 26, 30, 69, 33, 39, 39, 53

7. The number of tornados in the US for the years 1990 through 2019 are given. Construct astem and leaf plot.

1,482 1,123 1,418 976 1,178 928 903 939 1,703 1,2821,159 1,692 1,092 1,103 1,265 1,817 1,374 934 1,215 1,0751,339 1,424 1,148 1,173 1,235 1,082 1,173 1,297 1,132 1,133

8. At The Spinning Wheel, a bicycle shop, the owner checks the tire pressure and records thepressures, in kiloPascals(kPa). The pressures follow. Construct a stem and leaf plot. Deter-mine if there are any outliers and describe the distribution.

331 263 214 85 609 407 252 289 70 122580 147 489 593 807 390 135 379 278 241

6Usually we don’t remove data values because they are outliers.

Page 24: Chapter 2bkrein/Introductory Statistics... · 5.According to the International Bartenders Association, a Mai Tai is a drink which contains 40 ml light rum, 20 ml dark rum, 15 ml orange

34 CHAPTER 2. DESCRIBING DATA GRAPHICALLY