33
1 3.2 Visualizing Distributions of data ! A frequency table provides information on the distribution of data. " When we discuss the distribution of a variable, we are referring to the possible values, and which of the values occur more (or less) frequently than the others. Political affiliation Frequency Democrat 517 Republican 371 Independent 112 Possible values Occurred a lot Occurred less frequently

A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

1

3.2 Visualizing Distributions of data

! A frequency table provides information on the distribution of data. " When we discuss the distribution of a variable,

we are referring to the possible values, and which of the values occur more (or less) frequently than the others.

Political affiliation Frequency Democrat 517 Republican 371 Independent 112

Possible values

Occurred a lot

Occurred less frequently

Page 2: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

2

The distribution of the data

! The distribution of data is the way the data values are spread over all possible values. " What values occur frequently? " If the variable is numeric, what is the maximum

value? What is the minimum value? " What is the “shape” of the distribution

390380370360350340330

15

10

5

0

Weight (grams)

Freq

uenc

y

Weight of Contents of Cans of Cola

Page 3: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

3

Graphical displays of distributions

! As the phrase goes… “a picture is worth 1000 words”, and distributions are often better conveyed using graphics rather than tables.

Political affiliation

Frequency

Democrat 517 Republican 371 Independent 112

Democrat Republican Independent

Political affiliation in a 1000 person survey

politican affiliation

frequ

ency

of a

ffilia

tion

0100

200

300

400

500

600

Page 4: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

Pioneer in Statistical Graphics

! Florence Nightingale " See video clip from “Joy of Statistics”

4

Page 5: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

5

Bar graph

! Used to represent frequencies (or relative frequencies) for qualitative or categorical variables.

Democrat Republican Independent

Political affiliation in a 1000 person survey

politican affiliation

frequ

ency

of a

ffilia

tion

0100

200

300

400

500

600

Page 6: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

6

Bar graph - labels

! Always provide useful labels.

Democrat Republican Independent

Political affiliation in a 1000 person survey

politican affiliation

frequ

ency

of a

ffilia

tion

0100

200

300

400

500

600

Main title

Vertical axis label

Horizontal axis label

Categories

Tick marks

Page 7: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

7

Bar graph - formatting

! Some things to remember…

Democrat Republican Independent

Political affiliation in a 1000 person survey

politican affiliation

frequ

ency

of a

ffilia

tion

0100

200

300

400

500

600

Space between bars (specifically when this is a categorical variable plot)

Uniform (arbitrary) bar widths

Some white space at top

Page 8: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

8

Bar graph – Pareto chart

! A bar graph in which the bars are arranged in frequency order is called a Pareto chart.

Democrat Republican Independent

Political affiliation in a 1000 person survey

politican affiliation

frequ

ency

of a

ffilia

tion

0100

200

300

400

500

600

A Pareto chart (descending order)

Page 9: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

9

Bar graph – Pareto chart

! A bar graph in which the bars are arranged in frequency order is called a Pareto chart.

Not a Pareto chart

Democrat Independent Republican

Political affiliation in a 1000 person survey

politican affiliation

frequ

ency

of a

ffilia

tion

0100

200

300

400

500

600

Page 10: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

10

Bar graph – Pareto chart

Not a Pareto chart (but it is a bar chart)

A Pareto chart (and also a bar chart)

Page 11: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

11

Example: Deflategate

!  In 2014, there was a National Football League (NFL) scandal called ‘Deflategate’.

! The Patriots were accused of underinflating their game footballs, which would allow for fewer fumbles (an unfair advantage).

! Did it look like the Patriots had fewer fumbles? If so, how many fewer? We will actually look at the data as number of Plays per Fumble (high-> fewer fumbles).

Page 12: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

12

Example: Deflategate (offensive plays)

http://www.sharpfootballanalysis.com/blog/2015/the-new-england-patriots-prevention-of-fumbles-is-nearly-impossible

Categories (i.e. teams)

Presented as plays per fumble

Frequency of fumbles

Page 13: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

13

Dot plot – similar to a bar graph

!  If there are only a small number of observations (or counts), a dot plot can be used.

! One dot per observation.s ! Sometimes seen as a quick and easy plot in the engineering field.

Page 14: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

14

Pie Charts ! Also used to plot qualitative variables. ! A pie chart is a circle divided so that each

wedge represents the relative frequency of a particular category.

Political affiliation

Frequency Relative frequency

Democrat 517 0.517 Republican 371 0.371 Independent 112 0.112

Democrat

Republican

Independent

Political affiliation in a 1000 person survey

51.7%

11.2%

37.1%

Page 15: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

15

Pie Charts ! As I may have mentioned earlier, research

has been done that shows that our brains do not interpret pie charts very well.

! Consider other options first before presenting a pie chart.

Our brains comprehend this one better than this one.

Page 16: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

16

Histograms

! A histogram is like a bar graph, but it shows a distribution for a quantitative variable.

! The bars have a natural order (thus, the classes must be quantitative in nature) and the bar widths have specific meaning.

! The bars in a histogram touch each other because there are no gaps between the categories.

Page 17: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

17 17

Histogram Fr

eque

ncy

Measurement

How ‘often’ a value falls into a given bin

Quantitative values grouped into bins

Page 18: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

18

Histogram Example !  24 cola cans were sampled and weighed. ! A frequency table and histogram were

created:

390380370360350340330

15

10

5

0

Weight (grams)

Freq

uenc

y

Weight of Contents of Cans of Cola

Class range of values

Frequency

[340,350) 1 [350,360) 11 [360,370) 8 [370,380) 4

Page 19: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

19

Histogram Example

390380370360350340330

15

10

5

0

Weight (grams)

Freq

uenc

yWeight of Contents of Cans of Cola

No space between bars (specifically when this is a quantitative variable plot)

Rearranging these bars (as we did in a Pareto chart for qualitative data) would not make sense here. The classes are in order from smallest to largest.

Some white space at top

Axes and labels still important.

Page 20: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

20

Histogram Example ! Same data, more classes (narrower bins)…

histogram looks a bit different.

Class range of values

Frequency

[345,350) 1 [350,355) 6 [355,360) 5 [360,365) 1 [365,370) 7 [370,375) 3 [375,380) 1

390380370360350340330

10

5

0

Weight (grams)

Freq

uenc

y

Weight of Contents of Cans of Cola

Page 21: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

21

Example: Deflategate (all plays)

http://www.sharpfootballanalysis.com/blog/2015/the-new-england-patriots-prevention-of-fumbles-is-nearly-impossible

Numeric variable

Number of teams falling into each bin

NOTE: This author should have the bars touching each other for a correct histogram presentation.

Don’t put space between bars in a histogram

Patriots and their 187 plays/fumble

Page 22: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

22

Displaying Quantitative Data

! Histogram " Provides a picture or shape of the distribution

of the data. " Collects values into bins. " Bins should be of equal width and they should

touch each other. " Different bin choices can yield different

pictures. " Can show frequencies or relative frequencies

Page 23: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

23

Stem-and-leaf plots ! We can’t see individual data points in a

histogram due to the binning and the use of the bars for frequencies.

! A stem-and-leaf plot is similar to a histogram, but individual data points are identified.

! As with dot plots, this type of plot probably makes the most sense when the number of observations is relatively small.

Page 24: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

24

Stem-and-leaf plots ! One leaf is associated with one data point.

! Example data: 5.4, 0.7, 3.0, 2.6 0.3, 2.8, 5.2, 2.6

Here, a ‘leaf’ is the value one place to the right of the decimal place.

Page 25: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

25

Stem-and-leaf plots ! One leaf is associated with one data point.

! Example data: 5.4, 0.7, 3.0, 2.6 0.3, 2.8, 5.2, 2.6

Here, a ‘leaf’ is the value one place to the right of the decimal place.

Page 26: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

26

Stem-and-leaf example ! Recall the 80 observations on compressive

strengths:

105 97 245 163 207 134 218 199 160 196 221 154 228 131 180 178 157 151 175 201 183 153 174 154 190 76 101 142 149 200 186 174 199 115 193 167 171 163 87 176 121 120 181 160 194 184 165 145 160 150 181 168 158 208 133 135 172 171 237 170 180 167 176 158 156 229 158 148 150 118 143 141 110 133 123 146 169 158 135 149

Page 27: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

27

Stem-and-leaf example !  80 observations ! Min: 76, Max: 245 ! Here, a ‘leaf’ represents

the “ones place”. !  Looks somewhat like a

histogram turned on its side, but we can identify individual data points.

! Gives you a feel for the distribution of the data.

7 | 6 8 | 7 9 | 7 10 | 15 11 | 058 12 | 013 13 | 133455 14 | 12356899 15 | 001344678888 16 | 0003357789 17 | 0112445668 18 | 0011346 19 | 034699 20 | 0178 21 | 8 22 | 189 23 | 7 24 | 5 The decimal point is 1 digit(s) to the right of the |

Page 28: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

28

Line charts ! Also used to represent a quantitative

variable.

! Created by connecting the ‘center dots’ at the top of the bars of a histogram.

Page 29: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

29

Line chart example A histogram is also shown here, but it is not part of the line chart

Page 30: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

30

Time-Series Graph

!  If a histogram or line chart has a horizontal axis of time, then it is a time-series graph.

! Time series plots show how things change over time.

! Often used with financial market information or housing data.

Page 31: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

31

Time-Series Graph – example

! A line chart with a horizontal axis of time (Year) # a times series graph.

Page 32: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

32

Homes sold in Iowa City by zip code and month

Year (data by the month)

Time-Series Graph – example

1)  What is the general trend over the years 2006-2011? 2) What is the general trend within each year? 3) What is the width of the underlying bin?

Page 33: A frequency table provides information on the distribution ...homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_3.2_gra… · A frequency table provides information on the distribution

33

Number of Olympic medals

Year

Time-Series Graph – example

1) What is the width of the underlying bin?