11
STAT1010 – picturing data 1 1 3.2 Visualizing Distributions of data ! A frequency table provides information on the distribution of data. "When we discuss the distribution of a variable, we are referring to the possible values, and which of the values occur more (or less) frequently than the others. Political affiliation Frequency Democrat 517 Republican 371 Independent 112 Possible values Occurred a lot Occurred less frequently 2 The distribution of the data ! The distribution of data is the way the data values are spread over all possible values. "What values occur frequently? "If the variable is numeric, what is the maximum value? What is the minimum value? "What is the “shape” of the distribution 390 380 370 360 350 340 330 15 10 5 0 Weight (grams) Frequency Weight of Contents of Cans of Cola 3 Graphical displays of distributions ! As the phrase goes“a picture is worth 1000 words”, and distributions are often better conveyed using graphics rather than tables. Political affiliation Frequency Democrat 517 Republican 371 Independent 112 Democrat Republican Independent Political affiliation in a 1000 person survey politican affiliation frequency of affiliation 0 100 200 300 400 500 600

STAT1010 – picturing datahomepage.stat.uiowa.edu › ~rdecook › stat1010 › notes › Section... · 2016-09-12 · STAT1010 – picturing data 1 1 3.2 Visualizing Distributions

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: STAT1010 – picturing datahomepage.stat.uiowa.edu › ~rdecook › stat1010 › notes › Section... · 2016-09-12 · STAT1010 – picturing data 1 1 3.2 Visualizing Distributions

STAT1010 – picturing data

1

1

3.2 Visualizing Distributions of data

! A frequency table provides information on the distribution of data. " When we discuss the distribution of a variable,

we are referring to the possible values, and which of the values occur more (or less) frequently than the others.

Political affiliation Frequency Democrat 517 Republican 371 Independent 112

Possible values

Occurred a lot

Occurred less frequently

2

The distribution of the data

! The distribution of data is the way the data values are spread over all possible values. " What values occur frequently? " If the variable is numeric, what is the maximum

value? What is the minimum value? " What is the “shape” of the distribution

390380370360350340330

15

10

5

0

Weight (grams)

Freq

uenc

y

Weight of Contents of Cans of Cola

3

Graphical displays of distributions

! As the phrase goes… “a picture is worth 1000 words”, and distributions are often better conveyed using graphics rather than tables.

Political affiliation

Frequency

Democrat 517 Republican 371 Independent 112

Democrat Republican Independent

Political affiliation in a 1000 person survey

politican affiliation

frequ

ency

of a

ffilia

tion

0100

200

300

400

500

600

Page 2: STAT1010 – picturing datahomepage.stat.uiowa.edu › ~rdecook › stat1010 › notes › Section... · 2016-09-12 · STAT1010 – picturing data 1 1 3.2 Visualizing Distributions

STAT1010 – picturing data

2

Pioneer in Statistical Graphics

! Florence Nightingale " See video clip from “Joy of Statistics”

4

5

Bar graph

! Used to represent frequencies (or relative frequencies) for qualitative or categorical variables.

Democrat Republican Independent

Political affiliation in a 1000 person survey

politican affiliation

frequ

ency

of a

ffilia

tion

0100

200

300

400

500

600

6

Bar graph - labels

! Always provide useful labels.

Democrat Republican Independent

Political affiliation in a 1000 person survey

politican affiliation

frequ

ency

of a

ffilia

tion

0100

200

300

400

500

600

Main title

Vertical axis label

Horizontal axis label

Categories

Tick marks

Page 3: STAT1010 – picturing datahomepage.stat.uiowa.edu › ~rdecook › stat1010 › notes › Section... · 2016-09-12 · STAT1010 – picturing data 1 1 3.2 Visualizing Distributions

STAT1010 – picturing data

3

7

Bar graph - formatting

! Some things to remember…

Democrat Republican Independent

Political affiliation in a 1000 person survey

politican affiliation

frequ

ency

of a

ffilia

tion

0100

200

300

400

500

600

Space between bars (specifically when this is a categorical variable plot)

Uniform (arbitrary) bar widths

Some white space at top

8

Bar graph – Pareto chart

! A bar graph in which the bars are arranged in frequency order is called a Pareto chart.

Democrat Republican Independent

Political affiliation in a 1000 person survey

politican affiliation

frequ

ency

of a

ffilia

tion

0100

200

300

400

500

600

A Pareto chart (descending order)

9

Bar graph – Pareto chart

! A bar graph in which the bars are arranged in frequency order is called a Pareto chart.

Not a Pareto chart

Democrat Independent Republican

Political affiliation in a 1000 person survey

politican affiliation

frequ

ency

of a

ffilia

tion

0100

200

300

400

500

600

Page 4: STAT1010 – picturing datahomepage.stat.uiowa.edu › ~rdecook › stat1010 › notes › Section... · 2016-09-12 · STAT1010 – picturing data 1 1 3.2 Visualizing Distributions

STAT1010 – picturing data

4

10

Bar graph – Pareto chart

Not a Pareto chart (but it is a bar chart)

A Pareto chart (and also a bar chart)

11

Example: Deflategate

!  In 2014, there was a National Football League (NFL) scandal called ‘Deflategate’.

! The Patriots were accused of underinflating their game footballs, which would allow for fewer fumbles (an unfair advantage).

! Did it look like the Patriots had fewer fumbles? If so, how many fewer? We will actually look at the data as number of Plays per Fumble (high-> fewer fumbles).

12

Example: Deflategate (offensive plays)

http://www.sharpfootballanalysis.com/blog/2015/the-new-england-patriots-prevention-of-fumbles-is-nearly-impossible

Categories (i.e. teams)

Presented as plays per fumble

Frequency of fumbles

Page 5: STAT1010 – picturing datahomepage.stat.uiowa.edu › ~rdecook › stat1010 › notes › Section... · 2016-09-12 · STAT1010 – picturing data 1 1 3.2 Visualizing Distributions

STAT1010 – picturing data

5

13

Dot plot – similar to a bar graph

!  If there are only a small number of observations (or counts), a dot plot can be used.

! One dot per observation.s ! Sometimes seen as a quick and easy plot in the engineering field.

14

Pie Charts ! Also used to plot qualitative variables. ! A pie chart is a circle divided so that each

wedge represents the relative frequency of a particular category.

Political affiliation

Frequency Relative frequency

Democrat 517 0.517 Republican 371 0.371 Independent 112 0.112

Democrat

Republican

Independent

Political affiliation in a 1000 person survey

51.7%

11.2%

37.1%

15

Pie Charts ! As I may have mentioned earlier, research

has been done that shows that our brains do not interpret pie charts very well.

! Consider other options first before presenting a pie chart.

Our brains comprehend this one better than this one.

Page 6: STAT1010 – picturing datahomepage.stat.uiowa.edu › ~rdecook › stat1010 › notes › Section... · 2016-09-12 · STAT1010 – picturing data 1 1 3.2 Visualizing Distributions

STAT1010 – picturing data

6

16

Histograms

! A histogram is like a bar graph, but it shows a distribution for a quantitative variable.

! The bars have a natural order (thus, the classes must be quantitative in nature) and the bar widths have specific meaning.

! The bars in a histogram touch each other because there are no gaps between the categories.

17 17

Histogram

Freq

uenc

y

Measurement

How ‘often’ a value falls into a given bin

Quantitative values grouped into bins

18

Histogram Example !  24 cola cans were sampled and weighed. ! A frequency table and histogram were

created:

390380370360350340330

15

10

5

0

Weight (grams)

Freq

uenc

y

Weight of Contents of Cans of Cola

Class range of values

Frequency

[340,350) 1 [350,360) 11 [360,370) 8 [370,380) 4

Page 7: STAT1010 – picturing datahomepage.stat.uiowa.edu › ~rdecook › stat1010 › notes › Section... · 2016-09-12 · STAT1010 – picturing data 1 1 3.2 Visualizing Distributions

STAT1010 – picturing data

7

19

Histogram Example

390380370360350340330

15

10

5

0

Weight (grams)

Freq

uenc

y

Weight of Contents of Cans of ColaNo space between bars (specifically when this is a quantitative variable plot)

Rearranging these bars (as we did in a Pareto chart for qualitative data) would not make sense here. The classes are in order from smallest to largest.

Some white space at top

Axes and labels still important.

20

Histogram Example ! Same data, more classes (narrower bins)…

histogram looks a bit different.

Class range of values

Frequency

[345,350) 1 [350,355) 6 [355,360) 5 [360,365) 1 [365,370) 7 [370,375) 3 [375,380) 1

390380370360350340330

10

5

0

Weight (grams)

Freq

uenc

y

Weight of Contents of Cans of Cola

21

Example: Deflategate (all plays)

http://www.sharpfootballanalysis.com/blog/2015/the-new-england-patriots-prevention-of-fumbles-is-nearly-impossible

Numeric variable

Number of teams falling into each bin

NOTE: This author should have the bars touching each other for a correct histogram presentation.

Don’t put space between bars in a histogram

Patriots and their 187 plays/fumble

Page 8: STAT1010 – picturing datahomepage.stat.uiowa.edu › ~rdecook › stat1010 › notes › Section... · 2016-09-12 · STAT1010 – picturing data 1 1 3.2 Visualizing Distributions

STAT1010 – picturing data

8

22

Displaying Quantitative Data

! Histogram " Provides a picture or shape of the distribution

of the data. " Collects values into bins. " Bins should be of equal width and they should

touch each other. " Different bin choices can yield different

pictures. " Can show frequencies or relative frequencies

23

Stem-and-leaf plots ! We can’t see individual data points in a

histogram due to the binning and the use of the bars for frequencies.

! A stem-and-leaf plot is similar to a histogram, but individual data points are identified.

! As with dot plots, this type of plot probably makes the most sense when the number of observations is relatively small.

24

Stem-and-leaf plots ! One leaf is associated with one data point.

! Example data: 5.4, 0.7, 3.0, 2.6 0.3, 2.8, 5.2, 2.6

Here, a ‘leaf’ is the value one place to the right of the decimal place.

Page 9: STAT1010 – picturing datahomepage.stat.uiowa.edu › ~rdecook › stat1010 › notes › Section... · 2016-09-12 · STAT1010 – picturing data 1 1 3.2 Visualizing Distributions

STAT1010 – picturing data

9

25

Stem-and-leaf plots ! One leaf is associated with one data point.

! Example data: 5.4, 0.7, 3.0, 2.6 0.3, 2.8, 5.2, 2.6

Here, a ‘leaf’ is the value one place to the right of the decimal place.

26

Stem-and-leaf example ! Recall the 80 observations on compressive

strengths:

105 97 245 163 207 134 218 199 160 196 221 154 228 131 180 178 157 151 175 201 183 153 174 154 190 76 101 142 149 200 186 174 199 115 193 167 171 163 87 176 121 120 181 160 194 184 165 145 160 150 181 168 158 208 133 135 172 171 237 170 180 167 176 158 156 229 158 148 150 118 143 141 110 133 123 146 169 158 135 149

27

Stem-and-leaf example !  80 observations ! Min: 76, Max: 245 ! Here, a ‘leaf’ represents

the “ones place”. !  Looks somewhat like a

histogram turned on its side, but we can identify individual data points.

! Gives you a feel for the distribution of the data.

7 | 6 8 | 7 9 | 7 10 | 15 11 | 058 12 | 013 13 | 133455 14 | 12356899 15 | 001344678888 16 | 0003357789 17 | 0112445668 18 | 0011346 19 | 034699 20 | 0178 21 | 8 22 | 189 23 | 7 24 | 5 The decimal point is 1 digit(s) to the right of the |

Page 10: STAT1010 – picturing datahomepage.stat.uiowa.edu › ~rdecook › stat1010 › notes › Section... · 2016-09-12 · STAT1010 – picturing data 1 1 3.2 Visualizing Distributions

STAT1010 – picturing data

10

28

Line charts ! Also used to represent a quantitative

variable.

! Created by connecting the ‘center dots’ at the top of the bars of a histogram.

29

Line chart example A histogram is also shown here, but it is not part of the line chart

30

Time-Series Graph

!  If a histogram or line chart has a horizontal axis of time, then it is a time-series graph.

! Time series plots show how things change over time.

! Often used with financial market information or housing data.

Page 11: STAT1010 – picturing datahomepage.stat.uiowa.edu › ~rdecook › stat1010 › notes › Section... · 2016-09-12 · STAT1010 – picturing data 1 1 3.2 Visualizing Distributions

STAT1010 – picturing data

11

31

Time-Series Graph – example

! A line chart with a horizontal axis of time (Year) # a times series graph.

32

Homes sold in Iowa City by zip code and month

Year (data by the month)

Time-Series Graph – example

1)  What is the general trend over the years 2006-2011? 2) What is the general trend within each year? 3) What is the width of the underlying bin?

33

Number of Olympic medals

Year

Time-Series Graph – example

1) What is the width of the underlying bin?