47
Chapter 2 Summarizing and Graphing Data Professor Mike Gilmore Middlesex Community College Spring 2012 1

Ch2 graphs

Embed Size (px)

Citation preview

Chapter 2Summarizing and Graphing Data

Professor Mike Gilmore

Middlesex Community College

Spring 2012

1

Why Graphs?

• Describe data

• Explore data

• Compare data

• The goal is to convey a message about the data, rather than to decorate…

2

A Bunch of Data

ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCACGGCCACCGCTGCCCTGCC CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCACGGCCACCGCTGCCCT CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGACGGCCACCGCTGCCCTGC AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCACGGCCACCGCTGCCTG CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGACGGCCACCGCTGCCCTGCC TTTAATTACAGACCTGAAACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCAA CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCACGGCCACCGCTGCCCT CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGACGGCCACCGCTGCCCTGC AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCACGGCCACCGCTGCCCTG CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGACGGCCACCGCTGCCCTGCCA TTTAATTACAGACCTGAAACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCTATT CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCACGGCCACCGCTGCCCT CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGACGGCCACCGCTGCCCTGC AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCACGGCCACCGCTGCCCT CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGACGGCCACCGCTGCCCTGCA TTTAATTACAGACCTGAAACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCTGA CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCACGGCCACCGCTGCCCTG CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGACGGCCACCGCTGCCCTGCC AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCACGGCCACCGCTGCCCTG CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGACGGCCACCGCTGCCCTGCCA TTTAATTACAGACCTGAAACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCGTG CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCACGGCCACCGCTGCCCTG CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGACGGCCACCGCTGCCCTGCC AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCACGGCCACCGCTGCCCTG CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGACGGCCACCGCTGCCCTGCCA TTTAATTACAGACCTGAAACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCGTA CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCACGGCCACCGCTGCCCTG CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGACGGCCACCGCTGCCCTGCC AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCACGGCCACCGCTGCCCTG CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGACGGCCACCGCTGCCCTGCCT TTTAATTACAGACCTGAAACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCTAA CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCACGGCCACCGCTGCCCTG CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGACGGCCACCGCTGCCCTGCC AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCACGGCCACCGCTGCCCTC CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGTTTAATTACAGACCTGAACACG

3

Frequency Table

Nucleotide Frequency

A 787

C 712

T 79

G 723

Nucleotide Frequency

ATG 51

CCG 32

AATTG 1

GAGA 3

4

Graph

5

Characteristics of Data

1. Center

2. Variation

3. Distribution

4. Outliers

5. Change Over Time

6

Center

155 142 149 130151 163 151 142156 133 138 161128 144 172 137151 166 147 163145 116 136 158114 165 169 145150 150 150 158151 145 152 140170 129 188 156

?

Bin Frequency

109.5 0

119.5 2

129.5 2

139.5 5

149.5 9

159.5 13

169.5 6

179.5 2

189.5 1

More 0

7

Variation

155 142 149 130151 163 151 142156 133 138 161128 144 172 137151 166 147 163145 116 136 158114 165 169 145150 150 150 158151 145 152 140170 129 188 156

?

Bin Frequency

109.5 0

119.5 2

129.5 2

139.5 5

149.5 9

159.5 13

169.5 6

179.5 2

189.5 1

More 0

8

Distribution

155 142 149 130151 163 151 142156 133 138 161128 144 172 137151 166 147 163145 116 136 158114 165 169 145150 150 150 158151 145 152 140170 129 188 156

?

Bin Frequency

109.5 0

119.5 2

129.5 2

139.5 5

149.5 9

159.5 13

169.5 6

179.5 2

189.5 1

More 0

9

Outliers

155 142 149 130151 163 151 142156 133 138 161228 144 172 137151 166 147 163145 316 136 158114 165 169 145150 150 150 158151 145 152 140170 129 488 156

?

0

2

4

6

8

10

12

14

Frequency

Frequency

Bin Frequency

109.5 0

119.5 1

129.5 1

139.5 5

149.5 9

159.5 13

169.5 6

179.5 2

189.5 0

199.5 0

209.5 0

219.5 0

229.5 1

239.5 0

249.5 0

259.5 0

269.5 0

279.5 0

289.5 0

299.5 0

More 2

10

Change Over Time

http://www.ted.com/talks/lang/en/hans_rosling_at_state.html

http://www.maps4kids.com/vizdata_pop.html

http://www.gapminder.org/ 11

Summarizing Data

12

Frequency Table

• A frequency table shows how a data set is partitioned among all of several categories (or classes) by listing all of the categories along with the number of data values in each of the categories.

13

Simple Frequency Table

Cumulative Relative

Grade Frequency

A 4 = 4

B 4 + 7 = 11

C 11 + 9 = 20

D 20 + 3 = 23

F 23 + 2 = 25

Grade Frequency

A 4 / 25 = 0.16

B 7 / 25 = 0.28

C 9 / 25 = 0.36

D 3 / 25 = 0.12

F 2 / 25 = 0.08

Statistical Reasoning, Bennett, et.al., 3rd edition

14

Frequency Table Terms for Quantitative Categories

• Lower class limits

• Upper class limits

• Class boundaries

• Class midpoints

• Class width

– No gaps between classes

15

Illustration of Terms

16

Constructing a Frequency Table

1. Determine number of classes

2. Calculate class width

3. Choose first lower class limit

4. List all lower class limits

5. List all upper class limits

6. Tally each data point next to appropriate class limits

17

Statistical Reasoning, Bennett, et.al., 3rd edition18

Binned Data

Statistical Reasoning, Bennett, et.al., 3rd edition

19

Other Frequencies

• Relative Frequency

• Cumulative Frequency

20

Histogram

• A histogram is a graph of bars of equal width drawn adjacent to each other (without gaps). The horizontal scale represents classes of quantitative data values. The vertical scale represents frequencies.

• What characteristic of a data set can be better understood by constructing a histogram?

21

Histogram

22

Frequency Polygon

23

Elementary Statistics, Triola, 11th edition

Frequency Polygon

• DIY for BTU data

24

Ogive (“oh-jive”)a.k.a. Cumulative Frequency Polygon

25

Ogive

• DIY for BTU data

26

Dotplot

27

Animation Installed?

Stemplot

28

Bar Graph

29

Pareto Chart

• When we want to attract attention to more important data.

• Used for qualitative data, nominal not ordinal – WHY?

• Bars arranged in descending order by frequencies.

30

Pareto Chart

31

Scatter Plot

• Paired data

33

Time-Series Graph

34

Gaps in Data

35

Gaps in Data

• [self-reported age graph?]

36

“Bad” Graphs

• Graphics can offer clear and meaningful summaries of statistical data. However, even well-made graphics can be misleading if we are not careful in interpreting them, and poorly made graphics are almost always misleading. Moreover, some people use graphics in deliberately misleading ways.

37

38

Perceptual Distortion – 2D – BAD

Statistical Reasoning, Bennett, et.al., 3rd edition

39

Perceptual Distortion – 3D – WORSE

What is the right thing to do here?

Statistical Reasoning, Bennett, et.al., 3rd edition

40

Stretching Axes

41

Manipulating Axes

Watch the Scales

Statistical Reasoning, Bennett, et.al., 3rd edition

Partial Data

http://www.yale.edu/ynhti/curriculum/units/2008/6/08.06.06.x.html

Percent Change Graphs

Chart Junk

Help Wanted

• Statistical Graphics Designer

46

End of Ch2

• We’ve discussed data and graphs.

• Next we’ll work on comparing data.

• Bring your calculator.

47