Author
mike-gilmore
View
155
Download
0
Tags:
Embed Size (px)
Chapter 2
Chapter 2Summarizing and Graphing DataProfessor Mike GilmoreMiddlesex Community CollegeSpring 20121Why Graphs?Describe dataExplore dataCompare data
The goal is to convey a message about the data, rather than to decorate
2A Bunch of DataACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCACGGCCACCGCTGCCCTGCC CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCACGGCCACCGCTGCCCT CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGACGGCCACCGCTGCCCTGC AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCACGGCCACCGCTGCCTG CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGACGGCCACCGCTGCCCTGCC TTTAATTACAGACCTGAAACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCAA CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCACGGCCACCGCTGCCCT CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGACGGCCACCGCTGCCCTGC AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCACGGCCACCGCTGCCCTG CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGACGGCCACCGCTGCCCTGCCA TTTAATTACAGACCTGAAACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCTATT CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCACGGCCACCGCTGCCCT CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGACGGCCACCGCTGCCCTGC AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCACGGCCACCGCTGCCCT CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGACGGCCACCGCTGCCCTGCA TTTAATTACAGACCTGAAACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCTGA CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCACGGCCACCGCTGCCCTG CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGACGGCCACCGCTGCCCTGCC AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCACGGCCACCGCTGCCCTG CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGACGGCCACCGCTGCCCTGCCA TTTAATTACAGACCTGAAACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCGTG CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCACGGCCACCGCTGCCCTG CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGACGGCCACCGCTGCCCTGCC AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCACGGCCACCGCTGCCCTG CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGACGGCCACCGCTGCCCTGCCA TTTAATTACAGACCTGAAACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCGTA CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCACGGCCACCGCTGCCCTG CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGACGGCCACCGCTGCCCTGCC AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCACGGCCACCGCTGCCCTG CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGACGGCCACCGCTGCCCTGCCT TTTAATTACAGACCTGAAACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCTAA CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCACGGCCACCGCTGCCCTG CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGACGGCCACCGCTGCCCTGCC AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCACGGCCACCGCTGCCCTC CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGTTTAATTACAGACCTGAACACG
3Frequency TableNucleotideFrequencyA787C712T79G723NucleotideFrequencyATG51CCG32AATTG1GAGA34Graph
5Characteristics of DataCenterVariationDistributionOutliersChange Over Time6Center
155142149130151163151142156133138161128144172137151166147163145116136158114165169145150150150158151145152140170129188156?
BinFrequency109.50119.52129.52139.55149.59159.513169.56179.52189.51More07
Variation155142149130151163151142156133138161128144172137151166147163145116136158114165169145150150150158151145152140170129188156?
BinFrequency109.50119.52129.52139.55149.59159.513169.56179.52189.51More08
Distribution155142149130151163151142156133138161128144172137151166147163145116136158114165169145150150150158151145152140170129188156?
BinFrequency109.50119.52129.52139.55149.59159.513169.56179.52189.51More09
Outliers155142149130151163151142156133138161228144172137151166147163145316136158114165169145150150150158151145152140170129488156?BinFrequency109.50119.51129.51139.55149.59159.513169.56179.52189.50199.50209.50219.50229.51239.50249.50259.50269.50279.50289.50299.50More210
Change Over Timehttp://www.ted.com/talks/lang/en/hans_rosling_at_state.htmlhttp://www.maps4kids.com/vizdata_pop.html
http://www.gapminder.org/11Summarizing Data12Frequency TableA frequency table shows how a data set is partitioned among all of several categories (or classes) by listing all of the categories along with the number of data values in each of the categories.13Simple Frequency Table
CumulativeRelativeGradeFrequencyA4 = 4B4 + 7 = 11C11 + 9 = 20D20 + 3 = 23F23 + 2 = 25GradeFrequencyA4 / 25 = 0.16B7 / 25 = 0.28C9 / 25 = 0.36D3 / 25 = 0.12F2 / 25 = 0.08Statistical Reasoning, Bennett, et.al., 3rd edition14Frequency Table Terms for Quantitative CategoriesLower class limitsUpper class limitsClass boundariesClass midpointsClass widthNo gaps between classes15Illustration of Terms
16Constructing a Frequency TableDetermine number of classesCalculate class widthChoose first lower class limitList all lower class limitsList all upper class limitsTally each data point next to appropriate class limits17
Statistical Reasoning, Bennett, et.al., 3rd edition18Binned Data
Statistical Reasoning, Bennett, et.al., 3rd edition19Other FrequenciesRelative FrequencyCumulative Frequency
20HistogramA histogram is a graph of bars of equal width drawn adjacent to each other (without gaps). The horizontal scale represents classes of quantitative data values. The vertical scale represents frequencies.What characteristic of a data set can be better understood by constructing a histogram?
21Histogram
22Frequency Polygon23
Elementary Statistics, Triola, 11th edition
Frequency PolygonDIY for BTU data24Ogive (oh-jive)a.k.a. Cumulative Frequency Polygon25
OgiveDIY for BTU data26Dotplot27
Animation Installed?Stemplot28
Bar Graph29
Pareto ChartWhen we want to attract attention to more important data.Used for qualitative data, nominal not ordinal WHY?Bars arranged in descending order by frequencies.
30Pareto Chart31
Pie ChartAlso for qualitative data32
http://assistantvillageidiot.blogspot.com/2007/11/why-would-they-lie-huh.htmlScatter PlotPaired data33
Time-Series Graph34
Gaps in Data35
Gaps in Data[self-reported age graph?]36Bad GraphsGraphics can offer clear and meaningful summaries of statistical data. However, even well-made graphics can be misleading if we are not careful in interpreting them, and poorly made graphics are almost always misleading. Moreover, some people use graphics in deliberately misleading ways. 3738Perceptual Distortion 2D BAD
Statistical Reasoning, Bennett, et.al., 3rd edition39Perceptual Distortion 3D WORSE
What is the right thing to do here?Statistical Reasoning, Bennett, et.al., 3rd edition40Stretching Axes
41Manipulating Axes
Watch the Scales
Statistical Reasoning, Bennett, et.al., 3rd editionPartial Datahttp://www.yale.edu/ynhti/curriculum/units/2008/6/08.06.06.x.html
Percent Change Graphs
Chart Junk
Help WantedStatistical Graphics Designer
46End of Ch2Weve discussed data and graphs.Next well work on comparing data.Bring your calculator.
47