Upload
mrfrederick87
View
219
Download
0
Embed Size (px)
Citation preview
7/28/2019 Lecture 02. Graphical Displays Part 1
1/59
Statistics
ST 361
Statistics for Engineers
Graphical Displays
Kimberly Weems
5260 SAS Hall
mailto:[email protected]:[email protected]7/28/2019 Lecture 02. Graphical Displays Part 1
2/59
Statistics
Scales of Measure
Nominal DataPlaces data in categories
Another name for that group
From the article: Do you believe thatextraterrestrial beings have visited Earth at some
time in the past (Believe, Dont Believe, Not sure)
People are in one of three categories.
7/28/2019 Lecture 02. Graphical Displays Part 1
3/59
Statistics
Scales of Measure
Ordinal DataCategories that have an order
From the article:How superstitious are you?
(Very, Somewhat, Not very, not at all)
We know that Very is more than Somewhat which
is more than Not very, etc
7/28/2019 Lecture 02. Graphical Displays Part 1
4/59
Statistics
Scales of Measure
Note: Numbers can be assigned to thecategories but using the numbers does not
make much sense.
Code: 4 = Very, 3 = Somewhat, 2= Not very, 1=Not at all
Very is not twice as much as Not very.
7/28/2019 Lecture 02. Graphical Displays Part 1
5/59
Statistics
Scales of Measure
Interval/RatioNumbers are actually numbers=> make sense as
numbers and can be used that way
From the article: What is your age in years?
Someone who is 40 is twice as old as someone
who is 20. Difference in Age between someone 30
and 35 is the same as the difference between 25
and 30.
7/28/2019 Lecture 02. Graphical Displays Part 1
6/59
Statistics
Scales of Measure
Note: Your text refers to this as interval data,some other texts refer to it as ratio data. For
the purposes of this course we will not
differentiate the two.
No real difference between methods of analysis.
7/28/2019 Lecture 02. Graphical Displays Part 1
7/59Statistics
Other terminology:
Categorical data- nominal and ordinal, givesnames to categories.
Numeric data-uses meaningful numbers
Quantitative- another name for numeric Qualitative-another name for categorical
7/28/2019 Lecture 02. Graphical Displays Part 1
8/59
7/28/2019 Lecture 02. Graphical Displays Part 1
9/59
Statistics
Example: Intro Stat Students
gender height textbooks HSGPA car
female 69.5 320 3.1 yes
female 65 250 4 no
female 63 150 4 yes
female 64 300 3.35 yes
female 67 90 3.7 yes
female 63 300 3.9 yes
female 60 250 3 yes
female 64 250 3.8 yes
male 72 187 3.1 yes
female 66 150 3.4 no
7/28/2019 Lecture 02. Graphical Displays Part 1
10/59
Statistics
Example: Intro Stat Students
gender height textbooks HSGPA car
female 69.5 320 3.1 yes
female 65 250 4 no
female 63 150 4 yes
female 64 300 3.35 yes
female 67 90 3.7 yes
female 63 300 3.9 yes
female 60 250 3 yes
female 64 250 3.8 yes
male 72 187 3.1 yes
female 66 150 3.4 no
7/28/2019 Lecture 02. Graphical Displays Part 1
11/59
Statistics
Example: Intro Stat Students
gender height textbooks HSGPA car
female 69.5 320 3.1 yes
female 65 250 4 no
female 63 150 4 yes
female 64 300 3.35 yes
female 67 90 3.7 yes
female 63 300 3.9 yes
female 60 250 3 yes
female 64 250 3.8 yes
male 72 187 3.1 yes
female 66 150 3.4 no
7/28/2019 Lecture 02. Graphical Displays Part 1
12/59
Statistics 12
Why do we care?
Type of data dictates summary that will beused. We must choose the analysis that will be
used.
Summaries of categorical data.Proportions and counts
Example: Superstitious? 2% very, 22%
somewhat, 31% not very, 45% not at all.
7/28/2019 Lecture 02. Graphical Displays Part 1
13/59
Statistics 13
Why do we care?
Summaries of numeric data.Averages, medians, standard deviations
Example: Age? Average 35 years
7/28/2019 Lecture 02. Graphical Displays Part 1
14/59
Statistics
Graphics
Statistical Results are often presented ingraphical displays
1 picture = ____________
7/28/2019 Lecture 02. Graphical Displays Part 1
15/59
Statistics
Graphics
Statistical Results are often presented ingraphical displays
1 picture = 1000 words
Help understand the story behind the data.
Visualize the distribution
Th th l f d t l i t
7/28/2019 Lecture 02. Graphical Displays Part 1
16/59
Statistics
The three rules of data analysis wont
be difficult to remember
1. Make a picturereveals aspects not obvious in the rawdata; enables you to thinkclearly about the patterns andrelationships that may be hiding in your data.
2. Make a pictureto show important features of and
patterns in the data. You may also see things that you did notexpect: the extraordinary (possibly wrong) data values orunexpected patterns
3. Make a picturethe best way to tell others about yourdata is with a well-chosen picture.
7/28/2019 Lecture 02. Graphical Displays Part 1
17/59
Statistics
Graphics
Bar Charts- graphical representation ofcategorical data
Horizontal Axis- categories
Vertical Axis- count or percentage of subjects inthat category.
Pie Charts- Angle represents proportion of
values
7/28/2019 Lecture 02. Graphical Displays Part 1
18/59
Statistics
Categorical data
7/28/2019 Lecture 02. Graphical Displays Part 1
19/59
Statistics
Categorical data
Story: More females than males
7/28/2019 Lecture 02. Graphical Displays Part 1
20/59
Statistics
What about numeric values?
Plot numeric values to see where they arelocated.
Stem and Leaf Displays
Dot plot- graphic of numeric data.
7/28/2019 Lecture 02. Graphical Displays Part 1
21/59
Statistics
Stem and Leaf Displays
Partition each no. in data into a stem and leaf
Constructing stem and leaf display
1) deter. stem and leaf partition (5-20 stems)
2) write stems in column with smallest stem at top;
include all stems in range of data
3) only 1 digit in leaves; drop digits or round off
4) record leaf for each no. in corresponding stem row;
ordering the leaves in each row helps
7/28/2019 Lecture 02. Graphical Displays Part 1
22/59
Statistics
Example: employee ages at a small company
18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39;
stem: 10s digit; leaf: 1s digit 18: stem=1; leaf=8; 18 = 1 | 8
stem leaf
1 8 92 1 2 8 9 9
3 2 3 8 9
4 0 1
5 6 7
6 4
7/28/2019 Lecture 02. Graphical Displays Part 1
23/59
Statistics
Suppose a 95 yr. old is hiredstem leaf
1 8 92 1 2 8 9 9
3 2 3 8 9
4 0 1
5 6 7
6 4
7
8
9 5
7/28/2019 Lecture 02. Graphical Displays Part 1
24/59
Statistics
Advantages/Disadvantages of Stem-and-
Leaf Displays
Advantages
1) each measurement displayed
2) ascending order in each stem row
3) relatively simple (data set not too large)
Disadvantages
display becomes unwieldy for large data sets
7/28/2019 Lecture 02. Graphical Displays Part 1
25/59
Statistics
Dot Plot
A health researcher examined the amount ofsoda that a group of teenagers consumed
during a day. The resulting amounts in ounces
were: 9, 9, 6, 15, 12, 14, and 40.
0 10 20 30 40
7/28/2019 Lecture 02. Graphical Displays Part 1
26/59
Statistics
Dot Plot
A health researcher examined the amount ofsoda that a group of teenagers consumed
during a day. The resulting amounts in ounces
were: 9, 9, 6, 15, 12, 14, and 40.
0 10 20 30 40
7/28/2019 Lecture 02. Graphical Displays Part 1
27/59
Statistics
Dot Plot
A health researcher examined the amount ofsoda that a group of teenagers consumed
during a day. The resulting amounts in ounces
were: 9, 9, 6, 15, 12, 14, and 40.
0 10 20 30 40
7/28/2019 Lecture 02. Graphical Displays Part 1
28/59
Statistics
Dot Plot
A health researcher examined the amount ofsoda that a group of teenagers consumed
during a day. The resulting amounts in ounces
were: 9, 9, 6, 15, 12, 14, and 40.
0 10 20 30 40
7/28/2019 Lecture 02. Graphical Displays Part 1
29/59
Statistics
Dot Plot
A health researcher examined the amount ofsoda that a group of teenagers consumed
during a day. The resulting amounts in ounces
were: 9, 9, 6, 15, 12, 14, and 40.
0 10 20 30 40
7/28/2019 Lecture 02. Graphical Displays Part 1
30/59
Statistics
Dot Plot
A health researcher examined the amount ofsoda that a group of teenagers consumed
during a day. The resulting amounts in ounces
were: 9, 9, 6, 15, 12, 14, and 40.
0 10 20 30 40
7/28/2019 Lecture 02. Graphical Displays Part 1
31/59
7/28/2019 Lecture 02. Graphical Displays Part 1
32/59
Statistics
Dot Plot
We can see the main cluster is around 10. Smallest at 6, largest at 40.
Big gap between 40 and the other values.
7/28/2019 Lecture 02. Graphical Displays Part 1
33/59
Statistics
Dot Plot
Good for small numbers of values, butbecomes cumbersome with many values.
For larger numbers of values we could stack up
values in categories.
7/28/2019 Lecture 02. Graphical Displays Part 1
34/59
Statistics
Histograms
Histogram-bar chart of quantitative dataRange of possible values are broken into
categories
Example: Undergraduate university students
survey: How much did you spend on
textbooks this academic term?
Categories: $0 to $100, $101 to $200, etc.
7/28/2019 Lecture 02. Graphical Displays Part 1
35/59
Statistics
Textbooks
7/28/2019 Lecture 02. Graphical Displays Part 1
36/59
Statistics
Textbooks
About 20people paidbetween $601and $700
7/28/2019 Lecture 02. Graphical Displays Part 1
37/59
Statistics
Textbooks
Amounts centered around $400 More people around $400 than around $200
Values as big as 800 as low as 0.
Useful for understanding the distribution ofquantitative variables.
Where are the main chunks of data?
How spread out are the values?What is the shape of the data?
7/28/2019 Lecture 02. Graphical Displays Part 1
38/59
Statistics
TV Time
7/28/2019 Lecture 02. Graphical Displays Part 1
39/59
Statistics
TV Time
About 110 peoplespent between 11and 15
7/28/2019 Lecture 02. Graphical Displays Part 1
40/59
Statistics
TV Time
Main cluster between 0 and 10 Smallest value 0 and largest value around 60
Cant be below 0, minimum possible. Many
people around the lower limit. Max possible 168. No one around there or
even close.
Values tail off to the right side.
7/28/2019 Lecture 02. Graphical Displays Part 1
41/59
Statistics
Shapes of distributions
Skewed to the right (or positively skewed)Long tail to the right
Generally because individuals are stacked up near
a lower limit and unlimited on the upper end.
7/28/2019 Lecture 02. Graphical Displays Part 1
42/59
7/28/2019 Lecture 02. Graphical Displays Part 1
43/59
Statistics
Shapes of distributions
Skewed to the left (or negatively skewed)Long tail to the left
Generally because individuals are stacked up near
an upper limit and unlimited on the lower end.
7/28/2019 Lecture 02. Graphical Displays Part 1
44/59
Statistics
Birth Year
7/28/2019 Lecture 02. Graphical Displays Part 1
45/59
Statistics
Birth Year Traditionalstudents
7/28/2019 Lecture 02. Graphical Displays Part 1
46/59
Statistics
Birth Year Traditionalstudents
Switchedmajors, fiveor six year
students
7/28/2019 Lecture 02. Graphical Displays Part 1
47/59
Statistics
Birth Year Traditionalstudents
Switchedmajors, fiveor six year
students
Nontraditional
students
7/28/2019 Lecture 02. Graphical Displays Part 1
48/59
Statistics
Birth Year Traditionalstudents
Switchedmajors, fiveor six year
students
Nontraditional
studentsOlderstudentsback to for
life change
7/28/2019 Lecture 02. Graphical Displays Part 1
49/59
Statistics
Shapes of distributions
SymmetricTails approximately equal in both directions
Major cluster near far from limits on both ends.
7/28/2019 Lecture 02. Graphical Displays Part 1
50/59
Statistics
Textbooks
Approximatelysymmetric
7/28/2019 Lecture 02. Graphical Displays Part 1
51/59
Statistics
Bimodal
7/28/2019 Lecture 02. Graphical Displays Part 1
52/59
Statistics
Bimodal
Peak 1
7/28/2019 Lecture 02. Graphical Displays Part 1
53/59
Statistics
Bimodal
Peak 2
Peak 1
7/28/2019 Lecture 02. Graphical Displays Part 1
54/59
7/28/2019 Lecture 02. Graphical Displays Part 1
55/59
Statistics
Shapes of distributions
Bimodal-two peaksCaused by two or more groups
Multi-modalseveral peaks
h f di ib i
7/28/2019 Lecture 02. Graphical Displays Part 1
56/59
Statistics
Shapes of distributions
Help us understand the dataSkewed=> typically because of a natural limit that
subjects are near
Symmetric => Subjects are not near the limit.
Multi-modal=> multiple distinct groups within the
distribution.
7/28/2019 Lecture 02. Graphical Displays Part 1
57/59
O li
7/28/2019 Lecture 02. Graphical Displays Part 1
58/59
Statistics
Outliers
Recall dot plot40 might be considered an outlier.
Maybe data entry error.
May be actual value.
0 10 20 30 40
Cl P bl
7/28/2019 Lecture 02. Graphical Displays Part 1
59/59
Class Problem
The following 10 observations on Octobersnow cover for Eurasia during the years 1970-
1979 (in million km2):
Create a stem & leaf display of the data.
Is their an outlier in the data set?
6.5 12.0 14.9 10.0 10.7
7.9 21.9 12.5 14.5 9.2