Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Exploring Data
Chapter 1
Patterns from Histogram A Center: the value that divides the observations roughly in half
Spread (variability): the extent of the data from smallest to largest value
Histogram A example
center: 35, spread: 25 to 45
0 10 20 30 40 50 60 70 80 90 100
Histogram A practice
center, spread
0 10 20 30 40 50 60 70 80 90 100
Patterns from Histogram B Center: the value that divides the observations roughly in half
Spread (variability): the extent of the data from smallest to largest value
Shape: overall appearance of distribution
Histogram B example
skewed right
0 10 20 30 40 50 60 70 80 90 100
Histogram B example
skewed left
0 10 20 30 40 50 60 70 80 90 100
Histogram B example
symmetrical, mound shaped
0 10 20 30 40 50 60 70 80 90 100
Histogram B example
uniform, spread from 55 to 80, center around 70
0 10 20 30 40 50 60 70 80 90 100
Histogram B example
bimodal
0 10 20 30 40 50 60 70 80 90 100
Patterns from Histogram C Center: the value that divides the observations roughly in half
Spread (variability): the extent of the data from smallest to largest value
Shape: overall appearance of distribution
Unusual features: gaps/clusters and outliers
Histogram C example
roughly symmetrical with gaps at 30 and 40, center at 35, spread from 20 to 55
0 10 20 30 40 50 60 70 80 90 100
Histogram C example
uniform with possible outlier at 5, center around 43, spread from 5 to 55
0 10 20 30 40 50 60 70 80 90 100
Displaying Distributions with Graphs
categorical versus quantitative
categorical: bar graphs, pie charts
quantitative: dotplots, histograms, stemplots, boxplots
Typing Speeds Stemplot
8
2
6
7
5
4
3
2
8
9 1
7
2
5
3
5
5
5
4
5
8 5
1 9 6
2
6
1
Fairly symmetrical
Median: 62
Spread: from 22
to 91
No unusual
features
Key: 2|2 means 22 wpm
Alfred Hitchcock Stemplot
13
12
11
10
9
8
9
5
0 6 2
0 0 0 6 8
6 6 3 1 7
8 3 8 8 1 3
1
Key: 8|1 means 81 minutes
Slightly skewed
Median: 116
Spread: from 81
to 136
Gap in 90s
Split Stemplot
3
1
4
2
0 1 2 3 0 0 0 0 1 1 1 2 3 3 4 4 4 4 6 7 8 8 4 6 6 6 7 7 8 8 8 9 9 9
Similar to a histogram, we want to avoid too many
data points in a small range
ages of which a sample of 35 American mothers first gave birth
Key: 1|4 means 14 years old
Split Stemplot
Key: 1|4 means 14 years old
Split stemplot typically breaks each stem into
High (5-9) and Low(0-4)
3H
2H
1H
4L
3L
2L
1L
0 1 2 3 0 0 0 0 1 1 1 2 3 3 4 4 4 4
6 6 6 7 7 8 8 8 9 9 9
6 7 8 8
4
Back to Back Stemplots
6
5
4
3
2
1
0
4 9 4
1 6 7 6 9 6 1
0
5 4
5 2
3
3 6
6
9 3
8
8
4
1
Key: 4 | 1 means 41
Babe Ruth vs. Roger Maris Generally, we can see that Babe Ruth hit more home runs than Roger Maris.
The center of Babe Ruth is higher at 46 than Roger Maris at 24.5 home runs.
Roger Maris has a possible outlier at 61 yet Ruth has no outliers.
Maris has a larger spread from 8 to 61, but Ruth has a higher spread from 22 to 60; especially if we exclude the possible outlier.
Both distributions are fairly symmetrical.
Babe Ruth vs. Roger Maris Generally, we can see that Babe Ruth hit more home runs than Roger Maris.
The center of Babe Ruth is higher at 46 than Roger Maris at 24.5 home runs.
Roger Maris has a possible outlier at 61 yet Ruth has no outliers.
Maris has a larger spread from 8 to 61, but Ruth has a higher spread from 22 to 60; especially if we exclude the possible outlier.
Both distributions are fairly symmetrical.