53
1 Chapter 2 Chapter 2 Descriptive Statistics: Descriptive Statistics: Tabular and Graphical Methods Tabular and Graphical Methods Summarizing Qualitative Data Summarizing Qualitative Data Summarizing Quantitative Data Summarizing Quantitative Data Exploratory Data Analysis Exploratory Data Analysis Crosstabulations Crosstabulations and Scatter Diagrams and Scatter Diagrams

Lecture_1_2_3

Embed Size (px)

DESCRIPTION

QSM

Citation preview

DESCRIPTIVE STATISTICS I: TABULAR AND GRAPHICAL METHODSFrequency Distribution
*
Guests staying at Marada Inn were asked to rate the
quality of their accommodations as being excellent,
above average, average, below average, or poor. The
ratings provided by a sample of 20 quests are shown
below.
Below Average Average Above Average
Above Average Above Average Above Average Above Average Below Average Below Average Average Poor Poor
Above Average Excellent Above Average
Average Above Average Average
Relative Frequency Distribution
*
Percent Frequency Distribution
The percent frequency of a class is the relative frequency multiplied by 100.
*
Relative Percent
Bar Graph
A bar graph is a graphical device for depicting qualitative data.
On the horizontal axis we specify the labels that are used for each of the classes.
A frequency, relative frequency, or percent frequency scale can be used for the vertical axis.
Using a bar of fixed width drawn above each class label, we extend the height appropriately.
*
Pie Chart
The pie chart is a commonly used graphical device for presenting relative frequency distributions for qualitative data.
First draw a circle; then use the relative frequencies to subdivide the circle into sectors that correspond to the relative frequency for each class.
Since there are 360 degrees in a circle, a class with a relative frequency of .25 would consume .25(360) =
90 degrees of the circle.
*
Insights Gained from the Preceding Pie Chart
One-half of the customers surveyed gave Marada a quality rating of “above average” or “excellent” (looking at the left side of the pie). This might please the manager.
For each customer who gave an “excellent” rating, there were two customers who gave a “poor” rating (looking at the top of the pie). This should displease the manager.
Example: Marada Inn
Dot Plot
The manager of Hudson Auto would like to get a
better picture of the distribution of costs for engine
tune-up parts. A sample of 50 customer invoices has
been taken and the costs of parts, rounded to the
nearest dollar, are listed below.
Sheet1
91
78
93
57
75
52
99
80
97
62
71
69
72
89
66
75
79
75
72
76
104
74
62
68
97
105
77
65
80
109
85
97
88
68
83
68
71
69
67
74
62
82
98
101
79
105
79
69
62
73
&A
Guidelines for Selecting Number of Classes
Use between 5 and 20 classes.
Data sets with a larger number of elements usually require a larger number of classes.
Smaller data sets usually require fewer classes.
*
Use classes of equal width.
Approximate Class Width =
Approximate Class Width = (109 - 52)/6 = 9.5 10
Cost ($) Frequency
50-59 2
60-69 13
70-79 16
80-89 7
90-99 7
100-109 5
Total 50
Relative Percent
Insights Gained from the Percent Frequency Distribution
Only 4% of the parts costs are in the $50-59 class.
30% of the parts costs are under $70.
The greatest percentage (32% or almost one-third) of the parts costs are in the $70-79 class.
10% of the parts costs are $100 or more.
*
Dot Plot
One of the simplest graphical summaries of data is a dot plot.
A horizontal axis shows the range of data values.
*
. . . ..... .......... .. . .. . . ... . .. .
. .. .. .. .. . .
Another common graphical presentation of quantitative data is a histogram.
The variable of interest is placed on the horizontal axis.
*
*
Cumulative Distributions
Cumulative frequency distribution -- shows the number of items with values less than or equal to the upper limit of each class.
*
Shown on the vertical axis are the:
cumulative frequencies, or
cumulative percent frequencies
The frequency (one of the above) of each class is plotted as a point.
The plotted points are connected by straight lines.
*
Ogive
Because the class limits for the parts-cost data are 50-59, 60-69, and so on, there appear to be one-unit gaps from 59 to 60, 69 to 70, and so on.
These gaps are eliminated by plotting points halfway between the class limits.
*
Parts
Cost ($)
20
40
60
80
100
*
Exploratory Data Analysis
The techniques of exploratory data analysis consist of simple arithmetic and easy-to-draw pictures that can be used to summarize data quickly.
One such technique is the stem-and-leaf display.
*
Stem-and-Leaf Display
A stem-and-leaf display shows both the rank order and shape of the distribution of the data.
It is similar to a histogram on its side, but it has the advantage of showing the actual data values.
The first digits of each data item are arranged to the left of a vertical line.
To the right of the vertical line we record the last digit for each item in rank order.
Each line in the display is referred to as a stem.
Each digit on a stem is a leaf.
8 5 7
*
A single digit is used to define each leaf.
In the preceding example, the leaf unit was 1.
Leaf units may be 100, 10, 1, 0.1, and so on.
*
8.6 11.7 9.4 9.1 10.2 11.0 8.8
a stem-and-leaf display of these data will be
Leaf Unit = 0.1
8 6 8
9 1 4
1806 1717 1974 1791 1682 1910 1838
a stem-and-leaf display of these data will be
Leaf Unit = 10
5 2 7
6 2 2 2 2 5 6 7 8 8 8 9 9 9
7 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9
8 0 0 2 3 5 8 9
9 1 3 7 7 7 8 9
10 1 4 5 5 9
*
Stretched Stem-and-Leaf Display
*
6 5 6 7 8 8 8 9 9 9
7 1 1 2 2 3 4 4
7 5 5 5 6 7 8 9 9 9
8 0 0 2 3
8 5 8 9
10 1 4
Crosstabulations and Scatter Diagrams
Thus far we have focused on methods that are used to summarize the data for one variable at a time.
Often a manager is interested in tabular and graphical methods that will help understand the relationship between two variables.
*
Slide
Crosstabulation
Crosstabulation is a tabular method for summarizing the data for two variables simultaneously.
Crosstabulation can be used when:
One variable is qualitative and the other is quantitative
Both variables are qualitative
Both variables are quantitative
*
Crosstabulation
The number of Finger Lakes homes sold for each style and price for the past two years is shown below.
Price Home Style
*
Insights Gained from the Preceding Crosstabulation
The greatest number of homes in the sample (19) are a split-level style and priced at less than or equal to $99,000.
*
Crosstabulation: Row or Column Percentages
*
Note: row totals are actually 100.01 due to rounding.
*
*
Scatter Diagram
A scatter diagram is a graphical presentation of the relationship between two quantitative variables.
One variable is shown on the horizontal axis and the other variable is shown on the vertical axis.
*
Scatter Diagram
The Panthers football team is interested in investigating the relationship, if any, between interceptions made and points scored.
x = Number of y = Number of
Interceptions Points Scored
The preceding scatter diagram indicates a positive relationship between the number of interceptions and the number of points scored.
Higher points scored are associated with a higher number of interceptions.
*