Upload
kerseychene
View
108
Download
0
Tags:
Embed Size (px)
Citation preview
Task for the Day
• Work with a partner and answer the activity.
• Topic: ITEM ANALYIS
STATISTICS
Descriptive StatisticsGives numerical and
graphic procedures to summarize a collection of data in a clear and understandable way
Inferential Statistics
• Provides procedures to draw inferences about a population from a sample
Descriptive Statistics:
Tabular and Graphical Presentations
Summarizing Qualitative Data Summarizing Quantitative Data Recall
Qualitative Quantitative
Summarizing Qualitative Data Frequency Distribution (shows how many) Relative Frequency Distribution (shows what
fraction) Percent Frequency Distribution (shows what
percentage) Bar Graph Pie Chart Both these are graphical means for displaying
any of above.
Data – any set of information that describes a given identity
• It an be • GROUPED DATA is a data that has been
organized into classes. This data is no longer “raw”
• UNGROUPED DATA is simply an arrangement of data from lowest to highest.
A data class is a group of data which is related by some user defined property
Each of those classes is of a certain width and this is referred to as class width or class size.
Age (years) Frequency
0-9 12
10-19 30
20-29 18
30-39 12
Age (years) Frequency
1 12
2 30
3 18
4 6
Class
Calculating Class interval or Class Size
• Class interval = Higest Value – Lowest Value Number of classes you want to have
• or
• Class interval = HV - LV = Range • k k• Where k is equal to 1 + 3.3 log n• n is the number of observations
A A frequency distributionfrequency distribution is a tabular summary of is a tabular summary of data showing the frequency (or number) of itemsdata showing the frequency (or number) of items in each of several nonoverlapping classes.in each of several nonoverlapping classes.
A A frequency distributionfrequency distribution is a tabular summary of is a tabular summary of data showing the frequency (or number) of itemsdata showing the frequency (or number) of items in each of several nonoverlapping classes.in each of several nonoverlapping classes.
The objective is to The objective is to provide insightsprovide insights about the data about the data that cannot be quickly obtained by looking only atthat cannot be quickly obtained by looking only at the original data.the original data.
The objective is to The objective is to provide insightsprovide insights about the data about the data that cannot be quickly obtained by looking only atthat cannot be quickly obtained by looking only at the original data.the original data.
Frequency DistributionFrequency Distribution
Example: Miranda Inn
• Guests staying at Miranda Inn were• asked to rate the quality of their • accommodations as being excellent,• above average, average, below average, or• poor. The ratings provided by a sample of 20 guests are:
Below AverageBelow Average
Above AverageAbove Average
Above AverageAbove Average
AverageAverage
Above Average Above Average
AverageAverage
Above AverageAbove Average
Average Average
Above AverageAbove Average
Below AverageBelow Average
PoorPoor
Excellent Excellent
Above AverageAbove Average
AverageAverage
Above AverageAbove Average
Above AverageAbove Average
Below AverageBelow Average
PoorPoor
Above Average Above Average
AverageAverage
AverageAverage
Frequency DistributionFrequency Distribution
PoorPoorBelow AverageBelow AverageAverageAverageAbove AverageAbove AverageExcellentExcellent
22 33 66 99 11
Total 21Total 21
RatingRating FrequencyFrequency
The The relative frequencyrelative frequency of a class is the fraction or of a class is the fraction or proportion of the total number of data itemsproportion of the total number of data items belonging to the class.belonging to the class.
The The relative frequencyrelative frequency of a class is the fraction or of a class is the fraction or proportion of the total number of data itemsproportion of the total number of data items belonging to the class.belonging to the class.
A A relative frequency distributionrelative frequency distribution is a tabular is a tabular summary of a set of data showing the relativesummary of a set of data showing the relative frequency for each class.frequency for each class.
A A relative frequency distributionrelative frequency distribution is a tabular is a tabular summary of a set of data showing the relativesummary of a set of data showing the relative frequency for each class.frequency for each class.
Relative Frequency DistributionRelative Frequency Distribution
Percent Frequency Distribution
The The percent frequencypercent frequency of a class is the relative of a class is the relative frequency multiplied by 100.frequency multiplied by 100. The The percent frequencypercent frequency of a class is the relative of a class is the relative frequency multiplied by 100.frequency multiplied by 100.
AA percent frequency distributionpercent frequency distribution is a tabular is a tabular summary of a set of data showing the percentsummary of a set of data showing the percent frequency for each class.frequency for each class.
AA percent frequency distributionpercent frequency distribution is a tabular is a tabular summary of a set of data showing the percentsummary of a set of data showing the percent frequency for each class.frequency for each class.
Relative Frequency andRelative Frequency andPercent Frequency DistributionsPercent Frequency Distributions
PoorPoor
Below AverageBelow Average
AverageAverage
Above AverageAbove Average
ExcellentExcellent
.10.10
.15.15
.25.25
.45.45
.05.05
TotalTotal 1.00 1.00
1010
1515
2525
4545
55
100100
RelativeRelativeFrequencyFrequency
PercentPercentFrequencyFrequencyRatingRating
.10(100) = .10(100) = 1010
1/20 1/20 = .05= .05
Bar Graph A A bar graphbar graph is a graphical device for depicting is a graphical device for depicting qualitative data.qualitative data.
On one axis (usually the horizontal axis), we specifyOn one axis (usually the horizontal axis), we specify the labels that are used for each of the classes.the labels that are used for each of the classes. A A frequencyfrequency, , relative frequencyrelative frequency, or , or percent frequencypercent frequency scale can be used for the other axis (usually thescale can be used for the other axis (usually the vertical axis).vertical axis).
Using a Using a bar of fixed widthbar of fixed width drawn above each class drawn above each class label, we extend the height appropriately.label, we extend the height appropriately.
The The bars are separatedbars are separated to emphasize the fact that each to emphasize the fact that each class is a separate category.class is a separate category.
Poor BelowAverage
Average AboveAverage
Excellent
Fre
qu
en
cy
Rating
Bar GraphBar Graph
12
3
4
5
6
7
8
9
10Miranda Inn Quality RatingsMiranda Inn Quality Ratings
Good?Good?
Bad?Bad?
Pie Chart The The pie chartpie chart is a commonly used graphical device is a commonly used graphical device for presenting relative frequency distributions forfor presenting relative frequency distributions for qualitative data.qualitative data.
First draw a First draw a circlecircle; then use the relative; then use the relative frequencies to subdivide the circlefrequencies to subdivide the circle into sectors that correspond to theinto sectors that correspond to the
relative frequency for each class.relative frequency for each class. Since there are 360 degrees in a circle, Since there are 360 degrees in a circle,
a class with a relative frequency of .25 woulda class with a relative frequency of .25 would
consume .25(360) = 90 degrees of the circle.consume .25(360) = 90 degrees of the circle.
BelowAverage 15%
Average 25%
AboveAverage 45%
Poor10%
Excellent 5%
Miranda Inn Inn Quality Quality RatingsRatings
Miranda Inn Inn Quality Quality RatingsRatings
Pie ChartPie Chart
Insights Gained from the Preceding Pie ChartInsights Gained from the Preceding Pie Chart
Example: Miranda InnExample: Miranda Inn
• One-half of the customers surveyed gave MirandaOne-half of the customers surveyed gave Miranda a quality rating of “above average” or “excellent”a quality rating of “above average” or “excellent” (looking at the left side of the pie). This might(looking at the left side of the pie). This might please the manager.please the manager.
• For For eacheach customer who gave an “excellent” rating, customer who gave an “excellent” rating, there were there were twotwo customers who gave a “poor” customers who gave a “poor” rating (looking at the top of the pie). This shouldrating (looking at the top of the pie). This should displease the manager.displease the manager.
Summarizing Quantitative Data
Frequency Distribution Relative Frequency and Percent
Frequency Distributions Dot Plot Histogram Cumulative Distributions Ogive
Example: Juson Auto Repair
The manager of Juson AutoThe manager of Juson Auto
would like to have a betterwould like to have a better
understanding of the costunderstanding of the cost
of parts used in the engineof parts used in the engine
tune-ups performed in thetune-ups performed in the
shop. She examines 50shop. She examines 50
customer invoices for tune-ups. The costs of customer invoices for tune-ups. The costs of parts,parts,
rounded to the nearest dollar, are listed on the rounded to the nearest dollar, are listed on the nextnext
slide.slide.
Example: Juson Auto RepairExample: Juson Auto Repair
Sample of Parts Cost for 50 Tune-upsSample of Parts Cost for 50 Tune-ups
91 78 93 57 75 52 99 80 97 6271 69 72 89 66 75 79 75 72 76104 74 62 68 97 105 77 65 80 10985 97 88 68 83 68 71 69 67 7462 82 98 101 79 105 79 69 62 73
Including a line in the table for Including a line in the table for every possible cost is not a every possible cost is not a good idea.good idea.
Need to categorize.Need to categorize.
Frequency Distribution Guidelines for Selecting Number of
Classes• Use between 5 and 20 classes.Use between 5 and 20 classes.
• Data sets with a larger number of elementsData sets with a larger number of elements usually require a larger number of classes.usually require a larger number of classes.
• Smaller data sets usually require fewer classesSmaller data sets usually require fewer classes
Frequency Distribution Guidelines for Selecting Width of Classes
Largest Data Value Smallest Data ValueNumber of Classes
•Use classes of equal width.Use classes of equal width.
•Approximate Class Width =Approximate Class Width =
Frequency Distribution
• For Juson Auto Repair, if we choose six classes:
50-5950-59 60-69 60-69
70-7970-79 80-8980-89 90-9990-99 100-109100-109
22 1313 1616 77 77 55
Total 50Total 50
Parts Cost ($)Parts Cost ($) FrequencyFrequency
Approximate Class Width = (109 - 52)/6 = 9.5 Approximate Class Width = (109 - 52)/6 = 9.5 1010
Relative Frequency andPercent Frequency Distributions
50-5950-59
60-69 60-69
70-7970-79
80-8980-89
90-9990-99
100-109100-109
PartsPartsCost ($)Cost ($)
.04.04
.26.26
.32.32
.14.14
.14.14
.10.10
Total 1.00 Total 1.00
RelativeRelativeFrequencyFrequency
44
2626
3232
1414
1414
1010
100 100
PercentPercent FrequencyFrequency
2/502/50 .04(10.04(100)0)
Pre
vie
w c
um
ula
tive f
requenci
es
Pre
vie
w c
um
ula
tive f
requenci
es
here
.here
.
• Only 4% of the parts costs are in the $50-59 class.Only 4% of the parts costs are in the $50-59 class.
• The greatest percentage (32% or almost one-third)The greatest percentage (32% or almost one-third) of the parts costs are in the $70-79 class.of the parts costs are in the $70-79 class.
• 30% of the parts costs are under $70.30% of the parts costs are under $70.
• 10% of the parts costs are $100 or more.10% of the parts costs are $100 or more.
Insights Gained from the Percent Frequency Insights Gained from the Percent Frequency DistributionDistribution
Relative Frequency andRelative Frequency andPercent Frequency DistributionsPercent Frequency Distributions
Dot Plot One of the simplest graphical
summaries of data is a dot plot. A horizontal axis shows the range of
data values. Then each data value is represented by
a dot placed above the axis.
5050 6060 7070 8080 9090 100100 110110
Cost ($)
Dot Plot
Tune-up Parts CostTune-up Parts Cost
. . . ..... .......... .. . .. . . ... . .. .
. . .. . . . . .. .. .. .. . .
Not used much anymore. Common Not used much anymore. Common when graphical drawing tools were when graphical drawing tools were primitive.primitive.
Histogram Another common graphical presentation ofAnother common graphical presentation of quantitative data is a quantitative data is a histogramhistogram..
The variable of interest is placed on the horizontalThe variable of interest is placed on the horizontal axis.axis. A rectangle is drawn above each class interval withA rectangle is drawn above each class interval with its height corresponding to the interval’s its height corresponding to the interval’s frequencyfrequency,, relative frequencyrelative frequency, or , or percent frequencypercent frequency..
Unlike a bar graph, a histogram has Unlike a bar graph, a histogram has no naturalno natural separation between rectanglesseparation between rectangles of adjacent classes. of adjacent classes.
In informal discussions bar graphs and In informal discussions bar graphs and histograms are often equated. In this histograms are often equated. In this class you should be careful to keep them class you should be careful to keep them straight.straight.
Histogram
2
4
6
8
10
12
14
16
18
PartsCost ($)
Fre
qu
en
cy
5059 6069 7079 8089 9099 100-110
Tune-up Parts CostTune-up Parts Cost
Symmetric Left tail is the mirror image of the right tail Examples: heights and weights of people
Histogram (Common categories)
Rela
tive F
req
uen
cy
.05
.10
.15
.20
.25
.30
.35
0
Histogram Moderately Skewed Left
A longer tail to the left Example: exam scoresR
ela
tive F
req
uen
cy
.05
.10
.15
.20
.25
.30
.35
0
Moderately Right Skewed A Longer tail to the right Example: housing values
Histogram
Rela
tive F
req
uen
cy
.05
.10
.15
.20
.25
.30
.35
0
Histogram Highly Skewed Right
A very long tail to the right Example: executive salaries
Rela
tive F
req
uen
cy
.05
.10
.15
.20
.25
.30
.35
0
Cumulative frequency distributionCumulative frequency distribution shows the shows the number of items with values less than or equal tonumber of items with values less than or equal to the upper limit of each class..the upper limit of each class..
Cumulative frequency distributionCumulative frequency distribution shows the shows the number of items with values less than or equal tonumber of items with values less than or equal to the upper limit of each class..the upper limit of each class..
Cumulative relative frequency distributionCumulative relative frequency distribution – shows – shows the proportion of items with values less than orthe proportion of items with values less than or equal to the upper limit of each class.equal to the upper limit of each class.
Cumulative relative frequency distributionCumulative relative frequency distribution – shows – shows the proportion of items with values less than orthe proportion of items with values less than or equal to the upper limit of each class.equal to the upper limit of each class.
Cumulative DistributionsCumulative Distributions
Cumulative percent frequency distributionCumulative percent frequency distribution – shows – shows the percentage of items with values less than orthe percentage of items with values less than or equal to the upper limit of each class.equal to the upper limit of each class.
Cumulative percent frequency distributionCumulative percent frequency distribution – shows – shows the percentage of items with values less than orthe percentage of items with values less than or equal to the upper limit of each class.equal to the upper limit of each class.
Cumulative Distributions Hudson Auto Repair
<< 59 59
<< 69 69
<< 79 79
<< 89 89
<< 99 99
<< 109 109
Cost ($)Cost ($) CumulativeCumulativeFrequencyFrequency
CumulativeCumulativeRelativeRelative
FrequencyFrequency
CumulativeCumulativePercentPercent
FrequencyFrequency 22
1515
3131
3838
4545
50 50
.04.04
.30.30
.62.62
.76.76
.90.90
1.00 1.00
44
3030
6262
7676
9090
100 100
2 + 2 + 1313
15/5015/50 .30(10.30(100)0)
Cumulative frequency distributionCumulative frequency distribution shows the shows the number of items with values less than or equal tonumber of items with values less than or equal to the upper limit of each class..the upper limit of each class..
Cumulative frequency distributionCumulative frequency distribution shows the shows the number of items with values less than or equal tonumber of items with values less than or equal to the upper limit of each class..the upper limit of each class..
OgiveOgive
An An ogiveogive is a graph of a cumulative is a graph of a cumulative distribution.distribution. The data values are shown on the horizontal The data values are shown on the horizontal axis.axis. Shown on the vertical axis are the:Shown on the vertical axis are the:• cumulative frequencies, orcumulative frequencies, or• cumulative relative frequencies, orcumulative relative frequencies, or• cumulative percent frequenciescumulative percent frequencies
The frequency (one of the above) of each class The frequency (one of the above) of each class is plotted as a point.is plotted as a point.
The plotted points are connected by straight The plotted points are connected by straight lines.lines.
• Because the class limits for the parts-cost Because the class limits for the parts-cost data are 50-59, 60-69, and so on, there data are 50-59, 60-69, and so on, there appear to be one-unit gaps from 59 to 60, appear to be one-unit gaps from 59 to 60, 69 to 70, and so on.69 to 70, and so on.
OgiveOgive
• These gaps are eliminated by plotting points These gaps are eliminated by plotting points halfway between the class limits.halfway between the class limits.
• Thus, 59.5 is used for the 50-59 class, 69.5 Thus, 59.5 is used for the 50-59 class, 69.5 is used for the 60-69 class, and so on.is used for the 60-69 class, and so on.
Hudson Auto RepairHudson Auto Repair
PartsPartsCost ($)Cost ($)
20
40
60
80
100
Cu
mu
lati
ve P
erc
en
t Fr
eq
uen
cyC
um
ula
tive P
erc
en
t Fr
eq
uen
cy
50 60 70 80 90 100 11050 60 70 80 90 100 110
(89.5, (89.5, 76)76)
Ogive withOgive with
Cumulative Percent Frequencies Cumulative Percent Frequencies
Tune-up Parts CostTune-up Parts CostTune-up Parts CostTune-up Parts Cost
Class Limits
f ˂cf ˃cf ˂cpf ˃cpf
46-48 1 35 1 100 2.86
43-45 1 34 2 97.14 5.70
40-42 2 33 4 94.29 11.43
37-39 3 31 7 88.57 17.14
34-36 3 28 10 80.00 28.57
31-33 4 25 14 71.43 40.00
28-30 7 21 21 60.00 60.00
25-27 5 14 26 40.00 74.29
22-24 3 9 29 25.71 82.86
19-21 2 6 31 17.14 88.57
16-18 2 4 33 11.43 94.29
13-15 1 2 34 5.70 97.14
10-12 1 1 35 2.86 100.0
N = 35