Upload
helmut
View
47
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Statistics for Business and Economics. Chapter 2 Methods for Describing Sets of Data. Learning Objectives. Describe Qualitative Data Graphically Describe Quantitative Data Graphically Explain Numerical Data Properties Describe Summary Measures - PowerPoint PPT Presentation
Citation preview
Statistics for Business and Economics
Chapter 2 Methods for Describing
Sets of Data
Learning Objectives
1. Describe Qualitative Data Graphically2. Describe Quantitative Data Graphically3. Explain Numerical Data Properties4. Describe Summary Measures
5. Analyze Numerical Data UsingSummary Measures
Thinking ChallengeOur market share far exceeds all competitors! - VP
30%30%
32%32%
34%34%
36%36%
UsYYXX
Data Presentation
Data Presentation
QualitativeData
QuantitativeData
SummaryTable
Stem-&-LeafDisplay
FrequencyDistribution
HistogramBarGraph
PieChart
ParetoDiagram
Presenting Qualitative Data
Data Presentation
PieChart
ParetoDiagram
Data Presentation
QualitativeData
QuantitativeData
SummaryTable
Stem-&-LeafDisplay
FrequencyDistribution
HistogramBarGraph
Summary Table1. Lists categories & number of elements in category
2. Obtained by tallying responses in category
3. May show frequencies (counts), % or both
Row Is Category
Tally:|||| |||||||| ||||
Major CountAccounting 130Economics 20Management 50Total 200
Data Presentation
PieChart
SummaryTable
Data Presentation
QualitativeData
QuantitativeData
Stem-&-LeafDisplay
FrequencyDistribution
HistogramBarGraph
ParetoDiagram
0
50
100
150
Acct. Econ. Mgmt.
Major
Bar Graph
Vertical Bars for Qualitative Variables
Bar Height Shows Frequency or %
Zero Point
Percent Used Also
Equal Bar Widths
Freq
uenc
y
Data Presentation
Data Presentation
QualitativeData
QuantitativeData
SummaryTable
Stem-&-LeafDisplay
FrequencyDistribution
HistogramBarGraph
PieChart
ParetoDiagram
Econ.10%
Mgmt.25%
Acct.65%
Pie Chart
1. Shows breakdown of total quantity into categories
2. Useful for showing relative differences
3. Angle size• (360°)(percent)
Majors
(360°) (10%) = 36°
36°
Data Presentation
Data Presentation
QualitativeData
QuantitativeData
SummaryTable
Stem-&-LeafDisplay
FrequencyDistribution
HistogramBarGraph
PieChart
ParetoDiagram
Pareto DiagramLike a bar graph, but with the categories arranged by height in descending order from left to right.
0
50
100
150
Acct. Mgmt. Econ.
Major Vertical Bars for Qualitative Variables
Bar Height Shows Frequency or %
Zero Point
Percent Used Also
Equal Bar Widths
Freq
uenc
y
Thinking ChallengeYou’re an analyst for IRI. You want to show the market shares held by Web browsers in 2006. Construct a bar graph, pie chart, & Pareto diagram to describe the data.
Browser Mkt. Share (%)Firefox 14Internet Explorer 81Safari 4Others 1
0%
20%
40%
60%
80%
100%
Firefox InternetExplorer
Safari Others
Bar Graph Solution*M
arke
t Sha
re (%
)
Browser
Pie Chart Solution*Market Share
Safari, 4%
Firefox, 14%
Internet Explorer,
81%
Others, 1%
Pareto Diagram Solution*
0%
20%
40%
60%
80%
100%
InternetExplorer
Firefox Safari Others
Mar
ket S
hare
(%)
Browser
Presenting Quantitative Data
Data Presentation
Data Presentation
QualitativeData
QuantitativeData
SummaryTable
Stem-&-LeafDisplay
FrequencyDistribution
HistogramBarGraph
PieChart
ParetoDiagram
Stem-and-Leaf Display1. Divide each observation
into stem value and leaf value• Stem value defines
class• Leaf value defines
frequency (count)
2. Data: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
262 144677
3 028
4 1
Data Presentation
Data Presentation
QualitativeData
QuantitativeData
SummaryTable
Stem-&-LeafDisplay
FrequencyDistribution
HistogramBarGraph
PieChart
ParetoDiagram
Frequency Distribution Table Steps
1. Determine range
2. Select number of classes• Usually between 5 & 15 inclusive
3. Compute class intervals (width)
4. Determine class boundaries (limits)
5. Compute class midpoints
6. Count observations & assign to classes
Frequency Distribution Table Example
Raw Data: 24, 26, 24, 21, 27 27 30, 41, 32, 38
Boundaries (Lower + Upper Boundaries) / 2
Width
Class Midpoint Frequency
15.5 – 25.5 20.5 3
25.5 – 35.5 30.5 5
35.5 – 45.5 40.5 2
Relative Frequency & % Distribution Tables
Percentage Distribution
Relative Frequency Distribution
Class Prop.
15.5 – 25.5 .3
25.5 – 35.5 .5
35.5 – 45.5 .2
Class %
15.5 – 25.5 30.0
25.5 – 35.5 50.0
35.5 – 45.5 20.0
Data Presentation
Data Presentation
QualitativeData
QuantitativeData
SummaryTable
Stem-&-LeafDisplay
FrequencyDistribution
HistogramBarGraph
PieChart
ParetoDiagram
012345
Histogram
Frequency
Relative Frequency
Percent
0 15.5 25.5 35.5 45.5 55.5
Lower Boundary
Bars Touch
Class Freq.15.5 – 25.5 325.5 – 35.5 535.5 – 45.5 2
Count
Numerical Data Properties
Thinking Challenge
... employees cite low pay -- most workers earn only $20,000.
... President claims average pay is $70,000!
$400,000$400,000
$70,000$70,000
$50,000$50,000
$30,000$30,000
$20,000$20,000
Standard Notation
Measure Sample Population
Mean X
StandardDeviation S
Variance S 2 2
Size n N
Numerical Data Properties
Central Tendency (Location)
Variation (Dispersion)
Shape
Numerical DataProperties & Measures
Numerical DataProperties
Mean
Median
Mode
CentralTendency
Range
Variance
Standard Deviation
Variation
Percentiles
RelativeStanding
Interquartile Range Z–scores
Central Tendency
Numerical DataProperties & Measures
MeanMeanMedian
Mode
Range
Variance
Standard Deviation
Interquartile Range
Numerical DataProperties
CentralTendency Variation
Percentiles
RelativeStanding
Z–scores
Mean1. Measure of central tendency2. Most common measure3. Acts as ‘balance point’4. Affected by extreme values (‘outliers’)5. Formula (sample mean)
XX
nX X X
n
ii
n
n
1 1 2 …
Mean ExampleRaw Data: 10.3 4.9 8.9 11.7 6.3 7.7
XX
nX X X X X Xi
i
n
1 1 2 3 4 5 6
6
10 3 4 9 8 9 117 6 3 7 76
8 30
. . . . . .
.
Numerical DataProperties & Measures
Mean
MedianMedianMode
Range
Variance
Standard Deviation
Interquartile Range
Numerical DataProperties
CentralTendency Variation
Percentiles
RelativeStanding
Z–scores
Median
1. Measure of central tendency
2. Middle value in ordered sequence• If n is odd, middle value of sequence• If n is even, average of 2 middle values
3. Position of median in sequence
4. Not affected by extreme values
Positioning Point n 12
Median Example Odd-Sized Sample
• Raw Data: 24.1 22.6 21.5 23.7 22.6• Ordered: 21.5 22.6 22.6 23.7 24.1• Position: 1 2 3 4 5
Positioning Point
Median
n 12
5 12
3 0
22 6
.
.
Median Example Even-Sized Sample
• Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7• Ordered: 4.9 6.3 7.7 8.9 10.3 11.7• Position: 1 2 3 4 5 6
Positioning Point
Median
n 12
6 12
3 5
7 7 8 92
8 30
.
. . .
Numerical DataProperties & Measures
Mean
Median
ModeMode
Range
Variance
Standard Deviation
Interquartile Range
Numerical DataProperties
CentralTendency Variation
Percentiles
RelativeStanding
Z–scores
Mode1. Measure of central tendency
2. Value that occurs most often
3. Not affected by extreme values
4. May be no mode or several modes
5. May be used for quantitative or qualitative data
Mode Example• No Mode
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
• One ModeRaw Data: 6.3 4.9 8.9 6.3 4.9 4.9
• More Than 1 ModeRaw Data: 21 28 28 41 43 43
Thinking ChallengeYou’re a financial analyst for Prudential-Bache Securities. You have collected the following closing stock prices of new stock issues: 17, 16, 21, 18, 13, 16, 12, 11.Describe the stock pricesin terms of central tendency.
Central Tendency Solution*Mean
XX
nX X Xi
i
n
1 1 2 8
8
17 16 21 18 13 16 12 118
15 5
…
.
Central Tendency Solution*
Median• Raw Data: 17 16 21 18 13 16 12 11• Ordered: 11 12 13 16 16 17 18 21• Position: 1 2 3 4 5 6 7 8
Positioning Point
Median
n 12
8 12
4 5
16 1622
16
.
Central Tendency Solution*
ModeRaw Data: 17 16 21 18 13 16 12
11
Mode = 16
Summary of Central Tendency Measures
Measure Formula DescriptionMean Xi / n Balance PointMedian (n+1) Position
2Middle ValueWhen Ordered
Mode none Most Frequent
Shape
Shape1. Describes how data are distributed
2. Measures of Shape• Skew = Symmetry
Right-SkewedLeft-Skewed SymmetricMeanMean = = MedianMedian MeanMean MedianMedian MedianMedian MeanMean
Variation
Numerical DataProperties & Measures
Mean
Median
Mode
RangeRange
Variance
Standard Deviation
Interquartile Range
Numerical DataProperties
CentralTendency Variation
Percentiles
RelativeStanding
Z–scores
Range1. Measure of dispersion2. Difference between largest & smallest
observationsRange = Xlargest – Xsmallest
3. Ignores how data are distributed
77 88 99 1010 77 88 99 1010Range = 10 – 7 = 3 Range = 10 – 7 = 3
Numerical DataProperties & Measures
Mean
Median
Mode
Range
Interquartile Range
VarianceVarianceStandard DeviationStandard Deviation
Numerical DataProperties
CentralTendency Variation
Percentiles
RelativeStanding
Z–scores
Variance & Standard Deviation
1. Measures of dispersion
2. Most common measures
3. Consider how data are distributed
4 6 10 12
X = 8.3
4. Show variation about mean (X or μ)
8
Sample Variance Formula
n - 1 in denominator! (Use N if Population Variance)
SX X
n
ii
n
2
2
1
1
( )
X X X X X Xn
n1
2
2
2 2
1
( ) ( ) ( )…
=
Sample Standard Deviation Formula
S S
X X
n
X X X X X Xn
ii
n
n
2
2
1
12
22 2
1
1
( )
( ) ( ) ( )…
Variance ExampleRaw Data: 10.3 4.9 8.9 11.7 6.3 7.7
SX X
nX
X
n
S
ii
n
ii
n
2
2
1 1
2
2 2 2
18 3
10 3 8 3 4 9 8 3 7 7 8 36 1
6 368
( )
( ) ( ) ( )where .
. . . . . .
.
…
Thinking Challenge• You’re a financial analyst
for Prudential-Bache Securities. You have collected the following closing stock prices of new stock issues: 17, 16, 21, 18, 13, 16, 12, 11.
• What are the variance and standard deviation of the stock prices?
Variation Solution*Sample VarianceRaw Data: 17 16 21 18 13 16 12
11
SX X
nX
X
n
S
ii
n
ii
n
2
2
1 1
2
2 2 21
15 5
17 15 5 16 15 5 11 15 58 1
1114
( )
( ) ( ) ( )where .
. . .
.
…
Variation Solution*
Sample Standard Deviation
S SX X
n
ii
n
2
2
1
11114 3 34
( ). .
Summary of Variation Measures
Measure Formula DescriptionRange X largest – X smallest Total SpreadStandard Deviation(Sample)
X Xn
i
2
1
Dispersion aboutSample Mean
Standard Deviation(Population)
X
Ni X
2 Dispersion aboutPopulation Mean
Variance(Sample)
(X i X )2
n – 1Squared Dispersionabout Sample Mean
Interpreting Standard Deviation
Interpreting Standard Deviation: Chebyshev’s Theorem
• Applies to any shape data set
• No useful information about the fraction of data in the interval x – s to x + s• At least 3/4 of the data lies in the interval
x – 2s to x + 2s• At least 8/9 of the data lies in the interval
x – 3s to x + 3s• In general, for k > 1, at least 1 – 1/k2 of the data lies
in the interval x – ks to x + ks
Interpreting Standard Deviation: Chebyshev’s Theorem
sx 3 sx 3sx 2 sx 2sx xsx
No useful information
At least 3/4 of the data
At least 8/9 of the data
Chebyshev’s Theorem Example
• Previously we found the mean closing stock price of new stock issues is 15.5 and the standard deviation is 3.34.
• Use this information to form an interval that will contain at least 75% of the closing stock prices of new stock issues.
Chebyshev’s Theorem ExampleAt least 75% of the closing stock prices of new stock issues will lie within 2 standard deviations of the mean.
x = 15.5 s = 3.34
(x – 2s, x + 2s) = (15.5 – 2∙3.34, 15.5 + 2∙3.34)
= (8.82, 22.18)
Interpreting Standard Deviation: Empirical Rule
• Applies to data sets that are mound shaped and symmetric• Approximately 68% of the measurements lie in the
interval μ – σ to μ + σ• Approximately 95% of the measurements lie in the
interval μ – 2σ to μ + 2σ• Approximately 99.7% of the measurements lie in the
interval μ – 3σ to μ + 3σ
Interpreting Standard Deviation: Empirical Rule
μ – 3σ μ – 2σ μ – σ μ μ + σ μ +2σ μ + 3σ
Approximately 68% of the measurements
Approximately 95% of the measurements
Approximately 99.7% of the measurements
Empirical Rule ExamplePreviously we found the mean closing stock price of new stock issues is 15.5 and the standard deviation is 3.34. If we can assume the data is symmetric and mound shaped, calculate the percentage of the data that lie within the intervals x + s, x + 2s, x + 3s.
Empirical Rule Example
• Approximately 95% of the data will lie in the interval (x – 2s, x + 2s), (15.5 – 2∙3.34, 15.5 + 2∙3.34) = (8.82, 22.18)
• Approximately 99.7% of the data will lie in the interval (x – 3s, x + 3s), (15.5 – 3∙3.34, 15.5 + 3∙3.34) = (5.48, 25.52)
• According to the Empirical Rule, approximately 68% of the data will lie in the interval (x – s, x + s),
(15.5 – 3.34, 15.5 + 3.34) = (12.16, 18.84)
Numerical Measures of Relative Standing
Numerical DataProperties & Measures
Mean
Median
Mode
Range
Variance
Standard Deviation
Interquartile Range
Numerical DataProperties
CentralTendency Variation
PercentilesPercentiles
RelativeStanding
Z–scores
Numerical Measures of Relative Standing: Percentiles• Describes the relative location of a
measurement compared to the rest of the data• The pth percentile is a number such that p% of
the data falls below it and (100 – p)% falls above it
• Median = 50th percentile
Percentile Example• You scored 560 on the GMAT exam. This
score puts you in the 58th percentile. • What percentage of test takers scored lower
than you did?• What percentage of test takers scored higher
than you did?
Percentile Example• What percentage of test takers scored lower
than you did?58% of test takers scored lower than 560.
• What percentage of test takers scored higher than you did?
(100 – 58)% = 42% of test takers scored higher than 560.
Numerical DataProperties & Measures
Mean
Median
Mode
Range
Variance
Standard Deviation
Interquartile Range
Numerical DataProperties
CentralTendency Variation
Percentiles
RelativeStanding
Z–scoresZ–scores
Numerical Measures of Relative Standing: Z–Scores
• Describes the relative location of a measurement compared to the rest of the data
• Sample z–scorex – x
sz =
Population z–scorex – μσz =
• Measures the number of standard deviations away from the mean a data value is located
Z–Score Example• The mean time to assemble a
product is 22.5 minutes with a standard deviation of 2.5 minutes.
• Find the z–score for an item that took 20 minutes to assemble.
• Find the z–score for an item that took 27.5 minutes to assemble.
Z–Score Examplex = 20, μ = 22.5 σ = 2.5
x – μ 20 – 22.5σz = = 2.5 = –1.0
x = 27.5, μ = 22.5 σ = 2.5x – μ 27.5 – 22.5
σz = = 2.5 = 2.0
Quartiles & Box Plots
Quartiles1. Measure of noncentral tendency
25%25% 25%25% 25%25% 25%25%
QQ11 QQ22 QQ33
2. Split ordered data into 4 quarters
Positioning Point of Q i ni
14
( )3. Position of i-th quartile
Quartile (Q1) Example
• Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7• Ordered: 4.9 6.3 7.7 8.9 10.3 11.7• Position: 1 2 3 4 5 6
Q Position
Q
1
1 14
1 6 14
175 2
6 31
n( ) ( ) .
.
Quartile (Q2) Example
• Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7• Ordered: 4.9 6.3 7.7 8.9 10.3 11.7• Position: 1 2 3 4 5 6
Q Position
Q
2
2 14
2 6 14
3 5
7 7 8 92
8 32
n( ) ( ) .
. . .
Quartile (Q3) Example
• Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7• Ordered: 4.9 6.3 7.7 8.9 10.3 11.7• Position: 1 2 3 4 5 6
Q Position
Q
3
3 14
3 6 14
5 25 5
10 33
n( ) ( ) .
.
Numerical DataProperties & Measures
Mean
Median
Mode
Range
Interquartile RangeInterquartile RangeVariance
Standard Deviation
Skew
Numerical DataProperties
CentralTendency Variation Shape
Interquartile Range1. Measure of dispersion
2. Also called midspread
3. Difference between third & first quartiles• Interquartile Range = Q3 – Q1
4. Spread in middle 50%
5. Not affected by extreme values
Thinking Challenge
• You’re a financial analyst for Prudential-Bache Securities. You have collected the following closing stock prices of new stock issues: 17, 16, 21, 18, 13, 16, 12, 11.
• What are the quartiles, Q1 and Q3, and the interquartile
range?
Q1
Raw Data: 17 16 21 18 13 16 1211
Ordered: 11 12 13 16 16 17 1821
Position: 1 2 3 4 5 6 7 8
Quartile Solution*
Q Position
Q
1
1 14
1 8 14
2 5
12 51
n( ) ( ).
.
Quartile Solution*Q3
Raw Data: 17 16 21 18 13 16 1211
Ordered: 11 12 13 16 16 17 1821
Position: 1 2 3 4 5 6 7 8Q Position
Q
3
3 14
3 8 14
6 75 7
183
n( ) ( ).
Interquartile Range Solution*
Interquartile RangeRaw Data: 17 16 21 18 13 16 12
11Ordered: 11 12 13 16 16 17 18
21Position: 1 2 3 4 5 6 7 8Interquartile Range Q Q3 1 18 0 12 5 5 5. . .
Box Plot1. Graphical display of data using 5-number
summary
Median
44 66 88 1010 1212
Q3Q1 XlargestXsmallest
Shape & Box Plot
Right-SkewedLeft-Skewed Symmetric
QQ11 MedianMedian QQ33QQ11 MedianMedian QQ33 QQ11 MedianMedian QQ33
Graphing Bivariate Relationships
Graphing Bivariate Relationships
• Describes a relationship between two quantitative variables
• Plot the data in a Scattergram
Positive relationship
Negative relationship
No relationship
x xx
yy y
Scattergram Example• You’re a marketing analyst for Hasbro Toys.
You gather the following data:Ad $ (x) Sales (Units) (y)
1 12 13 24 25 4
• Draw a scattergram of the data
Scattergram Example
01234
0 1 2 3 4 5
Sales
Advertising
Time Series Plot
Time Series Plot• Used to graphically display data produced
over time• Shows trends and changes in the data over
time• Time recorded on the horizontal axis• Measurements recorded on the vertical axis• Points connected by straight lines
Time Series Plot Example• The following data shows
the average retail price of regular gasoline in New York City for 8 weeks in 2006.
• Draw a time series plot for this data.
DateAverage
PriceOct 16, 2006 $2.219Oct 23, 2006 $2.173Oct 30, 2006 $2.177Nov 6, 2006 $2.158Nov 13, 2006 $2.185Nov 20, 2006 $2.208Nov 27, 2006 $2.236Dec 4, 2006 $2.298
Time Series Plot Example
2.05
2.1
2.15
2.2
2.25
2.3
2.35
10/16 10/23 10/30 11/6 11/13 11/20 11/27 12/4
Date
Price
Distorting the Truth with Descriptive Techniques
Errors in Presenting Data1. Using ‘chart junk’
2. No relative basis in comparing data batches
3. Compressing the vertical axis
4. No zero point on the vertical axis
‘Chart Junk’
Bad PresentationBad Presentation Good PresentationGood Presentation
1960: $1.00
1970: $1.60
1980: $3.10
1990: $3.80
Minimum Wage Minimum Wage
0
2
4
1960 1970 1980 1990
$
No Relative Basis
Good PresentationGood Presentation
A’s by Class A’s by Class
Bad PresentationBad Presentation
0
100
200
300
FR SO JR SR
Freq.
0%
10%
20%
30%
FR SO JR SR
%
Compressing Vertical Axis
Good PresentationGood Presentation
Quarterly Sales Quarterly Sales
Bad PresentationBad Presentation
0
25
50
Q1 Q2 Q3 Q4
$
0
100
200
Q1 Q2 Q3 Q4
$
No Zero Point on Vertical Axis
Good PresentationGood Presentation
Monthly Sales Monthly Sales
Bad PresentationBad Presentation
0204060
J M M J S N
$
36394245
J M M J S N
$
Conclusion
1. Described Qualitative Data Graphically2. Described Numerical Data Graphically3. Explained Numerical Data Properties4. Described Summary Measures
5. Analyzed Numerical Data Using Summary Measures