Statistics for Business and Economics

Preview:

DESCRIPTION

Statistics for Business and Economics. Chapter 2 Methods for Describing Sets of Data. Learning Objectives. Describe Qualitative Data Graphically Describe Quantitative Data Graphically Explain Numerical Data Properties Describe Summary Measures - PowerPoint PPT Presentation

Citation preview

Statistics for Business and Economics

Chapter 2 Methods for Describing

Sets of Data

Learning Objectives

1. Describe Qualitative Data Graphically2. Describe Quantitative Data Graphically3. Explain Numerical Data Properties4. Describe Summary Measures

5. Analyze Numerical Data UsingSummary Measures

Thinking ChallengeOur market share far exceeds all competitors! - VP

30%30%

32%32%

34%34%

36%36%

UsYYXX

Data Presentation

Data Presentation

QualitativeData

QuantitativeData

SummaryTable

Stem-&-LeafDisplay

FrequencyDistribution

HistogramBarGraph

PieChart

ParetoDiagram

Presenting Qualitative Data

Data Presentation

PieChart

ParetoDiagram

Data Presentation

QualitativeData

QuantitativeData

SummaryTable

Stem-&-LeafDisplay

FrequencyDistribution

HistogramBarGraph

Summary Table1. Lists categories & number of elements in category

2. Obtained by tallying responses in category

3. May show frequencies (counts), % or both

Row Is Category

Tally:|||| |||||||| ||||

Major CountAccounting 130Economics 20Management 50Total 200

Data Presentation

PieChart

SummaryTable

Data Presentation

QualitativeData

QuantitativeData

Stem-&-LeafDisplay

FrequencyDistribution

HistogramBarGraph

ParetoDiagram

0

50

100

150

Acct. Econ. Mgmt.

Major

Bar Graph

Vertical Bars for Qualitative Variables

Bar Height Shows Frequency or %

Zero Point

Percent Used Also

Equal Bar Widths

Freq

uenc

y

Data Presentation

Data Presentation

QualitativeData

QuantitativeData

SummaryTable

Stem-&-LeafDisplay

FrequencyDistribution

HistogramBarGraph

PieChart

ParetoDiagram

Econ.10%

Mgmt.25%

Acct.65%

Pie Chart

1. Shows breakdown of total quantity into categories

2. Useful for showing relative differences

3. Angle size• (360°)(percent)

Majors

(360°) (10%) = 36°

36°

Data Presentation

Data Presentation

QualitativeData

QuantitativeData

SummaryTable

Stem-&-LeafDisplay

FrequencyDistribution

HistogramBarGraph

PieChart

ParetoDiagram

Pareto DiagramLike a bar graph, but with the categories arranged by height in descending order from left to right.

0

50

100

150

Acct. Mgmt. Econ.

Major Vertical Bars for Qualitative Variables

Bar Height Shows Frequency or %

Zero Point

Percent Used Also

Equal Bar Widths

Freq

uenc

y

Thinking ChallengeYou’re an analyst for IRI. You want to show the market shares held by Web browsers in 2006. Construct a bar graph, pie chart, & Pareto diagram to describe the data.

Browser Mkt. Share (%)Firefox 14Internet Explorer 81Safari 4Others 1

0%

20%

40%

60%

80%

100%

Firefox InternetExplorer

Safari Others

Bar Graph Solution*M

arke

t Sha

re (%

)

Browser

Pie Chart Solution*Market Share

Safari, 4%

Firefox, 14%

Internet Explorer,

81%

Others, 1%

Pareto Diagram Solution*

0%

20%

40%

60%

80%

100%

InternetExplorer

Firefox Safari Others

Mar

ket S

hare

(%)

Browser

Presenting Quantitative Data

Data Presentation

Data Presentation

QualitativeData

QuantitativeData

SummaryTable

Stem-&-LeafDisplay

FrequencyDistribution

HistogramBarGraph

PieChart

ParetoDiagram

Stem-and-Leaf Display1. Divide each observation

into stem value and leaf value• Stem value defines

class• Leaf value defines

frequency (count)

2. Data: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41

262 144677

3 028

4 1

Data Presentation

Data Presentation

QualitativeData

QuantitativeData

SummaryTable

Stem-&-LeafDisplay

FrequencyDistribution

HistogramBarGraph

PieChart

ParetoDiagram

Frequency Distribution Table Steps

1. Determine range

2. Select number of classes• Usually between 5 & 15 inclusive

3. Compute class intervals (width)

4. Determine class boundaries (limits)

5. Compute class midpoints

6. Count observations & assign to classes

Frequency Distribution Table Example

Raw Data: 24, 26, 24, 21, 27 27 30, 41, 32, 38

Boundaries (Lower + Upper Boundaries) / 2

Width

Class Midpoint Frequency

15.5 – 25.5 20.5 3

25.5 – 35.5 30.5 5

35.5 – 45.5 40.5 2

Relative Frequency & % Distribution Tables

Percentage Distribution

Relative Frequency Distribution

Class Prop.

15.5 – 25.5 .3

25.5 – 35.5 .5

35.5 – 45.5 .2

Class %

15.5 – 25.5 30.0

25.5 – 35.5 50.0

35.5 – 45.5 20.0

Data Presentation

Data Presentation

QualitativeData

QuantitativeData

SummaryTable

Stem-&-LeafDisplay

FrequencyDistribution

HistogramBarGraph

PieChart

ParetoDiagram

012345

Histogram

Frequency

Relative Frequency

Percent

0 15.5 25.5 35.5 45.5 55.5

Lower Boundary

Bars Touch

Class Freq.15.5 – 25.5 325.5 – 35.5 535.5 – 45.5 2

Count

Numerical Data Properties

Thinking Challenge

... employees cite low pay -- most workers earn only $20,000.

... President claims average pay is $70,000!

$400,000$400,000

$70,000$70,000

$50,000$50,000

$30,000$30,000

$20,000$20,000

Standard Notation

Measure Sample Population

Mean X

StandardDeviation S

Variance S 2 2

Size n N

Numerical Data Properties

Central Tendency (Location)

Variation (Dispersion)

Shape

Numerical DataProperties & Measures

Numerical DataProperties

Mean

Median

Mode

CentralTendency

Range

Variance

Standard Deviation

Variation

Percentiles

RelativeStanding

Interquartile Range Z–scores

Central Tendency

Numerical DataProperties & Measures

MeanMeanMedian

Mode

Range

Variance

Standard Deviation

Interquartile Range

Numerical DataProperties

CentralTendency Variation

Percentiles

RelativeStanding

Z–scores

Mean1. Measure of central tendency2. Most common measure3. Acts as ‘balance point’4. Affected by extreme values (‘outliers’)5. Formula (sample mean)

XX

nX X X

n

ii

n

n

1 1 2 …

Mean ExampleRaw Data: 10.3 4.9 8.9 11.7 6.3 7.7

XX

nX X X X X Xi

i

n

1 1 2 3 4 5 6

6

10 3 4 9 8 9 117 6 3 7 76

8 30

. . . . . .

.

Numerical DataProperties & Measures

Mean

MedianMedianMode

Range

Variance

Standard Deviation

Interquartile Range

Numerical DataProperties

CentralTendency Variation

Percentiles

RelativeStanding

Z–scores

Median

1. Measure of central tendency

2. Middle value in ordered sequence• If n is odd, middle value of sequence• If n is even, average of 2 middle values

3. Position of median in sequence

4. Not affected by extreme values

Positioning Point n 12

Median Example Odd-Sized Sample

• Raw Data: 24.1 22.6 21.5 23.7 22.6• Ordered: 21.5 22.6 22.6 23.7 24.1• Position: 1 2 3 4 5

Positioning Point

Median

n 12

5 12

3 0

22 6

.

.

Median Example Even-Sized Sample

• Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7• Ordered: 4.9 6.3 7.7 8.9 10.3 11.7• Position: 1 2 3 4 5 6

Positioning Point

Median

n 12

6 12

3 5

7 7 8 92

8 30

.

. . .

Numerical DataProperties & Measures

Mean

Median

ModeMode

Range

Variance

Standard Deviation

Interquartile Range

Numerical DataProperties

CentralTendency Variation

Percentiles

RelativeStanding

Z–scores

Mode1. Measure of central tendency

2. Value that occurs most often

3. Not affected by extreme values

4. May be no mode or several modes

5. May be used for quantitative or qualitative data

Mode Example• No Mode

Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7

• One ModeRaw Data: 6.3 4.9 8.9 6.3 4.9 4.9

• More Than 1 ModeRaw Data: 21 28 28 41 43 43

Thinking ChallengeYou’re a financial analyst for Prudential-Bache Securities. You have collected the following closing stock prices of new stock issues: 17, 16, 21, 18, 13, 16, 12, 11.Describe the stock pricesin terms of central tendency.

Central Tendency Solution*Mean

XX

nX X Xi

i

n

1 1 2 8

8

17 16 21 18 13 16 12 118

15 5

.

Central Tendency Solution*

Median• Raw Data: 17 16 21 18 13 16 12 11• Ordered: 11 12 13 16 16 17 18 21• Position: 1 2 3 4 5 6 7 8

Positioning Point

Median

n 12

8 12

4 5

16 1622

16

.

Central Tendency Solution*

ModeRaw Data: 17 16 21 18 13 16 12

11

Mode = 16

Summary of Central Tendency Measures

Measure Formula DescriptionMean Xi / n Balance PointMedian (n+1) Position

2Middle ValueWhen Ordered

Mode none Most Frequent

Shape

Shape1. Describes how data are distributed

2. Measures of Shape• Skew = Symmetry

Right-SkewedLeft-Skewed SymmetricMeanMean = = MedianMedian MeanMean MedianMedian MedianMedian MeanMean

Variation

Numerical DataProperties & Measures

Mean

Median

Mode

RangeRange

Variance

Standard Deviation

Interquartile Range

Numerical DataProperties

CentralTendency Variation

Percentiles

RelativeStanding

Z–scores

Range1. Measure of dispersion2. Difference between largest & smallest

observationsRange = Xlargest – Xsmallest

3. Ignores how data are distributed

77 88 99 1010 77 88 99 1010Range = 10 – 7 = 3 Range = 10 – 7 = 3

Numerical DataProperties & Measures

Mean

Median

Mode

Range

Interquartile Range

VarianceVarianceStandard DeviationStandard Deviation

Numerical DataProperties

CentralTendency Variation

Percentiles

RelativeStanding

Z–scores

Variance & Standard Deviation

1. Measures of dispersion

2. Most common measures

3. Consider how data are distributed

4 6 10 12

X = 8.3

4. Show variation about mean (X or μ)

8

Sample Variance Formula

n - 1 in denominator! (Use N if Population Variance)

SX X

n

ii

n

2

2

1

1

( )

X X X X X Xn

n1

2

2

2 2

1

( ) ( ) ( )…

=

Sample Standard Deviation Formula

S S

X X

n

X X X X X Xn

ii

n

n

2

2

1

12

22 2

1

1

( )

( ) ( ) ( )…

Variance ExampleRaw Data: 10.3 4.9 8.9 11.7 6.3 7.7

SX X

nX

X

n

S

ii

n

ii

n

2

2

1 1

2

2 2 2

18 3

10 3 8 3 4 9 8 3 7 7 8 36 1

6 368

( )

( ) ( ) ( )where .

. . . . . .

.

Thinking Challenge• You’re a financial analyst

for Prudential-Bache Securities. You have collected the following closing stock prices of new stock issues: 17, 16, 21, 18, 13, 16, 12, 11.

• What are the variance and standard deviation of the stock prices?

Variation Solution*Sample VarianceRaw Data: 17 16 21 18 13 16 12

11

SX X

nX

X

n

S

ii

n

ii

n

2

2

1 1

2

2 2 21

15 5

17 15 5 16 15 5 11 15 58 1

1114

( )

( ) ( ) ( )where .

. . .

.

Variation Solution*

Sample Standard Deviation

S SX X

n

ii

n

2

2

1

11114 3 34

( ). .

Summary of Variation Measures

Measure Formula DescriptionRange X largest – X smallest Total SpreadStandard Deviation(Sample)

X Xn

i

2

1

Dispersion aboutSample Mean

Standard Deviation(Population)

X

Ni X

2 Dispersion aboutPopulation Mean

Variance(Sample)

(X i X )2

n – 1Squared Dispersionabout Sample Mean

Interpreting Standard Deviation

Interpreting Standard Deviation: Chebyshev’s Theorem

• Applies to any shape data set

• No useful information about the fraction of data in the interval x – s to x + s• At least 3/4 of the data lies in the interval

x – 2s to x + 2s• At least 8/9 of the data lies in the interval

x – 3s to x + 3s• In general, for k > 1, at least 1 – 1/k2 of the data lies

in the interval x – ks to x + ks

Interpreting Standard Deviation: Chebyshev’s Theorem

sx 3 sx 3sx 2 sx 2sx xsx

No useful information

At least 3/4 of the data

At least 8/9 of the data

Chebyshev’s Theorem Example

• Previously we found the mean closing stock price of new stock issues is 15.5 and the standard deviation is 3.34.

• Use this information to form an interval that will contain at least 75% of the closing stock prices of new stock issues.

Chebyshev’s Theorem ExampleAt least 75% of the closing stock prices of new stock issues will lie within 2 standard deviations of the mean.

x = 15.5 s = 3.34

(x – 2s, x + 2s) = (15.5 – 2∙3.34, 15.5 + 2∙3.34)

= (8.82, 22.18)

Interpreting Standard Deviation: Empirical Rule

• Applies to data sets that are mound shaped and symmetric• Approximately 68% of the measurements lie in the

interval μ – σ to μ + σ• Approximately 95% of the measurements lie in the

interval μ – 2σ to μ + 2σ• Approximately 99.7% of the measurements lie in the

interval μ – 3σ to μ + 3σ

Interpreting Standard Deviation: Empirical Rule

μ – 3σ μ – 2σ μ – σ μ μ + σ μ +2σ μ + 3σ

Approximately 68% of the measurements

Approximately 95% of the measurements

Approximately 99.7% of the measurements

Empirical Rule ExamplePreviously we found the mean closing stock price of new stock issues is 15.5 and the standard deviation is 3.34. If we can assume the data is symmetric and mound shaped, calculate the percentage of the data that lie within the intervals x + s, x + 2s, x + 3s.

Empirical Rule Example

• Approximately 95% of the data will lie in the interval (x – 2s, x + 2s), (15.5 – 2∙3.34, 15.5 + 2∙3.34) = (8.82, 22.18)

• Approximately 99.7% of the data will lie in the interval (x – 3s, x + 3s), (15.5 – 3∙3.34, 15.5 + 3∙3.34) = (5.48, 25.52)

• According to the Empirical Rule, approximately 68% of the data will lie in the interval (x – s, x + s),

(15.5 – 3.34, 15.5 + 3.34) = (12.16, 18.84)

Numerical Measures of Relative Standing

Numerical DataProperties & Measures

Mean

Median

Mode

Range

Variance

Standard Deviation

Interquartile Range

Numerical DataProperties

CentralTendency Variation

PercentilesPercentiles

RelativeStanding

Z–scores

Numerical Measures of Relative Standing: Percentiles• Describes the relative location of a

measurement compared to the rest of the data• The pth percentile is a number such that p% of

the data falls below it and (100 – p)% falls above it

• Median = 50th percentile

Percentile Example• You scored 560 on the GMAT exam. This

score puts you in the 58th percentile. • What percentage of test takers scored lower

than you did?• What percentage of test takers scored higher

than you did?

Percentile Example• What percentage of test takers scored lower

than you did?58% of test takers scored lower than 560.

• What percentage of test takers scored higher than you did?

(100 – 58)% = 42% of test takers scored higher than 560.

Numerical DataProperties & Measures

Mean

Median

Mode

Range

Variance

Standard Deviation

Interquartile Range

Numerical DataProperties

CentralTendency Variation

Percentiles

RelativeStanding

Z–scoresZ–scores

Numerical Measures of Relative Standing: Z–Scores

• Describes the relative location of a measurement compared to the rest of the data

• Sample z–scorex – x

sz =

Population z–scorex – μσz =

• Measures the number of standard deviations away from the mean a data value is located

Z–Score Example• The mean time to assemble a

product is 22.5 minutes with a standard deviation of 2.5 minutes.

• Find the z–score for an item that took 20 minutes to assemble.

• Find the z–score for an item that took 27.5 minutes to assemble.

Z–Score Examplex = 20, μ = 22.5 σ = 2.5

x – μ 20 – 22.5σz = = 2.5 = –1.0

x = 27.5, μ = 22.5 σ = 2.5x – μ 27.5 – 22.5

σz = = 2.5 = 2.0

Quartiles & Box Plots

Quartiles1. Measure of noncentral tendency

25%25% 25%25% 25%25% 25%25%

QQ11 QQ22 QQ33

2. Split ordered data into 4 quarters

Positioning Point of Q i ni

14

( )3. Position of i-th quartile

Quartile (Q1) Example

• Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7• Ordered: 4.9 6.3 7.7 8.9 10.3 11.7• Position: 1 2 3 4 5 6

Q Position

Q

1

1 14

1 6 14

175 2

6 31

n( ) ( ) .

.

Quartile (Q2) Example

• Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7• Ordered: 4.9 6.3 7.7 8.9 10.3 11.7• Position: 1 2 3 4 5 6

Q Position

Q

2

2 14

2 6 14

3 5

7 7 8 92

8 32

n( ) ( ) .

. . .

Quartile (Q3) Example

• Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7• Ordered: 4.9 6.3 7.7 8.9 10.3 11.7• Position: 1 2 3 4 5 6

Q Position

Q

3

3 14

3 6 14

5 25 5

10 33

n( ) ( ) .

.

Numerical DataProperties & Measures

Mean

Median

Mode

Range

Interquartile RangeInterquartile RangeVariance

Standard Deviation

Skew

Numerical DataProperties

CentralTendency Variation Shape

Interquartile Range1. Measure of dispersion

2. Also called midspread

3. Difference between third & first quartiles• Interquartile Range = Q3 – Q1

4. Spread in middle 50%

5. Not affected by extreme values

Thinking Challenge

• You’re a financial analyst for Prudential-Bache Securities. You have collected the following closing stock prices of new stock issues: 17, 16, 21, 18, 13, 16, 12, 11.

• What are the quartiles, Q1 and Q3, and the interquartile

range?

Q1

Raw Data: 17 16 21 18 13 16 1211

Ordered: 11 12 13 16 16 17 1821

Position: 1 2 3 4 5 6 7 8

Quartile Solution*

Q Position

Q

1

1 14

1 8 14

2 5

12 51

n( ) ( ).

.

Quartile Solution*Q3

Raw Data: 17 16 21 18 13 16 1211

Ordered: 11 12 13 16 16 17 1821

Position: 1 2 3 4 5 6 7 8Q Position

Q

3

3 14

3 8 14

6 75 7

183

n( ) ( ).

Interquartile Range Solution*

Interquartile RangeRaw Data: 17 16 21 18 13 16 12

11Ordered: 11 12 13 16 16 17 18

21Position: 1 2 3 4 5 6 7 8Interquartile Range Q Q3 1 18 0 12 5 5 5. . .

Box Plot1. Graphical display of data using 5-number

summary

Median

44 66 88 1010 1212

Q3Q1 XlargestXsmallest

Shape & Box Plot

Right-SkewedLeft-Skewed Symmetric

QQ11 MedianMedian QQ33QQ11 MedianMedian QQ33 QQ11 MedianMedian QQ33

Graphing Bivariate Relationships

Graphing Bivariate Relationships

• Describes a relationship between two quantitative variables

• Plot the data in a Scattergram

Positive relationship

Negative relationship

No relationship

x xx

yy y

Scattergram Example• You’re a marketing analyst for Hasbro Toys.

You gather the following data:Ad $ (x) Sales (Units) (y)

1 12 13 24 25 4

• Draw a scattergram of the data

Scattergram Example

01234

0 1 2 3 4 5

Sales

Advertising

Time Series Plot

Time Series Plot• Used to graphically display data produced

over time• Shows trends and changes in the data over

time• Time recorded on the horizontal axis• Measurements recorded on the vertical axis• Points connected by straight lines

Time Series Plot Example• The following data shows

the average retail price of regular gasoline in New York City for 8 weeks in 2006.

• Draw a time series plot for this data.

DateAverage

PriceOct 16, 2006 $2.219Oct 23, 2006 $2.173Oct 30, 2006 $2.177Nov 6, 2006 $2.158Nov 13, 2006 $2.185Nov 20, 2006 $2.208Nov 27, 2006 $2.236Dec 4, 2006 $2.298

Time Series Plot Example

2.05

2.1

2.15

2.2

2.25

2.3

2.35

10/16 10/23 10/30 11/6 11/13 11/20 11/27 12/4

Date

Price

Distorting the Truth with Descriptive Techniques

Errors in Presenting Data1. Using ‘chart junk’

2. No relative basis in comparing data batches

3. Compressing the vertical axis

4. No zero point on the vertical axis

‘Chart Junk’

Bad PresentationBad Presentation Good PresentationGood Presentation

1960: $1.00

1970: $1.60

1980: $3.10

1990: $3.80

Minimum Wage Minimum Wage

0

2

4

1960 1970 1980 1990

$

No Relative Basis

Good PresentationGood Presentation

A’s by Class A’s by Class

Bad PresentationBad Presentation

0

100

200

300

FR SO JR SR

Freq.

0%

10%

20%

30%

FR SO JR SR

%

Compressing Vertical Axis

Good PresentationGood Presentation

Quarterly Sales Quarterly Sales

Bad PresentationBad Presentation

0

25

50

Q1 Q2 Q3 Q4

$

0

100

200

Q1 Q2 Q3 Q4

$

No Zero Point on Vertical Axis

Good PresentationGood Presentation

Monthly Sales Monthly Sales

Bad PresentationBad Presentation

0204060

J M M J S N

$

36394245

J M M J S N

$

Conclusion

1. Described Qualitative Data Graphically2. Described Numerical Data Graphically3. Explained Numerical Data Properties4. Described Summary Measures

5. Analyzed Numerical Data Using Summary Measures

Recommended