74
1 Business Statistics Graphs, Charts, and Graphs, Charts, and Tables – Describing Your Tables – Describing Your Data Data Dr.M.Raghunadh Acharya 06/13/22

Basic Stat Notes

Embed Size (px)

Citation preview

Page 1: Basic Stat Notes

1

Business Statistics

Graphs, Charts, and Tables – Graphs, Charts, and Tables – Describing Your DataDescribing Your Data

Graphs, Charts, and Tables – Graphs, Charts, and Tables – Describing Your DataDescribing Your Data

Dr.M.Raghunadh Acharya04/15/23

Page 2: Basic Stat Notes

2

Contents …

• Construct a frequency distribution both manually and with a computer

• Construct and interpret a histogram

• Create and interpret bar charts, pie charts, and stem-and-leaf diagrams

• Present and interpret data in line charts and scatter diagrams

04/15/23

Page 3: Basic Stat Notes

3

Frequency Distributions

What is a Frequency Distribution?

• A frequency distribution is a list or a table …

• containing the values of a variable (or a set of ranges within which the data falls) ...

• and the corresponding frequencies with which each value occurs (or frequencies with which data falls within each range)

04/15/23

Page 4: Basic Stat Notes

4

Why Use Frequency Distributions?

• A frequency distribution is a way to summarize data

• The distribution condenses the raw data into a more useful form...

• and allows for a quick visual interpretation of the data

04/15/23

Page 5: Basic Stat Notes

5

Frequency Distribution: Discrete Data

• Discrete data: possible values are countable

Example: An advertiser asks 200 customers how many days per week they read the daily newspaper.

Number of days read

Frequency

0 44

1 24

2 18

3 16

4 20

5 22

6 26

7 30

Total 20004/15/23

Page 6: Basic Stat Notes

6

Relative FrequencyRelative Frequency: What proportion is in each category?

Number of days read

FrequencyRelative

Frequency

0 44 .22

1 24 .12

2 18 .09

3 16 .08

4 20 .10

5 22 .11

6 26 .13

7 30 .15

Total 200 1.00

.22200

44

22% of the people in the sample report that they read the newspaper 0 days per week

04/15/23

Page 7: Basic Stat Notes

7

Frequency Distribution: Continuous Data

• Continuous Data: may take on any value in some interval

Example: A manufacturer of insulation randomly selects 20 winter days and records the daily high temperature

24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53, 27

(Temperature is a continuous variable because it could be measured to any degree of precision desired)

04/15/23

Page 8: Basic Stat Notes

8

Grouping Data by Classes

Sort raw data in ascending order:12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43,

44, 46, 53, 58

• Find range: 58 - 12 = 46

• Select number of classes: 5 (usually between 5 and 20)

• Compute class width: 10 (46/5 then round off)

• Determine class boundaries:10, 20, 30, 40, 50

• Compute class midpoints: 15, 25, 35, 45, 55

• Count observations & assign to classes04/15/23

Page 9: Basic Stat Notes

9

Frequency Distribution Example

Data in ordered array:

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Class Frequency

10 but under 20 3 .15

20 but under 30 6 .30

30 but under 40 5 .25

40 but under 50 4 .20

50 but under 60 2 .10

Total 20 1.00

RelativeFrequency

Frequency Distribution

04/15/23

Page 10: Basic Stat Notes

10

Histograms

• The classes or intervals are shown on the horizontal axis

• frequency is measured on the vertical axis

• Bars of the appropriate heights can be used to represent the number of observations within each class

• Such a graph is called a histogram

04/15/23

Page 11: Basic Stat Notes

11

Histogram

0

3

6

5

4

2

00

1

2

3

4

5

6

7

5 15 25 36 45 55 More

Fre

qu

en

cy

Class Midpoints

Histogram Example

Data in ordered array:12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

No gaps between

bars, since continuous

data

04/15/23

Page 12: Basic Stat Notes

12

Questions for Grouping Data into Classes

• 1. How wide should each interval be? (How many classes should be used?)

• 2. How should the endpoints of the intervals be determined?

• Often answered by trial and error, subject to user judgment

• The goal is to create a distribution that is neither too "jagged" nor too "blocky”

• Goal is to appropriately show the pattern of variation in the data

04/15/23

Page 13: Basic Stat Notes

13

How Many Class Intervals?

• Many (Narrow class intervals)

• may yield a very jagged distribution with gaps from empty classes

• Can give a poor indication of how frequency varies across classes

• Few (Wide class intervals)• may compress variation too much

and yield a blocky distribution• can obscure important patterns of

variation.0

2

4

6

8

10

12

0 30 60 More

TemperatureF

req

ue

nc

y

0

0.5

1

1.5

2

2.5

3

3.5

4 8

12

16

20

24

28

32

36

40

44

48

52

56

60

Mo

re

Temperature

Fre

qu

en

cy

(X axis labels are upper class endpoints)04/15/23

Page 14: Basic Stat Notes

14

General Guidelines

• Number of Data Points Number of Classes

under 50 5 - 7 50 – 100 6 - 10 100 – 250 7 - 12 over 250 10 - 20

– Class widths can typically be reduced as the number of observations increases

– Distributions with numerous observations are more likely to be smooth and have gaps filled since data are plentiful

04/15/23

Page 15: Basic Stat Notes

15

Class Width

• The class width is the distance between the lowest possible value and the highest possible value for a frequency class

• The minimum class width is

Largest Value Smallest ValueNumber of Classes

W =

04/15/23

Page 16: Basic Stat Notes

16

Histograms in Excel

SelectTools/Data

Analysis

1

04/15/23

Page 17: Basic Stat Notes

17

Choose Histogram

2

3

Input data and bin ranges

Select Chart Output

Histograms in Excel(continued)

04/15/23

Page 18: Basic Stat Notes

18

Stem and Leaf Diagram

• A simple way to see distribution details in a data set

METHOD: Separate the sorted data series into leading digits (the stem) and the trailing digits (the leaves)

04/15/23

Page 19: Basic Stat Notes

19

Example:

• Here, use the 10’s digit for the stem unit:

Data in ordered array:12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

• 12 is shown as

• 35 is shown as

Stem Leaf

1 2

3 5

04/15/23

Page 20: Basic Stat Notes

20

Example:

• Completed Stem-and-leaf diagram:

Data in ordered array:12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Stem Leaves

1 2 3 7

2 1 4 4 6 7 8

3 0 2 5 7 8

4 1 3 4 6

5 3 8

04/15/23

Page 21: Basic Stat Notes

21

Using other stem units

• Using the 100’s digit as the stem:

– Round off the 10’s digit to form

the leaves

– 613 would become 6 1• 776 would become 7 8• . . .• 1224 becomes 12 2

Stem Leaf

04/15/23

Page 22: Basic Stat Notes

22

Graphing Categorical Data

Categorical Data

Pie Charts

Pareto Diagram

Bar Charts

04/15/23

Page 23: Basic Stat Notes

23

Bar and Pie Charts

• Bar charts and Pie charts are often used for qualitative (category) data

• Height of bar or size of pie slice shows the frequency or percentage for each category

04/15/23

Page 24: Basic Stat Notes

24

Pie Chart Example

Percentages are rounded to the nearest percent

Current Investment Portfolio

Savings

15%

CD 14%

Bonds 29%

Stocks

42%

Investment Amount PercentageType (in thousands $)

Stocks 46.5 42.27

Bonds 32.0 29.09

CD 15.5 14.09

Savings 16.0 14.55

Total 110 100

(Variables are Qualitative)

04/15/23

Page 25: Basic Stat Notes

25

Bar Chart Example

Investor's Portfolio

0 10 20 30 40 50

Stocks

Bonds

CD

Savings

Amount in $1000's

04/15/23

Page 26: Basic Stat Notes

26

Pareto Diagram Examplecu

mu

lative % in

vested

(line g

raph

)

% i

nve

sted

in

eac

h c

ateg

ory

(b

ar g

rap

h)

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

Stocks Bonds Savings CD

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

04/15/23

Page 27: Basic Stat Notes

27

Bar Chart Example

Newspaper readership per week

0

10

20

30

40

50

0 1 2 3 4 5 6 7

Number of days newspaper is read per week

Freu

ency

Number of days

read

Frequency

0 44

1 24

2 18

3 16

4 20

5 22

6 26

7 30

Total 200

04/15/23

Page 28: Basic Stat Notes

28

Tabulating and Graphing Multivariate Categorical Data

• Investment in thousands of dollarsInvestment Investor A Investor B Investor C Total Category

Stocks 46.5 55 27.5 129

Bonds 32.0 44 19.0 95

CD 15.5 20 13.5 49

Savings 16.0 28 7.0 51

Total 110.0 147 67.0 324

04/15/23

Page 29: Basic Stat Notes

29

Tabulating and Graphing Multivariate Categorical Data

• Side by side chartsComparing Investors

0 10 20 30 40 50 60

S toc k s

B onds

CD

S avings

Inves tor A Inves tor B Inves tor C

(continued)

04/15/23

Page 30: Basic Stat Notes

30

Side-by-Side Chart Example

• Sales by quarter for three sales territories:

0

10

20

30

40

50

60

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

EastWestNorth

1st Qtr 2nd Qtr 3rd Qtr 4th QtrEast 20.4 27.4 59 20.4West 30.6 38.6 34.6 31.6North 45.9 46.9 45 43.9

04/15/23

Page 31: Basic Stat Notes

31

• Line charts show values of one variable vs. time– Time is traditionally shown on the

horizontal axisScatter Diagrams show points for

bivariate data – one variable is measured on the

vertical axis and the other variable is measured on the horizontal axis

Line Charts and Scatter Diagrams

04/15/23

Page 32: Basic Stat Notes

32

Line Chart Example

U.S. Inflation Rate

0

1

2

3

4

5

6

1984 1986 1988 1990 1992 1994 1996 1998 2000 2002

Year

Infl

atio

n R

ate

(%)

Year

Inflation

Rate

1985 3.561986 1.861987 3.651988 4.141989 4.821990 5.401991 4.211992 3.011993 2.991994 2.561995 2.831996 2.951997 2.291998 1.561999 2.212000 3.362001 2.852002 1.5804/15/23

Page 33: Basic Stat Notes

33

Scatter Diagram Example

Production Volume vs. Cost per Day

0

50

100

150

200

250

0 10 20 30 40 50 60 70

Volume per Day

Cos

t per

Day

Volume per day

Cost per day

23 125

26 140

29 146

33 160

38 167

42 170

50 188

55 195

60 20004/15/23

Page 34: Basic Stat Notes

34

Types of Relationships

• Linear Relationships

X X

YY

04/15/23

Page 35: Basic Stat Notes

35

• Curvilinear Relationships

X X

YY

Types of Relationships(continued)

04/15/23

Page 36: Basic Stat Notes

36

• No Relationship

X X

YY

Types of Relationships(continued)

04/15/23

Page 37: Basic Stat Notes

37

Chapter Summary

• Data in raw form are usually not easy to use for decision making -- Some type of organization is needed:

Table Graph

• Techniques reviewed in this chapter:– Frequency Distributions and

Histograms– Bar Charts and Pie Charts– Stem and Leaf Diagrams– Line Charts and Scatter Diagrams

04/15/23

Page 38: Basic Stat Notes

38

Summarization measures are single or few number representations of the data which are helpful in representing data and also to compare between data. Based on the summary measures of the sample ,population measures can be forecasted.

The following will illustrate the above, different measures to represent the data are as follows :

1. Measures of Center and Location2. Mean, median, mode, geometric mean, midrange

3. Other measures of Location4. Weighted mean, percentiles, quartiles

5. Measures of Variation6. Range, Inter quartile range, variance and standard deviation, coefficient of variation

Summarization measures …..

04/15/23

Page 39: Basic Stat Notes

39

Center and Location

Mean

Median

Mode

Other Measures of Location

Weighted Mean

Describing Data Numerically

Variation

Variance

Standard Deviation

Coefficient of Variation

Range

Percentiles Inter quartile Range

Quartiles

Summary Measures

04/15/23

Page 40: Basic Stat Notes

40

Center and Location

Mean Median Mode Weighted Mean

N

x

n

xx

N

ii

n

ii

1

1

i

iiW

i

iiW

w

xw

w

xwX

Overview: Measures of Center and Location

04/15/23

Page 41: Basic Stat Notes

41

• The Mean is the arithmetic average of data values

– Sample mean

– Population mean

n = Sample Size

N = Population Size

N

xxx

N

xN

N

ii

211

Mean (Arithmetic Average)

n

xxx

n

xx n

n

ii

211

04/15/23

Page 42: Basic Stat Notes

42

• The most common measure of central tendency• Mean = sum of values divided by the number of values• Affected by extreme values (outliers)

0 1 2 3 4 5 6 7 8 9 10

Mean = 3

0 1 2 3 4 5 6 7 8 9 10

Mean = 4

35

15

5

54321

4

5

20

5

104321

Mean (Arithmetic Average)

04/15/23

Page 43: Basic Stat Notes

43

• Not affected by extreme values

• In an ordered array, the median is the “middle” number– If n or N is odd, the median is the middle number– If n or N is even, the median is the average of the two middle

numbers

0 1 2 3 4 5 6 7 8 9 10

Median = 3

0 1 2 3 4 5 6 7 8 9 10

Median = 3

Median

04/15/23

Page 44: Basic Stat Notes

44

• A measure of central tendency• Value that occurs most often• Not affected by extreme values• Used for either numerical or categorical data• There may be no mode• There may be several modes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 5

0 1 2 3 4 5 6

No Mode

Mode

04/15/23

Page 45: Basic Stat Notes

45

• Used when values are grouped by frequency or relative importance

Days to Complete

Frequency

5 4

6 12

7 8

8 2

Example: Sample of 26 Repair Projects

Weighted Mean Days to Complete:

days 6.31 26

164

28124

8)(27)(86)(125)(4

w

xwX

i

iiW

Weighted Mean

04/15/23

Page 46: Basic Stat Notes

46

• Five houses on a hill by the beach

$2,000 K

$500 K

$300 K

$100 K

$100 K

House Prices:

$2,000,000 500,000 300,000 100,000 100,000

Review Example

04/15/23

Page 47: Basic Stat Notes

47

• Mean: ($3,000,000/5) = $600,000

• Median: middle value of ranked data = $300,000

• Mode: most frequent value = $100,000

House Prices:

$2,000,000 500,000 300,000 100,000 100,000

Sum 3,000,000

Summary Statistics

04/15/23

Page 48: Basic Stat Notes

48

• Mean is generally used, unless extreme values (outliers) exist

• Then median is often used, since the median is not sensitive to extreme values.– Example: Median home prices may be reported for a region –

less sensitive to outliers

Which measure of location is the “best”?

04/15/23

Page 49: Basic Stat Notes

49

• Describes how data is distributed

• Symmetric or skewed

Mean = Median = Mode

Mean < Median < Mode

Mode < Median < Mean

Right-SkewedLeft-Skewed Symmetric

(Longer tail extends to left) (Longer tail extends to right)

Shape of a Distribution

04/15/23

Page 50: Basic Stat Notes

50

Other Measures of Location

Percentiles

Quartiles

• 1st quartile = 25th

percentile

• 2nd quartile = 50th percentile

= median

• 3rd quartile = 75th percentile

The pth percentile in a data array:

• p% are less than or equal to this value

• (100 – p)% are greater than or equal to this value

(where 0 ≤ p ≤ 100)

Other Location Measures

04/15/23

Page 51: Basic Stat Notes

51

• The pth percentile in an ordered array of n values is the value in ith position, where

• Example: The 60th percentile in an ordered array of 19 values

is the value in 12th position:

1)(n100

pi

121)(19100

601)(n

100

pi

Percentiles

04/15/23

Page 52: Basic Stat Notes

52

• Quartiles split the ranked data into 4 equal groups

25% 25% 25% 25%

Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22

• Example: Find the first quartile

Q1 Q2 Q3

Quartiles

(n = 9)

Q1 = 25th percentile, so find the

so use the value half way between the 2nd and 3rd values,

so

25100 (9+1) = 2.5 position

25100

Q1=12.5

04/15/23

Page 53: Basic Stat Notes

53

• A Graphical display of data using 5-number summary:

Minimum -- Q1 -- Median -- Q3 -- Maximum

Example:

Minimum 1st Median 3rd Maximum Quartile Quartile

25% 25% 25% 25%

Box and Whisker Plot

04/15/23

Page 54: Basic Stat Notes

54

• The Box and central line are centered between the endpoints if data is symmetric around the median

• A Box and Whisker plot can be shown in either vertical or horizontal format

Shape of Box and Whisker Plots

04/15/23

Page 55: Basic Stat Notes

55

Right-SkewedLeft-Skewed Symmetric

Q1 Q2 Q3 Q1 Q2 Q3Q1 Q2 Q3

Distribution Shape and Box and Whisker Plot

04/15/23

Page 56: Basic Stat Notes

56

• Below is a Box-and-Whisker plot for the following data:

0 2 2 2 3 3 4 5 5 10 27

• This data is very right skewed, as the plot depicts

0 2 3 5 270 2 3 5 27

Min Q1 Q2 Q3 Max

Box-and-Whisker Plot Example

04/15/23

Page 57: Basic Stat Notes

57

Variation

Variance

Standard Deviation

Coefficient of Variation

PopulationVariance

Sample Variance

PopulationStandardDeviationSample Standard Deviation

Range

Interquartile

Range

Measures of Variation

04/15/23

Page 58: Basic Stat Notes

58

• Measures of variation give information on the spread or variability of the data values.

Same center, different variation

Variation

04/15/23

Page 59: Basic Stat Notes

59

• Difference between the largest and the smallest observations.

Range = xmaximum – xminimum

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 14 - 1 = 13

Example:

Range

04/15/23

Page 60: Basic Stat Notes

60

7 8 9 10 11 12Range = 12 - 7 = 5

7 8 9 10 11 12 Range = 12 - 7 = 5

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

Range = 5 - 1 = 4

Range = 120 - 1 = 119

Disadvantages of the Range

Sensitive to outliers

• Ignores the way in which data are distributed

04/15/23

Page 61: Basic Stat Notes

61

• Can eliminate some outlier problems by using the Interquartile range

• Eliminate some high-and low-valued observations and calculate the range from the remaining values.

• Interquartile range = 3rd quartile – 1st quartile

Interquartile Range

04/15/23

Page 62: Basic Stat Notes

62

Median(Q2)

XmaximumX

minimum Q1 Q3

Example:

25% 25% 25% 25%

12 30 45 57 70

Interquartile range = 57 – 30 = 27

Interquartile Range

04/15/23

Page 63: Basic Stat Notes

63

• Average of squared deviations of values from the mean

– Sample variance:

– Population variance:

N

μ)(xσ

N

1i

2i

2

1- n

)x(xs

n

1i

2i

2

Variance

04/15/23

Page 64: Basic Stat Notes

64

• Most commonly used measure of variation• Shows variation about the mean• Has the same units as the original data

– Sample standard deviation:

– Population standard deviation:

N

μ)(xσ

N

1i

2i

1-n

)x(xs

n

1i

2i

Standard Deviation

04/15/23

Page 65: Basic Stat Notes

65

Sample Data (Xi) : 10 12 14 15 17 18 18 24

n = 8 Mean = x = 16

4.24267

126

18

16)(2416)(1416)(1216)(10

1n

)x(24)x(14)x(12)x(10s

2222

2222

Calculation Example: Sample Standard Deviation

04/15/23

Page 66: Basic Stat Notes

66

Mean = 15.5

s = 3.338 11 12 13 14 15 16 17 18 19 20 21

11 12 13 14 15 16 17 18 19 20 21

Data B

Data A

Mean = 15.5 s = .9258

11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5 s = 4.57

Data C

Comparing Standard Deviations

04/15/23

Page 67: Basic Stat Notes

67

• Measures relative variation

• Always in percentage (%)

• Shows variation relative to mean

• Is used to compare two or more sets of data measured in different units

100%x

sCV

100%

μ

σCV

Population Sample

Coefficient of Variation

04/15/23

Page 68: Basic Stat Notes

68

• Stock A:– Average price last year = $50– Standard deviation = $5

Both stocks have the same standard deviation, but stock B is less variable relative to its price

10%100%$50

$5100%

x

sCVA

5%100%$100

$5100%

x

sCVB

Comparing Coefficient of Variation

Stock B:Average price last year = $100Standard deviation = $5

04/15/23

Page 69: Basic Stat Notes

69

• If the data distribution is bell-shaped, then the interval:

• contains about 68% of the values in the population or the sample

The Empirical The Empirical RuleRule

1σμ

X

μ

68%

1σμ

04/15/23

Page 70: Basic Stat Notes

70

• contains about 95% of the values in the population or the sample

• contains about 99.7% of the values in the population or the sample

The Empirical RuleThe Empirical Rule

2σμ 3σμ

3σμ

99.7%95%

2σμ

04/15/23

Page 71: Basic Stat Notes

71

• Regardless of how the data are distributed, at least (1 - 1/k2) of the values will fall within k standard deviations of the mean

• Examples:

– (1 - 1/12) = 0% ……..... k=1 (μ ± 1σ)(1 - 1/22) = 75% …........ k=2 (μ ± 2σ)(1 - 1/32) = 89% …........ k=3 (μ ± 3σ)

withinAt least

Tchebysheff’s Theorem

04/15/23

Page 72: Basic Stat Notes

72

• A standardized data value refers to the number of standard deviations a value is from the mean

• Standardized data values are sometimes referred to as z-scores

Standardized Data Values

04/15/23

Page 73: Basic Stat Notes

73

where: • x = original data value• μ = population mean• σ = population standard deviation

• z = standard score

(number of standard deviations x is from μ)

σ

μx z

Standardized Population Values

04/15/23

Page 74: Basic Stat Notes

74

where: • x = original data value• x = sample mean• s = sample standard deviation• z = standard score

(number of standard deviations x is from μ)Remark: The standardized sample values are used for constructing the confidence limits for the

population parameters.

s

xx z

Standardized Sample Values

04/15/23