40
MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

Embed Size (px)

Citation preview

Page 1: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 1Georgia State University - Confidential

MBA 7025

Statistical Business Analysis

Descriptive Statistics

Jan 27, 2015

Page 2: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 2Georgia State University - Confidential

Agenda

Central Limit Theorem

Descriptive Summary Measures

1. Measures of Central LocationMean, Median, Mode

2. Measures of VariationThe Range, Percentile, Variance

and Standard Deviation

3. Measures of AssociationCoefficient of Variation

Confidence Interval

Page 3: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 3Georgia State University - Confidential

1. It is the Arithmetic Average of data values:

2. The Most Common Measure of Central Tendency

3. Affected by Extreme Values (Outliers)

n

xn

ii

1 n

xxx ni 2

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Mean = 5 Mean = 6

xSample Mean

Mean

Page 4: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 4Georgia State University - Confidential

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Median = 5 Median = 5

1. Important Measure of Central Tendency

2. In an ordered array, the median is the “middle” number.• If n is odd, the median is the middle number.• If n is even, the median is the average of the 2

middle numbers.

3. Not Affected by Extreme Values

Median

Page 5: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 5Georgia State University - Confidential

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 9

1. A Measure of Central Tendency

2. Value that Occurs Most Often

3. Not Affected by Extreme Values

4. There May Not be a Mode

5. There May be Several Modes

6. Used for Either Numerical or Categorical Data

0 1 2 3 4 5 6

No Mode

Mode

Page 6: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 6Georgia State University - Confidential

• Describes How Data Are Distributed

• Measures of Shape:

Symmetric or skewed

Right-SkewedLeft-Skewed SymmetricMean = Median = Mode Mean Median Mode Median Mean Mode

Shape

Page 7: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 7Georgia State University - Confidential

Agenda

Central Limit Theorem

Descriptive Summary Measures

1. Measures of Central LocationMean, Median, Mode

2. Measures of VariationThe Range, Percentile, Variance and Standard Deviation

3. Measures of Association Coefficient of Variation

Confidence Interval

Page 8: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 8Georgia State University - Confidential

• Measure of Variation

• Difference Between Largest & Smallest

Observations: Range =

• Ignores How Data Are Distributed:

The Range

SmallestLa xx rgest

7 8 9 10 11 12

Range = 12 - 7 = 5

7 8 9 10 11 12

Range = 12 - 7 = 5

Page 9: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 9Georgia State University - Confidential

Percentile

1. Arrange data in ascending order.

2. The middle number is the median.

3. The number halfway to the median is the first quartile.

4. The number halfway past the median is the 3rd quartile.

5. A number with (no more than) 66% of the values less than it is the 66th percentile, and so forth.

Page 10: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 10Georgia State University - Confidential

Percentile

ObsMedals

ObsMedals

ObsMedals

ObsMedals

ObsMedals

1 110 12 24 23 10 34 6 45 3

2 100 13 19 24 9 35 6 46 3

3 72 14 18 25 8 36 6 47 2

4 47 15 18 26 8 37 5 48 2

5 46 16 16 27 7 38 5 49 2

6 41 17 15 28 7 39 5 50 2

7 40 18 14 29 7 40 4 51 2

8 31 19 13 30 6 41 4 52 1

9 28 20 11 31 6 42 4 53 1

10 27 21 10 32 6 43 4 54 1

11 25 22 10 33 6 44 3 55 1

2008 Olympic Medal Tally for top 55 nations. What is the percentile score for a country with 9 medals? What is the 50th percentile?

Page 11: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 11Georgia State University - Confidential

Percentile Solutions

Order all data (ascending or descending).

1. Country with 9 medals ranks 24th out of 55. There are 31 nations (56.36%) below it and 23 nations (41.82%) above it. Hence it can be considered a 57th or 58th percentile score.

2. The medal tally that corresponds to a 50th percentile is the one in the middle of the group, or the 28th country, with 7 medals. Hence the 50th percentile (Median) is 7.

Page 12: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 12Georgia State University - Confidential

Box Plot

Median

Q1 Q3Smallest Largest

Page 13: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 13Georgia State University - Confidential

• Important Measure of Variation

• Shows Variation About the Mean:

• For the Population:

• For the Sample:

Variance

N

X i

2

2

1

2

2

n

XXs i

For the Population: use N in the denominator.

For the Sample : use n - 1 in the denominator.

N

SS2

12

n

SSs

or

or

Page 14: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 14Georgia State University - Confidential

• Most Important Measure of Variation

• Shows Variation About the Mean:

• For the Population:

• For the Sample:

Standard Deviation

N

X i

2

1

2

n

XXs i

For the Population: use N in the denominator.

For the Sample : use n - 1 in the denominator.

1n

SSs

N

SSor

or

Page 15: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 15Georgia State University - Confidential

Computing Standard Deviation

Computing Sample Variance and Standard Deviation 

Mean of X =   6    

   

    Deviation    

X From Mean Squared  

3 -3 9  

4 -2 4  

6 0 0  

8 2 4  

9 3 9  

    26 Sum of Squares

    6.50 Variance = SS/n-1

    2.55 Stdev = Sqrt(Variance)

Page 16: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 16Georgia State University - Confidential

The Normal Distribution

A property of normally distributed data is as follows:

Distance from Mean

Percent of observations included in that range

± 1 standard deviation

Approximately 68%

± 2 standard deviations

Approximately 95%

± 3 standard deviations

Approximately 99.74%

Page 17: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 17Georgia State University - Confidential

Comparing Standard Deviations

Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21

11 12 13 14 15 16 17 18 19 20 21

Data B

Data A

Mean = 15.5 s = .9258

11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5 s = 4.57

Data C

Page 18: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 18Georgia State University - Confidential

Outliers

• Typically, a number beyond a certain number of standard deviations is considered an outlier.

• In many cases, a number beyond 3 standard deviations (about 0.25% chance of occurring) is considered an outlier.

• If identifying an outlier is more critical, one can make the rule more stringent, and consider 2 standard deviations as the limit.

Page 19: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 19Georgia State University - Confidential

Agenda

Central Limit Theorem

Descriptive Summary Measures

1. Measures of Central LocationMean, Median, Mode

2. Measures of VariationThe Range, Percentile, Variance

and Standard Deviation

3. Measures of Association Coefficient of Variation

Confidence Interval

Page 20: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 20Georgia State University - Confidential

• Measure of Relative Variation

• Always a %

• Shows Variation Relative to Mean

• Used to Compare 2 or More Groups

• Formula (for Sample):

100%

X

StDevCV

Coefficient of Variation

Page 21: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 21Georgia State University - Confidential

• Stock A: Average Price last year = $50

Standard Deviation = $5

• Stock B: Average Price last year = $100

Standard Deviation = $5

100%

X

StDevCV

Coefficient of Variation:

Stock A: CV = 10%

Stock B: CV = 5%

Computing Coefficient of Variation

Page 22: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 22Georgia State University - Confidential

Agenda

Central Limit Theorem

Descriptive Summary Measures

Confidence Interval

Page 23: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 23Georgia State University - Confidential

Central Limit Theorem

• Regardless of the population distribution, the distribution of the sample means is approximately normal for sufficiently large sample sizes (n>=30), with

• For a Sample Sizes of 30 or More, Distribution of the Sample Mean Will Be Normal, with

– mean of sample means = population mean, and

– standard error = [population deviation] / [sqrt(n)]

and

x

nx

Page 24: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 24Georgia State University - Confidential

Level of Significance & Level of Confidence

• Level of Significance – α (alpha), equals the maximum allowed percent of error. If the maximum allowed error is 5%, then α = 0.05.

• Level of Confidence is the desired degree of certainty. A 95% Confidence Level is the most common. A 95% Confidence Level would correspond to a 95% Confidence Interval of the Mean. This would state that the actual population mean has a 95% probability of lying within the calculated interval. A 95% Confidence Level corresponds to a 5% level of significance, or α = 0.05. The Confidence Level therefore equals 1- α.

Page 25: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 25Georgia State University - Confidential

Why Does Central Limit Theorem Work?

As Sample Size Increases:

1. Most Sample Means will be Close to Population Mean,

2. Some Sample Means will be Either Relatively Far Above or Below Population Mean.

3. A Few Sample Means will be Either Very Far Above or Below Population Mean.

Page 26: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 26Georgia State University - Confidential

Agenda

Confidence Interval

Descriptive Summary Measures

Central Limit Theorem

Page 27: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 27Georgia State University - Confidential

Confidence Intervals

• The population mean is within 2 Standard Errors (SE) of the sample mean, 95% of the time.

• Thus , is in the range defined by:

2*SE, about 95% of the time.

• (2 *SE) is also called the Margin of Error (MOE).

95% is called the confidence level.

• Sample Mean + Margin of Error (MOE)

• Called a Confidence Interval

Page 28: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 28Georgia State University - Confidential

The Standard Normal Distribution

X Bar - Number of SEs from the Mean

Frequency

3210-1-2-3-4

500

400

300

200

100

0

Standardized Histogram of X BarNormal Distribution with Mean 0 and Standard Error of 1

68%

95%

99.7%

Page 29: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 29Georgia State University - Confidential

Confidence Interval for Mean

• In general, the confidence interval for is given by

z.

• is the sample mean• z is the confidence factor. It is the number of standard errors one

has to go from the mean in order to include a certain percent of observations. For 95% confidence the value is 1.96 (approximately 2.00).

• is the standard error of the sample means.

In Excel, compute z with 95% confidence level (i.e. level of significance = 0.05)z score = normsinv(1-0.05/2) = 1.96

xn

x

n

Page 30: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 30Georgia State University - Confidential

Confidence Interval for Mean

• Since is generally not known we substitute the sample standard deviation, ‘s’. This changes the distribution of the sample means from z (standard normal) to a t-distribution, a close relative.

t.

• The t value is slightly larger than the z for a given confidence level, thereby increasing the margin of error. That is the price of using s in place of

x ns

Page 31: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 31Georgia State University - Confidential

Confidence Interval for Mean (Example 1)Gas Price

• A sample of 49 gas stations nationwide shows average price of unleaded is $ 3.87 and a standard deviation of $ 0.15 . Estimate the mean price of gas nationwide with 95% confidence.

In Excel, compute t with 5% error and (n-1), or 48 degrees of freedom=tinv(0.05,48) = 2.010635, rounded to 2.01.

95% CI for the Mean is: t

=3.87 ± [2.01 * (0.15/√49)] = $ 3.87 ± 0.043

Thus, $3.827 < < $3.913

Interpret the result!

x ns

Page 32: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 32Georgia State University - Confidential

Confidence Interval for Mean (Example 2)Federal Aid Problem

• Suppose a census tract with 5000 families is eligible for aid under program HR-247 if average income of families of 4 is between $7500 and $8500 (those lower than 7500 are eligible in a different program). A random sample of 12 families yields data below.

7,300 7,700 8,100 8,4007,800 8,300 8,500 7,600 7,400 7,800 8,300 8,600

Representative Sample

Page 33: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 33Georgia State University - Confidential

Confidence Interval for Mean (Example 2)Federal Aid Problem

7,300 7,700 8,100 8,4007,800 8,300 8,500 7,600 7,400 7,800 8,300 8,600

112

)79838600(..)79837300(441$

983,7$

22

s

x

Representative Sample

x MOE 7 983, MOE

In Excel, compute t with 5% error and (n-1), or 11 degrees of freedom=tinv(0.05,11) = 2.201.

Page 34: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 34Georgia State University - Confidential

Confidence Interval for Mean (Example 2)Federal Aid Problem

In Excel, compute t with 5% error and (n-1), or 11 degrees of freedom=tinv(0.05,11) = 2.201.

95% CI for the Mean is: t

=7,983 ± MOE

=7,983 ± [2.201 * (441/√12)] = 7,983 ± 280

Thus, $7,703 < < $8,263

x ns

Interpretation of Confidence Interval

• 95% Confident that Interval $7,983 + $280 Contains Unknown PopulationPopulation (Not SampleNot Sample) ) Mean Income.

• If We Selected 1,000 Samples of Size 12 and Constructed 1,000 Confidence Intervals, about 950 Would Contain Unknown Population Mean and 50 Would Not.

Page 35: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 35Georgia State University - Confidential

Confidence Interval for Proportions

• For proportions, • p = population proportion• = sample proportion

• Confidence Interval for p is given by

± z .

p̂n

pp )ˆ1(ˆ

Page 36: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 36Georgia State University - Confidential

Confidence Interval for Proportions (Example 1)Presidential Election

• The Wall Street Journal for Sept 10, 2008 reports that a poll of 860 people shows a 46% support for Sen. Obama as President.

Find the 95% CI for the proportion of the population that supports him.

In Excel, compute z with 95% confidence level (i.e. level of significance = 0.05)z score = normsinv(1-0.05/2) = 1.960

95% CI for the Proportions is:

= 0.46 ± 0.033

Thus, .427 < p < .493

860

)46.01(46.096.146.0

Page 37: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 37Georgia State University - Confidential

Confidence Interval for Proportions (Example 2)Japan Business Survey

• N =200 Californians

• Yes = 116

• No = 84

Is Japan the ForemostEconomic Power Today?

.p 116

2000 58

Page 38: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 38Georgia State University - Confidential

Confidence Interval for Proportions (Example 2)Japan Business Survey

In Excel, compute z with 95% confidence level (i.e. level of significance = 0.05)z score = normsinv(1-0.05/2) = 1.960

95% CI for the Proportions is: = 0.58 ± MOE= 0.58 ± 0.068

Thus, .512 < p < .648

In Excel, compute z with 90% confidence level (i.e. level of significance = 0.10)z score = normsinv(1-0.10/2) = 1.645

90% CI for the Proportions is: = 0.58 ± MOE= 0.58 ± 0.057Thus, .523 < p < .637

200

)58.01(58.096.158.0

200

)58.01(58.0645.158.0

Page 39: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 39Georgia State University - Confidential

Sample Means versus Sample Proportion

• Income/Loss

• Time to Complete Loan Papers

• Number of Fat Calories in Burger

• Breaking Strength of Cellular Phone Housing

• Americans Who Believe that Japan is #1 Economic Power

• Circuit Boards with One or More Failed Solder Connections

• African-Americans Who Pass CPA

Mean Proportion of

Means and Proportions Not the Same!!!!

Page 40: MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Descriptive Statistics Jan 27, 2015

MBA7025_04.ppt/Jan 27, 2015/Page 40Georgia State University - Confidential

Similarities and Differences Between Sample Means and Proportions

• Sample Means Computed from Data that Are MeasuredMeasured. Estimate Population Means.

• Sample Proportions Computed from Data that Are CountedCounted. Estimate Population Proportions.