74
Business Statistics for Managerial Decision Making Examining Distributions

Business Statistics for Managerial Decision Making

Embed Size (px)

DESCRIPTION

Business Statistics for Managerial Decision Making. Examining Distributions. Introduction. Descriptive Statistics Methods that organize and summarize data aid in effective presentation and increased understanding. - PowerPoint PPT Presentation

Citation preview

Page 1: Business Statistics for Managerial Decision Making

Business Statistics for Managerial Decision Making

Examining Distributions

Page 2: Business Statistics for Managerial Decision Making

Introduction Descriptive Statistics

Methods that organize and summarize data aid in effective presentation and increased understanding.

Bar charts, tabular displays, various plots of economic data, averages and percentages.

Often the individuals or objects studied by an investigator come from a much larger collection, and the researcher’s interest goes beyond just data summarization.

Page 3: Business Statistics for Managerial Decision Making

Introduction Population

The entire collection of individuals or objects about which information is desired.

Sample A subset of the population selected in some

prescribed manner for study.

Page 4: Business Statistics for Managerial Decision Making

Introduction Inferential Statistics

Involves generalizing from a sample to the population from which it was selected.

This type of generalization involves some risk, since a conclusion about the population will be reached based on the basis of available, but incomplete, information.

An important aspect in the development of inference techniques involves quantifying the associated risks.

Page 5: Business Statistics for Managerial Decision Making

Individuals and variables Individuals

are the objects described by a set of data. They may be people, but they may also be

business firms, common stocks, or other objects.

A Variable is any characteristic of an individual. A variable can take different values for

different individuals.

Page 6: Business Statistics for Managerial Decision Making

Categorical & Quantitative Variables

A Categorical Variable places an individual into one of several groups or categories.

A Quantitative Variable takes numerical values for which arithmetic operations such as adding and averaging make sense.

The distribution of a variable tell us what values it takes and how often it takes these values.

Page 7: Business Statistics for Managerial Decision Making

Example

Page 8: Business Statistics for Managerial Decision Making

Example

Page 9: Business Statistics for Managerial Decision Making

Discrete and Continuous Variable With numerical data (quantitative

variables), it is useful to make a further distinction. Numerical data is discrete if the possible values

are isolated points on the number line. Numerical data is continuous if the set of

possible values form an entire interval on the number line.

Page 10: Business Statistics for Managerial Decision Making

Stem plot To make a stem plot:

1. Separate each observation into a stem consisting of all but the final (rightmost) digit and a leaf, the final digit. Stems may have as many digits as needed, but each leaf contains only a single digit.

2. Write the stems in a vertical column with the smallest at the top, and draw a vertical line at the right of this column.

3. Write each leaf in the row to the right of its stem, in increasing order out from the stem.

Page 11: Business Statistics for Managerial Decision Making

Stem plot

Page 12: Business Statistics for Managerial Decision Making

Frequency Distribution A frequency distribution for categorical data

is a table that displays the categories, frequencies, and relative frequencies.

Example The increasing emphasis on exercise has resulted

in an increase of sport related injuries. A listing of the 82 sample observations would look something like this:

F, Sp, Sp, Co, F, L, F, Ch, De, L, Sp, Di, St, Cn,…

Page 13: Business Statistics for Managerial Decision Making

Frequency Distribution The following coding is used:

Sp = Sprain, St = Strain, Di = dislocation,

Co = Contusion, L = laceration,

Cn = Concussion, F = fracture,

Ch = chronic, De = dental

Page 14: Business Statistics for Managerial Decision Making

Frequency DistributionCategories Frequency Relative FrequencySprain 22 0.268Contusion 18 0.22Fracture 17 0.207Strain 9 0.11Laceration 6 0.073Chronic 4 0.049Dislication 3 0.037Concussion 2 0.024Dental 1 0.012

Total 82 1

Page 15: Business Statistics for Managerial Decision Making

Bar GraphFrequency Distribution for Type of Injury

0

5

10

15

20

25

Sprain Contusion Fracture Strain Laceration Chronic Dislication Concussion Dental

Co

un

t

Page 16: Business Statistics for Managerial Decision Making

Pie ChartFrequency Distribution for type of Injury

27%

22%

21%

11%

7%

5%

4%2% 1%

Sprain

Contusion

Fracture

Strain

Laceration

Chronic

Dislication

Concussion

Dental

Page 17: Business Statistics for Managerial Decision Making

Frequency Distribution for Discrete Numerical Data Discrete numerical data almost always

results from counting. In such cases, each observation is a whole

number. For example, if the possible values are 0, 1,

2, 3, …, then these are listed in column, and a running tally is kept as a single pass is made through the data

Page 18: Business Statistics for Managerial Decision Making

Frequency Distribution for Discrete Numerical Data

Example A sample of 708 bus drivers employed by

public corporations was selected, and the number of traffic accidents in which each was involved during a 4-year period was determined. A listing of the 708 sample observations would look something like this:

3, 0, 6, 0, 0, 2, 1, 4, 1, …

Page 19: Business Statistics for Managerial Decision Making

Frequency Distribution for Discrete Numerical Data

Number of Accidents Frequency Relative Frequency0 117 0.1651 157 0.222 158 0.2233 115 0.1624 78 0.115 44 0.0626 21 0.037 7 0.018 6 0.0089 1 0.00110 3 0.00411 1 0.001

Total 708 0.998

Page 20: Business Statistics for Managerial Decision Making

Bar GraphFrequency Distribution for Number of Accidents by Bus Drivers

0

20

40

60

80

100

120

140

160

180

1 2 3 4 5 6 7 8 9 10 11 12

Number of Accidents

Co

un

t

Page 21: Business Statistics for Managerial Decision Making

Frequency Distributions for Continuous Data

The difficulty with continuous data, such as observations on the unemployment rate by state, is that there is no natural categories.

Therefore we define our own categories. by marking off some intervals on horizontal unemployment rate axis as picture below.

1.00 9.00

Page 22: Business Statistics for Managerial Decision Making

Frequency Distributions for Continuous Data

If the smallest rate were 1.5%, and the largest was 8.9%, we might use the intervals of width 1% with the first one starting at 1 and the last one ending at 9.

Each data value should fall in exactly one of these intervals.

Page 23: Business Statistics for Managerial Decision Making

Frequency Distributions for Continuous Data

Page 24: Business Statistics for Managerial Decision Making

Frequency Distributions for Continuous Data

Unemployment rate Intervals Frequency Relative Frequency[1, 2) 2 0.039[2, 3) 13 0.255[3, 4) 21 0.412[4, 5) 10 0.196[5, 6) 3 0.059[6, 7) 1 0.020[7, 8) 0 0.000[8, 9) 1 0.020

Total 51 1.000

Page 25: Business Statistics for Managerial Decision Making

Histograms Mark the boundaries of the class intervals

on a horizontal axis. Draw a vertical scale marked with either

relative frequencies or frequencies. The rectangle corresponding to a particular

interval is drawn directly above the interval. The height of each rectangle is then the

class frequency or relative frequency.

Page 26: Business Statistics for Managerial Decision Making

Histograms

Page 27: Business Statistics for Managerial Decision Making

Histograms

Page 28: Business Statistics for Managerial Decision Making

Examining a Distribution In any graph of data, look for overall

pattern and for striking deviation from that pattern.

You can describe the overall pattern of a histogram by its shape, center, and spread.

An important kind of deviation is an outlier, an individual value that falls outside the overall pattern.

Page 29: Business Statistics for Managerial Decision Making

Symmetric & Skewed Distribution A distribution is symmetric if the right and left

sides of the histogram are approximately mirror images of each other.

A distribution is skewed to the right if the right side of the histogram ( containing the half of the observations with larger values) extends much farther out than the left side.

It is skewed to the left if the left side of the histogram extends much farther out than the right side.

Page 30: Business Statistics for Managerial Decision Making

Symmetric Distribution

Page 31: Business Statistics for Managerial Decision Making

Skewed to the Right

Page 32: Business Statistics for Managerial Decision Making

Symmetric Distribution

Page 33: Business Statistics for Managerial Decision Making

Numerical Summary Measures Describing the center of a data set.

Mean Median

Describing the variability in a data set. Variance, standard deviation Quartiles

Page 34: Business Statistics for Managerial Decision Making

The Mean To find the mean of a set of observations, add

their values and divide by the number of observations. If the n observations are

, their mean is

In a more compact notation,

X

nxxx ,,, 21

n

xxxX n

21

n

xX i

Page 35: Business Statistics for Managerial Decision Making

The Median The Median M is the midpoint of a distribution,

the number such that half of the observations are smaller and the other half are larger. To find the median of a distribution:

1. Arrange all observations in order of size, from smallest to largest.

2. If the number of observations n is odd, the median M is the center observation in the ordered list.

3. If the number of observations n is even, the median M is the mean of the two center observations in the ordered list.

Page 36: Business Statistics for Managerial Decision Making

The Quartiles Q1 and Q3

To calculate the quartiles:1. Arrange the observations in increasing order and

locate the median M in the ordered list of observations.

2. The first quartile Q1 is the median of the observations whose position in the ordered list is to the left of the location of the overall median.

3. The third quartile Q3 is the median of the observations whose position in the ordered list is to the right of the location of the overall median.

Page 37: Business Statistics for Managerial Decision Making

The Five Number Summary and Box-Plot

The five number summary of a distribution consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest. In symbols, the five number summary is

Minimum Q1 M Q3 Maximum

Page 38: Business Statistics for Managerial Decision Making

The Five Number Summary and Box-Plot

A box-plot is a graph of the five number Summary. A central box spans the quartiles. A line in the box marks the median. Lines extend from the box out to the smallest

and largest observations. Box-plots are most useful for side-by-side

comparison of several distributions.

Page 39: Business Statistics for Managerial Decision Making

Example

Page 40: Business Statistics for Managerial Decision Making

The Standard Deviation s The Variance s2 of a set of observations is the

average of the squares of the deviations of the observations from their mean. In symbols, the variance of n observations is

or, more compactly,

nxxx ,,, 21

1

)()()( 222

212

n

xxxxxxs n

1

)(

1

)(

22

22

n

n

xx

n

xxs

ii

i

Page 41: Business Statistics for Managerial Decision Making

The Standard Deviation s The standard deviation s is the square root

of the variance s2:

1

)(

1

)(

22

2

n

n

xx

n

xxs

ii

i

Page 42: Business Statistics for Managerial Decision Making

Choosing a Summary The five number summary is usually better

than the mean and standard deviation for describing a skewed distribution or a distribution with extreme outliers. Use ,

and s only for reasonably symmetric distributions that are free of outliers.

x

Page 43: Business Statistics for Managerial Decision Making

Strategies for Exploring Data

Plot the data Make a graph, usually a histogram or a stem-plot. Look at the distribution of the variable for:

overall pattern (shape, center, spread). striking deviations such as outliers.

Calculate a numerical summary to briefly describe center and spread.

Describe the overall pattern with a smooth curve.

Page 44: Business Statistics for Managerial Decision Making

Density Curves Sometimes the overall pattern (the

distribution of the variable) of a large number of observations is so regular that we can describe it by a smooth curve, called Density curve.

The curve is a mathematical model for the distribution.

Page 45: Business Statistics for Managerial Decision Making

Density Curve Histogram of the city

gas mileage (miles per gallon) of 856, 2001 model year motor vehicle.

The smooth curve, density curve, shows the overall shape of the distribution.

Page 46: Business Statistics for Managerial Decision Making

Density Curve The proportion of cars

with gas mileage less than 20 from the histogram is

%9.44449.856

384

Page 47: Business Statistics for Managerial Decision Making

Density Curve The proportion of cars

with gas mileage less than 20 from the density curve is .410

The area under the density curve gives a good approximation of areas given by histogram.

Page 48: Business Statistics for Managerial Decision Making

Density Curve A density curve is a curve that

Is always on or above the horizontal axis. Has area exactly 1 underneath it.

A density curve describes the overall pattern of a distribution.

The area under the curve and above any range of values is the proportion of all observations that fall in that range.

Page 49: Business Statistics for Managerial Decision Making

Median and mean of a Density Curve

The median of a density curve is the point that divides the area under the curve in Half.

Page 50: Business Statistics for Managerial Decision Making

Median and Mean of a Density Curve The mean of a density

curve is the balance point, at which the curve would balance if made of solid material.

Page 51: Business Statistics for Managerial Decision Making

Median and Mean of a Density Curve The median and mean

are the same for a symmetric density curve.

They both are at the center of the curve.

Page 52: Business Statistics for Managerial Decision Making

Median and Mean of a Density Curve The mean of a skewed

curve is pulled away from the median in the direction of the long tail.

Page 53: Business Statistics for Managerial Decision Making

Normal Density Curve These density curves,

called normal curves, are Symmetric Single peaked Bell shaped

Normal curves describe normal distributions.

Page 54: Business Statistics for Managerial Decision Making

Normal Density Curve The exact density curve for a particular

normal distribution is described by giving its mean and its standard deviation .

The mean is located at the center of the symmetric curve and it is the same as the median.

The standard deviation controls the spread of a normal curve.

Page 55: Business Statistics for Managerial Decision Making

Normal Density Curve

Page 56: Business Statistics for Managerial Decision Making

The 68-95-99.7 Rule Although there are many normal curve, They all

have common properties. In particular, all Normal distributions obey the following rule.

In a normal distribution with mean and standard deviation : 68% of the observations fall within of the mean . 95% of the observations fall within 2 of . 99.7% of the observations fall within 3 of .

Page 57: Business Statistics for Managerial Decision Making

The 68-95-99.7 Rule

Page 58: Business Statistics for Managerial Decision Making

The 68-95-99.7 Rule

Page 59: Business Statistics for Managerial Decision Making

Standard Normal Distribution The standard Normal

distribution is the Normal distribution N(0, 1) with mean

= 0 and standard deviation =1.

Page 60: Business Statistics for Managerial Decision Making

The standard Normal Table What is the area under

the standard normal curve between z = 0 and z = 2.3?

Compact notation:

P = .9893 - .5 =.4893

)3.20( zp

Page 61: Business Statistics for Managerial Decision Making

Finding the area under a normal curve

1. State the problem in terms of the observed variable x.

2. Standardize x to restate the problem in terms of a standard normal variable z

3. Draw a picture to show the area under the standard Normal curve.

4. Find the required area under the standard Normal curve Using table A and the fact that the total area under the curve is 1.

Page 62: Business Statistics for Managerial Decision Making

ExampleThe annual rate of return on stock indexes (which combine many individual stocks) is approximately Normal. Since 1954, the Standard & Poor’s 500 stock index has had a mean yearly return of about 12%, with standard deviation of 16.5%. Take this Normal distribution to be the distribution of yearly returns over a long period. The market is down for the year if the return on the index is less than zero. In what proportion of years is the market down?

Page 63: Business Statistics for Managerial Decision Making

Example State the problem

Call the annual rate of return for Standard & Poor’s 500-stocks Index x. The variable x has the N(12, 16.5) distribution. We want the proportion of years with X < 0.

Standardize Subtract the mean, then divide by the standard

deviation, to turn x into a standard Normal z:

73.5.16

120

5.16

12

0

z

x

x

Page 64: Business Statistics for Managerial Decision Making

Example Draw a picture to show

the standard normal curve with the area of interest shaded.

Use the table The proportion of

observations less than

- 0.73 is .2327. The market is down on an

annual basis about 23.27% of the time.

Page 65: Business Statistics for Managerial Decision Making

Example What percent of years have annual return

between 12% and 50%? State the problem

Standardize

5012 x

30.205.16

1250

5.16

12

5.16

1212

z

x

Page 66: Business Statistics for Managerial Decision Making

Example Draw a picture. Use table.

The area between 0 and 2.30 is the area below 2.30 minus the area below 0.

0.9893- .50 = .4893

Page 67: Business Statistics for Managerial Decision Making

Finding a Value when Given a Proportion

Sometimes we may want to find the observed value with a given proportion of observations above or below it.

To do this, use table A backward. Find the given proportion in the body of the table, read the corresponding z from the left column and top row, then unstandardize to get the observed value.

Page 68: Business Statistics for Managerial Decision Making

Example Miles per gallon ratings of compact cars

(2001 model year) follow approximately the N(25.7, 5.88) distribution. How many miles per gallon must a vehicle get to place in the top 10% of all 2001 model year compact cars?

Page 69: Business Statistics for Managerial Decision Making

Example We want to find the miles

per gallon rating x with area 0.1 to its right under the Normal Curve with mean 25.7 and standard deviation 5.88. That is the same as finding the miles per gallon rating x with area 0.9 to its left.

Page 70: Business Statistics for Managerial Decision Making

Example Look in the body of

Table A for the entry closest to 0.9. It is 0.8997. This is the entry corresponding to z = 1.28.

Page 71: Business Statistics for Managerial Decision Making

Example Unstandardize to transform the solution

from the z back to the original x scale.

2.33)88.5)(28.1(5.25

28.188.5

7.25

x

x

zx

Page 72: Business Statistics for Managerial Decision Making

Standard Normal Distribution If a variable x has any normal distribution N(, )

with mean and standard deviation , then the standardized variable

has the standard Normal distribution. This standardized value is often called z-score.

x

z

Page 73: Business Statistics for Managerial Decision Making

The standard Normal Table Table A is a table of area

under the standard Normal curve. The table entry for each value z is the area under the curve to the left of z.

Or you can use the applet at the following site.http:/www.stat.sc.edu~west/

applets/normaldemo.html

Page 74: Business Statistics for Managerial Decision Making

The standard Normal Table What is the area under

the standard normal curve to the right of

z = - 2.15? Compact notation:

P = 1 - .0158 =.9842

)15.2( zp