Business Statistics Outline Dealing with decision problem when the face of uncertainty are...

Preview:

Citation preview

Business Statistics

Outline

Dealing with decision problem when the face of uncertainty are important.

Descriptive Statistics

Sampling and Sampling Distributions

Point and Interval Estimation

Hypothesis Testing

Non-parametric Test - Chi-square Test

Analysis of Variance

Outline (cont.)

Time Series and Forecasting

Survey and sampling methods

Multivariate Analysis

Bayesian Statistics and Decision Analysis

Descriptive Statistics

Session 1

Population and sampleMeasures of Central Tendency

Mean, Median, Mode

Measures of Dispersion Variance, Standard deviation

Percentile, Inter-quartile range

Grouped data and histogramOther data representations

Descriptive Statistics

Population and Sample

• Population The population consists of the set of all measurements in which the investigator is interested. The population is also called the universe.

• Sample A sample is a subset of measurements selected from the population. Sampling from the population is often done randomly i.e. such that every possible sample of n elements will have equal chance of being selected. A sample created in this way is called simple random sample or random sample.

A medical manufacturer interested in marketing a new drug may be required the

Food and Drug Administration (FDA) to prove that the drug does not cause any

serious side effect.

The sampling was made by selecting a sample of people randomly, the result of

tests of drug using on this sample may then be used in a statistical inference about the entire population of people who may use

the drug if it will be introduced.

Example 1.1.

Population

Sample

Simple Random sampling

Population

Sample

Biased Sampling

Illustration for simple random sampling

Measures of Central Tendency

Mean Arithmetic Mean - AMGiven a set of data , the arithmetic

mean is defined as follows:

Mode The mode of a data set is the value that occurs most frequently

n/xAMi

i

This kind of mean is the most frequently used.

Measures of Central Tendency

Harmonic Mean - HM

)x/1(

nHM

ii

This kind of mean is used when dealing with velocity.

• Population Mean

• Sample Mean

N

xi

i

n

xx i

i

MedianThe median of a set of observations is a special point, it lies in position that half of the data lie below it and half above it.

Measures of Central Tendency

Set 1: Ordering 7, 9, 15, 18, 20; median is 15Set 2: Ordering 15.8 20.7 21.1 22.5 33.4 40.3

Median = (21.1 + 22.5)/2 = 21.8

Example 1.2.

Find median of the following two sets of data.Set 1: 15 20 7 9 18 (n=5)Set 2: 20.7 22.5 15.8 40.3 33.4 21.1 (n=6)

Measurements of Dispersion

The variance of a set of observations is the average squared deviation of the data points from their mean.

Variance and Standard Deviation

Sample Variance1n

)xx(s

2n

1ii

2

Note The denominator is of (n-1)

Population Variance

N

)x(2N

1ii

2

The standard deviation of a set of observations is the square root of the variance of the set

Measurements of Dispersion

Variance and Standard Deviation

Percentiles

The Pth percentile of a group of numbers is that value below which lie P% (P percent) of the numbers in the group. The position is given by (n+1)* P /100 where n is the number of data points. (GRE , GMAT Test)

Measurements of Dispersion

QuartilesThe percentage points that break the data set into 4 groups by the quarters-1st quarter, 2nd quarter and 3rd quarter

• 1st quartile Q1 is the 25th percentile.• 2nd quartile Q2 is the 50th percentile.

• 3rd quartile Q3 is the 75th percentile.

Inter-Quartile Range IQR = Q3 - Q1

Measurements of Dispersion

Example 1.3.Given a data set including 22 points:88, 56, 64, 45, 52, 76, 54, 79, 38, 98, 69, 77, 71, 45, 60, 78, 90, 81, 87, 44, 80, 41. Find the 20th, 30th and 90th percentiles. Also find the IQR. What are mean, mode and median? What is the variance of the set ?

SPSS

Measurements of Dispersion

Grouped Data and Histogram

• Classes We divide the data values into classes which have the same length and cover all data points. Each class represents for a mi observation value.

• Frequencies fi The number of observations in each class. Total frequencies is number of observations N. The relative frequency of each class is the ratio of individual frequency and N.• Histogram

• Mean and Variance of grouped data

Population N/)mf(K

1iii

N/))m(f(K

1i

2ii

2

Variance

MeanSample

Mean

Variancen/)mf(xK

iii

1

1n/))xm(f(sK

1i

2ii

2

Where K is number of classes, n is number observations of sample.

Grouped Data and Histogram

The number of errors in a text books was found. Number of errors per page is placed in column (mi) while column (fi) shows the number of pages contains errors. The following table and charts show histogram of errors distribution:

Example1.4

Grouped Data and Histogram

mi mi.mi fi Relative fi fi.mi fi.mi.mi0 0 102 0.204 0 01 1 138 0.276 138 1382 4 140 0.28 280 5603 9 79 0.158 237 7114 16 33 0.066 132 5285 25 8 0.016 40 200

500 1 827 2137

0.204

0.276 0.28

0.158

0.066

0.016

0

0.05

0.1

0.15

0.2

0.25

0.3

1

0

1

2

3

4

5

Example1.4

Other Descriptive Statistics

Index numbers

Simple index numbers

A index number is a number that measures the relative change in a set of measurements over time.

Index number for period i = 100 (value in period i / value in base period)

Year Price Index New Index73 121 100.000 84.61574 121 100.000 84.61575 122 100.826 85.31576 133 109.917 93.00777 136 112.397 95.10578 138 114.050 96.50379 143 118.182 100.00080 144 119.008 100.69981 144 119.008 100.69982 156 128.926 109.09183 162 133.884 113.28784 167 138.017 116.78385 230 190.083 160.83986 250 206.612 174.825

Price and Index

0

50

100

150

200

250

300

70 75 80 85 90

Price

Index

Other Descriptive Statistics

Consumer Price Index - Laspeyres Index

Laspeyres Index gives us a measurement for a change of quantity and price of items.

Other Descriptive Statistics

Items 1993 1994 1995Price Quantity Price Quantity

Price Quantity

Beef 238 50 240 52 233 54Pork 140 26 162 24 162 20Eggs 85 15 102 12 80 10Milk 105 85 112 91 113 92Bread 51 30 54 28 55 28Potatoes180 10 191 12 160 11Tomatoes 46 5 50 6 53 4Oranges 42 7 53 7 52 8

100*q.p

q.p)i(IndexLaspeyres

00

i0i

Other Descriptive Statistics

• Compute the Laspeyres Index:– Select year 1993 as a base year

• For 1993: Sum of quantity x price = 29594• For 1994: Sum of quantity x price = 31413• For 1995: Sum of quantity x price = 30546

– Laspeyres Index:• For 1993: 100• For 1994: 106.15• For 1993: 103.22

Other Descriptive Statistics

Stem-and-Leaf Displays

A way for re-arranging data to allow the data “speak for themselves”.

Given the data set: 11, 12, 12, 13, 14, 15, 15, 16, 20, 21, 21, 21, 21, 22, 25, 25, 26, 27, 28, 29, 29, 31, 32, 34, 35, 36, 38, 41, 42, 45, 47, 50, 52, 55, 60, 62

Example

Other Descriptive Statistics

The Stem-and-leaf display

1 122345562 01111255678993 1245684 12575 0256 02

Other Descriptive Statistics

Q 1 Q 3

IQR

Median

x x

Inner fenceQ 1 - 1.5 (IQR)

Outer fenceQ 1 - 3( IQR)

Inner fenceQ 3 + 1.5 (IQR)

Outer fenceQ 3 + 3 (IQR)

Smallest observation Largest observation

* *

Suspected outlierOutlier

Box-Whiskers plot

Other Descriptive Statistics

Examples for Box-Whiskers plot

Right skewed

Left skewed

Symmetric

Small variance

Suspectedoutlier

Outlier

Inner fence Outer fence

Box-Whisker plot (or Box plot) are useful for the following purposes.

•To identify the spread of data set.•To identify the location of data set based on median. •To identify possible skewness of the distribution.•To identify suspected outlier and outlier.•To quickly compare data sets.

Look at example in SPSS

Recommended