Upload
hilary-bishop
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Business Statistics
Outline
Dealing with decision problem when the face of uncertainty are important.
Descriptive Statistics
Sampling and Sampling Distributions
Point and Interval Estimation
Hypothesis Testing
Non-parametric Test - Chi-square Test
Analysis of Variance
Outline (cont.)
Time Series and Forecasting
Survey and sampling methods
Multivariate Analysis
Bayesian Statistics and Decision Analysis
Descriptive Statistics
Session 1
Population and sampleMeasures of Central Tendency
Mean, Median, Mode
Measures of Dispersion Variance, Standard deviation
Percentile, Inter-quartile range
Grouped data and histogramOther data representations
Descriptive Statistics
Population and Sample
• Population The population consists of the set of all measurements in which the investigator is interested. The population is also called the universe.
• Sample A sample is a subset of measurements selected from the population. Sampling from the population is often done randomly i.e. such that every possible sample of n elements will have equal chance of being selected. A sample created in this way is called simple random sample or random sample.
A medical manufacturer interested in marketing a new drug may be required the
Food and Drug Administration (FDA) to prove that the drug does not cause any
serious side effect.
The sampling was made by selecting a sample of people randomly, the result of
tests of drug using on this sample may then be used in a statistical inference about the entire population of people who may use
the drug if it will be introduced.
Example 1.1.
Population
Sample
Simple Random sampling
Population
Sample
Biased Sampling
Illustration for simple random sampling
Measures of Central Tendency
Mean Arithmetic Mean - AMGiven a set of data , the arithmetic
mean is defined as follows:
Mode The mode of a data set is the value that occurs most frequently
n/xAMi
i
This kind of mean is the most frequently used.
Measures of Central Tendency
Harmonic Mean - HM
)x/1(
nHM
ii
This kind of mean is used when dealing with velocity.
• Population Mean
• Sample Mean
N
xi
i
n
xx i
i
MedianThe median of a set of observations is a special point, it lies in position that half of the data lie below it and half above it.
Measures of Central Tendency
Set 1: Ordering 7, 9, 15, 18, 20; median is 15Set 2: Ordering 15.8 20.7 21.1 22.5 33.4 40.3
Median = (21.1 + 22.5)/2 = 21.8
Example 1.2.
Find median of the following two sets of data.Set 1: 15 20 7 9 18 (n=5)Set 2: 20.7 22.5 15.8 40.3 33.4 21.1 (n=6)
Measurements of Dispersion
The variance of a set of observations is the average squared deviation of the data points from their mean.
Variance and Standard Deviation
Sample Variance1n
)xx(s
2n
1ii
2
Note The denominator is of (n-1)
Population Variance
N
)x(2N
1ii
2
The standard deviation of a set of observations is the square root of the variance of the set
Measurements of Dispersion
Variance and Standard Deviation
Percentiles
The Pth percentile of a group of numbers is that value below which lie P% (P percent) of the numbers in the group. The position is given by (n+1)* P /100 where n is the number of data points. (GRE , GMAT Test)
Measurements of Dispersion
QuartilesThe percentage points that break the data set into 4 groups by the quarters-1st quarter, 2nd quarter and 3rd quarter
• 1st quartile Q1 is the 25th percentile.• 2nd quartile Q2 is the 50th percentile.
• 3rd quartile Q3 is the 75th percentile.
Inter-Quartile Range IQR = Q3 - Q1
Measurements of Dispersion
Example 1.3.Given a data set including 22 points:88, 56, 64, 45, 52, 76, 54, 79, 38, 98, 69, 77, 71, 45, 60, 78, 90, 81, 87, 44, 80, 41. Find the 20th, 30th and 90th percentiles. Also find the IQR. What are mean, mode and median? What is the variance of the set ?
SPSS
Measurements of Dispersion
Grouped Data and Histogram
• Classes We divide the data values into classes which have the same length and cover all data points. Each class represents for a mi observation value.
• Frequencies fi The number of observations in each class. Total frequencies is number of observations N. The relative frequency of each class is the ratio of individual frequency and N.• Histogram
• Mean and Variance of grouped data
Population N/)mf(K
1iii
N/))m(f(K
1i
2ii
2
Variance
MeanSample
Mean
Variancen/)mf(xK
iii
1
1n/))xm(f(sK
1i
2ii
2
Where K is number of classes, n is number observations of sample.
Grouped Data and Histogram
The number of errors in a text books was found. Number of errors per page is placed in column (mi) while column (fi) shows the number of pages contains errors. The following table and charts show histogram of errors distribution:
Example1.4
Grouped Data and Histogram
mi mi.mi fi Relative fi fi.mi fi.mi.mi0 0 102 0.204 0 01 1 138 0.276 138 1382 4 140 0.28 280 5603 9 79 0.158 237 7114 16 33 0.066 132 5285 25 8 0.016 40 200
500 1 827 2137
0.204
0.276 0.28
0.158
0.066
0.016
0
0.05
0.1
0.15
0.2
0.25
0.3
1
0
1
2
3
4
5
Example1.4
Other Descriptive Statistics
Index numbers
Simple index numbers
A index number is a number that measures the relative change in a set of measurements over time.
Index number for period i = 100 (value in period i / value in base period)
Year Price Index New Index73 121 100.000 84.61574 121 100.000 84.61575 122 100.826 85.31576 133 109.917 93.00777 136 112.397 95.10578 138 114.050 96.50379 143 118.182 100.00080 144 119.008 100.69981 144 119.008 100.69982 156 128.926 109.09183 162 133.884 113.28784 167 138.017 116.78385 230 190.083 160.83986 250 206.612 174.825
Price and Index
0
50
100
150
200
250
300
70 75 80 85 90
Price
Index
Other Descriptive Statistics
Consumer Price Index - Laspeyres Index
Laspeyres Index gives us a measurement for a change of quantity and price of items.
Other Descriptive Statistics
Items 1993 1994 1995Price Quantity Price Quantity
Price Quantity
Beef 238 50 240 52 233 54Pork 140 26 162 24 162 20Eggs 85 15 102 12 80 10Milk 105 85 112 91 113 92Bread 51 30 54 28 55 28Potatoes180 10 191 12 160 11Tomatoes 46 5 50 6 53 4Oranges 42 7 53 7 52 8
100*q.p
q.p)i(IndexLaspeyres
00
i0i
Other Descriptive Statistics
• Compute the Laspeyres Index:– Select year 1993 as a base year
• For 1993: Sum of quantity x price = 29594• For 1994: Sum of quantity x price = 31413• For 1995: Sum of quantity x price = 30546
– Laspeyres Index:• For 1993: 100• For 1994: 106.15• For 1993: 103.22
Other Descriptive Statistics
Stem-and-Leaf Displays
A way for re-arranging data to allow the data “speak for themselves”.
Given the data set: 11, 12, 12, 13, 14, 15, 15, 16, 20, 21, 21, 21, 21, 22, 25, 25, 26, 27, 28, 29, 29, 31, 32, 34, 35, 36, 38, 41, 42, 45, 47, 50, 52, 55, 60, 62
Example
Other Descriptive Statistics
The Stem-and-leaf display
1 122345562 01111255678993 1245684 12575 0256 02
Other Descriptive Statistics
Q 1 Q 3
IQR
Median
x x
Inner fenceQ 1 - 1.5 (IQR)
Outer fenceQ 1 - 3( IQR)
Inner fenceQ 3 + 1.5 (IQR)
Outer fenceQ 3 + 3 (IQR)
Smallest observation Largest observation
* *
Suspected outlierOutlier
Box-Whiskers plot
Other Descriptive Statistics
Examples for Box-Whiskers plot
Right skewed
Left skewed
Symmetric
Small variance
Suspectedoutlier
Outlier
Inner fence Outer fence
Box-Whisker plot (or Box plot) are useful for the following purposes.
•To identify the spread of data set.•To identify the location of data set based on median. •To identify possible skewness of the distribution.•To identify suspected outlier and outlier.•To quickly compare data sets.
Look at example in SPSS