46
Dr. C. Ertuna 1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Embed Size (px)

Citation preview

Page 1: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 1

Statistical Sampling & Analysis of Sample Data

(Lesson - 04/A)

Understanding the Whole from Pieces

Page 2: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 2

Sampling

Sampling is :

• Collecting sample data from a population and

• Estimating population parameters

Sampling is an important tool in business decisions since it is an effective and efficient way obtaining information about the population.

Page 3: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 3

Sampling (Cont.)

How good is the estimate obtained from the sample?

• The means of multiple samples of a fixed size (n) from some population will form a distribution called the sampling distribution of the mean

• The standard deviation of the sampling distribution of the mean is called the standard error of the mean

Page 4: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 4

Sampling (Cont.)

• Estimates from larger sample sizes provide more accurate results

• If the sample size is large enough the sampling distribution of the mean is approximately normal, regardless of the shape of the population distribution - Central Limit Theorem

n/x • Standard Error of the mean =

Page 5: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 5

Sampling Distribution of the Mean

THE CENTRAL LIMIT THEREOMFor samples of n observations taken from a population with mean and standard deviation , regardless of the population’s distribution, provided the sample size is sufficiently large, the distribution of the sample mean , will be normal with a mean equal to the population mean

. Further, the standard deviation will equal the population standard deviation divided by the square-root of the sample size .

The larger the sample size, the better the approximation to the normal distribution.

( ) x

x

x n

Page 6: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 6

Sampling Statistics

Sampling statistics are statistics that are based on values that are created by repeated sampling from a population,

such as:

•Mean of the sampling means

•Standard Error of the sampling mean

•Sampling distribution of the means

Page 7: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 7

Sampling: Key Issues

Key Sampling issues are:

• Sample Design (Planning)

• Sampling Methods (Schemes)

• Sampling Error

• Sample Size Determination.

Page 8: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 8

Sampling: Design

Sample Design (Sample Planning) describes:

• Objective of Sampling

• Target Population

• Population Frame

• Method of Sampling

• Statistical tools for Data Analysis

Page 9: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 9

Sampling: Methods

Subjective Methods– Judgment Sampling– Convenience

Sampling

Probabilistic Methods• Simple Random Sampling• Systematic Sampling• Stratified Sampling• Cluster Sampling

Sampling Methods (Sampling Schemes)

Page 10: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 10

Sampling: Methods (Cont.)

Simple Random Sampling Method

• refers to a method of selecting items from a population such that every possible sample of a specified size has an equal chance of being selected

• with or without replacement

Page 11: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 11

Sampling: Methods (Cont.)

Stratified Sampling Method:

• Population is divided into natural subsets (Strata)

• Items are randomly selected from stratum

• Proportional to the size of stratum.

Page 12: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna

PopulationPopulation

Cash holdings of All Financial Institutions in the Country

Large Institutions

Medium Size Institutions

Small Institutions

Stratified PopulationStratified Population

Stratum 1

Stratum 2

Stratum 3

Select n1

Select n2

Select n3

Stratified Sample of

Cash Holdings of Financial

Institutions

Stratified Sampling Example

Page 13: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna

Cluster sampling refers to a method by which the population is divided into groups, or clusters, that are each intended to be mini-populations. A random sample of m clusters is selected.

Cluster Sampling

Page 14: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna

Cluster Sampling Example

42 22 105 20 36 52 76

Algeria Scotland California Alaska New York Florida Mexico

Mid-Level Managers by Location for a Company

Page 15: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 15

Sampling Error

SAMPLING ERROR-SINGLE MEANThe difference between a value (a statistic) computed from a sample and the corresponding value (a parameter) computed

from a population.

Where: -x Error Sampling

mean Population

mean Samplex

Page 16: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 16

Sampling: Error (Cont.)

Sampling Error is inherent in any sampling process due to the fact that samples are only a subset of the total population.

• Sampling Errors depends on the relative size of sample

• Sampling Errors can be minimized but not eliminated.

Page 17: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 17

Sampling: Error (Cont.)

If Sampling size is more than 5% of the population

• “With Replacement” assumption of Central Limit Theorem and hence, Standard Error calculations are violated

• Correction by the following factor is needed.

1N

nN

Page 18: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

18Dr. C. Ertuna

Sampling: Size

Sample Size Determination.

2222/ E/szn

where,n = sample sizez = z-score = a factor representing probability

in terms of standard deviationα = 100% - confidence levelE = interval on either side of the mean

Page 19: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 19

Estimation

Estimation (Inference) is assessing the the value of a population parameter using sample data

Two types of estimation:

•Point Estimates

•Interval Estimates

Page 20: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 20

Estimation

FOR ESTIMATION USE ALLWAYS STANDARD NORMAL

DISTRIBUTION

Page 21: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 21

Estimation (Cont.)

• Most common point estimates are the descriptive statistical measures.

• If the expected value of an estimator equals to the population parameter then it is called unbiased.

Page 22: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 22

Estimation (Cont.)

Unbiased EstimatorsPoint Estimate Population Parameter (sample mean) (population mean) s (sample Std. Dev.) (population Std. Dev) p (sample proportion) (population proportion)

x

That means that we can use sample estimates as if they were population parameters without committing an error.

Page 23: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 23

Estimation (Cont.)

Interval Estimate provides a range within which population parameter falls with certain likelihood.

Confidence Level is the probability (likelihood) that the interval contains the population parameter. Most commonly used confidence levels are 90%, 95%, and 99%.

Page 24: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 24

Confidence Interval

Confidence Interval (CI) is an interval estimate specified from the perspective of the point estimate.

In other words CI is

• an interval on either side (+/-) of the point estimate

• based on a fraction (t or z-score) of the Std. Dev. of the point estimate

Page 25: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 25

Confidence Intervals

Point EstimateLower Confidence

LimitUpper Confidence

Limit

Page 26: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 26

95% Confidence Intervals

0.95

z.025= -1.96 z.025= 1.96

Page 27: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 27

CI for Proportions

For categorical variables having only two possible outcomes proportions are important.

An unbiased estimation of population proportion (π) is the sample statistics

p = x/nwhere,x = number of observations in the sample with

desired characteristics

Page 28: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 28

Confidence Interval- From General to Specific Format -

Point Estimate (Critical Value)(Standard Error)

(Based on CL)

CI unite value = n/szx 2/

CI proportion = n/p1(pzp 2/

Page 29: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 29

Confidence Interval- From Statistical Expression to Excel Formula -

Where

z α/2 = Normsinv(1 – α/tails)

and when n < 30 z t , then

t α/2 n-1 = Tinv(2α/tails, n-1)

Page 30: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 31

CI of the Mean (Cont.)

where,z = z-score = a critical factor representing

probability in terms of Standard Deviation (for sampling Standard Error) (valid for normal distribution) (critical value)

t = t-score = a factor representing probability in terms of standard deviation (or Std. Error) (valid for t distribution) (critical value)

α = 100% - confidence level

Page 31: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 32

CI of the Mean (Cont.)

where,

E = Margin of Error

E unite value = n/sz 2/

E proportion = n/)p1(pz 2/

Page 32: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 33

Z-score

A z-score is a critical factor, indicating how many standard deviation (standard error for sampling) away from the mean a value should be to observe a particular (cumulative) probability.

There is a relationship between z-score and probability over p(x) = (1-Normsdist(z))*tails and

There is a relationship between z-score and the value of the random variable over

X

Z

Page 33: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 34

Z-score (Cont.)

Since the z-score is a measure of distance from the mean in terms of Standard Deviation (Standard Error for sampling), it provides us with information that a cumulative probability could not. For example, the larger z-score the unusual is the observation.

Page 34: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 35

Student’s t-Distribution

The t-distribution is a family of distributions that is bell-shaped and symmetric like the Standard Normal Distribution but with greater area in the tails. Each distribution in the t-family is defined by its degrees of freedom. As the degrees of freedom increase, the t-distribution approaches the normal distribution.

Page 35: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 36

Degrees of freedom

Degrees of freedom (df) refers to the number of independent data values available to estimate the population’s standard deviation. If k parameters must be estimated before the population’s standard deviation can be calculated from a sample of size n, the degrees of freedom are equal to n - kn - k.

Page 36: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 37

Example of a CI Interval Estimate for

A sample of 100 cans, from a population with = 0.20, produced a sample mean equal to 12.09. A 95% confidence interval would be:

039.009.12100

20.096.109.12

n

zx

12.051 ounces

12.129 ounces

Page 37: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 38

Example of Impact of Sample Size on Confidence Intervals

If instead of sample of 100 cans, suppose a sample of 400 cans, from a population with = 0.20, produced a sample mean equal to 12.09. A 95% confidence interval would be:

0196.009.12400

20.096.109.12

n

zx

12.051 ounces

12.129 ounces

12.0704 ounces

12.1096 ouncesn=400

n=100

Page 38: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 39

Example of CI for Proportion

62 out of a sample of 100 individuals who were surveyed by Quick-Lube returned within one month to have their oil changed. To find a 90% confidence interval for the true proportion of customers who actually returned:

62.0100

62

n

xp

100

)62.01)(62.0(645.162.0

0.50.544

0.70.700

Page 39: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 40

From Margin of Error to Sampling Size

E unite value = n/sz 2/

E proportion = n/)p1(pz 2/

2222/ E/szn

222/ E/p1pzn

Page 40: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

41Dr. C. Ertuna

Sampling: Size

Sample Size Determination.

2222/ E/szn

where,n = sample sizez = z-score = a factor representing probability in terms of

standard deviationα = 100% - confidence levelE = interval on either side of the mean

222/ E/p1pzn

Page 41: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 42

Pilot Samples

A pilot sample is a sample taken from the population of interest of a size smaller than the anticipated sample size that is used to provide and estimate for the population standard deviation.

Page 42: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 43

Example of Determining Required Sample Size

The manager of the Georgia Timber Mill wishes to construct a 90% confidence interval with a margin of error of 0.50 inches in estimating the mean diameter of logs. A pilot sample of 100 logs yield a sample standard deviation of 4.8 inches.

25038.24950.0

)8.4(645.12

22

n

Page 43: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 44

RANGE versus CI

Example: 

The customer’s demand is normally distributed with a mean of 750 units/month and a standard deviation of 100 units/month. What is the probability that the demand will be within 700 units/month and 800 units/month?

Page 44: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 45

RANGE versus CI (Cont.)

1) A RANGE is GIVEN, probability asked (population and given)

• The customer’s demand is normally distributed with a mean of 750 units/month and a standard deviation of 100 units/month. What is the probability that the demand will be within 700 units/month and 800 units/month?

Answer: p(x≤800) - p(x≤700) ; p(700≤x≤800) = NORMDIST(800,750,100,true) -

NORMDIST(700,750,100,true)

Page 45: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 46

NORMDIST versus CI (Cont.)

2) PROBABILITY IS GIVEN, Upper and Lower limits are asked (sample mean, s, n)

• What would be the Confidence Interval for an expected sales level of 750 units/month if you whish to have a 90% confidence level based on 30 observations?

U/LL(x) = x NORMSINV(1-(/tails))*(s/SQRT(n))U/LL(x) = 750 NORMSINV(0.95)*100/SQRT(30)

Page 46: Dr. C. Ertuna1 Statistical Sampling & Analysis of Sample Data (Lesson - 04/A) Understanding the Whole from Pieces

Dr. C. Ertuna 47

Next Lesson

(Lesson - 04/B) Hypothesis Testing