From the population to the sample The sampling distribution FETP India

From the population to the sample

The sampling distribution

FETP India

Competency to be gained from this lecture

Use the properties of the sampling distribution to calculate standard error

to the mean

Key issues

• Population parameters versus sample statistics • Sampling distribution and its properties • Mean and standard error of the sampling

distribution

Things we already know

• Mean Arithmetic sum of data divided by number of

observations • Standard deviation

Index of variability (spread) of data about the mean• Z-score

Distance from mean in standard deviation unitsz = (x-mean)/sd

• Normal curve Bell-shaped curve that relates probability to z-scores

Parameters and statistics

Population parameters

• A population parameter is a numerical descriptive measure of a population

• Examples: Population mean (µ) Standard deviation ()


A statistic

• A statistic is a numerical descriptive measure of a sample

• Examples: Sample mean x Sample standard deviation s


Inference

• The parameter is fixed• The sample statistics varies from

sample to sample• We try to infer what happens in the

population from what we see in the sample


Sample mean: A typical situation

• A sample might be taken • The mean and standard deviation are

computed• From this data, one will want to infer

that the population values are identical or at least similar

• In other words, it is hoped that the sample data reflects the population data

Sampling distribution

Sample mean: Another approach

• Change your thinking from a single sample

• Consider the situation where you: Take many samples Calculate a mean and standard deviation for

each sample


Taking many samples from a population

• Consider a population of 1,000 individuals with various heights

• If we take 10 samples of 100 persons from the population, each of the 10 samples will have a specific frequency distribution with: A specific mean A specific standard deviation

• In each sample, each data point is a height


Looking at the means of the samples

• We can look at the frequency distribution of the means of each of the 10 samples

• In this case: The data points are no longer the heights The data points are the means


Intuitive observation

• If we take iterative samples from a population, we are unlikely to sample extreme values every time: Values close to the mean are common Extreme values are less common

• Thus, when we compare the distribution of the heights and the distribution of the means, we observe: More variation in the distribution of individual heights Less variation in the distribution of the means


Taking many samples from the population

• If we take many samples, we can plot a complete frequency distribution of the means of the samples

• Each sample produces a statistic (mean)

• The distribution of statistics (means) is called a sampling distribution



Multiple sample means

Important properties of the sampling distribution

1. The sampling distribution is normally distributed

2. The mean of the sampling distribution is equal to the mean of the population


Standard deviation of the sampling distribution

• If the standard deviation of the population is

• The standard deviation of the sampling distribution will be / (√ n)

• n is the sample size


Terminology

• The mean of the sampling distribution continues to be called the mean

• The standard deviation of the sampling distribution is the standard error

Standard error

Distribution of sample means • One could obtain a standard deviation of

sample means which would describe the variability and the spread of sample means about the true population mean

• In a practical situation: There is only one sample mean One hopes this sample mean is near the real

population mean• Wouldn't it be nice to have an estimate of

the standard deviation of sample means which describe the spread of sample means?

Standard error

Standard error of the mean

• Divide the standard deviation by the square root of the number of observations

• The resulting estimate of the standard deviation of sample means is called the standard error of means

• It can be interpreted in a manner similar to the standard deviation of raw scores For example, the probability of obtaining a

sample mean which is outside the -1.96 to +1.96 range is 5 out of 100

Standard error

Central limit theorem

• If x possesses any distribution with mean µ and standard deviation SD

• Then the sample mean x based on a random sample of size n will have a distribution that approaches the distribution of a normal random variable Mean µ Standard deviation SD/square root of n as n

increases without limit. • Special case:

If x is normally distributed, the result is true for any sample size

Standard error

Simple example

• Let the population be 1,2,3,4,5 Mean = 15/5 = 3 = µ

• Let’s take a sample of two elements• The 25 possible samples are:

1,1 1,2 1,3 1,4 1,52,1 2,2 2,3 2,4 2,53,1 3,2 3,3 3,4 3,54,1 4,2 4,3 4,4 4,55,1 5,2 5,3 5,4 5,5 Standard error

The frequency distribution of the population is not normal

0

1

2

1 2 3 4 5Values

Freq

uenc

y

Standard error

Standard deviation of the population

Values MeanDeviation to the mean

Square deviation to the mean Variance

Standard deviation

1 3 -2 42 3 -1 13 3 0 04 3 1 15 3 2 4

Total 0 10 2 1.4

Standard error

Looking at the mean of the samples

• The 25 means of the 25 samples are:1 1.5 2 2.5 31.5 2 2.5 3 3.52 2.5 3 3.5 42.5 3 3.5 4 4.53 3.5 4 4.5 5

Mean of sample means = 75/25 = 3Same as population mean

Standard error

The sampling distribution tends to be normal

0

1

23

4

5

6

1 1.5 2 2.5 3 3.5 4 4.5 5Values

Freq

uenc

y

Even if the population is not normally distributed, the sampling distribution will tend to be normal

Standard error

Standard deviation of the sampleValues Mean

Deviation to the mean

Square deviation to the mean Variance

Standard error

1 3 -2 41.5 3 -1.5 2.251.5 3 -1.5 2.25

2 3 -1 12 3 -1 12 3 -1 1

2.5 3 -0.5 0.252.5 3 -0.5 0.252.5 3 -0.5 0.252.5 3 -0.5 0.25

3 3 0 03 3 0 03 3 0 03 3 0 03 3 0 03 3 0 0

3.5 3 0.5 0.253.5 3 0.5 0.253.5 3 0.5 0.253.5 3 0.5 0.25

4 3 1 14 3 1 14 3 1 1

4.5 3 1.5 2.254.5 3 1.5 2.25

5 3 2 4Total 0 25 1.00 1.00 Standard error

Standard deviation in the population

and standard error• Standard deviation in the population:

1.4• Sample size:

2• Square root of the sample size:

1.4• Standard deviation / square root of the sample

size: 1.4 / 1.4 = 1 = Standard error

Standard error

Applying the standard error: Male's serum uric acid levels (1/2)

• Population mean : 5.4 mg per 100 ml

• Standard deviation is: 1

• Take 100 samples of 25 men in each sample • Compute 100 sample means• How many of those means would you expect to

fall within the range 5.4-(1.96x1) to 5.4+(1.96x1)?

• The answer is 95!Standard error

Applying the standard error: Male's serum uric acid levels (2/2)

• One sample • Mean serum uric acid level of 8.2• Would you assume this was

"significantly" different from the population mean? Yes, because a mean of that magnitude

could occur less than 5 times in 100

Standard error

Key messages

• While population parameters are fixed, samples provide estimates (statistics) that fluctuate

• The distribution of a statistic for all possible samples of given size ‘n’ is called the sampling distribution. For large ‘n’, the sampling distribution is ‘normal’,

even if the original distribution is not. If the original distribution is normal, the result is true

even for small ‘n’.• The mean of the sampling distribution is the

population mean and the standard deviation (standard error) is the population SD/ sq.root n

Documents

From the population to the sample The sampling distribution FETP India