Samplinf Distribution

Preview:

Citation preview

  • 8/7/2019 Samplinf Distribution

    1/4

    The Sampling Distribution of the Mean

    Note: A prerequisite for this chapter is the chapter on the normal distribution.

    Sampling and the Sampling Error

    Inferential statistics is all about making deductions about population parameters fromsample statistics. We will concentrate on inferring the population mean from the samplemean. When we take a random sample from a population and calculate the mean of thissample it will obviously not be equal to the true population mean. The difference betweenthe sample mean and the true population mean is called thesampling error. The samplingerror is purely due to chance (due to the randomness involved in picking the sample). Wenow wish to address an important question: how variable can the mean of a sample be?

    The Sampling Distribution of the Mean

    If we are given a normal distribution, we understand how to calculate the probability thata value picked at random lies between two limits (ref ). The section on the normaldistribution gave several examples of such calculations. This section shows how this isrelevant to questions like the ones asked in Section ___. We will see that givenobservations that do not necessarily follow a normal distribution, when it comes toanswering questions about the means of samples we can still use the theory of the normaldistribution.

    This is possible because of an important result in probability theory called the Central

    Limit Theorem. The theorem is easy enough to understand and what it says is thefollowing:

    Given a population which has an arbitrary distribution (it need not benormal - it could be skewed arbitrarily), suppose we take all possiblesamples of a certain (arbitrary) size, and look at the means of all thesamples. Then, the means are distributed normally.*

    One point needs to be made: the size of the sample should not be small. In practice, thesampling distribution for samples of size 30 or so will provide an excellentapproximation to a true normal distribution.

    Definition

    Thesampling distribution of the mean is the distribution of the means of samples of afixed size taken from a population.

    * The actual statement of the Central Limit Theorem is of course more involved, but this is basically what itdoes say. See ref

    1

  • 8/7/2019 Samplinf Distribution

    2/4

    Since we may assume that the sampling distribution of the mean is a normal distribution,it is natural to ask what the mean and the standard deviation of this distribution are. Theanswer is given by the following result:

    For a population with mean and standard deviation , the sampling

    distribution of samples of size n has mean equal to

    and standarddeviation

    n

    Definition

    The standard deviation of the sampling distribution is called thestandard error. It is

    denotedx

    and is given by the formulax

    =n

    Notice that the standard error decreases as the size of the sample increases and isproportional to the standard deviation of the original population. This makes sense if youthink about it: what the standard error measures is the variability of the sample means. Ifyou take samples of a very large size, the variation between the means is obviously less.And if the original population has a large variability, the variability between the samplemeans is also obviously larger.

    z values for the sampling distribution are defined as before. The only difference is thatthe variable whose distribution we are considering is x and the standard deviation of this

    distribution isx

    . So the z value for the sampling distribution is given by

    x

    x

    z

    =

    All this is relevant because in practice we are really interested saying something about themean of a sample. The example below should illustrate.

    Example 1. Assume that we know that the population of hypertensive people has a meandiastolic value of 120 and a standard deviation of 40. What is the probability of finding arandom sample of 25 people with a mean diastolic value less than 100?

    If we had taken all possible samples of size 25 from this population, we know that themeans of these samples form a normal distribution with mean 120 and standard deviation8.

    How did we get the 8? The standard deviation of our hypothetical sampling distribution

    (the standard errorx

    ) isn

    =

    25

    40=

    5

    40= 8

    2

  • 8/7/2019 Samplinf Distribution

    3/4

    Our problem has now reduced to a question that we know how to answer: Given a normaldistribution with mean 120 and standard deviation of 8, what is the proportion of casesless than 100? The z value is -1.25.

    How? The z value is given byx

    x

    z

    = =8120100

    =820

    = -2.5

    What is the area to the left of z = -2.5 for the standard normal distribution? Looking upthe normal table shows that the answer is 0.0062 (Check this!). So the probability offinding a random sample of 25 people with a mean diastolic value less than 100 is 0.62%.

    Exercise. Apopulation of hypertensive people has a mean diastolic value of 120 and astandard deviation of 40. Find the probability that a random sample of 30 hypertensivepeople have a diastolic level less than 110.

    (z is -1.67. The probability is 0.0475)

    Exercise. A population of asthmatics is known to have a mean PFR of 100 with astandard deviation of 40. What is the probability that a sample of 10 asthmatics has aPFR more than 150?(z is 3.94. The probability is less than 0.0005)

    Looking Ahead

    The reader will have noticed that in the examples above we knew the values ofand

    for thepopulation. In practice this is never true! When the population mean is notknown the solution is simple. We know from statistical theory that thesample mean x isa good estimator for (see the next section on Estimation). So when the value of isnot known we simply use thesample mean x in its place. From this one might guessthat one could do the same with the standard deviation - if is not known, uses in itsplace. But whiles is a good estimator for , the procedure we used to calculateprobabilities as we did above needs a small modification when the sample size is small(less than 30 or so) - if we uses in place of we have to use a modification of the normaldistribution called the t distribution.

    But before we do this, we look at the basic idea of estimation.

    Review Questions

    What is sampling error?

    3

  • 8/7/2019 Samplinf Distribution

    4/4

    What is standard error?

    4

Recommended