38
WHAT IS SAMPLING? Sampling is the act, process, or technique of selecting a suitable sample, or a representative part of a population for the purpose of determining parameters or characteristics of the whole population. The goal is to have the smaller study group to resemble as closely as possible the larger group. Sampling permits the researcher to work with a more manageable group size. The study’s findings can be generalized back to the total population with inferential statistics. Sampling is used for gathering data from a population. It is a statistical practice by which observations are made upon certain individuals of a population so as to derive certain conclusions about the population. 1.1 What is a sample? A sample is a finite part of a statistical population whose properties are studied to gain information about the whole (Webster, 1985). When dealing with people, it can be defined as a set of respondents (people) selected from a larger population for the purpose of a survey. 1.2 Why do we use samples ? 1. It is frequently too costly, or impracticable, to collect information for the whole population, i.e to conduct a census. For example, in business, quality control checks may involve destructive testing-every component produced cannot therefore be tested to destruction! Likewise, it would be impractical to measure, for example, the diameter of every single ball-bearing produced to check for precision. 2. Since in some cases, for example the population of a country, there are obvious practical and economic reasons which will hinder the study of the whole population, a sample is used so that the data gathered from the sample may be inferred to the population. 1.3 Advantages of sampling.

sampling

Embed Size (px)

Citation preview

Page 1: sampling

WHAT IS SAMPLING?

Sampling is the act, process, or technique of selecting a suitable sample, or a representative part of a population for the purpose of determining parameters or characteristics of the whole population. The goal is to have the smaller study group to resemble as closely as possible the larger group. Sampling permits the researcher to work with a more manageable group size. The study’s findings can be generalized back to the total population with inferential statistics.

Sampling is used for gathering data from a population. It is a statistical practice by which observations are made upon certain individuals of a population so as to derive certain conclusions about the population.

1.1 What is a sample?

A sample is a finite part of a statistical population whose properties are studied to gain information about the whole (Webster, 1985). When dealing with people, it can be defined as a set of respondents (people) selected from a larger population for the purpose of a survey.

1.2 Why do we use samples?

1. It is frequently too costly, or impracticable, to collect information for the whole population, i.e to conduct a census. For example, in business, quality control checks may involve destructive testing-every component produced cannot therefore be tested to destruction! Likewise, it would be impractical to measure, for example, the diameter of every single ball-bearing produced to check for precision.

2. Since in some cases, for example the population of a country, there are obvious practical and economic reasons which will hinder the study of the whole population, a sample is used so that the data gathered from the sample may be inferred to the population.

1.3 Advantages of sampling.

1. Reduced cost. If data are secured from only a small fraction of the aggregate, expenditures may be expected to be smaller than if a complete census is attempted.

2. Greater speed. For the same reason, the data can be collected and summarized more quickly with a sample than with a complete count. This may be a vital consideration when the information is urgently needed.

3. Greater scope. In certain types of inquiry, highly trained personnel or specialized equipment, limited in availability, must be used to obtain the data. A complete census may then be impracticable: the choice lies between obtaining the information by sampling or not at all. Thus surveys which rely on sampling have more scope and flexibility as to the types of information that can be obtained. On the other hand, if information is wanted for many subdivisions or segments of the population, it may be found that a complete enumeration offers the best solution.

4. Greater accuracy. Because personnel of higher quality can be employed and can be given intensive training, a sample may actually produce more accurate results than the kind of complete enumeration that it is feasible to take.

Page 2: sampling

1.4 Disadvantages of sampling

1. Sampling does not give information on every person, business etc

2. Sampling does not provide information for action with respect to individual account.

3. Sampling produces results containing errors of sampling. This is a disadvantage if the error of sampling is too big for some purpose one has in mind.

2.0 STEPS IN SAMPLING

1. Define population (N) to be sampled is defined.2. Sample size (n) is determined

3. Control for bias and error.

4. Sample is selected.

2.1 What is a sampling frame?

A Sample frame is a set of the population where all the individuals can be identified and used in the sampling exercise. It is the actual set from which the sample is to be taken. This sampling frame is representative of the population.

For example, the sampling frame in a household survey may be the people listed in the telephone directory.

In most cases, due to time and size constraint, a representative set of the population is taken for observation. A sample is selected for data collection purposes from the sample frame (hence, from the population)

2.2 Definition of population

A population can be defined as including all people or items with the characteristic one wish to understand. The group of interest and its characteristics to which the findings of the study will be generalized must be identified. It is also called the “target” population (the ideal selection).However at times the “accessible” or “available” population must be used (realistic selection).

For example,

1. The target population for a household survey may be the Mauritius adult population

2. A manufacturer needs to decide whether a batch of material from production is of high enough quality to be released to the customer, or should be sentenced for scrap or rework due to poor quality. In this case, the batch is the population.

2.3 Determination of the sample size.

Page 3: sampling

There is no specific sample size for a research study. Sample sizes depend on the type of study being conducted and the population being studied. The following is a list of examples to determine sample sizes for different kinds of research studies.

1. Experimental/Causal Research: Most researchers recommend that the sample size of each group in an experimental study be at least 30 participants. In some cases the group could be as small as 15 if tight controls are used in establishing the research groups.

2. Co relational Studies: The recommended number for relationship studies is also 30. Smaller numbers make it difficult to obtain statistical significance.

3. Descriptive Studies: The number of participants in a descriptive study can vary significantly. Usually the size of the population to be studied has more of an effect on the sample size than any general sampling rule. Small populations require a larger percentage of the population to be included in the study.

Sample size for a given population based on a 5% Level of significance

General Rules:

1. A smaller percentage is required for a larger population.

2. Studies using a population less than 100 should use the entire population.

3. Populations of approximately 300 should use a sample of 50%.

4. If the population size is around 1500, 20% should be sampled.

5. Population of above 100000 would require only 384 in the sample population.

NUMBER SAMPLE SIZE

NUMBER SAMPLE SIZE

10 10 150 108

20 19 200 132

30 28 300 169

40 36 500 217

50 44 1000 278

60 52 2000 322

70 59 5000 357

80 66 10000 370

90 73 50000 381

100 80 100000 384

Page 4: sampling

2.3.1 Calculating a Sample SizeIn general, the larger the sample size, the more closely the sample data will match that from the population. However in practice, the number of responses will give sufficient precision at an affordable cost must be worked out.

Calculation of an appropriate sample size depends upon a number of factors unique to each survey and it is down to us to make the decision regarding these factors. The three most important are:

1. How accurate one wishes to be2. How confident one is in the results3. What budget one has available

The temptation is to say all should be as high as possible. The problem is that an increase in either accuracy or confidence (or both) will always require a larger sample and higher budget. Therefore a compromise must be reached and the degree of inaccuracy and confidence one is prepared to accept must be worked out.

For example in a Market research project, values such as mean income and mean height etc are estimated.

For a meanThe required formula is: s = (z / e)2

Where:s = the sample sizez = a number relating to the degree of confidence one wishes to have in the result. 95% confidence* is most frequently used and accepted. The value of ‘z’ should be 2.58 for 99% confidence, 1.96 for 95% confidence, 1.64 for 90% confidence and 1.28 for 80% confidence.e = the error that can be accepted, measured as a proportion of the standard deviation (accuracy)

If mean income is being estimated and one wishes to know what sample size to aim for, in order that one can be 95% confident in the result. Assuming that an error of 10% of the population standard deviation can be accepted the following calculation can be used:

s = (1.96 / 0.1)2

Therefore s = 384.16

In other words, 385 people would need to be sampled to meet our criterion.

If the whole population had been interviewed, then the confidence level would have been 100%. But since only a sample has been interviewed, one is less confident. As the sample size calculation has been based on the 95% confidence level, this means that one can be confident that amongst the whole population there is a 95% chance that the mean is inside the acceptable error limit. However there is of course a 5% chance that the measure is outside this limit. If one wanted to be more confident, the sample size calculation should have been based on a 99% confidence level and if a lower level of confidence can be accepted, then the calculation can be based on the 90% confidence level.

2.4 Control for sampling bias and error

Page 5: sampling

1. The sources of sampling bias must be known and it must be identified how to avoid it.2. It must be decided whether the bias is so severe that the results of the study will be

seriously affected

3. In the final report, awareness of bias, rationale for proceeding, and potential effects must be documented.

2.5 Selection of the sample

It is the process by which the researcher attempts to ensure that the sample is representative of the population from which it is to be selected. The sampling method that will be used must be identified.

3.0 HOW TO SELECT THE MOST APPROPRIATE SAMPLING METHOD

A variety of sampling methods can be employed, individually or in combination. Factors commonly influencing the choice between these designs include:

1. Nature and quality of frame.

2. Availability of auxiliary information about units on the frame.

3. Accuracy requirements and the need to measure accuracy.

4. Whether detailed analysis of the sample is expected.

5. Cost/operational concerns.

4.0 PROBABILITY SAMPLING

A probability sampling scheme is one in which every unit in the population has a chance (greater than zero) of being selected in the sample, and this probability can be accurately determined. The combination of these traits makes it possible to produce unbiased estimates of population totals, by weighting sampled units according to their probability of selection.

Probability sampling includes:

1. Simple Random Sampling2. Systematic Sampling

3. Stratified Sampling,

4. Cluster or Multistage Sampling.

These various ways of probability sampling have two things in common:

Page 6: sampling

1. Every element has a known nonzero probability of being sampled.2. It involves random selection at some point.

4.1 SIMPLE RANDOM SAMPLING (SRS)

A simple random sample is one in which each member (person) in the total population has an equal chance of being picked for the sample. In addition, the selection of one member should in no way influence the selection of another. Simple random sampling should be used with a homogeneous population, that is, one composed of members who all possess the same attribute you are interested in measuring. In identifying the population to be surveyed, homogeneity can be determined by asking the question, “What is (are) the common characteristic(s) that are of interest?” These may include such characteristics as age, sex, rank/grade, position, income, religious or political affiliation, etc.

Example: Placing names in a hat and drawing the sample is a method of using simple random sampling.

4.1.1 Steps

The following steps are used to randomly select a sample.

1. The researcher identifies and defines the population.2. The appropriate sample size is determined.

3. The population is listed and all members are assigned a number. Each individual must have the same number of digits as the others.

4. The researcher then uses the table or random numbers to select each member of the sample .

Many statistics and research books contain random number tables similar to the sample shown below.

Page 7: sampling

How to use a random number table.

1. Assuming that there is a population of 185 students and each student has been assigned a number from 1 to 185. Suppose we wish to sample 5 students

2. Since the population consists of 185 students and 185 is a three digit number, the first three digits of the numbers listed on the chart must be used.

3. We close our eyes and randomly point to a spot on the chart. For this example, 20631 in the first column was selected.

4. That number is interpreted as 206 (first three digits). Since there is no member of the population with that number, we go to the next number 899 (89990). Once again there is no one with that number, so we continue at the top of the next column. As we work down the column, the first number to match the population is 100 (actually 10005 on the chart). Student number 100 would be in the sample. Continuing down the chart, the other four subjects in the sample would be students 049, 082, 153, and 005.

Microsoft Excel has a function to produce random numbers.

The function is simply

=RAND()

Type that into a cell and it will produce a random number in that cell. Copy the formula throughout a selection of cells and it will produce random numbers between 0 and 1.

Whatever range that is required can be obtained if the formula is modified. For example, if random numbers from 1 to 250 are needed, the following formula could be entered:

Page 8: sampling

=INT(250*RAND())+1

The INT eliminates the digits after the decimal, the 250* creates the range to be covered, and the +1 sets the lowest number in the range.

4.1.2 Example:

The University Of Mauritius has decided to offer 10 books to the MBA group B class. In order that the books are given to 10 random students, a random sampling is carried out using Simple Random Sampling of 10 students from the class. The names or roll nos, available on XL sheets are listed and a random number. is allocated to all the students. The list is then sorted in ascending order and the fist 10 names sorted out are provided with the books.

4.1.3 Advantages:

1. It is easy to conduct2. strategy requires minimum knowledge of the population to be sampled

3. It minimises bias and simplifies analysis of results. In particular, the variance between individual results within the sample is a good indicator of variance in the overall population, which makes it relatively easy to estimate the accuracy of results.

4.1.4 Disadvantages

1. SRS can be vulnerable to sampling error because the randomness of the selection may result in a sample that doesn't reflect the makeup of the population.

For instance, a simple random sample of ten people from a given country will on average produce five men and five women, but any given trial is likely to over represent one sex and under represent the other. Systematic and stratified techniques, discussed below, attempt to overcome this problem by using information about the population to choose a more representative sample.

2. SRS may also be cumbersome and tedious when sampling from an unusually large target population. In some cases, investigators are interested in research questions specific to subgroups of the population.

For example,

Researchers might be interested in examining whether cognitive ability as a predictor of job performance is equally applicable across racial groups. SRS cannot accommodate the needs of researchers in this situation because it does not provide subsamples of the population.

3. It requires the names or the list of all the population members.4. There is difficulty in reaching all selected in the sample.

4.2 SYSTEMATIC SAMPLING

Page 9: sampling

4.2.1 Systematic sampling is a method of random sampling. The individuals to be sampled are selected at a uniform interval that is measured in time, order, or space.

4.2.2 Steps:

The following steps are used to systematically select a sample using every (Kth) name.

1. The researcher identifies and defines the total population.

2. The appropriate sample size is determined.

3. The population is listed using the names of the members.

4. The researcher then determines the sample size.

5. The researcher next divides this sample into the total population producing the Kth number.

6. The researcher then selects a random starting point on the population list within the first Kth number. For example, the population is 1,000 and we need a sample of 50. The Kth number is 20. The researcher randomly selects a starting point (some number between 1 and 20).

7. The next member of the sample is chosen by adding 20 to the random starting point.

4.2.2 Example:

Let's assume that we have a population that has N=100 people in it.

We want to take a sample of n=20.

To use systematic sampling, the population must be listed in a random order.

The sampling fraction would be f = 20/100 = 20%.

Page 10: sampling

The interval size, k, is equal to N/n = 100/20 = 5.

Now, select a random integer from 1 to 5.

By inserting a formula in XL, we get 4.

Now, to select the sample, we start with the 4th unit in the list and take every k-th unit (every 5th, because k=5). We would be sampling units 4, 9, 14, 19, and so on to 100 and we would end up with 20 units in our sample.

4.2.3 Advantages

1. It is easier to extract the Sample than in simple random Sampling.

2. It ensures that the individuals chosen are spread across the population.

3. It can be used in case where sampling frame does not exist

4.2.4 Disadvantages

1. All members of the population do not have an equal chance of being selected.

2. The Kth person may be related to a periodical order in the population list, producing unrepresentativeness in the sample.

3. It may prove to be costly and time consuming if samples are not conveniently located.

4. Bias can occur where there are recurring sets in the population.

4.3 STRATIFIED SAMPLING

A second method of modifying the random sampling process is called stratified sampling. In this case the population is divided into subgroups chosen by the researcher. Stratified sampling can be proportional or non-proportional. In proportional sampling the participants are chosen in proportion to the number in each subgroup. Non proportional sampling occurs when the response weight of the subgroup is not a factor.

4.3.1 Steps:

Sampling is carried out as follows:

1. The population is divided into non-overlapping groups N1, N2, N3, ... Ni, such that N1 + N2 + N3

+ ... + Ni = N where N is population size.

2. The proportion of Ni/N is found.

3. A simple random sample of f = n/N is carried out in each strata.

4.3.2 Example

An example might be taken from University of Mauritius. The opinion of students in the Faculty are to be taken in connection with their grievances. Suppose from 500 students, there are 300 males and 200 females and out of these, there are 60 male part timers and 50 female part timers.

Page 11: sampling

1. We are required to take a sample of 100 student, stratified according to the above categories.

2. The first step is to find the total number of students (500) and calculate the percentage in each group.

% male, full time = ( 240 / 500 ) x 100 = 0.48 x 100 = 48

% male, part time = ( 60 / 500 ) x100 = 0.12 x 100 = 12

% female, full time = (150 / 500 ) x 100 = 0.3 x 100 = 30

% female, part time = (50/500)x100 = 0.1 x 100 = 10

3.0 This tells us that of our sample of 100,

48% should be male, full time.

12% should be male, part time.

30% should be female, full time.

10% should be female, part time.

48% of 100 is 48.

12% of 100 is 12.

30% of 100 is 30.

10% of 100 is 10.

4.0 Therefore the above numbers of people are randomly chosen within their strata.

Example of a stratified sampling appropriately used.

The state superintendent of schools wants to determine if geographic location has a significant effect on teacher support merit pay plan.

1. The researcher identifies and defines the population.2. The appropriate sample size is determined.

Page 12: sampling

3. The variable and subgroups (strata) for which the researcher wants to guarantee appropriate, equal representation is identified.

4. All members of the population are classified as members of one identified subgroup.

5. Using a table of random numbers) an “appropriate” number of individuals from each of the subgroups is randomly selected.

Steps:

1. The total teacher population in state is listed.

2. Teachers are stratified into geographic regions.

3. Proportional sampling is used to select study participants: 20% of population in each area.

4. Teachers are randomly placed into two study groups.

4.3.3 Advantages

1. It gives a more precise sample.

2. It can be used for both proportion and stratification sampling.

3. The sample represents the desired strata.

4. It focuses on important subpopulations but ignores irrelevant ones

5. It improves the accuracy of estimation

Page 13: sampling

6. It is efficient

7. Sampling equal numbers from strata varying widely in size may be used to equate the statistical power of tests of differences between strata.

4.3.4 Disadvantages

1. It requires the name of all population members.

2. There is difficulty in reaching all in the selected members.

3. The researcher must have the names of all the populations.

4. It can be difficult to select relevant stratification variables

5. It is not useful when there are no homogeneous subgroups

6. It can be expensive

7. It requires accurate information about the population.

4.4 CLUSTER SAMPLING

The random sampling process can be modified by using the cluster sampling process. Cluster sampling utilizes the convenience of naturally occurring groups. It is particularly useful in situations for which no list of the elements within a population is available and therefore cannot be selected directly. As this form of sampling is conducted by randomly selecting subgroups of the population, possibly in several stages, it should produce results equivalent to a simple random sample.

The sample is generally done by first sampling at the higher level(s) e.g. randomly sampled countries, then sampling from subsequent levels in turn e.g. within the selected countries sample counties, then within these postcodes, the within these households, until the final stage is reached, at which point the sampling is done in a simple random manner e.g. sampling people within the selected households. The ‘levels’ in question, are defined by subgroups into which it is appropriate to subdivide your population.

Cluster samples are generally used if:

1. No list of the population exists.2. Well-defined clusters, which will often be geographic areas exist.3. A reasonable estimate of the number of elements in each level of clustering can be made.4. Often the total sample size must be fairly large to enable cluster sampling to be used effectively.

4.4.1 Steps:

1. The population is identified and defined.

2. The desired sample size determined.

3. A logical cluster is identified and defined.

Page 14: sampling

4. All clusters (or a list is obtained) that make up the population of clusters is listed.

5. The average number of population members per cluster is estimated.

6. The number of clusters needed is determined by dividing the sample size by the estimated size of a cluster.

7. The needed number of clusters is randomly selected by using a table of random numbers.

8. All population members in each selected cluster are included in the study.

4.4.2 Example of where cluster sampling would be appropriately used:

A large suburban school district wants to test the effect of a new integrated reading program on sixth graders. The school division has a sixth grade population of 3,000 students based in 100 classrooms.

Using normal random sampling, the researcher would list all 3,000 students and use a table of random numbers to select the study participants. This process would create a situation where every one of the 100 classes would have a few students represented in the sample.

Some of the problems that random sampling would create here are:

1. It is difficult to administer since each class would have only a few students in the sample.

2. It is difficult to set up a control and experimental group study since some students would be in the same class.

3. Increased cost and time to train the participants in all 100 classrooms.

Steps:

1. The cluster to be used must be determined. The logical cluster to use in this study would be each of the 100 individual classrooms.

2. The 100 classrooms are determined and the number of subjects needed is determined. In this case 30 classrooms have been chosen.

3. The 30 chosen classrooms are determined and using random selection, the 15 classes to be chosen for each of the experimental and control groups are determined.

4. The treatment or independent variable is applied to the experimental classrooms.

Page 15: sampling

4.4.3 Advantages

1. It is efficient.

2. The researcher doesn’t need the names of all population members.

3. It reduces travel to site

4. It is useful for educational research

4.4.4 Disadvantages

Fewer sampling points make it less like that the sample is representative

A comparison between Stratified and Cluster Sampling processes

Stratified Sampling Cluster SamplingHomogeneity within group Homogeneity between groupsHeterogeneity between groups Heterogeneity within groupsAll groups are included Random selection of groupsSampling efficiency improved by increasing accuracy at a faster rate than cost.

Sampling efficiency improved by decreasing cost at a faster rate than accuracy.

Page 16: sampling

5.0 NON-PROBABILITY SAMPLING

Nonprobability sampling is any sampling method where some elements of the population have no chance of selection (these are sometimes referred to as 'out of coverage'/'undercovered'), or where the probability of selection can't be accurately determined. It involves the selection of elements based on assumptions regarding the population of interest, which forms the criteria for selection. Hence, because the selection of elements is nonrandom, nonprobability sampling does not allow the estimation of sampling errors. These conditions place limits on how much information a sample can provide about the population. Information about the relationship between sample and population is limited, making it difficult to extrapolate from the sample to the population.

5.1 CONVENIENCE SAMPLING

It is the process of including whoever happens to be available at the time. It is also called as accidental or haphazard sampling

5.1.1 Steps:

The researcher just interview people at any places as they walk by for example on the street. This is easy because he just chooses it, without any random mechanism. He chooses the people that walk by. Sometimes the people could ignore him so it all depends what he is surveying.

5.1.2 Examples:

The female moviegoers sitting in the first row of a movie theatre

Page 17: sampling

A group of students in a high school do a study about teacher attitudes. They interview teachers at the school, a couple of teachers in the family and few others who are known to their parents.

In a class of 50 students, the teacher chooses the first 5 students who raise their hands or who answers a question right.

5.1.3 Advantages:

1. Convenience sampling is often a preferred option to other methods of sampling because it allows a researcher to pilot-test an experiment with minimal resources and time.

2. It is also relatively inexpensive and allows the researcher to get a gross estimate of the results.

3. It is perhaps the best way of getting some basic information quickly and efficiently4. It can be used when it is impossible to access a wider population, for example due to time

5.1.4 Disadvantages:

1. The sample is not an accurate representation of the population2. The findings from this sample are less definitive3. Results have to be extrapolated in order to fine tune them.4. It is completely unstructured approach. Difficulty in determining how much of the effect

(dependent variable) results from the cause (independent variable)

5.2 PURPOSIVE SAMPLING

It is also called “judgement” sampling. A purposive sampling is a non-random sampling in which the selection of the sample is based on person expertise about the population. As the purposive sampling is not based on the probability theory therefore, no objective method is used for measuring the reliability of the sample results. This technique being unscientific always involves the liking and disliking of the enumerators. This method is useful only when the sample drawn is small provided the selection of the sample is representative and the investigator is thoroughly skilled and has experience in the field of inquiry and known the drawbacks of the deliberate selection.

5.2.1 Steps:

When taking the sample, reject people who do not fit a particular profile.

5.2.2 Example

A researcher wants to get opinions from non-working mothers. They go around an area knocking on doors during the day when children are likely to be at school. They ask to speak to the 'woman of the house. Their first questions are then about whether there are children and whether the woman has a day job.

5.2.3 Advantages:

1. The people who do not fit the requirements are eliminated2. The sample is an accurate or near to accurate representation of the population.

Page 18: sampling

3. The results are expected to be more accurate4. It is less time consuming5. It is less expensive as it involves lesser search costs.

5.2.4 Disadvantage

Potential for inaccuracy in the researcher’s criteria and resulting sample selections

5.3 QUOTA SAMPLING

In a Market Research context, the most frequently-adopted form of non-probability sampling is known as quota sampling? In some ways this is similar to cluster sampling in that it requires the definition of key subgroups. The main difference lies in the fact that quotas (i.e. the amount of people to be surveyed) within subgroups are set beforehand (e.g. 25% 16-24 yr olds, 30% 25-34 yr olds, 20% 35-55 yr olds, and 25% 56+ yr olds) usually proportions are set to match known population distributions. Interviewers then select respondents according to these criteria rather than at random.

5.3.1 Steps:

Like stratified sampling, the researcher first identifies the stratums and their proportions as they are represented in the population. Then convenience or judgment sampling is used to select the required number of subjects from each stratum. This differs from stratified sampling, where the stratums are filled by random sampling.

5.3.2 Example:

The student council at UOM wants to gauge student opinion on the quality of their extracurricular activities. They decide to survey 100 of 1,000 students using the grade levels (7 to 12) as the sub-population.

The table below gives the number of students in each grade level.

Table 1. Number of students enrolled at UOM, by grade

Grade level Number of students Percentage of students (%) Quota of students in sample of 100

7 150 15 15

8 220 22 22

9 160 16 16

10 150 15 15

11 200 20 20

12 120 12 12

Total 1,000 100 100

The student council wants to make sure that the percentage of students in each grade level is reflected in the sample. The formula is:

Page 19: sampling

Percentage of students in Grade 10= (number of students ÷ number of students) x 100%= (150 ÷ 1,000) x 100= 15%

Since 15% of the school population is in Grade 10, 15% of the sample should contain Grade 10 students. Therefore, use the following formula to calculate the number of Grade 10 students that should be included in the sample:

Sample of Grade 10 students= (15% of 100) x 100= 0.15 x 100= 15 students

5.3.3 Advantages:1. It is easier to organize as compared to random sampling;2. It is cheaper to collect samples in this form;

3. More reliable than random sampling;

4. Each group to be researched is included in the sample

5.3.4 Disadvantages:

1. People who are less accessible (more difficult to contact, more reluctant to participate) are underrepresented.

2. The subjective nature of this selection means that only about a proportion of the population has a chance of being selected in a typical quota sampling strategy.

3. It does not meet the basic requirement of randomness. 4. Some units may have no chance of selection or the chance of selection may be unknown.

Therefore, the sample may be biased.5. Not as representative of the population as a whole as other sampling methods 6. Because the sample is non-random it is impossible to assess the possible sampling error

5.4 SNOWBALL SAMPLING

In snowball sampling, someone who meets the criteria for inclusion in the study is identified. The person is then asked to recommend others who they may know who also meet the criteria. Although this method would hardly lead to representative samples, there are times when it may be the best method available.

5.4.1 Steps:

1. Find people to study.2. Ask them to refer you other people who fit your study requirements, then follow up with

these new people.3. Repeat this method of requesting referrals until you have studied enough people.

Page 20: sampling

5.4.2 Example:

If the homeless are being studied, it is not likely to find good lists of homeless people within a specific geographical area. However, if we go to that area and identify one or two, we may find that they know very well who the other homeless people in their vicinity are and how we can find them.

5.4.3 Advantages

Snowball sampling is especially useful when we are trying to reach populations that are inaccessible or hard to find.

5.4.4 Disadvantages:

1. It is a good qualitative material but poor in terms of generating reliable data that applies to the larger population

2. The way that the sample is chosen by target people makes it liable to various forms of bias. People tend to associate not only with people with the same study selection characteristic but also with other characteristics.

Heterogeneity Sampling

Homogeneity Sampling

If all opinions or views need to be included, and there is no concern for representing these views proportionately then heterogeneity sampling is performed. Another term for this is sampling for diversity. In many brainstorming or nominal group processes (including concept mapping), some form of heterogeneity sampling is used because the primary interest is in getting broad spectrum of ideas, not identifying the "average" or "modal instance" ones. In effect, what must be sampled is ideas and not people. Here the universe is made up of all possible ideas relevant to some topic and a sampling of this population is needed, not a sample of the people who have the ideas. Clearly, in order to get all of the ideas, and especially the "outlier" or unusual ones, a broad and diverse range of participants must be included. Heterogeneity sampling is, in this sense, almost the opposite of modal instance sampling.

How to ensure that the sample is representative of the population

An essential prerequisite is that any sample must be selected in such a way as to be representative of the population from which it has been drawn. The fundamental consideration is that any sample should be a random sample, i.e. every member of the population should have an equal chance of being selected. However a representative sample does not mean that it is an exact replica, in miniature, of the population parameters. The results are subject to sampling error.

WHAT ARE RESPONSE RATES?

Page 21: sampling

The percentage of people who respond to your survey is considered the response rate. A high survey response rate helps to ensure that the survey results are representative of the survey population.

Sufficient response rates are important for surveys. A survey that collects very little data may not contain substantial information. In order to collect successful responses, researchers must take into consideration the audience, the quantity of online surveys in circulation, and the potential for surveys reported as spam. These factors may result in lower respondent interest and acceptance of survey invitations. But there are ways to increase response rates!

The Importance of Response Rates

A high response rate is the key to legitimizing a survey's results. When a survey receives responses from a large percentage of its target population, the findings are seen as more accurate. Low response rates, on the other hand, can damage the credibility of a survey's results, because the sample is less likely to represent the overall target population.

Low response rates are a continuing problem for survey organizations. Some people simply refuse to participate in surveys, while others, for a wide range of reasons, cannot participate. Still, a well-designed survey, coupled with incentives and techniques to elicit response, can help guarantee a healthy response rate.

Reasons for Non-Response

There are many reasons why people might choose not to respond to a survey. Sometimes time is a factor. People may feel they can't spare the time to participate in a survey. Others may see a survey as a nuisance, particularly telephone and mail surveys. However, some factors that can cause non-response lie in the hands of the surveyors themselves, and can thus be avoided. The following list includes some of the pitfalls that can lead to non-response:

If potential respondents have trouble understanding the questions, the chance that they will choose not to participate increases. Survey questions must be clear and concise.

The survey format must be unambiguous and consistent. Question formats should also remain consistent and not jump randomly from type to type (i.e. multiple choice to short answer and back again). Instructions should be as explicit as possible.

People are much more likely to respond to a nicely designed survey. A form that looks unprofessional or haphazardly constructed will undoubtedly lead to a lower response rate. Web surveys that require too much scrolling or contain too many pages can also inhibit response.

Telephone Surveys can occur any time during the day, but the incredible growth of the telemarketing industry has led many people to screen their calls, especially during the dinner hour. If telephone interviewers identify themselves and their purpose up front, instances where people assume they are telemarketers and screen them out can be minimized.

Response Issues

Not only do survey researchers have to be concerned about non response rate errors, but they also have to be concerned about the following potential response rate errors:

Page 22: sampling

Response bias occurs when respondents deliberately falsify their responses. This error greatly jeopardizes the validity of a survey's measurements.

Response order bias occurs when a respondent loses track of all options and picks one that comes easily to mind rather than the most accurate.

Response set bias occurs when respondents do not consider each question and just answer all the questions with the same response. For example, they answer "disagree" or "no" to all questions.

These response errors can seriously distort a survey's results. Unfortunately, response bias is difficult to eliminate; even if the same respondent is questioned repeatedly, he or she may continue to falsify responses. Response order bias and response set errors, however, can be reduced through careful development of the survey questionnaire.

Methods That Can Induce Response

Just as there are ways to avoid causing non-response, there are numerous proven methods that can stimulate response. Some of the methods survey organizations use to help increase response rates include the following:

Incentives are perhaps the most effective method to ensure participation. Survey organizations use many kinds of incentives to elicit response, such as offering to share the survey's findings or awarding a certain number of 'points' for each survey taken that can then be redeemed for prizes. Some survey organizations enter respondents in a sweepstakes or even pay a modest stipend for participation.

Although answering machines are generally viewed as a problem, they can also be used to a survey organization's advantage. A simple message requesting a call back can be very effective, especially if the organization uses an 800 number.

Postcards or e-mails announcing upcoming surveys have been shown to increase response.

Successful survey organizations always follow up the initial invitation with a reminder to those that have not yet responded.

Establishing legitimacy can help convince potential respondents to participate in a survey. A good survey tells potential respondents, who is conducting the survey and what credentials they hold. It also outlines procedures for asking questions and providing feedback.

Surveying employees is a great way to gauge both opinion and workplace efficiency, but these surveys only work if enough employees participate. Offering employees time to fill out a survey not only ensures participation, it also sends a positive message that their opinions are valued, leading to honest, more useful responses.

PART 2

Discuss how population parameters(mean, variance and proportion)are estimated from sample parameters

ESTIMATION OF POPULATION PARAMETERS

Every member of a population cannot be examined so we use the data from a sample, taken from the same population, to estimate some measure, such as the mean, of the population itself.

Page 23: sampling

The sample will provide us with the best estimate of the exact 'truth' about the population. The method of sampling depends on the data available but the ideal method, as every member of the population has an equal chance of being selected, is random sampling.

We estimate limits within which we are expect the 'truth' about the population to lie and state how confident we are about this estimation.

There are therefore two types of estimate of a population parameter:

Point estimate - one particular value;

Interval estimate - an interval centred on the point estimate.

Point Estimate of Parameter (e.g. mean)From the sample, a single value is calculated to serve as an estimate for the population parameter.

a) The best estimate of the population percentage, , is the sample percentage, p.

b) The best estimate of the unknown population mean, , is the sample mean, xx

n .

This estimate of is often written and referred to as 'mu hat'. (mu) is the symbol for the population mean

c) Sample variance is calculated using a formula

d) The best estimate of the unknown population standard deviation, , is the sample standard deviation s, where:

sx x

n

2

1 This is from the [x n 1] key.

N.B.

sx x

n

2

[x n ] key underestimates

Little difference between the two estimates when n is large.Example 1: The Accountant wishes to obtain some information about all the invoices sent out to a supermarket's account customers. In order to estimate this information, a sample of twenty invoices is randomly selected from the whole population.

Values of Invoices (£):

Page 24: sampling

32.53 22.27 33.38 41.47 38.05

31.47 38.00 43.16 29.05 22.20

25.27 26.78 30.97 38.07 38.06

25.11 24.11 43.48 32.93 42.04

1) The proportion in the population over £40,

p =

2) The population mean,

, = £32.92

3) The population standard deviation, s (from xn-1) = £7.12, = £7.12

Interval Estimate (Confidence interval)

Often it is more useful to quote two limits between which the parameter is expected to lie, together with the probability of it lying in that range.

The limits are called the confidence limits and the interval between them the confidence interval.

e.g. We are 95% confident that the mean male height lies between 5' 9" and 5' 11".

The width of the confidence interval depends on three sensible factors:

the degree of confidence we wish to have in it,the chance of it including the 'truth', e.g. 95%;

the size of the sample, n;

the amount of variation among the members of the sample, i.e. its standard deviation, s.

Confidence interval for the population mean () where is unknown (Usual case)

If the sample size is small, (n < 30), and the population standard deviation is unknown, then the t-tables are used.

These give a wider interval and so compensates for the probable error in estimating the value of the population standard deviation from the sample standard deviation.

(If the sample size is large either table gives a similar result.)

Confidence Interval: where t is from the t-table with (n-1) degrees of freedom

Page 25: sampling

In statistics, a confidence interval (CI) is a particular kind of interval estimate of a population parameter. Instead of estimating the parameter by a single value, an interval likely to include the parameter is given. Thus, confidence intervals are used to indicate the reliability of an estimate. How likely the interval is to contain the parameter is determined by the confidence level or confidence coefficient. Increasing the desired confidence level will widen the confidence interval.

A confidence interval is always qualified by a particular confidence level, usually expressed as a percentage; thus one speaks of a "95% confidence interval". The end points of the confidence interval are referred to as confidence limits. For a given estimation procedure in a given situation, the higher the confidence level, the wider the confidence interval will be.

The calculation of a confidence interval generally requires assumptions about the nature of the estimation process – it is primarily a parametric method – for example, it may depend on an assumption that the distribution of the population from which the sample came is normal. As such, confidence intervals as discussed below are not robust statistics, though modifications can be made to add robustness – see robust confidence intervals.

The purpose of sampling is to draw inferences about a population parameter on the basis of sample information.

Point estimators

A sample mean derived from a process of random sampling provides a good estimator of the population mean in the sense that it is one that is near to the true population mean. A single sample mean may also be regarded as a good estimator as it provides an unbiased estimate of the population mean. The probability

of a sample mean selected at random exceeding by certain amounts is exactly equal to the

probability of it being below by the same amounts.

We can say that µ=X where the hat (^) on µ indicates that it is an estimate of µ, the unknown population parameter. Thus the sample mean X may be used as an estimator-an unbiased estimator- of the population mean, µ.

Since the value of the estimator, X, computed from a single sample is a single value, it is referred to as a point estimate of the unknown population mean because it represents a single point on the scale of possible values.

Interval estimators

in statistics, the evaluation of a parameter—for example, the mean (average)—of a population by computing an interval, or range of values, within which the parameter is most likely to be located. Intervals are commonly chosen such that the parameter falls within with a 95 or 99 percent probability, called the confidence coefficient. Hence, the intervals are called confidence intervals; the end points of such an interval are called upper and lower confidence limits.

Page 26: sampling

The interval containing a population parameter is established by calculating that statistic from values measured on a random sample taken from the population and by applying the knowledge (derived from probability theory) of the fidelity with which the properties of a sample represent those of the entire population.

The probability tells what percentage of the time the assignment of the interval will be correct but not what the chances are that it is true for any given sample. Of the intervals computed from many samples, a certain percentage will contain the true value of the parameter being sought

For example,

Suppose we want to estimate the mean summer income of a class of business students.

For n=25 students,

is calculated to be 400 $/week.(Point estimate)

An alternative statement is:

The mean income is between 380 and 420 $/week.(Interval estimate)

Qualities of Estimators

Qualities desirable in estimators include unbiasedness, consistency, and relative efficiency:

• An unbiased estimator of a population parameter is an estimator whose expected value is equal to that parameter.

• An unbiased estimator is said to be consistent if the difference between the estimator and the parameter grows smaller as the sample size grows larger.

• If there are two unbiased estimators of a parameter, the one whose variance is smaller is said to be relatively efficient.

Unbiasedness

An unbiased estimator of a population parameter is an estimator whose expected value is equal to that parameter.

E.g. the sample mean is an unbiased estimator of the population mean , since:

E( )=

Consistency

An unbiased estimator is said to be consistent if the difference between the estimator and the parameter grows smaller as the sample size grows larger.

Page 27: sampling

E.g. is a consistent estimator of because:

V( ) is

That is, as n grows larger, the variance of X grows smaller.

Efficiency

If there are two unbiased estimators of a parameter, the one whose variance is smaller is said to be relatively efficient.

E.g. both the the sample median and sample mean are unbiased estimators of the population mean, however, the sample median has a greater variance than the sample mean, so we choose

since it is relatively efficient when compared to the sample median.

Four commonly used confidence levels are

Standard error of the mean

As sample size increases, the sample means cluster more and more around the true population mean. Thus the variance and standard deviation of the sampling distribution decline as sample size is increased. This standard deviation is formally referred to as the standard error of the mean

σx

Where σ is the population standard deviation and n is the sample size.

The standard error declines as the sample size is increased, not proportionately- it declines according to √n, not n.

Page 28: sampling

Importance of confidence interval estimators for parameters.

In statistics, a confidence interval (CI) is a particular kind of interval estimate of a population parameter. Instead of estimating the parameter by a single value, an interval likely to include the parameter is given. Thus, confidence intervals are used to indicate the reliability of an estimate. How likely the interval is to contain the parameter is determined by the confidence level or confidence coefficient. Increasing the desired confidence level will widen the confidence interval.

A confidence interval is always qualified by a particular confidence level, usually expressed as a percentage; thus one speaks of a "95% confidence interval". The end points of the confidence interval are referred to as confidence limits. For a given estimation procedure in a given situation, the higher the confidence level, the wider the confidence interval will be.

The calculation of a confidence interval generally requires assumptions about the nature of the estimation process – it is primarily a parametric method – for example, it may depend on an assumption that the distribution of the population from which the sample came is normal. As such, confidence intervals as discussed below are not robust statistics, though modifications can be made to add robustness – see robust confidence intervals.

REFERENCE

http://writing.colostate.edu/guides/research/survey/com2d4.cfm

http://davidmlane.com/hyperstat/A12977.html

http://www.britannica.com/EBchecked/topic/466339/point-estimation

http://onlinestatbook.com/chapter8/mean.html

Division of Instructional Innovation and Assessment, The University of Texas at Austin. “Guidelines for Maximizing Response Rates.” Instructional Assessment Resources. 2007. http://www.utexas.edu/academic/diia/assessment/iar/teaching/gather/method/survey-Response.php

Page 29: sampling

http://en.wikipedia.org/wiki/Sampling_(statistics)

http://www.gap-system.org/~history/Extras/Cochran_sampling_intro.html

http://www.marketresearchworld.net/index.php?option=com_content&task=view&id=23&Itemid=1&limit=1&limitstart=1

http://www.socialresearchmethods.net/kb/sampterm.php

http://www.socialresearchmethods.net/tutorial/Mugo/tutorial.h