21

Click here to load reader

Types of Sampling Design

Embed Size (px)

DESCRIPTION

mn

Citation preview

Page 1: Types of Sampling Design

Types of sampling design

The sampling design is based on the techniques to select any object from a set of related objects. Surveys of woodfuel consumption, supply and provision are basically conducted by means of sampling techniques. This means that by studying a small group (sample) selected at random one obtains information on variables of interest to a larger group (universe6), thus permitting inferences as to the behaviour of these variables within the universe. This procedure is adopted because surveying an entire universe (unless very small) entails high costs.

3.1 UNIVERSE

The universe must be defined in the light of the objectives of the survey. It can be expressed in geographical terms (locality, municipality, district, province, country or some intermediate category) or in sectoral terms (urban population, pottery manufacturers, fuelwood producers). It is also necessary to place time limits on the definition of the universe, because its composition and characteristics can change over time. It is recommended that the universe be given spatial limits that coincide with standard or official groupings (political, administrative, natural, etc.) in common use in countries, so that its dimensions can be estimated from information already available.

The universe is given a preliminary definition at the start of the methodological design of the survey. It will subsequently be refined once its size and spatial and temporal distribution are known by reviewing existing information. The redefinition may mean extending or reducing the universe. An extension may be called for when it is realized that an area exists with sizeable woodfuel use or where there is real or potential supply. Causes for universe reduction might be that the scarcity of information on supply and demand in a certain area is such that its inclusion in the survey would introduce greater error than its elimination; or the realization that a given locality or area does not form part of the universe because without major users.

3.2 SAMPLING FRAME

Once the universe has been defined, information that is a precise as possible has to be sought on its dimensions and spatial and temporal distribution in order to construct the sampling frame, this being the basis on which to develop the sampling design. The sampling frame is the information that locates and defines the dimensions of the universe and may consist of housing censuses and maps grouped by locality, district, quarter, etc.; maps of forest cover with types of vegetation or land use; or housing lists in small localities. Constructing the sampling frame is described in the sections on General Variables – Supply, Demand and Provision (Chapter 2).

3.3 SAMPLING UNIT

A basic concept in sampling theory is the sampling unit, which is the minimum unit of observation for information on the operative variables. The sampling unit must be clearly defined for constructing the sampling frame. By convention in statistics, a capital “N” is used to refer to the number of sampling units making up the universe, and a lowercase “n” for the number of sampling units in the sample itself. The sampling unit best suited for the respective

Page 2: Types of Sampling Design

sectors is shown in Table 3.1. Other sampling units can be defined as suggested by the objective of the survey.

Table 3.1: Sampling unit for thematic group and sector or branch under examination

Group Sector/branch Sampling unitDemand Residential

- urban- rural

Home

Industrial EstablishmentCommercialInstitutional

Supply Direct PlotIndirect Establishment

Provision Producers Individual producers,

companiesTransport operatorsCommercial suppliers

Once the universe and sampling unit have been defined, and once the sampling frame is ready, the sample design comprises two major stages: definition of type of sampling and determination of sample size.

3.4 TYPES OF SAMPLING

There are different types of sampling, but all are based on the principle of randomness. In order to be able to make valid inferences from a sample for transposition to the universe, the sample must be representative of the universe; and this is achieved by its randomness and adequate size.

Page 3: Types of Sampling Design

The basis for statistical inference, then, is randomness. This means that all the elements making up the universe have the same chance of being selected to form the sample. If the selection is not random, there is a serious risk that the findings will not be representative of the whole population, but of a section only. This is referred to as bias. An example of bias due to non-random selection in an inventory of wood resources occurs when the plots selected are those in the vicinity of access roads, which are likely to be more heavily visited and have smaller stocks of wood. Extrapolating the results of this non-random sample to the universe would lead to an underestimation of stocks.

Sample size will depend on the variability of the phenomenon under study, the level of confidence set and acceptable error. One common mistake is to think that for a sample to be representative of a universe, it must be directly proportional in size to that of the universe, in other words, the larger the universe, the larger the sample. This is not true and details on how to arrive at the required sample size are given later.

3.4.1 Simple random sampling

This consists in selecting randomly “n” sample units (SU) in the universe, in a way that gives us all the SUs the same opportunity of being selected.

A subset of a statistical population in which each member of the subset has an equal probability of being chosen. A simple random sample is meant to be an unbiased representation of a group. An example of a simple random sample would be a group of 25 employees chosen out of a hat from a company of 250 employees. In this case, the population is all 250 employees, and the sample is random because each employee has an equal chance of being chosen.The Steps involved in SRS are as follow:

Assign a single number to each element in the sampling frame. Use random numbers to select elements into the sample until the desired number of

cases is obtained. The method is not very different from winning a lottery.

Page 4: Types of Sampling Design

Each SU is assigned a number and the sample is selected randomly from tables of random numbers, calculators, lots, etc. This technique can only be used when there is a complete sampling frame that includes all the sampling units and where these can be readily recognisable and identifiable in the field, for example a telephone directory or a list of homes identified by street and number or the name of the occupant. When constructing a sample of natural resources, it is usually difficult to identify or locate the selected plots accurately, as this requires a detailed map and instruments for precise geographical location.

When simple random sampling must be used:

• When it is known that the variable of greatest interest is randomly distributed within the universe

• With small universes (not more than 200 SUs)

• With universes with little geographical dispersion

• When the pattern of distribution for the variable under study is not known

3.4.2 Stratified random sampling

Stratified random sampling is used when the whole universe of size N is broken down into relatively homogeneous strata for the variable under study. This is advisable provided the variation between strata is greater than the variation within each stratum.

Regarding the selection of sampling units and estimation of parameters, each stratum is treated independently, as if it were a universe on its own. Within each stratum, the sampling units can be selected at random, by clusters or systematically.

Page 5: Types of Sampling Design

Stratified sampling makes it possible to improve the precision of estimates with reduced sampling effort, to characterize each stratum separately and to facilitate field work.

It is most important to realize that the sampling units should belong to only one stratum, that the strata should be recognizable by people outside the survey group, and that the actual size of the stratum should be known. It is not advisable to form a large number of strata, because this would unnecessarily complicate field surveying and data analysis.

When it comes to deciding on a stratified sample there are general criteria that one can apply. In the group on woodfuel demand, the advisability of stratification is defined in the first instance by the patterns of saturation and consumption. In the direct supply group, stratification is done by source and type of land cover or use. For the indirect supply group and providers, producers, transport operators and traders, volume of production or sales is used. Since these are variables that need to be known before the survey takes place, the relevant data can be obtained from secondary sources or from indicator variables, as described in Chapter 2.

When stratified sampling should be used:

• It is recommended for universes where it is supposed or known that distribution of the key variable(s) differs between readily identifiable sub-universes;

• Because of its low sampling efficiency, it is not recommended for small universes with fewer than 200 sampling units and variables showing normal distribution.

3.4.3 Sampling by clusters

Cluster sampling is a sampling technique used when "natural" groupings are evident in a  statistical population. It is often used in marketing research. In this technique, the total population is divided into these groups (or clusters) and a sample of the groups is selected. Then

Page 6: Types of Sampling Design

the required information is collected from the elements within each selected group. This may be done for every element in these groups or a subsample of elements may be selected within each of these groups. A common motivation for cluster sampling is to reduce the average cost per interview. Given a fixed budget, this can allow an increased sample size. Assuming a fixed sample size, the technique given more accurate results when most of the variation in the population is within the groups, not between them.

Cluster elements

Elements within a cluster should ideally be as heterogeneous as possible, but there should be homogeneity between cluster means. Each cluster should be a small scale representation of the total population. The clusters should be mutually exclusive and collectively exhaustive. A random sampling technique is then used on any relevant clusters to choose which clusters to include in the study. In single-stage cluster sampling, all the elements from each of the selected clusters are used. In two-stage cluster sampling, a random sampling technique is applied to the elements from each of the selected clusters.

The main difference between cluster sampling and stratified sampling is that in cluster sampling the cluster is treated as the sampling unit so analysis is done on a population of clusters (at least in the first stage). In stratified sampling, the analysis is done on elements within strata. In stratified sampling, a random sample is drawn from each of the strata, whereas in cluster sampling only the selected clusters are studied. The main objective of cluster sampling is to reduce costs by increasing sampling efficiency. This contrasts with stratified sampling where the main objective is to increase precision.

There also exists multistage sampling, where more than two steps are taken in selecting clusters from clusters.

Aspects of cluster sampling

One version of cluster sampling is area sampling or geographical cluster sampling. Clusters consist of geographical areas. Because a geographically dispersed population can be expensive to survey, greater economy than simple random sampling can be achieved by treating several respondents within a local area as a cluster. It is usually necessary to increase the total sample size to achieve equivalent precision in the estimators, but cost savings may make that feasible.

In some situations cluster analysis is only appropriate when the clusters are approximately the same size. This can be achieved by combining clusters. If this is not possible, probability proportionate to size sampling is used. In this method, the probability of selecting any cluster varies with the size of the cluster, giving larger clusters a greater probability of selection and smaller clusters a lower probability. However, if clusters are selected with probability proportionate to size, the same number of interviews should be carried out in each sampled cluster so that each unit sampled has the same probability of selection.

Clusters are spatially compact groups of sampling units. They are selected randomly and within each cluster all the sampling units are studied or subjected to further sampling. When sampling

Page 7: Types of Sampling Design

by clusters should be used: when there is considerable difficulty in reaching every sampling unit in the universe, because of wide dispersion or physical barriers to access.

3.4.4 Systematic sampling selectionA type of probability sampling method in which sample members from a larger population are selected according to a random starting point and a fixed, periodic interval. This interval, called the sampling interval, is calculated by dividing the population size by the desired sample size. Despite the sample population being selected in advance, systematic sampling is still thought of as being random, provided the periodic interval is determined beforehand and the starting point is random.• Steps:

Calculate the sampling interval as the ratio between population size and sample size, I = N/n.

Arrange all elements in the population in an order. Select a case in the first interval randomly. Select every ith case from this point. Systematic Sampling is easier and simpler than SRS

This is not strictly speaking a type of sampling and is best considered as a frame for regular sample selection.

The first sampling unit is chosen at random, while the remainder are selected at regular intervals of unit, distance or time. Its theoretical limitation lies in the fact that only the first number is chosen at random and the remainder do not have the same probability of being included in the sample. Its advantage is that it facilitates location of the sampling units in areas of difficult access and permits visits to sampling units that are not included in the sampling frame.

When systematic selection should be used:

Page 8: Types of Sampling Design

• Whenever it is not possible to identify every sampling unit within the sample frame, e.g. in large towns where lists of homes are not kept.

• When access to sampling units is difficult because of distance, lack of roads or difficult terrain, e.g. in forest inventories.

Combining several types of sampling

It is possible to combine different types of sampling within the same survey, depending on the characteristics of the sectors or branches concerned and the degree of acceptable trade-off between precision and cost of the exercise. For example, in a residential sector one may opt for a two-stage stratified sample in clusters, whereas for a small homogeneous and compact industrial branch, simple random sampling may be preferred.

3.5 SAMPLE SIZE

Sample size must be determined independently for each universe, according to three factors: the variability of the most important numerical variable, the level of confidence required and the acceptable level of error. This is summarized by the following formula:7

no = (s2 . t2 , v)/ e2   (1) in terms of variance and absolute error

or

no = (cv2 . t2 , v)/ e2  in terms of variation coefficient and relative error

where:

no = size of sample

s2 = variance of the sample

t2 , v = critical value of Student’s ‘t’ test with significance level   and  v degrees of freedom

e = acceptable error

cv = variation coefficient = standard deviation of the sample / sample mean

v = degrees of freedom= n – 1

Variance ( s 2 )  and variation coefficient (VC) indicate the degree of homogeneity of the variable under consideration in the sample. These are calculated - manually by calculator or with Excel – with the data from a preliminary sample or earlier survey.

Acceptable error ( e )  refers to the allowable difference between sample mean and mean of the universe. It is set in accordance with previous knowledge of the phenomenon under study, and

Page 9: Types of Sampling Design

it is advisable to keep it within 10-20% - which can also be expressed in absolute values with the units of measurement of the variable in question.

The critical value of   t   is obtained from tables in statistics books or from Excel, selecting first the level of significance ( ) or its complement, the level of confidence (1-  ). A level of confidence of 0.95, which is equivalent to a = 0.05 is enough for surveys of this kind. In addition, in order to define the degrees of freedom (v = n-1), a first assessment of the number (n) of cases in the sample is needed. These two values are the entry data for the tables. Subsequently, the sample size is specified by means of an iterative process, where the value of ‘n’ is obtained using Formula (1) to determine the value of ‘t’.

This formula shows that the number of elements making up the sample is directly proportional to the variance and value of t2, and inversely proportional to the square of the error. The sample size will be large when: (a) the element under study is highly variable (high variance or variation coefficient); (b) the level of confidence sought is high; and/or (c) the acceptable error is low. Conversely, the sample size will be small if the phenomenon shows little variance, a low level of confidence is set, and a high level of error is accepted.

From this it is clear that the size of a sample does not depend on the size of the universe. Thus, starting with an equal level of confidence and error acceptance in a tropical rainforest covering the same surface area as a temperate pine forest, the sample size will be larger for the rainforest because of its greater heterogeneity in the wood stock variable in relation to the pine forest.

So far no consideration has been given to the size of the universe in determining sample size.

Nevertheless, for a small universe (fewer than 120 sampling units), it is necessary to correct the value of no obtained from Formula (1), by using Formula (2)8:

n = no/(1 + no/N) (2)

where:

no = sample size obtained from Formula (1)

N = size of universe

n = definitive size of sample

Annex III gives the calculated sample size for the estimation of fuelwood consumption in a residential sector for varying universe size and error margin and corrected for finite population. It applies for the variable “specific fuelwood consumption”, where, due to the abundance of case studies, the variation coefficient is known.

Variables to be used in calculating sample size

Page 10: Types of Sampling Design

• To define sample size of any sector or branch of woodfuel demand, it is best to use the unit consumption variable.

• In the industrial, commercial and institutional sectors it is not always possible to find data on unit consumption, but one can use the volume of production per unit time, which is closely correlated with unit consumption.

• In the case of direct supply (from forest, plantation, etc.) the important variables may be stock or productivity, but the first is recommended as there is more secondary information and it is easier to measure in a preliminary sample. If there are no data on stock, basal area data (G) may be used.

• In sectors or branches of indirect supply (sawmills, carpentry workshops, etc.), volume of production per unit time must be used.

• For provision sectors: in the case of producers, it is best to use volume of woodfuel production; traders, volume of sales; and transport operators, transport capacity, all expressed per unit time.

The final decision on the size of the sample will depend on the agreed trade-off between desired accuracy and availability of monetary, human and time resources for conducting the field survey. It is recommended that sectors or branches having greater importance in woodfuel demand, supply and provision be given priority in the allocation of resources for field surveys so that estimations can be more accurate. In situations where it is not possible to realize the sample size determined by statistical calculation, it is essential to survey at least ten sample units per sector, branch or stratum, and to indicate the error in estimation, finding the e value of Formula (1).

6 In statistics “universe” is also referred to as “population”.7 Formula used to determine the sample size needed to estimate the population mean; for hypothesis testing for differences between means and variances other formulas are available. Useful statistics reference books include Zar 1999, Cochran 1977, and Steel and Torrie 1988.8 Also termed “correction for finite population”.

Non probability sample

A quota sample a type of  non-probability sample in which the researcher selects people according to some fixed quota. That is, units are selected into a sample on the basis of pre-specified characteristics so that the total sample has the same distribution of characteristics assumed to exist in the population being studied. For example, if you are a researcher conducting a national quota sample, you might need to know what proportion of the population is male and what proportion is female as well as what proportions of each gender fall into different age categories, race or ethnic categories, educational categories, etc. The researcher would then collect a sample with the same proportions as the national population.Nonprobability Sampling

Page 11: Types of Sampling Design

The difference between nonprobability and probability sampling is that nonprobability sampling does not involve random selection and probability sampling does. Does that mean that nonprobability samples aren't representative of the population? Not necessarily. But it does mean that nonprobability samples cannot depend upon the rationale of probability theory. At least with a probabilistic sample, we know the odds or probability that we have represented the population well. We are able to estimate confidence intervals for the statistic. With nonprobability samples, we may or may not represent the population well, and it will often be hard for us to know how well we've done so. In general, researchers prefer probabilistic or random sampling methods over non probabilistic ones, and consider them to be more accurate and rigorous. However, in applied social research there may be circumstances where it is not feasible, practical or theoretically sensible to do random sampling. Here, we consider a wide range of non-probabilistic alternatives.

We can divide nonprobability sampling methods into two broad types: accidental or purposive. Most sampling methods are purposive in nature because we usually approach the sampling problem with a specific plan in mind. The most important distinctions among these types of sampling methods are the ones between the different types of purposive sampling approaches.

Accidental, Haphazard or Convenience Sampling

One of the most common methods of sampling goes under the various titles listed here. I would include in this category the traditional "man on the street" (of course, now it's probably the "person on the street") interviews conducted frequently by television news programs to get a quick (although nonrepresentative) reading of public opinion. I would also argue that the typical use of college students in much psychological research is primarily a matter of convenience. (You don't really believe that psychologists use college students because they believe they're representative of the population at large, do you?). In clinical practice,we might use clients who are available to us as our sample. In many research contexts, we sample simply by asking for volunteers. Clearly, the problem with all of these types of samples is that we have no evidence that they are representative of the populations we're interested in generalizing to -- and in many cases we would clearly suspect that they are not.

Purposive Sampling

In purposive sampling, we sample with a purpose in mind. We usually would have one or more specific predefined groups we are seeking. For instance, have you ever run into people in a mall or on the street who are carrying a clipboard and who are stopping various people and asking if they could interview them? Most likely they are conducting a purposive sample (and most likely they are engaged in market research). They might be looking for Caucasian females between 30-40 years old. They size up the people passing by and anyone who looks to be in that category they stop to ask if they will participate. One of the first things they're likely to do is verify that the respondent does in fact meet the criteria for being in the sample. Purposive sampling can be

Page 12: Types of Sampling Design

very useful for situations where you need to reach a targeted sample quickly and where sampling for proportionality is not the primary concern. With a purposive sample, you are likely to get the opinions of your target population, but you are also likely to overweight subgroups in your population that are more readily accessible.

All of the methods that follow can be considered subcategories of purposive sampling methods. We might sample for specific groups or types of people as in modal instance, expert, or quota sampling. We might sample for diversity as in heterogeneity sampling. Or, we might capitalize on informal social networks to identify specific respondents who are hard to locate otherwise, as in snowball sampling. In all of these methods we know what we want -- we are sampling with a purpose.

Modal Instance Sampling

In statistics, the mode is the most frequently occurring value in a distribution. In sampling, when we do a modal instance sample, we are sampling the most frequent case, or the "typical" case. In a lot of informal public opinion polls, for instance, they interview a "typical" voter. There are a number of problems with this sampling approach. First, how do we know what the "typical" or "modal" case is? We could say that the modal voter is a person who is of average age, educational level, and income in the population. But, it's not clear that using the averages of these is the fairest (consider the skewed distribution of income, for instance). And, how do you know that those three variables -- age, education, income -- are the only or even the most relevant for classifying the typical voter? What if religion or ethnicity is an important discriminator? Clearly, modal instance sampling is only sensible for informal sampling contexts.

Expert Sampling

Expert sampling involves the assembling of a sample of persons with known or demonstrable experience and expertise in some area. Often, we convene such a sample under the auspices of a "panel of experts." There are actually two reasons you might do expert sampling. First, because it would be the best way to elicit the views of persons who have specific expertise. In this case, expert sampling is essentially just a specific subcase of purposive sampling. But the other reason you might use expert sampling is to provide evidence for the validity of another sampling approach you've chosen. For instance, let's say you do modal instance sampling and are concerned that the criteria you used for defining the modal instance are subject to criticism. You might convene an expert panel consisting of persons with acknowledged experience and insight into that field or topic and ask them to examine your modal definitions and comment on their appropriateness and validity. The advantage of doing this is that you aren't out on your own trying to defend your decisions -- you have some acknowledged experts to back you. The disadvantage is that even the experts can be, and often are, wrong.

Quota Sampling

Page 13: Types of Sampling Design

In quota sampling, you select people non randomly according to some fixed quota. There are two types of quota sampling : proportional and non-proportional. In proportional quota sampling you want to represent the major characteristics of the population by sampling a proportional amount of each. For instance, if you know the population has 40% women and 60% men, and that you want a total sample size of 100, you will continue sampling until you get those percentages and then you will stop. So, if you've already got the 40 women for your sample, but not the sixty men, you will continue to sample men but even if legitimate women respondents come along, you will not sample them because you have already "met your quota." The problem here (as in much purposive sampling) is that you have to decide the specific characteristics on which you will base the quota. Will it be by gender, age, education race, religion, etc.?

Nonproportional quota sampling is a bit less restrictive. In this method, you specify the minimum number of sampled units you want in each category. here, you're not concerned with having numbers that match the proportions in the population. Instead, you simply want to have enough to assure that you will be able to talk about even small groups in the population. This method is the nonprobabilistic analogue of stratified random sampling in that it is typically used to assure that smaller groups are adequately represented in your sample.

Heterogeneity Sampling

We sample for heterogeneity when we want to include all opinions or views, and we aren't concerned about representing these views proportionately. Another term for this is sampling fordiversity. In many brainstorming or nominal group processes (including concept mapping), we would use some form of heterogeneity sampling because our primary interest is in getting broad spectrum of ideas, not identifying the "average" or "modal instance" ones. In effect, what we would like to be sampling is not people, but ideas. We imagine that there is a universe of all possible ideas relevant to some topic and that we want to sample this population, not the population of people who have the ideas. Clearly, in order to get all of the ideas, and especially the "outlier" or unusual ones, we have to include a broad and diverse range of participants. Heterogeneity sampling is, in this sense, almost the opposite of modal instance sampling.

Snowball Sampling

In snowball sampling, you begin by identifying someone who meets the criteria for inclusion in your study. You then ask them to recommend others who they may know who also meet the criteria. Although this method would hardly lead to representative samples, there are times when it may be the best method available. Snowball sampling is especially useful when you are trying to reach populations that are inaccessible or hard to find. For instance, if you are studying the homeless, you are not likely to be able to find good lists of homeless people within a specific geographical area. However, if you go to that area and identify one or two, you may find that they know very well who the other homeless people in their vicinity are and how you can find them.

Page 14: Types of Sampling Design

Researchers use this sampling method if the sample for the study is very rare or is limited to a very small subgroup of the population. This type of sampling technique works like chain referral. After observing the initial subject, the researcher asks for assistance from the subject to help identify people with a similar trait of interest.The process of snow ball sampling is much like asking your subjects to nominate another person with the same trait as your next subject. The researcher then observes the nominated subjects and continues in the same way until the obtaining sufficient number of subjects.For example, if obtaining subjects for a study that wants to observe a rare disease, the researcher may opt to use snowball sampling since it will be difficult to obtain subjects. It is also possible that the patients with the same disease have a support group; being able to observe one of the members as your initial subject will then lead you to more subjects for the study.

Types of Snowball Sampling

Linear Snowball Sampling

Exponential Non-Discriminative Snowball Sampling

Exponential Discriminative Snowball Sampling

Advantages of Snowball Sampling

The chain referral process allows the researcher to reach populations that are difficult to sample when using other sampling methods.

The process is cheap, simple and cost-efficient.

This sampling technique needs little planning and fewer workforce compared to other  sampling techniques.

Disadvantages of Snowball Sampling

Page 15: Types of Sampling Design

The researcher has little control over the sampling method. The subjects that the researcher can obtain rely mainly on the previous subjects that were observed.

Representativeness of the sample is not guaranteed. The researcher has no idea of the true distribution of the population and of the sample.

Sampling bias is also a fear of researchers when using this sampling technique. Initial subjects tend to nominate people that they know well. Because of this, it is highly possible that the subjects share the same traits and characteristics, thus, it is possible that the sample that the researcher will obtain is only a small subgroup of the entire population.