Upload
chandni-patel
View
217
Download
0
Embed Size (px)
Citation preview
8/2/2019 Chapter 15 Sampling.pdf1
1/12
Prepared by: Devang Kale 1
Chapter 15
SamplingAbout the chapter:
We will study this chapter into two parts.
1. In first part we will try to know some terminologies which are useful to understand the
concept of the sampling.
2. In second part we will study various sampling techniques mainly into two parts
1) Probability sampling and 2) Non probability sampling.
Type of Questions:In exam short questions about various terminologies can be asked or else long questions
regarding sampling design may be asked. Students are supposed to put more weight on
Probability Sampling Scheme.
8/2/2019 Chapter 15 Sampling.pdf1
2/12
Prepared by: Devang Kale 2
Various terminologies which are important to understand the concepts of the Sampling
and Sampling Design.
Population: It is a collection of People, Households or Objects which we want to study for
research purpose.
For eg. If one wants to study the overall satisfaction level of the students of Parul Institute
then all the students of Parul Institute forms the population for that research.
It is always necessary to decide about target population before starting actual datacollection process. Population size helps us to decide about sample size and population
type gives us idea about sampling design.
For eg. If one wants to start restaurant near by Parul Campus and he/she wants to make
survey to know about potential customers then it is necessary to decide that population of
this survey is employees of the Parul Institute or Students of the Parul Institute or both.
Population Element: An individual participant or object on which the measurement is
taken are called population element.
For eg. In above mention research individual student is population element as we are
collecting data from a particular student. Population element may be sometime an object or
household depending upon the type of study.
Sample: It is a small part of the population, analyzing which we can draw conclusion about
the whole population.
For eg. In Parul Institute if 15000 Students are studying and to know the satisfaction level
of the student if we are randomly selecting 500 students and collect data from these 500
student then these 500 students are called sample.
Sample Element: Individual or Household or Object which is selected in a sample is called
sample element.For eg. In the above situation a particular student selected in the sample of 500 is called
sample element.
Census: If data is collected from all the available elements of the population and
conclusion is drawn on this basis is called census method.
For eg. Indian Population Census where data is collected from each and every individual
who is available in India.
Census method is always very costly and time consuming, because of this Indian
Population Census is published after every 10years.
Sampling: Here Small sample is drawn from the population and characteristics of the all
sample elements are studied and conclusion is drawn for the entire population. This
process is also called inference.
Sampling Frame: Specific list of all the Population Unit is called sample frame.
For eg. Telephone Directory, Students Register etc. In the above example specific list of all
the students with their roll numbers, divisions and address is called sampling frame.
8/2/2019 Chapter 15 Sampling.pdf1
3/12
Prepared by: Devang Kale 3
Advantages of Sample:Following are the advantages of sample over census (Complete evaluation of population).
1. Lower Cost:
Sampling helps to reduce cost of research to a great extent. In census method we are
suppose to evaluate each and every unit of the population and opposite to that in sampling
we are evaluating a very small portion of the population.2. Greater Accuracy of Results:
By research it is proved that 90% of the total survey error in one study is from non
sampling sources and only 10% or less was from random sampling error. Thus we can say
sampling can give us grater accuracy of results.
3. Greater Speed of Data Collection:
Obviously speed of data collection and analysis of data is very high.
4. Availability of Population Elements:
Some experiments are destructive where only one option left that is sampling. For example
in quality control experiments to check quality of the product we must have to break the
unit in such situation we have to go for sampling.
Characteristics of a Good Sample:Goodness of sample can be measured on the basis of two components 1.Accuracy and
2.Precision.
1. Accuracy: It is a degree to which bias is absent from the sample. In a proper sample
some observations are less than the actual no. and some observations are grater than the
actual no. Thus less and grater observations are nullifying the effect of each other.
For example suppose average price of the land in Vadodara City is 4000/Sq. feet but
suppose researcher has selected all the sample elements from rich area then definitelyaverage of the sample elements will be higher than the actual average.
Thus accurate sample is one in which underestimates offset the overestimates. High
amount of Systematic Variance is responsible for the low accuracy of sample. Where
Systematic Variance is because of some known or unknown sources. By increasing the
sample size we can increase the accuracy of sample.
For example suppose researcher wants to estimate average price of the household in a
particular area. Due to wrong selection of sampling method he is selecting last house of the
society every time and note down its price. We know in most of the cases last house in a
raw is having more area as compare to the other houses and thus there is a chance to
overestimate the price of household. Thus in this case source of Systematic Variance iswrong sampling method.
2. Precision: It means how closely sample represents population. It is a second criterion of
a good sample design. It is obvious that no sample represents its population exactly. After
considering all the source of systematic variations whatever variation is left that is because
of unsystematic sources of variation which is known as Sampling Error (or Random
Sampling Error). Sampling Error is inherent source of variation which can not be removed
8/2/2019 Chapter 15 Sampling.pdf1
4/12
Prepared by: Devang Kale 4
even after increasing sample size. For example if we consider above mentioned research
problem i.e. to estimate average price of the household in a particular area. If respondents
are giving some wrong figures which are slightly higher or lower then this may be reason
for low precision which is not possible to remove completely from the analysis.
Types of Sample Design:Here we are going to study mainly two types of sampling design. 1. Probability Sampling
Design and 2. Non Probability Sampling Design.
1. Probability Sampling Design: It is a systematic approach of selecting sample. Where
each element of the population is having pre assigned probability of getting selected in a
sample. Data obtain through probability sampling design can be used to generalize result
for the entire population. Which is the most important use of probability sampling design?
In probability sampling whole process is systematic. Where sample size is also decided on
the basis of some formula and selection of a sample is also done following specific
process. Following are the probability sampling design.
1. Simple Random Sampling: (with replacement and without replacement)
2. Systematic Sampling3. Stratified Sampling
4. Cluster Sampling
5. Double Sampling or Multiple Sampling
2. Non Probability Sampling Design: Where probability sampling is not possible to apply
we apply non probability sampling. Where population units are rare or difficult to arrange
in a specific manner we apply non probability sampling design. In non probability
sampling no probability is assign to the population unit and sample selection is based on
the convenience of the researcher. The biggest drawback of these designs is result obtain
through sample analysis can not be generalize for the entire population as there is a greatscope for biasness. Following are the non probability sampling design.
1. Convenience
2. Purposive
3. Judgment
4. Quota
5. Snowball
Steps for selecting Sampling Design:
To understand this topic we will take help of one example where research is necessarybefore taking actual decision.
Suppose a person wants to set one restaurant near by Parul Campus as he believes more
than 16000 people are visiting this campus in a day because of this there is a great potential
to set restaurant. He wants to confirm his belief for that he wants to carry out research. In
this context following steps of selecting sampling design are mentioned.
8/2/2019 Chapter 15 Sampling.pdf1
5/12
Prepared by: Devang Kale 5
Step 1: What is the target population?
Before deciding about sampling design it is necessary to decide target population as it
gives us idea about the probability or non probability sampling design. If target population
is reasonably high then we can go for probability sampling design. Also it is to be confirm
that whether we need to target individual or household or employee or a student etc.
In the above research problem researcher has to identify target population as all the 16000
people are not the target population. Because out of these 16000 students and staff
members many are local and manage their lunch on their own. Also students who are usingmess facility on regular basis are not their target customers. So to include local students
and employees or not in the population that researcher has to decide first.
Step 2: What are the parameters of interests?
This study is restricted to the three parameters of the population which are Mean (mu),
Standard Deviation (Sigma) and Population Proportion (P). Population parameters are
indicated using Greek letters and can be found using whole information contained in the
population units. If to get information from the each unit of population is not possible then
we can estimate population parameters using sample information which are identified as
sample statistic. Sample mean (x bar), sample standard deviation (s) and sample proportion(p) are the sample statistics.
In the above research problem on an average how many times a student of Parul Institute
takes lunch out side the campus and not use mass facility that may be parameter of interest.
Here we are talking about parameter mean. By taking sample of 300 to 500 students we
can estimate this number which will be called sample statistics.
Step 3: What is appropriate sampling method?
It is very necessary to decide whether we should apply probability or non probability
sampling. Selection of this is based on availability of budget, availability of population
units and requirement of accuracy and precision. If budget is more and population units areavailable then we may go for probability sampling and on the contrary if budget is small or
population units are not available then we may go for non probability sampling. If high
accuracy and precision is a need of the research then we must have to go for probability
sampling.
In the research problem that we have considered researcher may use stratified sampling
where stretas can be formed using various categories of students and staffs like local
students, hostelite students, students who are commuting from remote areas etc. He may
use Cluster sampling by considering each department as a separate cluster like
Management is one cluster, Diploma is second cluster etc.
Step 4: What is sampling frame?
Sampling frame help us to locate and select specific sample unit. Sampling frame is proper
list of all the population units which help us to select sample. In many cases we use
telephone directory as a sampling frame. If record of population units is available at some
other sources then we may use it. It is always difficult to find out sampling frame in
international research as geographic areas and housing patterns are varying from country to
country.
8/2/2019 Chapter 15 Sampling.pdf1
6/12
Prepared by: Devang Kale 6
Research problem that we have considered, sampling frame may be a proper list of all the
students containing their sections (i.e. sec.A, sec.B etc.), their department (i.e.
Management, Engineering etc.) and their roll numbers.
Step 5: What is the Sample Size? (i.e. n)In case of non probability sampling design sample size is either decided on the basis of rule
of thumb or on the basis of available budget. If available budget is high then sample size is
high and vice versa.
In case of probability sampling size of sample is decided on the basis of some other factorslike variation in the data and degree of precision required. To decide about sample size
some formulas are available which involves SD and mean. By putting values of SD and
Mean we can find sample size scientifically. Some principles that we can use for deciding
sample size.
Detail Note on Probability Sampling Design (Important from point of view of exam)
1. Simple Random Sampling: This is the simplest probability sampling design. Simple
Random Sampling is a probability sampling design in which each population unit is having
equal and independent chance of getting selected in a sample. Simple Random Sampling isdivided in to two parts 1. Simple Random Sampling With Replacementin this sampling
scheme selected population unit is replace back to the population so every time population
size remains same and 2. Simple Random Sampling Without Replacement- where selected
unit is not replace back to the population so every time when new sample unit is selected
sample size reduces by one unit.
Probability of Selection = (Sample Size/Population Size)
If out of 100 students 10 students are to be selected then probability of any students getting
selected in a sample is 10/100 = 0.1
Merits: 1. Simplest method of selecting sample.
2. If population units are randomize properly then gives good results.
Demerits: 1. To select sample of size n n random numbers need to be generate.
2. This method is applicable when population size is small and it is possible to
assign numbers to all the population units.
2. Systematic Sampling: This method involves mechanical procedure of selecting sample.
Where selection of sample is done on the basis of formula. Steps involve in this are as
follows.
Step 1: Assign numbers to all the population units.
Step 2: Find Skip Interval kk = Population Size / Sample Size
Step 3: Select only one random number from 1 to k.
Step 4: Draw sample by choosing every k th population unit.
For eg: Suppose we want to select sample of size 5 where population size is 100.
Skip Interval k = 100/5 = 20
So we will select any random number from 1 to 20 suppose it is 12 then first unit in the
sample will be 12.
8/2/2019 Chapter 15 Sampling.pdf1
7/12
Prepared by: Devang Kale 7
1st
unit122
ndunit(12+20) = 32
3rd
unit(32+20) = 52
4th
unit(52+20) = 72
5th
unit(72+20) = 92
This way sample of 5 will be selected out of 100 population units.
Merits: 1. Easy to apply as only one random number is to be selected and other sample
units are selected automatically.Demerits: 1. Possible to apply only when population size is small.
2. Great scope of biasness due to Cyclical or Seasonal factors. For eg. If we have
observations of Ice Cream consumption for last three years. Suppose these observations are
available on monthly basis i.e. January 2008, February 2008 .. December 2011. If weuse systematic sampling and if we have selected first month of off season then there is a
chance for biased answer as systematic method make use of skip interval. That means it
may skip months of April and May all the times.
3. Stratified Sampling: Most of the population can be segregated into several mutually
exclusive subpopulations or strata. Process in which sample units are selected from eachmutually exclusive stratum is called stratified sampling. Stratification is usually more
efficient statistically than simple random sampling.
Due to three reasons researcher chooses a stratified random sampling: 1. to increase
samples statistical efficiency, 2. to provide adequate data for analyzing the various
subpopulation or strata and 3. to apply different method and procedures to different strata.
Strata are mutually exclusive in nature that means population unit which is falling in first
strata cannot fall in second or subsequent strata. Within stratum population units are
homogenous and between stratum population units are heterogeneous.
To understand stratified sampling we will consider one example of buying behavior ofcustomers for some luxury product. Demand for luxury item is dependent on the buying
capacity of the customers, so that we have to divide customers in to three categories like
low income, average income and high income groups. Here each income groups are
forming one stratum. Suppose total population is of 100 customers and it is divided into
three strata as follows.
N = 100 = Total Population
S1 = Strata one S2 = Strata two S3 = Strata three
N1 = 20 N2 = 30 N3 = 50
High Income Average Income Low Income
Here N = 100 total population
S1 = Strata one N1 = Population Size of Strata one
S2 = Strata two N2 = Population size of strata two
8/2/2019 Chapter 15 Sampling.pdf1
8/12
Prepared by: Devang Kale 8
S3 = Strata three N3 = Population size of strata three
Here sample is selected from each strata and total sample size is denoted by n.
n1 = sample size of strata one
n2 = sample size of strata two
n3 = sample size of strata three
For selecting a sample there are two approaches 1. Proportionate and 2. Disproportionate
sampling. In proportionate stratified sampling sample is selected from each stratum in
proportion to the stratum share to the total population.
This means in the above situation if we want to draw a sample of size 10 (i.e. n=10) then
we will select 2 units from strata one, 3 units from strata two and 5 units from strata three
as strata 1,2 and 3 are contributing 20%, 30% and 50% to the total population respectively.
Similarly in case of disproportionate sampling we may select 3 units from strata one, 4
units from strata two and 3 units from strata five as we are not considering any proportion
in this case.
4. Cluster Sampling: In simple random sampling we select each unit individually where as
in cluster sampling we divide whole population into some non homogeneous groups andselect whole cluster in a sample and try to evaluate whole cluster for analysis purpose.
Continuing with the same example which we have considered for stratified sampling
suppose we are dealing with large population and not in a position to form homogeneous
strata. In such situation we can divide whole population into some non homogeneous
groups (Cluster) and select entire group in our sample. This group gives us chance to
evaluate all the types of population units.
1. Cluster sampling is economically more efficient than simple random sampling but
statistically it is less than simple random sampling. 2. Within Cluster population units are
non homogeneous and between cluster population units are mostly homogeneous. 3. If fewclusters are very small in size and few are very large than we can club small clusters into
one and can form large size cluster.
Generally to apply cluster sampling we use Area Sampling in area sampling we assign
numbers (1,2,3..) to various areas and random numbers are generated using randomnumber table. Whatever number occurs that area is selected in a sample and evaluated
fully. For eg. If it is not possible to divide people into various strata using their income
level than we can go by area sampling i.e. Panigate-1, waghodiya-2, Pratapnagar-3 and so
on. If number 2 is selected then entire Waghodiya area will be evaluated.
8/2/2019 Chapter 15 Sampling.pdf1
9/12
Prepared by: Devang Kale 9
N = Total Population = 10000
C1 = Cluster oneN1 = 2000 C2 = Cluster two C3 = Cluster three
N2 = 5000 N3 = 3000
Here N = 100 total population
C1 = Cluster one N1 = Population Size of Cluster one
C2 = Cluster two N2 = Population size of Cluster two
C3 = Cluster three N3 = Population size of Cluster three
If using random number table cluster two is selected then entire cluster of 5000 units will
be evaluated.
Points to be remember while applying cluster sampling:
1. When clusters are homogeneous this contributes to low statistical efficiency so we will
avoid use of cluster sampling in such situation or we can form new clusters which are
heterogeneous in nature.
2. Clusters may be of equal or unequal size. If they are of equal size then no problem but if
they are of unequal size then we may merge small clusters into large one or we may split
large cluster into small one.
3. There is no such method to decide about specific sample size. By experimenting with
different sample size and efficiency at different sample size we can come to the specificsample size decision.
4. Multistage Cluster design: If size of the cluster is very large then we have to use
multistage cluster design. Which means in the first stage we will select sample using
cluster method and in the second stage we may go for simple random sampling. For eg. If
by using cluster sampling method we have selected Panigate area in a cluster then it is not
possible to evaluate whole area as population of Panigate area may be around 50000-60000
people. Because of this reason we will go for simple random sampling in the second stage
and try to reduce sample size.
Difference between Stratified Sampling and Cluster Sampling:Stratified Sampling Cluster Sampling
1. Population Units are homogeneous
Within Strata.
1. Population units are heterogeneous
Within cluster.
2. Population Units are heterogeneous
Between strata.
2. Population units are homogeneous
Between clusters.
3. Few population units are selected
In a sample from each strata.
3. Whole cluster is selected in a sampl
And evaluated.
8/2/2019 Chapter 15 Sampling.pdf1
10/12
Prepared by: Devang Kale 10
4. Proportionate and disproportionate
Sampling are two method of selecting
Sample from each strata.
4. One stage cluster sampling and
Multistage cluster sampling are the
Method of selecting sample.
5. Double Sampling: to have more economic and efficient data analysis some times we use
more than one sampling schemes at a time. If two sampling designs are used in first and
second stage of sampling then it is called double sampling.
For eg. In the first stage we will select sample using cluster method and in the second stagewe may go for simple random sampling. For eg. If by using cluster sampling method we
have selected Panigate area in a cluster then it is not possible to evaluate whole area as
population of Panigate area may be around 50000-60000 people. Because of this reason we
will go for simple random sampling in the second stage and try to reduce sample size.
Multistage sampling- in multistage sampling more than one sampling scheme used. If
stratified sampling is used in the first stage then in the second stage simple random
sampling may be used. Similarly if cluster sampling is used in the first stage then in second
stage systematic sampling and in the third stage simple random sampling may be used.
Comparison of Probability Sampling Design
Type Description Advantages DisadvantagesSimple Random Sa
Cost: High
Use: Moderate
Each population unit is
Having equal and
Independent chance ofGetting selected in sample.
Easy to implement 1. Requires list of all
Population units.
2. Uses larger sampleSize.
SystematicCost: Moderate
Use: Moderate
One unit is selected bet.1 to k where k is skip
Interval. Then after every
Kth element is selected.
Easier than simple randomSampling
1. Seasonal and CyclicalBiasness may be there
2. with large population
It is not possible to applySystematic sampling
Stratified Sampling
Cost: HighUse: Moderate
Population is divided into
Homogeneous strata andSample is selected from
Each stratum.
1. statistical efficiency can
Be increased.2. different method of
Analysis is possible in diffe
Strata.
1. Costly when we are
Suppose to create strataFor large population.
Cluster Sampling
Cost: Moderate
Use: High
Clusters are created using
Population units where
Each cluster is heterogeneoWithin itself. Entire cluster
Is selected in a sample andevaluated.
1. Economically more
Efficient than simple rando
2. Easy to apply whenGeographic areas are used
As a cluster.3. Population list is notRequired.
Statistical efficiency is
Often lower.
Double Sampling
Cost: ModerateUse: Moderate
Two or more sampling
Designs are used at variousStages to select ultimate
sample
Cost can be reduce
Significantly as stratificatioOr clustering is used in 1st
Stage.
1.Cost may increase if
Order of various samplinScheme is not proper.
2. Very complex to apply
8/2/2019 Chapter 15 Sampling.pdf1
11/12
Prepared by: Devang Kale 11
Detail Note on Non Probability Sampling Methods:
1. Convenience Sampling: This is unrestricted non probability sampling method, where as
name suggests Convenience researcher is having freedom to choose whomever he finds
easily. Researcher may include his or her friends, relatives, neighbors or any other person
who is available easily.
In convenience sampling there is no control on precision and accuracy. So result obtainedfrom this method may not be very reliable. Still this method is useful as cost of data
collection is very less. Data can be collected using this method to know about views of
people on some current issues.
2. Judgment Sampling: here researcher selects sample members by keeping some criteria in
mind. For eg. If we are conducting research on employees welfare policy of some
organization then better information can be found from those employees who are atleast
two years old in that organization.
Thus researcher is taking some judgment about the respondent. This method is very useful
in the early stages of exploratory research where we are taking experts opinion. For newproduct development this method is very useful.
3. Quota Sampling: here researcher is trying to improve representativeness of the
population assigning specific quotas on the basis of certain factors like gender, religion,
economic class, education level etc. For eg. If some research is conducted in Parul Institute
where population units are students then researcher may find proportion of male and
female students suppose it is 60% and 40% respectively then in sample also researcher will
maintain the same proportion.
Suppose in some other researcher we are considering three factors Gender, Education leveland Economic Class.
1. GenderMale, Female
2. Education LevelGraduate, Undergraduate
3. Economic ClassUpper, Middle, lower.If we consider above three factors then there will be total 12 combinations. We have to find
population proportion of all the 12 combinations and select sample accordingly. This may
increase sample representativeness of population. This will be called precision control.
Quota sampling is having some weakness; it is very difficult to apply quota sampling when
we consider more than three factors. Also there is no guarantee that quotas will be
definitely increase precision.Despite above mentioned problems quota sampling is widely used because probability
sampling is usually very costly and time consuming. Use of quota sampling argues that
while there is some danger of systematic bias, the risks are usually not that great.
8/2/2019 Chapter 15 Sampling.pdf1
12/12
Prepared by: Devang Kale 12
4. Snowball Sampling: this method is applicable when we are dealing with the concept for
which population units are rarely found and knowledge can be gathered through referral
network. As snow ball is increases in size as it is rolling, knowledge increases as we are
approaching more population units by reference.
For eg. Suppose we are conducting research on disease like swine flew. Information about
this can be obtained through only those patients who have face this problem or doctors whoare treating such disease.
[Questions may be asked:
1. Under what kind of conditions would you recommend?
1. A probability sample? A non probability sample?2. A simple random sampling design? A cluster sample? A stratified sampling
design?
3. A disproportionate stratified probability sample?2. Describe the differences between a probability sample and a non probability sample.
3. Why would a researcher use a quota purposive sample?]