Chapter 15 Sampling.pdf1

Embed Size (px)

Citation preview

  • 8/2/2019 Chapter 15 Sampling.pdf1

    1/12

    Prepared by: Devang Kale 1

    Chapter 15

    SamplingAbout the chapter:

    We will study this chapter into two parts.

    1. In first part we will try to know some terminologies which are useful to understand the

    concept of the sampling.

    2. In second part we will study various sampling techniques mainly into two parts

    1) Probability sampling and 2) Non probability sampling.

    Type of Questions:In exam short questions about various terminologies can be asked or else long questions

    regarding sampling design may be asked. Students are supposed to put more weight on

    Probability Sampling Scheme.

  • 8/2/2019 Chapter 15 Sampling.pdf1

    2/12

    Prepared by: Devang Kale 2

    Various terminologies which are important to understand the concepts of the Sampling

    and Sampling Design.

    Population: It is a collection of People, Households or Objects which we want to study for

    research purpose.

    For eg. If one wants to study the overall satisfaction level of the students of Parul Institute

    then all the students of Parul Institute forms the population for that research.

    It is always necessary to decide about target population before starting actual datacollection process. Population size helps us to decide about sample size and population

    type gives us idea about sampling design.

    For eg. If one wants to start restaurant near by Parul Campus and he/she wants to make

    survey to know about potential customers then it is necessary to decide that population of

    this survey is employees of the Parul Institute or Students of the Parul Institute or both.

    Population Element: An individual participant or object on which the measurement is

    taken are called population element.

    For eg. In above mention research individual student is population element as we are

    collecting data from a particular student. Population element may be sometime an object or

    household depending upon the type of study.

    Sample: It is a small part of the population, analyzing which we can draw conclusion about

    the whole population.

    For eg. In Parul Institute if 15000 Students are studying and to know the satisfaction level

    of the student if we are randomly selecting 500 students and collect data from these 500

    student then these 500 students are called sample.

    Sample Element: Individual or Household or Object which is selected in a sample is called

    sample element.For eg. In the above situation a particular student selected in the sample of 500 is called

    sample element.

    Census: If data is collected from all the available elements of the population and

    conclusion is drawn on this basis is called census method.

    For eg. Indian Population Census where data is collected from each and every individual

    who is available in India.

    Census method is always very costly and time consuming, because of this Indian

    Population Census is published after every 10years.

    Sampling: Here Small sample is drawn from the population and characteristics of the all

    sample elements are studied and conclusion is drawn for the entire population. This

    process is also called inference.

    Sampling Frame: Specific list of all the Population Unit is called sample frame.

    For eg. Telephone Directory, Students Register etc. In the above example specific list of all

    the students with their roll numbers, divisions and address is called sampling frame.

  • 8/2/2019 Chapter 15 Sampling.pdf1

    3/12

    Prepared by: Devang Kale 3

    Advantages of Sample:Following are the advantages of sample over census (Complete evaluation of population).

    1. Lower Cost:

    Sampling helps to reduce cost of research to a great extent. In census method we are

    suppose to evaluate each and every unit of the population and opposite to that in sampling

    we are evaluating a very small portion of the population.2. Greater Accuracy of Results:

    By research it is proved that 90% of the total survey error in one study is from non

    sampling sources and only 10% or less was from random sampling error. Thus we can say

    sampling can give us grater accuracy of results.

    3. Greater Speed of Data Collection:

    Obviously speed of data collection and analysis of data is very high.

    4. Availability of Population Elements:

    Some experiments are destructive where only one option left that is sampling. For example

    in quality control experiments to check quality of the product we must have to break the

    unit in such situation we have to go for sampling.

    Characteristics of a Good Sample:Goodness of sample can be measured on the basis of two components 1.Accuracy and

    2.Precision.

    1. Accuracy: It is a degree to which bias is absent from the sample. In a proper sample

    some observations are less than the actual no. and some observations are grater than the

    actual no. Thus less and grater observations are nullifying the effect of each other.

    For example suppose average price of the land in Vadodara City is 4000/Sq. feet but

    suppose researcher has selected all the sample elements from rich area then definitelyaverage of the sample elements will be higher than the actual average.

    Thus accurate sample is one in which underestimates offset the overestimates. High

    amount of Systematic Variance is responsible for the low accuracy of sample. Where

    Systematic Variance is because of some known or unknown sources. By increasing the

    sample size we can increase the accuracy of sample.

    For example suppose researcher wants to estimate average price of the household in a

    particular area. Due to wrong selection of sampling method he is selecting last house of the

    society every time and note down its price. We know in most of the cases last house in a

    raw is having more area as compare to the other houses and thus there is a chance to

    overestimate the price of household. Thus in this case source of Systematic Variance iswrong sampling method.

    2. Precision: It means how closely sample represents population. It is a second criterion of

    a good sample design. It is obvious that no sample represents its population exactly. After

    considering all the source of systematic variations whatever variation is left that is because

    of unsystematic sources of variation which is known as Sampling Error (or Random

    Sampling Error). Sampling Error is inherent source of variation which can not be removed

  • 8/2/2019 Chapter 15 Sampling.pdf1

    4/12

    Prepared by: Devang Kale 4

    even after increasing sample size. For example if we consider above mentioned research

    problem i.e. to estimate average price of the household in a particular area. If respondents

    are giving some wrong figures which are slightly higher or lower then this may be reason

    for low precision which is not possible to remove completely from the analysis.

    Types of Sample Design:Here we are going to study mainly two types of sampling design. 1. Probability Sampling

    Design and 2. Non Probability Sampling Design.

    1. Probability Sampling Design: It is a systematic approach of selecting sample. Where

    each element of the population is having pre assigned probability of getting selected in a

    sample. Data obtain through probability sampling design can be used to generalize result

    for the entire population. Which is the most important use of probability sampling design?

    In probability sampling whole process is systematic. Where sample size is also decided on

    the basis of some formula and selection of a sample is also done following specific

    process. Following are the probability sampling design.

    1. Simple Random Sampling: (with replacement and without replacement)

    2. Systematic Sampling3. Stratified Sampling

    4. Cluster Sampling

    5. Double Sampling or Multiple Sampling

    2. Non Probability Sampling Design: Where probability sampling is not possible to apply

    we apply non probability sampling. Where population units are rare or difficult to arrange

    in a specific manner we apply non probability sampling design. In non probability

    sampling no probability is assign to the population unit and sample selection is based on

    the convenience of the researcher. The biggest drawback of these designs is result obtain

    through sample analysis can not be generalize for the entire population as there is a greatscope for biasness. Following are the non probability sampling design.

    1. Convenience

    2. Purposive

    3. Judgment

    4. Quota

    5. Snowball

    Steps for selecting Sampling Design:

    To understand this topic we will take help of one example where research is necessarybefore taking actual decision.

    Suppose a person wants to set one restaurant near by Parul Campus as he believes more

    than 16000 people are visiting this campus in a day because of this there is a great potential

    to set restaurant. He wants to confirm his belief for that he wants to carry out research. In

    this context following steps of selecting sampling design are mentioned.

  • 8/2/2019 Chapter 15 Sampling.pdf1

    5/12

    Prepared by: Devang Kale 5

    Step 1: What is the target population?

    Before deciding about sampling design it is necessary to decide target population as it

    gives us idea about the probability or non probability sampling design. If target population

    is reasonably high then we can go for probability sampling design. Also it is to be confirm

    that whether we need to target individual or household or employee or a student etc.

    In the above research problem researcher has to identify target population as all the 16000

    people are not the target population. Because out of these 16000 students and staff

    members many are local and manage their lunch on their own. Also students who are usingmess facility on regular basis are not their target customers. So to include local students

    and employees or not in the population that researcher has to decide first.

    Step 2: What are the parameters of interests?

    This study is restricted to the three parameters of the population which are Mean (mu),

    Standard Deviation (Sigma) and Population Proportion (P). Population parameters are

    indicated using Greek letters and can be found using whole information contained in the

    population units. If to get information from the each unit of population is not possible then

    we can estimate population parameters using sample information which are identified as

    sample statistic. Sample mean (x bar), sample standard deviation (s) and sample proportion(p) are the sample statistics.

    In the above research problem on an average how many times a student of Parul Institute

    takes lunch out side the campus and not use mass facility that may be parameter of interest.

    Here we are talking about parameter mean. By taking sample of 300 to 500 students we

    can estimate this number which will be called sample statistics.

    Step 3: What is appropriate sampling method?

    It is very necessary to decide whether we should apply probability or non probability

    sampling. Selection of this is based on availability of budget, availability of population

    units and requirement of accuracy and precision. If budget is more and population units areavailable then we may go for probability sampling and on the contrary if budget is small or

    population units are not available then we may go for non probability sampling. If high

    accuracy and precision is a need of the research then we must have to go for probability

    sampling.

    In the research problem that we have considered researcher may use stratified sampling

    where stretas can be formed using various categories of students and staffs like local

    students, hostelite students, students who are commuting from remote areas etc. He may

    use Cluster sampling by considering each department as a separate cluster like

    Management is one cluster, Diploma is second cluster etc.

    Step 4: What is sampling frame?

    Sampling frame help us to locate and select specific sample unit. Sampling frame is proper

    list of all the population units which help us to select sample. In many cases we use

    telephone directory as a sampling frame. If record of population units is available at some

    other sources then we may use it. It is always difficult to find out sampling frame in

    international research as geographic areas and housing patterns are varying from country to

    country.

  • 8/2/2019 Chapter 15 Sampling.pdf1

    6/12

    Prepared by: Devang Kale 6

    Research problem that we have considered, sampling frame may be a proper list of all the

    students containing their sections (i.e. sec.A, sec.B etc.), their department (i.e.

    Management, Engineering etc.) and their roll numbers.

    Step 5: What is the Sample Size? (i.e. n)In case of non probability sampling design sample size is either decided on the basis of rule

    of thumb or on the basis of available budget. If available budget is high then sample size is

    high and vice versa.

    In case of probability sampling size of sample is decided on the basis of some other factorslike variation in the data and degree of precision required. To decide about sample size

    some formulas are available which involves SD and mean. By putting values of SD and

    Mean we can find sample size scientifically. Some principles that we can use for deciding

    sample size.

    Detail Note on Probability Sampling Design (Important from point of view of exam)

    1. Simple Random Sampling: This is the simplest probability sampling design. Simple

    Random Sampling is a probability sampling design in which each population unit is having

    equal and independent chance of getting selected in a sample. Simple Random Sampling isdivided in to two parts 1. Simple Random Sampling With Replacementin this sampling

    scheme selected population unit is replace back to the population so every time population

    size remains same and 2. Simple Random Sampling Without Replacement- where selected

    unit is not replace back to the population so every time when new sample unit is selected

    sample size reduces by one unit.

    Probability of Selection = (Sample Size/Population Size)

    If out of 100 students 10 students are to be selected then probability of any students getting

    selected in a sample is 10/100 = 0.1

    Merits: 1. Simplest method of selecting sample.

    2. If population units are randomize properly then gives good results.

    Demerits: 1. To select sample of size n n random numbers need to be generate.

    2. This method is applicable when population size is small and it is possible to

    assign numbers to all the population units.

    2. Systematic Sampling: This method involves mechanical procedure of selecting sample.

    Where selection of sample is done on the basis of formula. Steps involve in this are as

    follows.

    Step 1: Assign numbers to all the population units.

    Step 2: Find Skip Interval kk = Population Size / Sample Size

    Step 3: Select only one random number from 1 to k.

    Step 4: Draw sample by choosing every k th population unit.

    For eg: Suppose we want to select sample of size 5 where population size is 100.

    Skip Interval k = 100/5 = 20

    So we will select any random number from 1 to 20 suppose it is 12 then first unit in the

    sample will be 12.

  • 8/2/2019 Chapter 15 Sampling.pdf1

    7/12

    Prepared by: Devang Kale 7

    1st

    unit122

    ndunit(12+20) = 32

    3rd

    unit(32+20) = 52

    4th

    unit(52+20) = 72

    5th

    unit(72+20) = 92

    This way sample of 5 will be selected out of 100 population units.

    Merits: 1. Easy to apply as only one random number is to be selected and other sample

    units are selected automatically.Demerits: 1. Possible to apply only when population size is small.

    2. Great scope of biasness due to Cyclical or Seasonal factors. For eg. If we have

    observations of Ice Cream consumption for last three years. Suppose these observations are

    available on monthly basis i.e. January 2008, February 2008 .. December 2011. If weuse systematic sampling and if we have selected first month of off season then there is a

    chance for biased answer as systematic method make use of skip interval. That means it

    may skip months of April and May all the times.

    3. Stratified Sampling: Most of the population can be segregated into several mutually

    exclusive subpopulations or strata. Process in which sample units are selected from eachmutually exclusive stratum is called stratified sampling. Stratification is usually more

    efficient statistically than simple random sampling.

    Due to three reasons researcher chooses a stratified random sampling: 1. to increase

    samples statistical efficiency, 2. to provide adequate data for analyzing the various

    subpopulation or strata and 3. to apply different method and procedures to different strata.

    Strata are mutually exclusive in nature that means population unit which is falling in first

    strata cannot fall in second or subsequent strata. Within stratum population units are

    homogenous and between stratum population units are heterogeneous.

    To understand stratified sampling we will consider one example of buying behavior ofcustomers for some luxury product. Demand for luxury item is dependent on the buying

    capacity of the customers, so that we have to divide customers in to three categories like

    low income, average income and high income groups. Here each income groups are

    forming one stratum. Suppose total population is of 100 customers and it is divided into

    three strata as follows.

    N = 100 = Total Population

    S1 = Strata one S2 = Strata two S3 = Strata three

    N1 = 20 N2 = 30 N3 = 50

    High Income Average Income Low Income

    Here N = 100 total population

    S1 = Strata one N1 = Population Size of Strata one

    S2 = Strata two N2 = Population size of strata two

  • 8/2/2019 Chapter 15 Sampling.pdf1

    8/12

    Prepared by: Devang Kale 8

    S3 = Strata three N3 = Population size of strata three

    Here sample is selected from each strata and total sample size is denoted by n.

    n1 = sample size of strata one

    n2 = sample size of strata two

    n3 = sample size of strata three

    For selecting a sample there are two approaches 1. Proportionate and 2. Disproportionate

    sampling. In proportionate stratified sampling sample is selected from each stratum in

    proportion to the stratum share to the total population.

    This means in the above situation if we want to draw a sample of size 10 (i.e. n=10) then

    we will select 2 units from strata one, 3 units from strata two and 5 units from strata three

    as strata 1,2 and 3 are contributing 20%, 30% and 50% to the total population respectively.

    Similarly in case of disproportionate sampling we may select 3 units from strata one, 4

    units from strata two and 3 units from strata five as we are not considering any proportion

    in this case.

    4. Cluster Sampling: In simple random sampling we select each unit individually where as

    in cluster sampling we divide whole population into some non homogeneous groups andselect whole cluster in a sample and try to evaluate whole cluster for analysis purpose.

    Continuing with the same example which we have considered for stratified sampling

    suppose we are dealing with large population and not in a position to form homogeneous

    strata. In such situation we can divide whole population into some non homogeneous

    groups (Cluster) and select entire group in our sample. This group gives us chance to

    evaluate all the types of population units.

    1. Cluster sampling is economically more efficient than simple random sampling but

    statistically it is less than simple random sampling. 2. Within Cluster population units are

    non homogeneous and between cluster population units are mostly homogeneous. 3. If fewclusters are very small in size and few are very large than we can club small clusters into

    one and can form large size cluster.

    Generally to apply cluster sampling we use Area Sampling in area sampling we assign

    numbers (1,2,3..) to various areas and random numbers are generated using randomnumber table. Whatever number occurs that area is selected in a sample and evaluated

    fully. For eg. If it is not possible to divide people into various strata using their income

    level than we can go by area sampling i.e. Panigate-1, waghodiya-2, Pratapnagar-3 and so

    on. If number 2 is selected then entire Waghodiya area will be evaluated.

  • 8/2/2019 Chapter 15 Sampling.pdf1

    9/12

    Prepared by: Devang Kale 9

    N = Total Population = 10000

    C1 = Cluster oneN1 = 2000 C2 = Cluster two C3 = Cluster three

    N2 = 5000 N3 = 3000

    Here N = 100 total population

    C1 = Cluster one N1 = Population Size of Cluster one

    C2 = Cluster two N2 = Population size of Cluster two

    C3 = Cluster three N3 = Population size of Cluster three

    If using random number table cluster two is selected then entire cluster of 5000 units will

    be evaluated.

    Points to be remember while applying cluster sampling:

    1. When clusters are homogeneous this contributes to low statistical efficiency so we will

    avoid use of cluster sampling in such situation or we can form new clusters which are

    heterogeneous in nature.

    2. Clusters may be of equal or unequal size. If they are of equal size then no problem but if

    they are of unequal size then we may merge small clusters into large one or we may split

    large cluster into small one.

    3. There is no such method to decide about specific sample size. By experimenting with

    different sample size and efficiency at different sample size we can come to the specificsample size decision.

    4. Multistage Cluster design: If size of the cluster is very large then we have to use

    multistage cluster design. Which means in the first stage we will select sample using

    cluster method and in the second stage we may go for simple random sampling. For eg. If

    by using cluster sampling method we have selected Panigate area in a cluster then it is not

    possible to evaluate whole area as population of Panigate area may be around 50000-60000

    people. Because of this reason we will go for simple random sampling in the second stage

    and try to reduce sample size.

    Difference between Stratified Sampling and Cluster Sampling:Stratified Sampling Cluster Sampling

    1. Population Units are homogeneous

    Within Strata.

    1. Population units are heterogeneous

    Within cluster.

    2. Population Units are heterogeneous

    Between strata.

    2. Population units are homogeneous

    Between clusters.

    3. Few population units are selected

    In a sample from each strata.

    3. Whole cluster is selected in a sampl

    And evaluated.

  • 8/2/2019 Chapter 15 Sampling.pdf1

    10/12

    Prepared by: Devang Kale 10

    4. Proportionate and disproportionate

    Sampling are two method of selecting

    Sample from each strata.

    4. One stage cluster sampling and

    Multistage cluster sampling are the

    Method of selecting sample.

    5. Double Sampling: to have more economic and efficient data analysis some times we use

    more than one sampling schemes at a time. If two sampling designs are used in first and

    second stage of sampling then it is called double sampling.

    For eg. In the first stage we will select sample using cluster method and in the second stagewe may go for simple random sampling. For eg. If by using cluster sampling method we

    have selected Panigate area in a cluster then it is not possible to evaluate whole area as

    population of Panigate area may be around 50000-60000 people. Because of this reason we

    will go for simple random sampling in the second stage and try to reduce sample size.

    Multistage sampling- in multistage sampling more than one sampling scheme used. If

    stratified sampling is used in the first stage then in the second stage simple random

    sampling may be used. Similarly if cluster sampling is used in the first stage then in second

    stage systematic sampling and in the third stage simple random sampling may be used.

    Comparison of Probability Sampling Design

    Type Description Advantages DisadvantagesSimple Random Sa

    Cost: High

    Use: Moderate

    Each population unit is

    Having equal and

    Independent chance ofGetting selected in sample.

    Easy to implement 1. Requires list of all

    Population units.

    2. Uses larger sampleSize.

    SystematicCost: Moderate

    Use: Moderate

    One unit is selected bet.1 to k where k is skip

    Interval. Then after every

    Kth element is selected.

    Easier than simple randomSampling

    1. Seasonal and CyclicalBiasness may be there

    2. with large population

    It is not possible to applySystematic sampling

    Stratified Sampling

    Cost: HighUse: Moderate

    Population is divided into

    Homogeneous strata andSample is selected from

    Each stratum.

    1. statistical efficiency can

    Be increased.2. different method of

    Analysis is possible in diffe

    Strata.

    1. Costly when we are

    Suppose to create strataFor large population.

    Cluster Sampling

    Cost: Moderate

    Use: High

    Clusters are created using

    Population units where

    Each cluster is heterogeneoWithin itself. Entire cluster

    Is selected in a sample andevaluated.

    1. Economically more

    Efficient than simple rando

    2. Easy to apply whenGeographic areas are used

    As a cluster.3. Population list is notRequired.

    Statistical efficiency is

    Often lower.

    Double Sampling

    Cost: ModerateUse: Moderate

    Two or more sampling

    Designs are used at variousStages to select ultimate

    sample

    Cost can be reduce

    Significantly as stratificatioOr clustering is used in 1st

    Stage.

    1.Cost may increase if

    Order of various samplinScheme is not proper.

    2. Very complex to apply

  • 8/2/2019 Chapter 15 Sampling.pdf1

    11/12

    Prepared by: Devang Kale 11

    Detail Note on Non Probability Sampling Methods:

    1. Convenience Sampling: This is unrestricted non probability sampling method, where as

    name suggests Convenience researcher is having freedom to choose whomever he finds

    easily. Researcher may include his or her friends, relatives, neighbors or any other person

    who is available easily.

    In convenience sampling there is no control on precision and accuracy. So result obtainedfrom this method may not be very reliable. Still this method is useful as cost of data

    collection is very less. Data can be collected using this method to know about views of

    people on some current issues.

    2. Judgment Sampling: here researcher selects sample members by keeping some criteria in

    mind. For eg. If we are conducting research on employees welfare policy of some

    organization then better information can be found from those employees who are atleast

    two years old in that organization.

    Thus researcher is taking some judgment about the respondent. This method is very useful

    in the early stages of exploratory research where we are taking experts opinion. For newproduct development this method is very useful.

    3. Quota Sampling: here researcher is trying to improve representativeness of the

    population assigning specific quotas on the basis of certain factors like gender, religion,

    economic class, education level etc. For eg. If some research is conducted in Parul Institute

    where population units are students then researcher may find proportion of male and

    female students suppose it is 60% and 40% respectively then in sample also researcher will

    maintain the same proportion.

    Suppose in some other researcher we are considering three factors Gender, Education leveland Economic Class.

    1. GenderMale, Female

    2. Education LevelGraduate, Undergraduate

    3. Economic ClassUpper, Middle, lower.If we consider above three factors then there will be total 12 combinations. We have to find

    population proportion of all the 12 combinations and select sample accordingly. This may

    increase sample representativeness of population. This will be called precision control.

    Quota sampling is having some weakness; it is very difficult to apply quota sampling when

    we consider more than three factors. Also there is no guarantee that quotas will be

    definitely increase precision.Despite above mentioned problems quota sampling is widely used because probability

    sampling is usually very costly and time consuming. Use of quota sampling argues that

    while there is some danger of systematic bias, the risks are usually not that great.

  • 8/2/2019 Chapter 15 Sampling.pdf1

    12/12

    Prepared by: Devang Kale 12

    4. Snowball Sampling: this method is applicable when we are dealing with the concept for

    which population units are rarely found and knowledge can be gathered through referral

    network. As snow ball is increases in size as it is rolling, knowledge increases as we are

    approaching more population units by reference.

    For eg. Suppose we are conducting research on disease like swine flew. Information about

    this can be obtained through only those patients who have face this problem or doctors whoare treating such disease.

    [Questions may be asked:

    1. Under what kind of conditions would you recommend?

    1. A probability sample? A non probability sample?2. A simple random sampling design? A cluster sample? A stratified sampling

    design?

    3. A disproportionate stratified probability sample?2. Describe the differences between a probability sample and a non probability sample.

    3. Why would a researcher use a quota purposive sample?]