5
 Grace H. Encelan-Brizuela, MD, MSPH July 29, 2010 SAMPLING METHODS IN RESEARCH INTRODUCTION Challenge to every research protocol: It must specify a sample of subjects that: can be studied at an acceptable cost in time and money is large enough to control random error in generalizin g the study findings to the population is representative enough to control systematic error in these inferences Basic terms and concepts… Population – complete set of people with specified set of character istics Sample – subset of the population, selected so as to be representative of the larger population (e.g., Population – Medicine Class 2013; Sample – 2013 Section B)  Target population – the large set of patients throughout the world to which the results will be generalized. Defined by clinical and demographic characteristics. Accessible population – the subset of the target population that is available for the study. Defined by geographic and tempora l characteristics. Reasons for sampling 1. Samples can be studied more quickly than populations 2. A stu dy of a sampl e is less e xp ensive th an studying an entire population 3. A study of an entire population is impossible in most situations (e.g., sex workers) 4. Sample results are often more accurate than results based on a population (More prone to mistakes if the population is larger) 5. If samples are properly selected, probability methods can be used to estimate the error in the resulting statistics (can also generalize) 6. Samples can be selected to reduce heterogeneity (i.e., use of inclusion and exclusion criteria) RESEARCH QUESTION (Truth in the Universe) STEP # 1  Target Populations Specify clinical and Demographic Characteristics CRITERIA Well suited to the Research Question  STEP # 2 Accessible Population Specify temporal And geographic Characteristics CRITERIA Representative of target populations and easy to study  STUDY PLAN (Truth in the Study) STEP # 3 Intended Sample Design an approach to selecting the sample CRITERIA Representative of accessible population and easy to do Sampling Specification Specification Establishing Inclusion Criteria Inclusion criteria – define the main characteristics of the target and accessible populations (the exclusion criteria is not the opposite of the inclusion criteria and vice versa) Considerations Examples Inclusio n criteria Specifying the characteristics that define populations that are relevant to the research question and efficient for study: A 5 year trial of calcium supplementation for preventing osteoporosis might specify that the subjects be:  Target populati on Derived from Literatur e Demographic characteristics White females age 45 – 50 Clinical characteristics In good general health: no known life threatening disease; not taking long-term corticosteroids Accessib le populati on Geographic characteristics Patients attending the medical clinic at the investigato r’s hospital  Temporal characteristics Between Jan 1 and Dec 31, 2006 Establishing Exclusion Criteria Exclusion criteria – indicate subsets of individuals who meet the eligibility criteria, but are likely to interfere with the quality of the data or the interpretation of the findings Considerations Examples Exclusi on criteri a Specifying subsets of the population that will not be studied because of: A 5 year trial of calcium supplementa tion for preventing osteoporosis might exclude subjects who are: A high likelihood of being lost to follow-up Plan to move out of state An inability to provide good data Disoriented or having language barriers Eth ical barriers Kidney stone fo rmers  The subject’s refusal to participate Unwilling to accept possibility of random allocation to placebo group Choosing an accessible population Clinic based samples – inexpensive and easy to recruit, but selection factors that determine who comes to the hospital or clinic may have an important effect Population based samples – particularly useful for guiding public health and clinical practice in the whole community, but chief disadvantage is the expense and difficulty

Sampling Methods in Research

Embed Size (px)

Citation preview

8/2/2019 Sampling Methods in Research

http://slidepdf.com/reader/full/sampling-methods-in-research 1/4

 Grace H. Encelan-Brizuela, MD, MSPH July 29, 2010

SAMPLING METHODS INRESEARCH

INTRODUCTION

Challenge to every research protocol:It must specify a sample of subjects that:can be studied at an acceptable cost in time and

moneyis large enough to control random error in

generalizing the study findings to the populationis representative enough to control systematic

error in these inferences

Basic terms and concepts…

Population – complete set of people with

specified set of characteristics Sample – subset of the population, selected

so as to be representative of the largerpopulation (e.g., Population – Medicine Class2013; Sample – 2013 Section B)

 Target population – the large set of patientsthroughout the world to which the results will begeneralized. Defined by clinical anddemographic characteristics.

Accessible population – the subset of thetarget population that is available for the study.Defined by geographic and temporalcharacteristics.

Reasons for sampling1. Samples can be studied more quickly than

populations2. A study of a sample is less expensive than

studying an entire population3. A study of an entire population is impossible

in most situations (e.g., sex workers)4. Sample results are often more accurate

than results based on a population (More proneto mistakes if the population is larger)

5. If samples are properly selected, probability

methods can be used to estimate the error inthe resulting statistics (can also generalize)6. Samples can be selected to reduce

heterogeneity (i.e., use of inclusion andexclusion criteria)

RESEARCHQUESTION(Truth in the

Universe)STEP # 1

 TargetPopulations

Specify clinicaland

DemographicCharacteristicsCRITERIA

Well suited to theResearchQuestion

 

STEP # 2AccessiblePopulation

Specify temporalAnd geographicCharacteristics

CRITERIARepresentative of 

target populationsand easy to

study

 

STUDY PLAN(Truth in the Study)

STEP # 3Intended Sample

Design an approachto selecting the

sampleCRITERIA

Representative of accessiblepopulation

and easy to do

Sampling

Specification

SpecificationEstablishing Inclusion Criteria

Inclusion criteria – define the maincharacteristics of the target and accessiblepopulations (the exclusion criteria is not the

opposite of the inclusion criteria and vice versa)

Considerations Examples

Inclusion

criteria

Specifying thecharacteristicsthat definepopulations thatare relevant tothe researchquestion andefficient forstudy:

A 5 year trial of calciumsupplementation forpreventingosteoporosis mightspecify that thesubjects be:

 Targetpopulati

onDerived

fromLiteratur

e

Demographic

characteristics

White females age

45 – 50

Clinicalcharacteristics

In good generalhealth: no knownlife threateningdisease; not takinglong-termcorticosteroids

Accessible

population

Geographiccharacteristics

Patients attendingthe medical clinic atthe investigator’shospital

 Temporalcharacteristics

Between Jan 1 andDec 31, 2006

Establishing Exclusion CriteriaExclusion criteria – indicate subsets of 

individuals who meet the eligibility criteria, butare likely to interfere with the quality of the dataor the interpretation of the findings

Considerations Examples

Exclusion

criteria

Specifyingsubsets of thepopulation thatwill not bestudied becauseof:

A 5 year trial of calciumsupplementation forpreventingosteoporosis might

exclude subjects whoare:

A high likelihoodof being lost tofollow-up

Plan to move out of state

An inability toprovide gooddata

Disoriented or havinglanguage barriers

Ethical barriers Kidney stone formers

 The subject’srefusal toparticipate

Unwilling to acceptpossibility of randomallocation to placebogroup

Choosing an accessible population

Clinic based samples – inexpensive and easyto recruit, but selection factors that determinewho comes to the hospital or clinic may have animportant effect

Population based samples – particularlyuseful for guiding public health and clinicalpractice in the whole community, but chief disadvantage is the expense and difficultyinvolved

1of 4

Page

8/2/2019 Sampling Methods in Research

http://slidepdf.com/reader/full/sampling-methods-in-research 2/4

SAMPLING

1. Probability Sampling

uses a random process to guarantee that eachunit of the population has a specified chance of selection

if there is no accurate listing of the target  population, use non-probability sampling

researcher knows denominator and characteristics of the population

a. Simple Random sampling Every subject has an equal probability of 

being selected for the study.

Recommended way is to use a table of random numbers or a computer generated listof random numbers

Process of enumerating every unit of theaccessible population, and then selecting thesample at random

Fishbowl sampling included here

What are needed:

  Accurate listing of the population

  Mechanism to find and enroll those who arechosen

b. Systematic sampling

Involves selecting by a periodic process;starting point is chosen at random

Example: get 200 samples from a populationof 3400

  Procedure: Number all units 1 to 3400; dividepopulation with the number to be sampled(3400/200 = 17). Select any number from 1 to17 to be the k. Then select every 17th subjectthereafter.

NOTE: should not be used when a cyclicrepetition is inherent in the sampling frame.

  e.g., not appropriate for selecting months of the year in a study of the frequency of differenttypes of accidents, because some accidentsoccur most often at certain times of the year

  e.g., selecting all even/odd numbers whenmales and females are seated alternately

c. Stratified Random sampling

Involves dividing the population intosubgroups according to characteristics and

taking a random sample from each of these“strata”

Characteristics used to stratify should berelated to the measurement of interest

In Medicine, commonly used strata include:age, gender, severity of disease

e.g., use of proportionate numbers in groupswith differing population sizes

d. Cluster sampling

Process of taking a random sample of naturalgroupings of individuals in the population; veryuseful when the population is widely dispersed

and it is impractical or costly to list and samplefrom all of its elements

Clusters are commonly based on geographicareas or districts, so this approach is used moreoften in epidemiologic research than in clinicalresearch

e.g., Different areas [Area 48 or 81] in BrgyDona Imelda

(e.g., Different areas [Area 48 or 81] in Brgy DonaImelda)

ADDITIONAL: Multi-Stage samplingCombination of the above-mentioned sampling

methodsUsed in very big research studies (e.g.,

nationwide study)

e.g., nationwide choose province choosecity/municipality choose barangay, etc

2. Nonprobability Sampling

Sampling method in which the probabilitythat a subject is selected is unknown

E.g., studies involving abused children/women,

sex workers, etcDenominator is unknown (total population

number)

a. Consecutive Sampling

Involves taking every patient who meets theselection criteria over a specified time intervalor number of patients; it amounts to taking thecomplete accessible population over theduration of the study

 Taking an accessible population at the time of the study

Usually used by residents in their research (dueto time constraints)

b. Convenience Sampling

Process of taking those members of theaccessible population who are easily available.

Many biases, sample not homogenous

c. Judgemental Sampling

Involves handpicking from the accessiblepopulation those individuals judged mostappropriate for the study

E.g. accreditation

Snowball Sampling

Referral method 

 After recruiting one subject, ask that subject for other prospective subjects

SAMPLE SIZE COMPUTATION

Sample SizeFactors that affect the number of subjects requiredfor a study:1. Whether the research design involves

paired or unpaired data

2. Whether beta error is considered in additionto alpha error

3. Whether a large or small variance isanticipated in the data set

4. Whether alpha level chosen is the usual (pvalue 0.05) or smaller

5. Whether the desired difference betweenmeans or proportions to be detected is fairlysmall or extremely small

Pre-testAnswer with LARGE or SMALL:

What sample size would be needed if theinvestigator wants the answer to be very closeto the true value (i.e., have very narrowconfidence level or a very small p value)? -LARGER

What sample size would be needed if anticipated variance is small? - SMALLER

What sample size would be needed if thedifference the investigator wants to detect isextremely small? - LARGER

2 of 4 |Page

8/2/2019 Sampling Methods in Research

http://slidepdf.com/reader/full/sampling-methods-in-research 3/4

Review of Basic Concepts and Terms

Effect size – difference you want to detect between one group and the other group (related to number 3)

 Alpha level/Significance level – probability that a positive finding is due to chance alone (inmedicine, set at 95% [0.05] and Z α=1.96 -constant)

Power – the probability that the effect will bedetected; equivalent to beta error (in medicine,set at 80% and Z β=0.84 - constant)

 Alpha error – type I error; error of finding

something when in fact there is nothing (error of rejecting the null when it is true); z-value for alpha error is 1.96; p = 0.05 means that you areallowing yourself 5% chance of committing typeI error 

Beta error – type II error; error of finding nothingwhen in fact there is something (error of accepting the null when it is false); z-value for beta error is 0.84

Recall… (Nice to Know)t = d____ 

 _sd__ 

√ NWhere: d is the mean difference that wasobserved, sd is the standard error of that meandifference, and N is the sample size

 T o solve for N, rearrangements have to be done.  The formula becomes (Need to Know)

N = (zα)2 * (s)2 (d)2 

Derivation of the Basic Sample Size FormulaFormula for the Calculation of Sample Size forstudies commonly pursued in Medical Research

Studies using the paired t test (e.g. before andafter studies) and considering alpha (Type I)error only

N = (zα)2 * (s)2 (d)2

Use Paired T-test if:

Utilizing a before and after study (involves 1group only)

Matching was employed (according to gender,age, etc)

Subjects are twins

If not, data is continuous use independent T-test 

StudyCharacteris

tics

Assumptions made byInvestigator

 Type of Study

Before and after study of an anti-HPN drug

Data setsPre-treatment and post-treatmentobservations in the same group of subjects

Variable Systolic blood pressureStandard

deviation (s)15 mmHg

Variance (s

2

) 225 mmHgData for

alpha (zα)

p = 0.05; therefore, 95%confidence desired (two-tailedtest); Zα = 1.96

Difference tobe detected

(d)

10 mmHg or larger differencebetween pre and post-treatmentblood pressure values

N = (zα)2 * (s)2 (d)2

= (1.96)2 * (15)2 (10)2

  = (3.84)*(225)(100) 

= 864 = 8.64 = 9 subjects total100 

Studies using the Student’s t test (e.g. oneexperimental group and one control group) andconsidering alpha (Type I) error only

N = (zα)2 * 2 * (s)2 (d)2

Study

Characteristics

Assumptions made by

Investigator

 Type of Study

RCT of an anti-HPN drug

Data setsObservations in one experimentalgroup and one control group

Variable Systolic blood pressureStandard

deviation (s)15 mmHg

Variance (s2) 225 mmHg

Data foralpha (zα)

p = 0.05; therefore, 95%confidence desired (two-tailedtest); Zα = 1.96

Difference tobe detected

(d)

10 mmHg or larger differencebetween mean blood pressurevalues of theexperimental group and controlgroup

N = (zα)2 * 2 * (s)2 (d)2

= (1.96)2 * 2 * (15)2 (10)2

= (3.84)*2*(225)(100)

= 1728 = 17.28

100= 18 subjects per group * 2 grps = 36 subjects

Studies using the Student’s t test andconsidering alpha (Type I) error and beta (TypeII) errors

N = (zα + zβ )2 * 2 * (s)2 (d)2

StudyCharacteris

tics

Assumptions made byInvestigator

 Type of 

Study

RCT of an anti-HPN drug

Data setsObservations in one experimentalgroup and one control group

Variable Systolic blood pressureStandard

deviation (s)15 mmHg

Variance (s2) 225 mmHg

Data foralpha (zα)

p = 0.05; therefore, 95%confidence desired (two-tailedtest); Zα = 1.96

Data for beta(zΒ)

20% beta error; therefore, 80%power desired (one-tailed test); Zβ

= 0.84

Difference tobe detected

(d)

10 mmHg or larger differencebetween mean blood pressurevalues of theexperimental group and controlgroup

N = (zα + zβ )2 * 2 * (s)2 (d)2

= (1.96+0.84)2*2* (15)2 (10)2

= (7.84)*2* (225)

3 of 4 |Page

8/2/2019 Sampling Methods in Research

http://slidepdf.com/reader/full/sampling-methods-in-research 4/4

100= 3528 = 35.28

100= 36 subjects per grp * 2 grps = 72 subjects

Studies using a test of differences in proportionsand considering alpha (Type I) error and beta(Type II) errors

N = (zα + zβ )2 * 2 * p(1 - p)(d)2

Study

Characteristics

Assumptions made by

Investigator

 Type of Study

RCT of a drug to reduce the 5yrmortality inpatients with a particular form of 

cancer

Data setsObservations in one experimentalgroup and one control group

VariableSuccess=5-yr survival after Tx;Failure=death within 5 yrs of Tx

Variance, p(1-p)

p=0.55;therefore, (1-p) = 0.45

Data for

alpha (zα)

p = 0.05; therefore, 95%

confidence desired (two-tailedtest); Zα = 1.96

Data for beta(zΒ)

20% beta error; therefore, 80%power desired (one-tailed test); Zβ

= 0.84

Difference tobe detected

(d)

0.1 or larger difference bet thesuccess(survival) of the E grp and that of 

the C grp)

N = (zα + zβ )2 * 2 * p(1 - p)(d)2

= (1.96+0.84)2 * 2 * (0.55)(0.45)

(0.1)2

= (7.84)*2*(0.2475)0.01

= 3.88 = 3880.01

= 388 subjects per grp * 2 grps = 776

Remember:N = sample sizeZα = z-value for alpha error = 1.96 (constant)Zβ = z-value for beta error = 0.84 (constant)(s)2 = variance (from literature)p = mean proportion of success (from literature)

d = difference to be detected (researcherassigned)

4 of 4 |Page