1 Sampling Racial and Ethnic Minorities William D. Kalsbeek Director, Survey Research Unit...
64
1 Sampling Racial and Ethnic Minorities William D. Kalsbeek Director, Survey Research Unit Professor, Department of Biostatistics University of North Carolina June 14, 2000 Copyright 2000, William Kalsbeek
1 Sampling Racial and Ethnic Minorities William D. Kalsbeek Director, Survey Research Unit Professor, Department of Biostatistics University of North Carolina
1 Sampling Racial and Ethnic Minorities William D. Kalsbeek
Director, Survey Research Unit Professor, Department of
Biostatistics University of North Carolina June 14, 2000 Copyright
2000, William Kalsbeek
Slide 2
2 Acknowledgements l Gayle Shimokura For significant
contributions to this presentation through her meticulous
background research. l CDC/National Center for Health Statistics
(Contract No. UR6/CCU417428-01) For funding support for this
presentation UNC-CHs Center for Health Statistics Research
http://www.sph.unc.edu/chsrhttp://www.sph.unc.edu/chsr
Slide 3
Copyright 2000, William Kalsbeek 3 Race/Ethnic Minorities (% of
Population: March 2000 CPS) l Hispanics (11.7 %) Settled (95%)
Mobile ( 5 %) l African-American (12.8 %) Settled (99.9%) Mobile
(0.1%) l Asian-American (4.0%) l Native-American (0.9%)
Slide 4
Copyright 2000, William Kalsbeek 4 Overview l Some basics on
probability sampling l Problems in sampling rare population
subgroups* l A review of some existing remedies* * Note that a
reference list is available
Slide 5
Copyright 2000, William Kalsbeek 5 Context: Sampling
Race/Ethnic Minorities l As the population subgroup of interest in
a specially targeted study (targeted sampling) l As a key subgroup
in a general population study (oversampling) Ethnic Minority With
Oversampling Targeted
Slide 6
Copyright 2000, William Kalsbeek 6 Probability vs.
Nonprobability Sampling? l Probability sampling: Random sampling
methods used Each member of the target population with a known,
nonzero selection probability l Nonprobability sampling in
exceptional circumstances Judgment used Requires models to analyze
l Probability sampling is generally preferred
Slide 7
Copyright 2000, William Kalsbeek 7 Sampling Frames and Linkage
l Sampling Frame = List(s) used to select a probability sample l
EXAMPLE: List of patients to sample health care users l Usefulness
of a frame is tied to: The linkage that exists between entries on
the list and the population being sampled
Slide 8
Copyright 2000, William Kalsbeek 8 Sample Weights l A number
for each member of the sample Reflecting the inverse of the
selection probability for the sample member l May be adjusted for
sample imbalance due to: Nonresponse Incomplete frame coverage
Other selection problems
Slide 9
Copyright 2000, William Kalsbeek 9 What are the Statistical
Goals of Probability Sampling? l Validity The ability to produce
estimates without bias tied to sampling Achieved if all population
members have some known chance to be chosen in the sample l
Efficiency Tied to precision of estimates Achieved if the right
sampling tools are used Greater efficiency costs more
(cost-efficiency)
Slide 10
Copyright 2000, William Kalsbeek 10 What Selection Tools Might
be Used to Sample Race/Ethnic Minorities? l Stratified sampling
Separate sampling within each of a number of population groupings
(strata) l Screening for the targeted minority group Identify
subgroup members in initial sample of the full population
Slide 11
Copyright 2000, William Kalsbeek 11 Stratified Sampling:
Population divided into a H subgroups called strata Separate
probability sample in each stratum Combine estimates from each
stratum to produce the estimate for the whole population Vs.
Stratified Analysis
Slide 12
Copyright 2000, William Kalsbeek 12 Stratified Sampling Used
When: Wish to improve the efficiency of population-wide estimates
AND/OR Wish to control the sample size of estimates for important
population subgroups Isolatable to some degree by the strata
Slide 13
Copyright 2000, William Kalsbeek 13 Stratum Allocation Options:
C h = Average cost of adding another respondent to the sample in
the h-th stratum Sampling rate in h-th stratum = Standard deviation
of all members of the h-th stratum (measures intra-stratum
variation)
Slide 14
Copyright 2000, William Kalsbeek 14 Stratum Allocation Options
Analysis Priority OptionDescription to Estimates for:
ProportionateSame sampling rates Overall (f h = f in all strata)
population OptimumMost cost-efficient sampling rates Overall
population BalancedEqual sample sizes Population (n h = n/H)
subgroups* DisproportionateTo "oversample" Key population important
subgroups subgroups* (f h higher in subgroup strata) * Definable by
the strata
Slide 15
Copyright 2000, William Kalsbeek 15 Screening for a Targeted
Population Subgroup l Sampling in two phases Goal is to locate
members of the population subgroup Usually done by telephone or
face-to-face in general population surveys l Process: Select an
initial sample Administer a relatively short interview To determine
membership in the targeted subgroup Retain all target subgroup and
(perhaps) a random portion of the rest Copyright 2000, William
Kalsbeek
Slide 16
16 What May Lead to Problems in Sampling Race/Ethnic
Minorities? l Incomplete Frame(s) A sizable portion of the
population not linked to entries on the list(s) used for sampling l
Rarity They usually comprising a relatively small percentage of the
target population
Slide 17
Copyright 2000, William Kalsbeek 17 What May Lead to Problems
in Sampling Race/Ethnic Minorities? l Mobility Some of them move
around a lot, thus creating a more dynamic than static linkage
between the frame and sampled population l Dispersion They are
somewhat scattered geographically May have some pockets with
relatively high concentrations
Slide 18
Copyright 2000, William Kalsbeek 18
Slide 19
Copyright 2000, William Kalsbeek 19 Some Remedies l Targeted
Sampling Multiple Frame Methods Linkage Exploitation Methods
Network/multiplicity sampling Snowball sampling Adaptive cluster
sampling Time and Space Sampling l Oversampling Disproportionate
Stratified Sampling with Screening
Slide 20
Copyright 2000, William Kalsbeek 20 Multiple Frame Methods:
Selection Approaches l Premise: Frame options taken alone may be
inadequate or too costly to use, BUT Choosing the sample jointly
from multiple frames may: Produce better coverage of the targeted
population and Be more cost-effective l Dual-Frame Designs --- Two
frames
Slide 21
Copyright 2000, William Kalsbeek 21 Multiple Frames Frame A
Frame C Frame B
Slide 22
Copyright 2000, William Kalsbeek 22 Multiple Frame Methods:
EXAMPLE l Sampling Native Americans l Two frames: List of tribal
rolls Less complete Less expensive to locate NAs Area household
frame from: List of residential dwellings in a sample of block
groups (neighborhoods) More complete More expensive because of the
need to screen Most cost-effective mix = ?
Slide 23
Copyright 2000, William Kalsbeek 23 Multiple Frame Methods:
Estimation Approaches l Work by Hartley (1962), Choudry (1989), and
Skinner and Rao (1996) l Special Requirements: Identify/eliminate
overlap prior to sampling OR Require knowledge of membership in
intersection groups for analysis adjustments
Slide 24
Copyright 2000, William Kalsbeek 24 Multiple Frame Methods:
Estimation Approaches l Eliminate frame duplication; treat as a
stratified sample OR l Select with duplication present and either:
Combine estimates for intersection groups OR Determine frame
membership for sample respondents and weight accordingly
Slide 25
Copyright 2000, William Kalsbeek 25 Multiple Frame Methods:
Implications for Sampling Race/Ethnic Minorities l Advantages:
Improved sample coverage over using a single list Potential cost
savings if cost of frame use differs among frames l Disadvantages:
Higher design/selection/analysis complexity relative to single
frame use Challenge in finding the most cost-effective mix of
sample sizes for frames
Slide 26
Copyright 2000, William Kalsbeek 26 Linkage Exploitation
Methods: Selection Approaches l Premise: Population members with a
rare attribute can often identify others with the same attribute l
Various adaptations: Based in the notion of multiplicity in frames
Differ according to how multiplicity is utilized
Slide 27
Copyright 2000, William Kalsbeek 27 Multiplicity Frame Listing
Population Member
Slide 28
Copyright 2000, William Kalsbeek 28 Linkage Exploitation
Methods: Various Adaptations l Network/multiplicity sampling
Network --- social/spatial/organizational linkage among members of
the targeted subgroup EXAMPLES: relatives, friends, co-workers, co-
habitants, organization co-members, etc. Linkages may be:
Asymmetric Complex EXAMPLE: friends
Slide 29
Copyright 2000, William Kalsbeek 29 Linkage Exploitation
Methods: Various Adaptations l Network/multiplicity sampling
Sampling Process: Chose an initial sample of targeted subgroup
Sample members interviewed and asked to nominate other members of
their network who are members of the targeted subgroup Interview
those nominated and have them nominate others in like manner
Selection probability directly tied to size of network
Slide 30
Copyright 2000, William Kalsbeek 30 Linkage Exploitation
Methods: Various Adaptations l Snowball sampling Network sampling
but with multiple phases of nomination Snowballing may be best used
to construct frames to sample rare populations Continue waves of
nomination until list expansion ceases
Slide 31
Copyright 2000, William Kalsbeek 31 Linkage Exploitation
Methods: Various Adaptations l Adaptive cluster sampling Exploits
the tendency for members of some targeted subgroups to cluster
together Original motivation from ecology and geology Sampling
Process: Select a random sample of the population Where one
identifies members of the targeted subgroup, sample others in the
neighborhood
Slide 32
Copyright 2000, William Kalsbeek 32 Linkage Exploitation
Methods: EXAMPLE l Snowballing: sampling frame of prenatal care
providers l Study of recent female immigrants from Central and
South America l Process: Contact OB-GYNs in private practices and
public clinics Those providing prenatal care to immigrants nominate
others doing the same Continue iteratively until the no new
providers are discovered
Slide 33
Copyright 2000, William Kalsbeek 33 Linkage Exploitation
Methods: Estimation l Major contributors: Sirken (network), Goodman
(snowball), and Thompson (adaptive) l Approaches: Weighted
multiplicity estimation (Sirken) Rao-Blackwellization to improve
estimator efficiency (Thompson) l Special requirements: Network
membership information Multiplicity counts
Slide 34
Copyright 2000, William Kalsbeek 34 Linkage Exploitation
Methods: Implications for Sampling Race/Ethnic Minorities l
Advantages: Greater operational efficiency in locating members of
the target population Find a hotspot; then sample nearby l
Disadvantages: Difficult to determine selection probabilities for
weights Asymmetric linkages (A nominates B, but not vice versa)
Valid probability samples?
Slide 35
Copyright 2000, William Kalsbeek 35 Time and Space Sampling:
Selection Approach l Premise: Portions of ethnic subpopulations are
relatively mobile (e.g., migrant farm workers, homeless) Sampling a
chunk of time Linkage between members of the target subgroup and
the frame is dynamic overtime Those moving more frequently have
greater chance of selection Sample space and time to address this
potential for bias
Slide 36
Copyright 2000, William Kalsbeek 36 Time and Space Sampling:
EXAMPLE l Sampling migrant seasonal farm workers l Process: Spatial
dimension: sample migrant housing locations On farms In other
residential housing areas Time dimension: sample time periods
during the data collection period Three consecutive days
Slide 37
Copyright 2000, William Kalsbeek 37 Time and Space Sampling:
Estimation l Contributors: Kalsbeek (1988); Kalton (1991) l
Approaches: Multiplicity estimators similar to those used in
network samples l Special Requirements: Need multiplicity count for
each sample member? Sampling scheme compromise needed between:
Statistical precision of estimates Operational effectiveness
Slide 38
Copyright 2000, William Kalsbeek 38 Time and Space Sampling:
Implications for Sampling Race/Ethnic Minorities l Advantages:
Deals with the fluidity of frame-population linkage in mobile
populations Provides a framework for finding a cost- efficient
solution l Disadvantages: Added complexity to selection, data
gathering, and analysis of sample
Slide 39
Copyright 2000, William Kalsbeek 39 Disproportionate Stratified
Sampling with Screening: Selection Approach l Premise:
Concentrations of the targeted subgroup vary in the population
Sample strata with higher concentrations more heavily Result:
larger sample size for the target subgroup relative to a
proportionate sample
Slide 40
Copyright 2000, William Kalsbeek 40
Slide 41
Copyright 2000, William Kalsbeek 41 DSS with Screening: EXAMPLE
l Oversampling African-Americans l A simple process: Stratify the
population By relatively high and low concentrations of African-
Americans High concentration areas in the South and large cities
Sample with relatively higher rates in the high concentration
stratum
Slide 42
Copyright 2000, William Kalsbeek 42 DSS with Screening:
Estimation l Approaches: Weighted estimate to account for sample
disproportionality Effect of variable weights is to lower precision
of some population estimates l Special Requirements: Establishing
the most cost-efficient overall and stratum-specific sampling
rates
Slide 43
Copyright 2000, William Kalsbeek 43 DSS with Screening:
Implications for Sampling Race/Ethnic Minorities l Advantages:
Increased sample size for the targeted subgroups Are target
subgroup non-members in the (oversampled) high concentration
strata) l Disadvantages: Loss in precision on overall population
estimates
Slide 44
Copyright 2000, William Kalsbeek 44 A Two-Stratum Model for
Effects of Oversampling l Setting: Oversampling a minority group
10% of the population Two sampling strata: One with higher %
minority (to oversample) One with lower % minority (to undersample)
Two alternative sets of strata: Nearly Pure --- strata virtually
all members or non- members Less Pure --- strata mostly all members
or non- members
Slide 45
Copyright 2000, William Kalsbeek 45 Nearly Pure Strata
Oversampled Stratum | Undersampled Stratum | TARGET POPULATION
Slide 46
Copyright 2000, William Kalsbeek 46 Less Pure Strata
Undersampled Stratum | Oversampled Stratum | TARGET POPULATION
Slide 47
Copyright 2000, William Kalsbeek 47 A Two-Stratum Model for
Effects of Oversampling l Assumptions: Simple random sampling in
each stratum Stratum unit variances are equal Other minor
simplifying conditions
Slide 48
Copyright 2000, William Kalsbeek 48 A Two-Stratum Model for
Effects of Oversampling l Sample Sizes (Relative to Proportionate):
Minority_Nom = Nominal Sample Size for Minority Observed increase
in size of minority sample Due to oversampling of the predominantly
minority stratum Minority_Eff = Effective Sample Size for Minority
Adjusted size of minority sample Considering the (downward) effect
of variable sample weights on statistical quality of estimates
Overall_Eff = Effiective Size of Overall Sample Adjusted size of
overall sample Considering the (downward) effect of variable sample
weights on statistical quality of estimates
Slide 49
Copyright 2000, William Kalsbeek 49 Effects of Oversampling:
Nearly Pure Strata
Slide 50
Copyright 2000, William Kalsbeek 50 Effects of Oversampling:
Less Pure Strata
Slide 51
Copyright 2000, William Kalsbeek 51 Summary l Sampling rare
ethnic groups is possible BUT l Accomplishing it effectively is
likely to be: Complex (dealing with multiplicity, dealing with
multiple frames, resolving statistical- operational dilemmas)
Costly (screening, stratification) Adverse effect on overall
population estimates (if oversampling done) Loss of sampling
validity? (snowball sampling)
Slide 52
Copyright 2000, William Kalsbeek 52 A Case-Study in
Oversampling Blacks and Mexican-Americans: The Third National
Health and Nutrition Examination Survey (NHANESIII)
Slide 53
Copyright 2000, William Kalsbeek 53 Cluster Sampling: Random
selection applied to one or more levels of a population hierarchy
Sampling Stage = Level of hierarchy at which sampling is done
Jargon: PSU = Primary Sampling Unit is what is sampled in the first
selection stage SSU = Secondary Sampling Unit is what is sampled in
the second stage
Slide 54
Copyright 2000, William Kalsbeek 54 Population Hierarchies:
Population Member
Slide 55
Copyright 2000, William Kalsbeek 55 Population Hierarchies:
EXAMPLE: African-American residents of the US non-institutionalized
household population Resident > Household > Block Group >
Census Tract > Minor Civil Division > County > State >
US
Slide 56
Copyright 2000, William Kalsbeek 56 NHANES III Overview
National health survey U.S. civilian noninstitutionalized
population Stratified multi-stage sample design Detailed profile
and predictors of health status Data gathering timeline: 1988-94
Data collected by: Face-to-face interviews in the home Detailed
examination at mobile sites
Slide 57
Copyright 2000, William Kalsbeek 57 NHANES III Target
Population l U.S. residents Two months and older Including those
living in Alaska and Hawaii l Civilians only Excludes housing on
military bases l Noninstitutionalized population only Excludes some
residents of hospitals, nursing homes, prisons, and other
comparable institutions l Eligibility determined as of the time of
interview
Slide 58
Copyright 2000, William Kalsbeek 58 NHANES III in General Key
minority domains: Black (non-Hispanic) Mexican American Children: 2
months 5 years The Elderly: > 60 years
Slide 59
Copyright 2000, William Kalsbeek 59
Slide 60
Copyright 2000, William Kalsbeek 60 Stratification to
Oversample Key Minority Domains Applied at: The PSU level:
Race/ethnicity or income indicator The segment level: Density of
Mexican-Americans The household level: Race/ethnicity The (sample)
person level: Age
Slide 61
Copyright 2000, William Kalsbeek 61 Oversampling of Key
Minority Domains Implementation accomplished by: Disproportionate
allocation favoring key minority domains Using a weighted measure
of size: = overall sampling probability for the j-th among all
cells of the cross-classification by the race/ethnicity and age
categories that define the key minority domains = Measure of size
for the same cross-classification within the ( -th) cluster
Slide 62
Copyright 2000, William Kalsbeek 62 Stratification to
Oversample Key Minority Domains in NHANES III
Slide 63
Copyright 2000, William Kalsbeek 63 Stratification to
Oversample Key Minority Domains in NHANES III Oversampling implies
more widely variable selection probabilities and sample weights
Effect of variable weights is to increase variances of estimates
One model: Increased variance by a factor of, Variance of weights
among sample respondents Mean of weights among sample
respondents
Slide 64
Copyright 2000, William Kalsbeek 64 Stratification to
Oversample Key Minority Domains in NHANES III EXAMPLE: Effect of
variable sample weights on total population estimates using data
from the MEC-examined NHANES III sample n = 23,561 9,397.04