64
1 Sampling Racial and Ethnic Minorities William D. Kalsbeek Director, Survey Research Unit Professor, Department of Biostatistics University of North Carolina June 14, 2000 Copyright 2000, William Kalsbeek

1 Sampling Racial and Ethnic Minorities William D. Kalsbeek Director, Survey Research Unit Professor, Department of Biostatistics University of North Carolina

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

  • Slide 1
  • 1 Sampling Racial and Ethnic Minorities William D. Kalsbeek Director, Survey Research Unit Professor, Department of Biostatistics University of North Carolina June 14, 2000 Copyright 2000, William Kalsbeek
  • Slide 2
  • 2 Acknowledgements l Gayle Shimokura For significant contributions to this presentation through her meticulous background research. l CDC/National Center for Health Statistics (Contract No. UR6/CCU417428-01) For funding support for this presentation UNC-CHs Center for Health Statistics Research http://www.sph.unc.edu/chsrhttp://www.sph.unc.edu/chsr
  • Slide 3
  • Copyright 2000, William Kalsbeek 3 Race/Ethnic Minorities (% of Population: March 2000 CPS) l Hispanics (11.7 %) Settled (95%) Mobile ( 5 %) l African-American (12.8 %) Settled (99.9%) Mobile (0.1%) l Asian-American (4.0%) l Native-American (0.9%)
  • Slide 4
  • Copyright 2000, William Kalsbeek 4 Overview l Some basics on probability sampling l Problems in sampling rare population subgroups* l A review of some existing remedies* * Note that a reference list is available
  • Slide 5
  • Copyright 2000, William Kalsbeek 5 Context: Sampling Race/Ethnic Minorities l As the population subgroup of interest in a specially targeted study (targeted sampling) l As a key subgroup in a general population study (oversampling) Ethnic Minority With Oversampling Targeted
  • Slide 6
  • Copyright 2000, William Kalsbeek 6 Probability vs. Nonprobability Sampling? l Probability sampling: Random sampling methods used Each member of the target population with a known, nonzero selection probability l Nonprobability sampling in exceptional circumstances Judgment used Requires models to analyze l Probability sampling is generally preferred
  • Slide 7
  • Copyright 2000, William Kalsbeek 7 Sampling Frames and Linkage l Sampling Frame = List(s) used to select a probability sample l EXAMPLE: List of patients to sample health care users l Usefulness of a frame is tied to: The linkage that exists between entries on the list and the population being sampled
  • Slide 8
  • Copyright 2000, William Kalsbeek 8 Sample Weights l A number for each member of the sample Reflecting the inverse of the selection probability for the sample member l May be adjusted for sample imbalance due to: Nonresponse Incomplete frame coverage Other selection problems
  • Slide 9
  • Copyright 2000, William Kalsbeek 9 What are the Statistical Goals of Probability Sampling? l Validity The ability to produce estimates without bias tied to sampling Achieved if all population members have some known chance to be chosen in the sample l Efficiency Tied to precision of estimates Achieved if the right sampling tools are used Greater efficiency costs more (cost-efficiency)
  • Slide 10
  • Copyright 2000, William Kalsbeek 10 What Selection Tools Might be Used to Sample Race/Ethnic Minorities? l Stratified sampling Separate sampling within each of a number of population groupings (strata) l Screening for the targeted minority group Identify subgroup members in initial sample of the full population
  • Slide 11
  • Copyright 2000, William Kalsbeek 11 Stratified Sampling: Population divided into a H subgroups called strata Separate probability sample in each stratum Combine estimates from each stratum to produce the estimate for the whole population Vs. Stratified Analysis
  • Slide 12
  • Copyright 2000, William Kalsbeek 12 Stratified Sampling Used When: Wish to improve the efficiency of population-wide estimates AND/OR Wish to control the sample size of estimates for important population subgroups Isolatable to some degree by the strata
  • Slide 13
  • Copyright 2000, William Kalsbeek 13 Stratum Allocation Options: C h = Average cost of adding another respondent to the sample in the h-th stratum Sampling rate in h-th stratum = Standard deviation of all members of the h-th stratum (measures intra-stratum variation)
  • Slide 14
  • Copyright 2000, William Kalsbeek 14 Stratum Allocation Options Analysis Priority OptionDescription to Estimates for: ProportionateSame sampling rates Overall (f h = f in all strata) population OptimumMost cost-efficient sampling rates Overall population BalancedEqual sample sizes Population (n h = n/H) subgroups* DisproportionateTo "oversample" Key population important subgroups subgroups* (f h higher in subgroup strata) * Definable by the strata
  • Slide 15
  • Copyright 2000, William Kalsbeek 15 Screening for a Targeted Population Subgroup l Sampling in two phases Goal is to locate members of the population subgroup Usually done by telephone or face-to-face in general population surveys l Process: Select an initial sample Administer a relatively short interview To determine membership in the targeted subgroup Retain all target subgroup and (perhaps) a random portion of the rest Copyright 2000, William Kalsbeek
  • Slide 16
  • 16 What May Lead to Problems in Sampling Race/Ethnic Minorities? l Incomplete Frame(s) A sizable portion of the population not linked to entries on the list(s) used for sampling l Rarity They usually comprising a relatively small percentage of the target population
  • Slide 17
  • Copyright 2000, William Kalsbeek 17 What May Lead to Problems in Sampling Race/Ethnic Minorities? l Mobility Some of them move around a lot, thus creating a more dynamic than static linkage between the frame and sampled population l Dispersion They are somewhat scattered geographically May have some pockets with relatively high concentrations
  • Slide 18
  • Copyright 2000, William Kalsbeek 18
  • Slide 19
  • Copyright 2000, William Kalsbeek 19 Some Remedies l Targeted Sampling Multiple Frame Methods Linkage Exploitation Methods Network/multiplicity sampling Snowball sampling Adaptive cluster sampling Time and Space Sampling l Oversampling Disproportionate Stratified Sampling with Screening
  • Slide 20
  • Copyright 2000, William Kalsbeek 20 Multiple Frame Methods: Selection Approaches l Premise: Frame options taken alone may be inadequate or too costly to use, BUT Choosing the sample jointly from multiple frames may: Produce better coverage of the targeted population and Be more cost-effective l Dual-Frame Designs --- Two frames
  • Slide 21
  • Copyright 2000, William Kalsbeek 21 Multiple Frames Frame A Frame C Frame B
  • Slide 22
  • Copyright 2000, William Kalsbeek 22 Multiple Frame Methods: EXAMPLE l Sampling Native Americans l Two frames: List of tribal rolls Less complete Less expensive to locate NAs Area household frame from: List of residential dwellings in a sample of block groups (neighborhoods) More complete More expensive because of the need to screen Most cost-effective mix = ?
  • Slide 23
  • Copyright 2000, William Kalsbeek 23 Multiple Frame Methods: Estimation Approaches l Work by Hartley (1962), Choudry (1989), and Skinner and Rao (1996) l Special Requirements: Identify/eliminate overlap prior to sampling OR Require knowledge of membership in intersection groups for analysis adjustments
  • Slide 24
  • Copyright 2000, William Kalsbeek 24 Multiple Frame Methods: Estimation Approaches l Eliminate frame duplication; treat as a stratified sample OR l Select with duplication present and either: Combine estimates for intersection groups OR Determine frame membership for sample respondents and weight accordingly
  • Slide 25
  • Copyright 2000, William Kalsbeek 25 Multiple Frame Methods: Implications for Sampling Race/Ethnic Minorities l Advantages: Improved sample coverage over using a single list Potential cost savings if cost of frame use differs among frames l Disadvantages: Higher design/selection/analysis complexity relative to single frame use Challenge in finding the most cost-effective mix of sample sizes for frames
  • Slide 26
  • Copyright 2000, William Kalsbeek 26 Linkage Exploitation Methods: Selection Approaches l Premise: Population members with a rare attribute can often identify others with the same attribute l Various adaptations: Based in the notion of multiplicity in frames Differ according to how multiplicity is utilized
  • Slide 27
  • Copyright 2000, William Kalsbeek 27 Multiplicity Frame Listing Population Member
  • Slide 28
  • Copyright 2000, William Kalsbeek 28 Linkage Exploitation Methods: Various Adaptations l Network/multiplicity sampling Network --- social/spatial/organizational linkage among members of the targeted subgroup EXAMPLES: relatives, friends, co-workers, co- habitants, organization co-members, etc. Linkages may be: Asymmetric Complex EXAMPLE: friends
  • Slide 29
  • Copyright 2000, William Kalsbeek 29 Linkage Exploitation Methods: Various Adaptations l Network/multiplicity sampling Sampling Process: Chose an initial sample of targeted subgroup Sample members interviewed and asked to nominate other members of their network who are members of the targeted subgroup Interview those nominated and have them nominate others in like manner Selection probability directly tied to size of network
  • Slide 30
  • Copyright 2000, William Kalsbeek 30 Linkage Exploitation Methods: Various Adaptations l Snowball sampling Network sampling but with multiple phases of nomination Snowballing may be best used to construct frames to sample rare populations Continue waves of nomination until list expansion ceases
  • Slide 31
  • Copyright 2000, William Kalsbeek 31 Linkage Exploitation Methods: Various Adaptations l Adaptive cluster sampling Exploits the tendency for members of some targeted subgroups to cluster together Original motivation from ecology and geology Sampling Process: Select a random sample of the population Where one identifies members of the targeted subgroup, sample others in the neighborhood
  • Slide 32
  • Copyright 2000, William Kalsbeek 32 Linkage Exploitation Methods: EXAMPLE l Snowballing: sampling frame of prenatal care providers l Study of recent female immigrants from Central and South America l Process: Contact OB-GYNs in private practices and public clinics Those providing prenatal care to immigrants nominate others doing the same Continue iteratively until the no new providers are discovered
  • Slide 33
  • Copyright 2000, William Kalsbeek 33 Linkage Exploitation Methods: Estimation l Major contributors: Sirken (network), Goodman (snowball), and Thompson (adaptive) l Approaches: Weighted multiplicity estimation (Sirken) Rao-Blackwellization to improve estimator efficiency (Thompson) l Special requirements: Network membership information Multiplicity counts
  • Slide 34
  • Copyright 2000, William Kalsbeek 34 Linkage Exploitation Methods: Implications for Sampling Race/Ethnic Minorities l Advantages: Greater operational efficiency in locating members of the target population Find a hotspot; then sample nearby l Disadvantages: Difficult to determine selection probabilities for weights Asymmetric linkages (A nominates B, but not vice versa) Valid probability samples?
  • Slide 35
  • Copyright 2000, William Kalsbeek 35 Time and Space Sampling: Selection Approach l Premise: Portions of ethnic subpopulations are relatively mobile (e.g., migrant farm workers, homeless) Sampling a chunk of time Linkage between members of the target subgroup and the frame is dynamic overtime Those moving more frequently have greater chance of selection Sample space and time to address this potential for bias
  • Slide 36
  • Copyright 2000, William Kalsbeek 36 Time and Space Sampling: EXAMPLE l Sampling migrant seasonal farm workers l Process: Spatial dimension: sample migrant housing locations On farms In other residential housing areas Time dimension: sample time periods during the data collection period Three consecutive days
  • Slide 37
  • Copyright 2000, William Kalsbeek 37 Time and Space Sampling: Estimation l Contributors: Kalsbeek (1988); Kalton (1991) l Approaches: Multiplicity estimators similar to those used in network samples l Special Requirements: Need multiplicity count for each sample member? Sampling scheme compromise needed between: Statistical precision of estimates Operational effectiveness
  • Slide 38
  • Copyright 2000, William Kalsbeek 38 Time and Space Sampling: Implications for Sampling Race/Ethnic Minorities l Advantages: Deals with the fluidity of frame-population linkage in mobile populations Provides a framework for finding a cost- efficient solution l Disadvantages: Added complexity to selection, data gathering, and analysis of sample
  • Slide 39
  • Copyright 2000, William Kalsbeek 39 Disproportionate Stratified Sampling with Screening: Selection Approach l Premise: Concentrations of the targeted subgroup vary in the population Sample strata with higher concentrations more heavily Result: larger sample size for the target subgroup relative to a proportionate sample
  • Slide 40
  • Copyright 2000, William Kalsbeek 40
  • Slide 41
  • Copyright 2000, William Kalsbeek 41 DSS with Screening: EXAMPLE l Oversampling African-Americans l A simple process: Stratify the population By relatively high and low concentrations of African- Americans High concentration areas in the South and large cities Sample with relatively higher rates in the high concentration stratum
  • Slide 42
  • Copyright 2000, William Kalsbeek 42 DSS with Screening: Estimation l Approaches: Weighted estimate to account for sample disproportionality Effect of variable weights is to lower precision of some population estimates l Special Requirements: Establishing the most cost-efficient overall and stratum-specific sampling rates
  • Slide 43
  • Copyright 2000, William Kalsbeek 43 DSS with Screening: Implications for Sampling Race/Ethnic Minorities l Advantages: Increased sample size for the targeted subgroups Are target subgroup non-members in the (oversampled) high concentration strata) l Disadvantages: Loss in precision on overall population estimates
  • Slide 44
  • Copyright 2000, William Kalsbeek 44 A Two-Stratum Model for Effects of Oversampling l Setting: Oversampling a minority group 10% of the population Two sampling strata: One with higher % minority (to oversample) One with lower % minority (to undersample) Two alternative sets of strata: Nearly Pure --- strata virtually all members or non- members Less Pure --- strata mostly all members or non- members
  • Slide 45
  • Copyright 2000, William Kalsbeek 45 Nearly Pure Strata Oversampled Stratum | Undersampled Stratum | TARGET POPULATION
  • Slide 46
  • Copyright 2000, William Kalsbeek 46 Less Pure Strata Undersampled Stratum | Oversampled Stratum | TARGET POPULATION
  • Slide 47
  • Copyright 2000, William Kalsbeek 47 A Two-Stratum Model for Effects of Oversampling l Assumptions: Simple random sampling in each stratum Stratum unit variances are equal Other minor simplifying conditions
  • Slide 48
  • Copyright 2000, William Kalsbeek 48 A Two-Stratum Model for Effects of Oversampling l Sample Sizes (Relative to Proportionate): Minority_Nom = Nominal Sample Size for Minority Observed increase in size of minority sample Due to oversampling of the predominantly minority stratum Minority_Eff = Effective Sample Size for Minority Adjusted size of minority sample Considering the (downward) effect of variable sample weights on statistical quality of estimates Overall_Eff = Effiective Size of Overall Sample Adjusted size of overall sample Considering the (downward) effect of variable sample weights on statistical quality of estimates
  • Slide 49
  • Copyright 2000, William Kalsbeek 49 Effects of Oversampling: Nearly Pure Strata
  • Slide 50
  • Copyright 2000, William Kalsbeek 50 Effects of Oversampling: Less Pure Strata
  • Slide 51
  • Copyright 2000, William Kalsbeek 51 Summary l Sampling rare ethnic groups is possible BUT l Accomplishing it effectively is likely to be: Complex (dealing with multiplicity, dealing with multiple frames, resolving statistical- operational dilemmas) Costly (screening, stratification) Adverse effect on overall population estimates (if oversampling done) Loss of sampling validity? (snowball sampling)
  • Slide 52
  • Copyright 2000, William Kalsbeek 52 A Case-Study in Oversampling Blacks and Mexican-Americans: The Third National Health and Nutrition Examination Survey (NHANESIII)
  • Slide 53
  • Copyright 2000, William Kalsbeek 53 Cluster Sampling: Random selection applied to one or more levels of a population hierarchy Sampling Stage = Level of hierarchy at which sampling is done Jargon: PSU = Primary Sampling Unit is what is sampled in the first selection stage SSU = Secondary Sampling Unit is what is sampled in the second stage
  • Slide 54
  • Copyright 2000, William Kalsbeek 54 Population Hierarchies: Population Member
  • Slide 55
  • Copyright 2000, William Kalsbeek 55 Population Hierarchies: EXAMPLE: African-American residents of the US non-institutionalized household population Resident > Household > Block Group > Census Tract > Minor Civil Division > County > State > US
  • Slide 56
  • Copyright 2000, William Kalsbeek 56 NHANES III Overview National health survey U.S. civilian noninstitutionalized population Stratified multi-stage sample design Detailed profile and predictors of health status Data gathering timeline: 1988-94 Data collected by: Face-to-face interviews in the home Detailed examination at mobile sites
  • Slide 57
  • Copyright 2000, William Kalsbeek 57 NHANES III Target Population l U.S. residents Two months and older Including those living in Alaska and Hawaii l Civilians only Excludes housing on military bases l Noninstitutionalized population only Excludes some residents of hospitals, nursing homes, prisons, and other comparable institutions l Eligibility determined as of the time of interview
  • Slide 58
  • Copyright 2000, William Kalsbeek 58 NHANES III in General Key minority domains: Black (non-Hispanic) Mexican American Children: 2 months 5 years The Elderly: > 60 years
  • Slide 59
  • Copyright 2000, William Kalsbeek 59
  • Slide 60
  • Copyright 2000, William Kalsbeek 60 Stratification to Oversample Key Minority Domains Applied at: The PSU level: Race/ethnicity or income indicator The segment level: Density of Mexican-Americans The household level: Race/ethnicity The (sample) person level: Age
  • Slide 61
  • Copyright 2000, William Kalsbeek 61 Oversampling of Key Minority Domains Implementation accomplished by: Disproportionate allocation favoring key minority domains Using a weighted measure of size: = overall sampling probability for the j-th among all cells of the cross-classification by the race/ethnicity and age categories that define the key minority domains = Measure of size for the same cross-classification within the ( -th) cluster
  • Slide 62
  • Copyright 2000, William Kalsbeek 62 Stratification to Oversample Key Minority Domains in NHANES III
  • Slide 63
  • Copyright 2000, William Kalsbeek 63 Stratification to Oversample Key Minority Domains in NHANES III Oversampling implies more widely variable selection probabilities and sample weights Effect of variable weights is to increase variances of estimates One model: Increased variance by a factor of, Variance of weights among sample respondents Mean of weights among sample respondents
  • Slide 64
  • Copyright 2000, William Kalsbeek 64 Stratification to Oversample Key Minority Domains in NHANES III EXAMPLE: Effect of variable sample weights on total population estimates using data from the MEC-examined NHANES III sample n = 23,561 9,397.04