27
COVERAGE AND COVERAGE AND SAMPLING SAMPLING Damon Burton Damon Burton University of Idaho University of Idaho

COVERAGE AND SAMPLING Damon Burton University of Idaho

Embed Size (px)

Citation preview

Page 1: COVERAGE AND SAMPLING Damon Burton University of Idaho

COVERAGE AND COVERAGE AND SAMPLINGSAMPLING

Damon BurtonDamon Burton

University of IdahoUniversity of Idaho

Page 2: COVERAGE AND SAMPLING Damon Burton University of Idaho

What do each of these What do each of these important sampling terms important sampling terms mean?mean?

Page 3: COVERAGE AND SAMPLING Damon Burton University of Idaho

ESSENTIAL SAMPLING ESSENTIAL SAMPLING DEFINITIONSDEFINITIONS

Survey PopulationSurvey Population -- -- consists of all consists of all units (i.e., individuals, households, units (i.e., individuals, households, organizations) to which one desires organizations) to which one desires to generalize survey results. to generalize survey results.

Sample FrameSample Frame -- -- list from which a list from which a sample is to be drawn in order to sample is to be drawn in order to represent the survey population. represent the survey population.

SampleSample -- -- consists of all units of the consists of all units of the population that are drawn for population that are drawn for inclusion in the survey. inclusion in the survey.

Page 4: COVERAGE AND SAMPLING Damon Burton University of Idaho

ESSENTIAL SAMPLING ESSENTIAL SAMPLING DEFINITIONSDEFINITIONS

Completed SampleCompleted Sample -- -- consists of all consists of all units (i.e., persons) that complete units (i.e., persons) that complete the survey. the survey.

Coverage ErrorCoverage Error – – results from every results from every unit in the survey population not unit in the survey population not having a known, nonzero chance of having a known, nonzero chance of being included in the sample. being included in the sample.

Sampling ErrorSampling Error – – is the result of is the result of collecting data from only a subset, collecting data from only a subset, rather than all, members of the rather than all, members of the sampling frame. sampling frame.

Page 5: COVERAGE AND SAMPLING Damon Burton University of Idaho

COVERAGE COVERAGE CONSIDERATIONSCONSIDERATIONS

Telephone coverageTelephone coverage

Internet coverageInternet coverage

Mail coverageMail coverage

Page 6: COVERAGE AND SAMPLING Damon Burton University of Idaho

TELEPHONE COVERAGETELEPHONE COVERAGE

In 2000, telephones were regarded as the In 2000, telephones were regarded as the best survey mode for general surveys best survey mode for general surveys becausebecause high coverage (i.e., 90% of Americans had high coverage (i.e., 90% of Americans had

phones),phones), Random Digit Dialing (RDD) procedures Random Digit Dialing (RDD) procedures

allowed sampling of most phone users,allowed sampling of most phone users, People were ameanable to answering survey People were ameanable to answering survey

questions over the phone.questions over the phone.

By 2003, half of all US citizens used cell By 2003, half of all US citizens used cell phones, and by 2007, 16% had only cell phones, and by 2007, 16% had only cell service. service.

Today almost 20% of US adults would be Today almost 20% of US adults would be excluded by RDD sampling procedures. excluded by RDD sampling procedures.

Page 7: COVERAGE AND SAMPLING Damon Burton University of Idaho

INTERNET COVERAGEINTERNET COVERAGE

The internet is a useful mode for The internet is a useful mode for conducting surveys for specific conducting surveys for specific populations who have service (e.g., populations who have service (e.g., students, professionals, & businesses), students, professionals, & businesses), but it has significant coverage gaps with but it has significant coverage gaps with the general population.the general population. As of 2007, only 71% of Americans used the As of 2007, only 71% of Americans used the

internet at least occasionally.internet at least occasionally. Only 67% had internet service in their Only 67% had internet service in their

homes.homes. Only 47% had high-speed home internet Only 47% had high-speed home internet

service with 23% having dial-up and 29% service with 23% having dial-up and 29% having no home internet access.having no home internet access.

Internet growth seems to be slowing.Internet growth seems to be slowing.

Page 8: COVERAGE AND SAMPLING Damon Burton University of Idaho

PROBLEMS WITH INTERNET PROBLEMS WITH INTERNET FOR POPULATION SURVEYSFOR POPULATION SURVEYS

No list of all, or most, internet subscribers is No list of all, or most, internet subscribers is available (i.e., sampling frame).available (i.e., sampling frame).

No simple procedure is available for drawing No simple procedure is available for drawing samples in which individuals, or households, samples in which individuals, or households, have a known, nonzero chance of inclusion.have a known, nonzero chance of inclusion.

People’s ability to use the internet varies People’s ability to use the internet varies significantly, even in households with good significantly, even in households with good access.access.

Because internet providers are private, not Because internet providers are private, not public, legal and cultural barriers prevent public, legal and cultural barriers prevent contacting randomly generated email addresses.contacting randomly generated email addresses.

Web surveyors often use self-selected panels of Web surveyors often use self-selected panels of respondents, creating a number of sampling respondents, creating a number of sampling issues. issues.

Page 9: COVERAGE AND SAMPLING Damon Burton University of Idaho

MAIL COVERAGEMAIL COVERAGE

Phone books once were good sources of Phone books once were good sources of addresses for mail surveys.addresses for mail surveys.

By 1990, 25% of households had unlisted By 1990, 25% of households had unlisted numbers, and cell phone-only households rose numbers, and cell phone-only households rose sharply.sharply.

Address-based sampling has become more Address-based sampling has become more feasible with US Postal Service DSF lists. feasible with US Postal Service DSF lists. DSF is an electronic file containing all delivery DSF is an electronic file containing all delivery

point addresses by USPS.point addresses by USPS. Names are provided for addresses except PO Names are provided for addresses except PO

boxes.boxes. DSF can’t tell homes from businesses.DSF can’t tell homes from businesses. Geocoding is possible for stratified sampling or Geocoding is possible for stratified sampling or

targeting specific populations.targeting specific populations. DSF is available thru vendors, each has different DSF is available thru vendors, each has different

processes for managing and updating lists.processes for managing and updating lists.

Page 10: COVERAGE AND SAMPLING Damon Burton University of Idaho

MAIL COVERAGEMAIL COVERAGE

Missing addresses for multiperson dwellings (e.g., Missing addresses for multiperson dwellings (e.g., apartments) are problematic.apartments) are problematic.

Initial evaluations have shown DSF mail surveys Initial evaluations have shown DSF mail surveys with a reminder mailing resulted in 4-7% higher with a reminder mailing resulted in 4-7% higher response rates than RDD surveys.response rates than RDD surveys.

RDD and DSF surveys overrepresent white, non-RDD and DSF surveys overrepresent white, non-Hispanic individuals with higher education levels Hispanic individuals with higher education levels who are married. who are married.

Other lists sometimes used include licensed Other lists sometimes used include licensed drivers, utility users, registered voters, and drivers, utility users, registered voters, and homeowners. homeowners.

General lists may be compiled from multiple General lists may be compiled from multiple sources, including: sources, including: credit card holders, telephone credit card holders, telephone directories, magazine subscribers, bank directories, magazine subscribers, bank depositors, organization membership lists, catalog depositors, organization membership lists, catalog and internet customers, and other sources.and internet customers, and other sources.

Page 11: COVERAGE AND SAMPLING Damon Burton University of Idaho

What are the major What are the major coverage issues for coverage issues for phone, internet and mail phone, internet and mail surveys?surveys?

Page 12: COVERAGE AND SAMPLING Damon Burton University of Idaho

REDUCING COVERAGE REDUCING COVERAGE ERRORSERRORS

Many surveys are designed for Many surveys are designed for special populations. special populations.

You need to know how a You need to know how a specific list is compiled, specific list is compiled, maintained and used. maintained and used.

5 important questions to ask 5 important questions to ask about any potential sampling about any potential sampling list. list.

Page 13: COVERAGE AND SAMPLING Damon Burton University of Idaho

COVERAGE QUESTION 1COVERAGE QUESTION 1

Does the list contain everyone Does the list contain everyone in the survey population? in the survey population?

If not, determine whether If not, determine whether getting the remainder of the getting the remainder of the people on the list is possible. people on the list is possible.

Evaluate the consequences of Evaluate the consequences of not obtaining excluded names. not obtaining excluded names.

Page 14: COVERAGE AND SAMPLING Damon Burton University of Idaho

COVERAGE QUESTION 2COVERAGE QUESTION 2

Does the list include names of Does the list include names of people who are not in the study people who are not in the study population? population?

If so, learning up front exactly If so, learning up front exactly who is on the list and why would who is on the list and why would have allowed respondents to only have allowed respondents to only answer questions appropriate for answer questions appropriate for them. them.

This targeting strategy would This targeting strategy would save valuable resources. save valuable resources.

Page 15: COVERAGE AND SAMPLING Damon Burton University of Idaho

COVERAGE QUESTION 3COVERAGE QUESTION 3

How is the list maintained and How is the list maintained and updated? updated?

You may need to check the You may need to check the accuracy of addresses before accuracy of addresses before surveying. surveying.

Accuracy depends on continual Accuracy depends on continual updating of addresses on list.updating of addresses on list.

Page 16: COVERAGE AND SAMPLING Damon Burton University of Idaho

COVERAGE QUESTION 4COVERAGE QUESTION 4

Are the same sample units Are the same sample units included on the list more than included on the list more than once? once?

Customers’ names may be added Customers’ names may be added to the list each time they order if to the list each time they order if a slightly different name or a slightly different name or address are given. address are given.

Divorced parents are often on the Divorced parents are often on the list twice compared to married list twice compared to married parents only once.parents only once.

Page 17: COVERAGE AND SAMPLING Damon Burton University of Idaho

COVERAGE QUESTION 5COVERAGE QUESTION 5

Does the list contain other Does the list contain other information that can be used to information that can be used to improve the survey? improve the survey?

Use mixed modes for different Use mixed modes for different aspects of the survey process. aspects of the survey process.

Age and gender can be used to Age and gender can be used to identify nonresponse error.identify nonresponse error.

What other information would be What other information would be valuable?valuable?

Page 18: COVERAGE AND SAMPLING Damon Burton University of Idaho

RESPONDENT SELECTIONRESPONDENT SELECTIONSamples drawn from phone books in the Samples drawn from phone books in the 1970-1990’s typically produced a higher 1970-1990’s typically produced a higher proportion of male respondents, even when proportion of male respondents, even when letters requested females complete the letters requested females complete the survey. survey.

Women are more likely to participate in Women are more likely to participate in phone surveys because they answer the phone surveys because they answer the phone more often. phone more often.

Commonly ask for “the adult with the most Commonly ask for “the adult with the most recent birthday” to randomize respondents recent birthday” to randomize respondents in the household.in the household.

Other surveys target “the individual who Other surveys target “the individual who shops for groceries most often,” “who shops for groceries most often,” “who makes the investment decisions,” or “who makes the investment decisions,” or “who purchases the computer.”purchases the computer.”

Page 19: COVERAGE AND SAMPLING Damon Burton University of Idaho

COVERAGE OUTCOMESCOVERAGE OUTCOMESThe goal is that every unit in the survey The goal is that every unit in the survey population appears on the sample frame list population appears on the sample frame list only once, so the survey population is only once, so the survey population is prepared for actual sampling. prepared for actual sampling.

Often researchers must decide what amount Often researchers must decide what amount of coverage is acceptable. of coverage is acceptable.

Do alternatives exist? What is the cost of Do alternatives exist? What is the cost of those alternatives? Can the coverage error those alternatives? Can the coverage error be accurately assessed?be accurately assessed?

Mixed mode surveys are a possibility. For Mixed mode surveys are a possibility. For example, most of the survey is conducted on example, most of the survey is conducted on the internet, but hard copy surveys are the internet, but hard copy surveys are mailed to the portion of the sample who mailed to the portion of the sample who don’t have internet access.don’t have internet access.

Page 20: COVERAGE AND SAMPLING Damon Burton University of Idaho

PROBABILITY SAMPLINGPROBABILITY SAMPLING

Sampling error is the type of error that Sampling error is the type of error that occurs because information is occurs because information is requested from only a sample of the requested from only a sample of the population rather than the entire population rather than the entire sample. sample.

The first step in drawing a sample is to The first step in drawing a sample is to understand the number of properly understand the number of properly selected respondents necessary for selected respondents necessary for generalizing results to the population generalizing results to the population and with what degree of accuracy. and with what degree of accuracy.

Page 21: COVERAGE AND SAMPLING Damon Burton University of Idaho

How do I calculate the How do I calculate the desired sample size for a desired sample size for a survey study?survey study?

Page 22: COVERAGE AND SAMPLING Damon Burton University of Idaho

HOW LARGE SHOULD A HOW LARGE SHOULD A SAMPLE BE?SAMPLE BE?

The size of the sample, not the The size of the sample, not the proportion sampled, is what affects proportion sampled, is what affects precision. precision.

The formula takes into accountThe formula takes into account How much sampling error can be tolerated How much sampling error can be tolerated

within a given confidence interval,within a given confidence interval, The amount of confidence one wishes to The amount of confidence one wishes to

have in the estimates,have in the estimates, How varied the population is with respect to How varied the population is with respect to

the characteristic of interest, andthe characteristic of interest, and The size of the population from which the The size of the population from which the

sample is drawn. sample is drawn.

Page 23: COVERAGE AND SAMPLING Damon Burton University of Idaho

SAMPLE SIZESAMPLE SIZE

(Np)(p)(1-p)(Np)(p)(1-p)

Ns =Ns =

(Np-1)(B/C)2+(p)(1-p)(Np-1)(B/C)2+(p)(1-p)

Formula termsFormula terms Ns = the completed sample size needed for the Ns = the completed sample size needed for the

desired level of precision.desired level of precision. Np = the size of the population,Np = the size of the population, p = the proportion of the population expected p = the proportion of the population expected

to to choose one of the 2 response choose one of the 2 response categories,categories,

B = margin of error (i.e., half of the desired B = margin of error (i.e., half of the desired confidence interval width such as 3%),confidence interval width such as 3%),

C = Z score associated with the confidence level C = Z score associated with the confidence level (i.e., 1.96 corresponds with a 95% confidence (i.e., 1.96 corresponds with a 95% confidence level). level).

Page 24: COVERAGE AND SAMPLING Damon Burton University of Idaho

How do I draw a good How do I draw a good simple random sample?simple random sample?

Page 25: COVERAGE AND SAMPLING Damon Burton University of Idaho

5 SAMPLING PREMISES5 SAMPLING PREMISES

1.1. Relatively few completed questionnaires can Relatively few completed questionnaires can provide surprising precision at a high level of provide surprising precision at a high level of confidence. confidence.

2.2. Among large populations, there is virtually no Among large populations, there is virtually no difference in the completed sample size needed difference in the completed sample size needed for a given confidence level of precision. for a given confidence level of precision.

3.3. Within small populations, greater proportions Within small populations, greater proportions of the population are needed to be surveyed to of the population are needed to be surveyed to achieve estimates with a given margin of error.achieve estimates with a given margin of error.

4.4. At higher levels of sample size, increasing your At higher levels of sample size, increasing your sample size yield smaller and smaller sample size yield smaller and smaller reductions in margin of error.reductions in margin of error.

5.5. Completed sample sizes must be much larger if Completed sample sizes must be much larger if one wants to make precise estimates for one wants to make precise estimates for subgroups of the population.subgroups of the population.

Page 26: COVERAGE AND SAMPLING Damon Burton University of Idaho

DRAWING A SIMPLE DRAWING A SIMPLE RANDOM SAMPLERANDOM SAMPLE

Typically numbers are assigned to every member Typically numbers are assigned to every member of the sample frame, and the computer randomly of the sample frame, and the computer randomly selects a certain number of respondents. selects a certain number of respondents.

Sometimes comparisons require sampling Sometimes comparisons require sampling different segments of the population unequally. different segments of the population unequally. Comparing employees who have worked for a c Comparing employees who have worked for a c company more or less than 6 months requires company more or less than 6 months requires weighted sampling. weighted sampling.

Because employees with less than 6 months Because employees with less than 6 months service represent only 5% of the workforce, more service represent only 5% of the workforce, more veteran employees have a 20% great chance of veteran employees have a 20% great chance of being select. If you need equal numbers for being select. If you need equal numbers for these 2 groups, you’ll need to sample a higher these 2 groups, you’ll need to sample a higher percentange of new than older employees.percentange of new than older employees.

Page 27: COVERAGE AND SAMPLING Damon Burton University of Idaho

TheThe

EndEnd