Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
1/18/10
1
FPP Chapter 19
Surveys
Plan of Study Conducting surveys is not trivial. We will cover these
three main topics today. 1. Issues in questionnaire design
2. Methods for selecting units to survey
3. Administration of surveys
1/18/10
2
Kind of a short (maybe long) tangent
Statistical Literacy Statistics are often used to validate arguments, philosophies,
public policy decisions etc. It behooves us to be educated consumers of statistics
Lying with statistics “There are three kinds of lies: lies, damned lies and statistics” “If you torture data enough it will tell you what you want it to.”
Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best
Statistics: Concepts and Controversies by David Moore
1/18/10
3
Statistical Literacy An example from Joel Best’s book
In dissertation prospectus a student reported the following that was found in a 1995 reputable peer-reviewed journal
“Every year since 1950 the number of American children gunned down has doubled”
Best says that “I think that it may be the worst – that is, the most inaccurate -- social statistic ever”
What’s wrong with it?
Statistical Literacy Understand the measure Two reports in same newspaper (2/90)
Teens’ sexual activity on the rise. Teen’s sexual activity on the decline
First article: avg. age of first intercourse (17.2 females, 16.5 males), which was younger
Second article: avg. number of partners (6) and frequency (3 per month), which were lower.
Two analyses really tackled different question but conclusions presented the same way
Always find out what variable is being measured when judging or making statistical conclusions.
1/18/10
4
Statistical Literacy Which U.S. states are the worst polluters?
E.P.A. in 1993 ranked New Jersey 22nd worst in the nation in release of toxic chemicals. E.P.A used total pounds (38.6 million).
When using pounds per square mile, New Jersey was the 4th worst in the nation.
The two analyses assess pollution, but they use different variables.
Pay attention to variable used when judging statistical claims
Statistical Literacy “Statistics are no substitute for judgments” Henry Clay
1/18/10
5
Statistical Literacy Beware Hidden Agendas Survey paid for by disposable diaper company
“It is estimated that disposable diapers account for less than 2% of the trash in today’s landfills. In contrasts, beverage containers, third-class mail and yard wastes are estimate to account for about 21% of the trash in landfills. Given this, in your opinion would it be fair to ban disposable diapers?”
Question is prefaced with claims that only favor diapers which will lead respondents answers
When judging surveys get exact wording of questions
Statistical Literacy Beware hidden agendas 2
“Levi’s 501 Report, a fall fashion survey conducted annually on 100 campuses” sponsored by Levi’s
Report: 90% of college students chose Levi’s 501 jeans as “in” on campus
Levi’s 501 jeans was the only type of blue jeans on the list
1/18/10
6
Statistical Literacy Beware of hidden agendas 3
Advertisement for Triumph cigarettes:
“TRIUMPH BEATS MERIT – an amazing 60% said that Triumph tastes as good or better than Merid.”
Actual survey results 36% Triumph, 40% Merit, 24% no pref.
The company reported the survey results to make Triumph look as good as possible.
Statistical Literacy Economic questions
Did wages increase during the Reagan-Bush years (1980-1992)?
Average wages in private nonagricultural Production: $235 in 1980, $345 in 1992
The equivalent 1980 wage adjusted for inflation was $388. People were worse off!
Consider inflation when judging statistical statements about money.
1/18/10
7
Statistical Literacy Consider raw numbers
“Planes get closer in midair as traffic control errors rise.”
18% increase in errors, 1997 to 1998.
Wow! Huge increase. Flying was much more dangerous in 1998 compared to 1997
Actual error rates: 5.5 errors per million 1998 4.8 errors per million 1997
7 more errors per million isn’t as bad
Statistical Literacy Use the proper base
80% of all accidents happen within 10 miles of home
Well duh! Most driving is within ten miles of home, so most accidents should occur within 10 miles of home.
1/18/10
8
Statistical Literacy Use proper base
12th grade students ranking on math performance in various regions. (I.A.E.E.A. rankings 1991)
Hungary ranked near the bottom of list. Hong Kong ranked first
But … 50% of Hungary’s 12th graders took math 3% of Hong Kong’s 12th graders took math
It is likely that only the mathematically inclined students took math in Hong Kong while math education was more universal in Hungary
Average score in Hungary should be lower when including performance of weaker math students.
Statistical Literacy Good and bad graphs
1/18/10
9
Statistical Literacy What to do?
Ask yourself as many questions as possible Try to think of all possible interpretations Is it suspicious? If it is suspicious then investigate Make sure that it makes sense
Statistical Literacy Swine Flu
1/18/10
10
General Idea
Parameter
Statistic
Inference
Sample
Population
Some new vocabulary Population Sample Parameter Statistic Inference Bias Non-response bias Response bias Simple random sample Convenience sampling Frame coverage bias Judgment sampling Voluntary sampling Probably others that I’ve missed
1/18/10
11
Plan of Study Conducting surveys is not trivial. We will cover these
three main topics today. 1. Issues in questionnaire design
2. Methods for selecting units to survey
3. Administration of surveys
Challenges to writing good questions 1. Defining objectives and specifying the kind of answers needed to meet
objectives of the question 2. Ensuring all respondents have a shared, common understanding of the
question 3. Ensuring people are asked questions to which they know the answers 4. Asking questions respondents are able to answer in the terms required by the
question 5. Asking questions respondents are willing to answer accurately
These come from the book by Floyd Fowler, Jr. Improving Survey Questions, Sage Publications, 1995
Dr. Jerry Reiter created a fantastic article about asking good questions. It is on blackboard and I encourage you to read it and use it as a resource.
1/18/10
12
Challenge to writing a good question #1 Objective: Assess the mental health of people with arthritis (Pincus, 1993).
To meet this objective, the researchers used the Minnesota Multiphasic Personality Inventory, which is a standard battery of questions used to judge whether people are depressed. The Inventory contains a series of true/false questions, for example (a) “I am about as able to work as I ever was.” (b) “I am in just as good physical health as my friends.” (c) “I have few or no pains.”
People who answer “false” many times are considered to be depressed.
These questions are not informative for the objective of interest. For example, people with arthritis answer “false” to the questions because of the
nature of their disease.
Challenge to writing a good question #2 It is surprisingly easy to interpret questions differently, and it is
surprisingly hard to write questions that are interpreted consistently.
Here are questions from two separate 1992 polls of U.S. residents (Moore, 1995): (a) “Does it seem possible or does it seem impossible that the Nazi
extermination of the Jews never happened?” (b) “Does it seem possible to you that the Nazi extermination of the Jews
never happened, or do you feel certain that it happened?” For the first question, about 22% of respondents answered that it seems
possible. But, the question has a double negative, which easily could confuse people.
In the second poll, which has a much clearer phrasing, only 1% of people answered that it seems possible.
1/18/10
13
Challenge to writing a good question #3 Ability to answer question
1. Is your health plan a PPO, HMO, or fee for service plan?
It is unreasonable to expect people to know the terms PPO (preferred provider organization), HMO (health maintainance organization), or fee for service.
What can be done instead is to ask people questions that lead them to describe their plan. The researcher then can categorize the responses once the survey is complete.
Challenge to writing a good question #4 The form of the answer Here is a question that was asked to AIDS patients in a health
survey. “In the past 30 days, were you able to climb a flight of stairs
with no difficulty, with some difficulty, or were you not able to climb stairs at all?”
The problem with this question is that AIDS patients cannot answer it! Their condition varies tremendously from day to day. Some days they have the strength to climb stairs, and other days they do not.
1/18/10
14
Challenge to writing a good question #5 It’s surprising how many issues people are unwilling to
answer truthfully, or even answer at all, because they don’t want to be perceived as doing something socially undesirable.
“Did you vote in the presidential election of November 2000?”
“How many alcoholic drinks did you have altogether yesterday?”
Steps to running a survey 1. Establish the target population (this isn’t always easy)
2. Obtain a sampling frame (this can be very difficult)
3. Select a sample (this can be difficult)
4. Obtain data from the sampled units (this can be difficult)
In addition to being difficult all of the above could potentially be very expensive
1/18/10
15
Misspecifying target population 1994 Democratic gubernatorial primary in Arizona
All polls predict Eddie Basha would trail front-runner by at least 9 points
Result of election: Basha won
Target population used in polls: registered voters who had voted in previous primaries
Surveys that use census as sampling frame U.S. census often used as frame for many federal and social
surveys target population here is folks living in U.S.
U.S. census misses some people can you think of any examples?
Samples taken from frame are non-representative even before sampling
1/18/10
16
Selecting samples Units sampled should be representative of the target
population How do we ensure this?
Select a subset of units from the frame at random Most common method is to obtain a “simple random sample”
If random sample is large enough, it should have characteristics that mirror the characteristics of the population frame.
Obtaining survey data Remember the following when designing a survey
Imperative that purpose of survey is stated clearly
Confidentiality should be promised and kept At ISU there is a group that ok’s confidentiality of survey is met
Method for asking questions should be the same for all sampled units
1/18/10
17
Unreliable methods of selecting samples What follows are examples of how NOT to select a sample
Convenience sampling: Picking units that are easy to measure
Judgement sampling: Picking units you judge as representative of the population
Voluntary response sampling: Picking units who respond voluntarily
What are some examples of each?
Additional potential pitfalls Nonresponse bias:
Units that do not respond differ from those that do. These folks will be under representated.
Frame coverage bias: Frame doesn’t include all of target population
Can we think of some examples?
1/18/10
18
Example of voluntary response survey Nightline call-in poll:
Ted Koppel asked people to call his show to express their opinion on whether the United Nations should continue to have its headquarters in New York
186,000 people called in with 67% saying no. Independent random sample: 72% said yes.
Examples of problematic survey designs Shere Hite’s book, Women and Love: A Cultural Revolution in
Progress (1987), claims:
84% of women “not satisfied emotionally with their relationships” (pg. 804)
95% of women “report forms of emotional and psychological harassment from men with whom they are in love relationships” (pg. 810)
70% of women “married five or more years are having sex outside of their marriages” (pg. 856)
1/18/10
19
Hite’s survey To whom did she send a survey?
100,00 questionnaires mailed to professional women’s groups, counseling centers, church societies, and senior citizens’ centers.
Her target population was women. What was her actual represented population?
Hite’s survey What did the survey look like?
127 essay questions on questionnaire
4.5% of these questionnaires returned
What was not taken into account?
1/18/10
20
Hite’s survey How did she ask the questions?
Questions use vague words like “love”. People have different interpretations of such words
Questions were leading “Does your husband/lover treat you as an equal? Or are there times when
he seems to you as an inferior? Leave you out of decisions? Act superior?” (pg. 795)
Another problematic survey design The article “Abortion Rights Groups Surveying Voters’
Views”, by Jack Coffman, appeared in the December 26, 1989 issue of the St. Paul Pioneer Press Dispatch.
Problems with Minnesota survey
1/18/10
21
Random sampling comment 1 Say you collect data on units using a method other than a random
sample, and you know these data are not representative of the population of interest. Then, you take a random sample from these collected data. This random sample is representative of the population.
Wrongo !!
Large random samples are representative of the population in the frame. Effectively, this methods uses the unrepresentative, collected data as a
frame. By randomly sampling from a unrepresentative sample, you just get a
smaller unrepresentative sample.
Random sampling comment 2 Say you obtain data that are representative of the target
population. Should you take a random sample from these collected data? This question arises when researchers use data collected by
others, for example in a Stat 101 project.
No!
If you have a representative sample, use it. This sub-sampling method just reduces the amount of data
you work with
1/18/10
22
Random sampling comment 3 A census is a measurement of outcomes for all units in the population. For example the
U.S.. Government does a census of the population every 10 years to apportion seats in the House of Representatives. It also takes censuses of agriculture and business.
Why do survey instead of census?
Surveys are cheaper They require much fewer people to contact
Surveys results can be obtained more quickly Same reason as above This is important because we want to make policy decisions on current answers not
answers that are months or years old. Surveys can be more accurate
Fewer people to contact, less problems with interviewer effects and non-response bias Up shot: less data of high quality is better than more data of poor quality
Random sampling comment 4 Most major surveys are not simple random samples They involve multiple stages of random selection
e.g., randomly pick 100 cities. From these cities random pick 500 households, then random pick 1 person from each household
Data collection like this are NOT representative of the population. However, because units are selected randomly, statistician can account for the non-representation.
This is done by assigning a weight to each observation that reflects how many units it represents in the population A good question to ask here would be: Where do the weights come
from? Generally when analyzing data from surveys that are not simple
random samples it is wise to contact a professional statistician
1/18/10
23
Up Shot Conducting GOOD surveys isn’t trivial
Requires tons of work in the preparation phase Identifying population Obtaining representative sampling frame Creating a well written questionnaire
Lots of work collecting data Depending on the above analysis may be difficult
Final projects