23
1/18/10 1 FPP Chapter 19 Surveys Plan of Study Conducting surveys is not trivial. We will cover these three main topics today. 1. Issues in questionnaire design 2. Methods for selecting units to survey 3. Administration of surveys

Surveys - Statistical Sciencegp42/sta101/notes/FPP19_2pp.pdfDamned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best Statistics: Concepts

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Surveys - Statistical Sciencegp42/sta101/notes/FPP19_2pp.pdfDamned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best Statistics: Concepts

1/18/10

1

FPP Chapter 19

Surveys

Plan of Study   Conducting surveys is not trivial. We will cover these

three main topics today. 1.  Issues in questionnaire design

2.  Methods for selecting units to survey

3.  Administration of surveys

Page 2: Surveys - Statistical Sciencegp42/sta101/notes/FPP19_2pp.pdfDamned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best Statistics: Concepts

1/18/10

2

Kind of a short (maybe long) tangent

Statistical Literacy  Statistics are often used to validate arguments, philosophies,

public policy decisions etc.   It behooves us to be educated consumers of statistics

 Lying with statistics  “There are three kinds of lies: lies, damned lies and statistics”  “If you torture data enough it will tell you what you want it to.”

 Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best

 Statistics: Concepts and Controversies by David Moore

Page 3: Surveys - Statistical Sciencegp42/sta101/notes/FPP19_2pp.pdfDamned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best Statistics: Concepts

1/18/10

3

Statistical Literacy  An example from Joel Best’s book

 In dissertation prospectus a student reported the following that was found in a 1995 reputable peer-reviewed journal

 “Every year since 1950 the number of American children gunned down has doubled”

  Best says that “I think that it may be the worst – that is, the most inaccurate -- social statistic ever”

 What’s wrong with it?

Statistical Literacy   Understand the measure   Two reports in same newspaper (2/90)

  Teens’ sexual activity on the rise.   Teen’s sexual activity on the decline

  First article: avg. age of first intercourse (17.2 females, 16.5 males), which was younger

  Second article: avg. number of partners (6) and frequency (3 per month), which were lower.

  Two analyses really tackled different question but conclusions presented the same way

  Always find out what variable is being measured when judging or making statistical conclusions.

Page 4: Surveys - Statistical Sciencegp42/sta101/notes/FPP19_2pp.pdfDamned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best Statistics: Concepts

1/18/10

4

Statistical Literacy  Which U.S. states are the worst polluters?

  E.P.A. in 1993 ranked New Jersey 22nd worst in the nation in release of toxic chemicals. E.P.A used total pounds (38.6 million).

 When using pounds per square mile, New Jersey was the 4th worst in the nation.

  The two analyses assess pollution, but they use different variables.

  Pay attention to variable used when judging statistical claims

Statistical Literacy   “Statistics are no substitute for judgments” Henry Clay

Page 5: Surveys - Statistical Sciencegp42/sta101/notes/FPP19_2pp.pdfDamned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best Statistics: Concepts

1/18/10

5

Statistical Literacy   Beware Hidden Agendas   Survey paid for by disposable diaper company

  “It is estimated that disposable diapers account for less than 2% of the trash in today’s landfills. In contrasts, beverage containers, third-class mail and yard wastes are estimate to account for about 21% of the trash in landfills. Given this, in your opinion would it be fair to ban disposable diapers?”

 Question is prefaced with claims that only favor diapers which will lead respondents answers

 When judging surveys get exact wording of questions

Statistical Literacy  Beware hidden agendas 2

  “Levi’s 501 Report, a fall fashion survey conducted annually on 100 campuses” sponsored by Levi’s

 Report: 90% of college students chose Levi’s 501 jeans as “in” on campus

 Levi’s 501 jeans was the only type of blue jeans on the list

Page 6: Surveys - Statistical Sciencegp42/sta101/notes/FPP19_2pp.pdfDamned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best Statistics: Concepts

1/18/10

6

Statistical Literacy   Beware of hidden agendas 3

  Advertisement for Triumph cigarettes:

  “TRIUMPH BEATS MERIT – an amazing 60% said that Triumph tastes as good or better than Merid.”

  Actual survey results  36% Triumph, 40% Merit, 24% no pref.

  The company reported the survey results to make Triumph look as good as possible.

Statistical Literacy   Economic questions

 Did wages increase during the Reagan-Bush years (1980-1992)?

  Average wages in private nonagricultural  Production: $235 in 1980, $345 in 1992

  The equivalent 1980 wage adjusted for inflation was $388. People were worse off!

  Consider inflation when judging statistical statements about money.

Page 7: Surveys - Statistical Sciencegp42/sta101/notes/FPP19_2pp.pdfDamned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best Statistics: Concepts

1/18/10

7

Statistical Literacy   Consider raw numbers

  “Planes get closer in midair as traffic control errors rise.”

  18% increase in errors, 1997 to 1998.

  Wow! Huge increase. Flying was much more dangerous in 1998 compared to 1997

  Actual error rates:   5.5 errors per million 1998   4.8 errors per million 1997

  7 more errors per million isn’t as bad

Statistical Literacy  Use the proper base

 80% of all accidents happen within 10 miles of home

 Well duh! Most driving is within ten miles of home, so most accidents should occur within 10 miles of home.

Page 8: Surveys - Statistical Sciencegp42/sta101/notes/FPP19_2pp.pdfDamned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best Statistics: Concepts

1/18/10

8

Statistical Literacy   Use proper base

  12th grade students ranking on math performance in various regions. (I.A.E.E.A. rankings 1991)

  Hungary ranked near the bottom of list.   Hong Kong ranked first

  But …   50% of Hungary’s 12th graders took math   3% of Hong Kong’s 12th graders took math

  It is likely that only the mathematically inclined students took math in Hong Kong while math education was more universal in Hungary

  Average score in Hungary should be lower when including performance of weaker math students.

Statistical Literacy  Good and bad graphs

Page 9: Surveys - Statistical Sciencegp42/sta101/notes/FPP19_2pp.pdfDamned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best Statistics: Concepts

1/18/10

9

Statistical Literacy  What to do?

 Ask yourself as many questions as possible  Try to think of all possible interpretations  Is it suspicious?  If it is suspicious then investigate  Make sure that it makes sense

Statistical Literacy  Swine Flu

Page 10: Surveys - Statistical Sciencegp42/sta101/notes/FPP19_2pp.pdfDamned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best Statistics: Concepts

1/18/10

10

General Idea

Parameter

Statistic

Inference

Sample

Population

Some new vocabulary   Population   Sample   Parameter   Statistic   Inference   Bias   Non-response bias   Response bias   Simple random sample   Convenience sampling   Frame coverage bias   Judgment sampling   Voluntary sampling   Probably others that I’ve missed

Page 11: Surveys - Statistical Sciencegp42/sta101/notes/FPP19_2pp.pdfDamned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best Statistics: Concepts

1/18/10

11

Plan of Study   Conducting surveys is not trivial. We will cover these

three main topics today. 1.  Issues in questionnaire design

2.  Methods for selecting units to survey

3.  Administration of surveys

Challenges to writing good questions 1.  Defining objectives and specifying the kind of answers needed to meet

objectives of the question 2.  Ensuring all respondents have a shared, common understanding of the

question 3.  Ensuring people are asked questions to which they know the answers 4.  Asking questions respondents are able to answer in the terms required by the

question 5.  Asking questions respondents are willing to answer accurately

  These come from the book by Floyd Fowler, Jr. Improving Survey Questions, Sage Publications, 1995

  Dr. Jerry Reiter created a fantastic article about asking good questions. It is on blackboard and I encourage you to read it and use it as a resource.

Page 12: Surveys - Statistical Sciencegp42/sta101/notes/FPP19_2pp.pdfDamned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best Statistics: Concepts

1/18/10

12

Challenge to writing a good question #1   Objective: Assess the mental health of people with arthritis (Pincus, 1993).

  To meet this objective, the researchers used the Minnesota Multiphasic Personality Inventory, which is a standard battery of questions used to judge whether people are depressed. The Inventory contains a series of true/false questions, for example   (a) “I am about as able to work as I ever was.”   (b) “I am in just as good physical health as my friends.”   (c) “I have few or no pains.”

  People who answer “false” many times are considered to be depressed.

  These questions are not informative for the objective of interest.   For example, people with arthritis answer “false” to the questions because of the

nature of their disease.

Challenge to writing a good question #2   It is surprisingly easy to interpret questions differently, and it is

surprisingly hard to write questions that are interpreted consistently.

  Here are questions from two separate 1992 polls of U.S. residents (Moore, 1995):   (a) “Does it seem possible or does it seem impossible that the Nazi

extermination of the Jews never happened?”   (b) “Does it seem possible to you that the Nazi extermination of the Jews

never happened, or do you feel certain that it happened?”   For the first question, about 22% of respondents answered that it seems

possible.   But, the question has a double negative, which easily could confuse people.

  In the second poll, which has a much clearer phrasing, only 1% of people answered that it seems possible.

Page 13: Surveys - Statistical Sciencegp42/sta101/notes/FPP19_2pp.pdfDamned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best Statistics: Concepts

1/18/10

13

Challenge to writing a good question #3  Ability to answer question

 1. Is your health plan a PPO, HMO, or fee for service plan?

  It is unreasonable to expect people to know the terms PPO (preferred provider organization), HMO (health maintainance organization), or fee for service.

 What can be done instead is to ask people questions that lead them to describe their plan. The researcher then can categorize the responses once the survey is complete.

Challenge to writing a good question #4  The form of the answer  Here is a question that was asked to AIDS patients in a health

survey.  “In the past 30 days, were you able to climb a flight of stairs

with no difficulty, with some difficulty, or were you not able to climb stairs at all?”

 The problem with this question is that AIDS patients cannot answer it! Their condition varies tremendously from day to day. Some days they have the strength to climb stairs, and other days they do not.

Page 14: Surveys - Statistical Sciencegp42/sta101/notes/FPP19_2pp.pdfDamned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best Statistics: Concepts

1/18/10

14

Challenge to writing a good question #5   It’s surprising how many issues people are unwilling to

answer truthfully, or even answer at all, because they don’t want to be perceived as doing something socially undesirable.

  “Did you vote in the presidential election of November 2000?”

  “How many alcoholic drinks did you have altogether yesterday?”

Steps to running a survey 1.  Establish the target population (this isn’t always easy)

2.  Obtain a sampling frame (this can be very difficult)

3.  Select a sample (this can be difficult)

4.  Obtain data from the sampled units (this can be difficult)

  In addition to being difficult all of the above could potentially be very expensive

Page 15: Surveys - Statistical Sciencegp42/sta101/notes/FPP19_2pp.pdfDamned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best Statistics: Concepts

1/18/10

15

Misspecifying target population  1994 Democratic gubernatorial primary in Arizona

 All polls predict Eddie Basha would trail front-runner by at least 9 points

 Result of election: Basha won

 Target population used in polls: registered voters who had voted in previous primaries

Surveys that use census as sampling frame  U.S. census often used as frame for many federal and social

surveys  target population here is folks living in U.S.

 U.S. census misses some people  can you think of any examples?

 Samples taken from frame are non-representative even before sampling

Page 16: Surveys - Statistical Sciencegp42/sta101/notes/FPP19_2pp.pdfDamned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best Statistics: Concepts

1/18/10

16

Selecting samples  Units sampled should be representative of the target

population  How do we ensure this?

 Select a subset of units from the frame at random  Most common method is to obtain a “simple random sample”

  If random sample is large enough, it should have characteristics that mirror the characteristics of the population frame.

Obtaining survey data  Remember the following when designing a survey

 Imperative that purpose of survey is stated clearly

 Confidentiality should be promised and kept   At ISU there is a group that ok’s confidentiality of survey is met

 Method for asking questions should be the same for all sampled units

Page 17: Surveys - Statistical Sciencegp42/sta101/notes/FPP19_2pp.pdfDamned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best Statistics: Concepts

1/18/10

17

Unreliable methods of selecting samples  What follows are examples of how NOT to select a sample

 Convenience sampling:   Picking units that are easy to measure

 Judgement sampling:   Picking units you judge as representative of the population

 Voluntary response sampling:   Picking units who respond voluntarily

 What are some examples of each?

Additional potential pitfalls  Nonresponse bias:

 Units that do not respond differ from those that do. These folks will be under representated.

 Frame coverage bias:  Frame doesn’t include all of target population

 Can we think of some examples?

Page 18: Surveys - Statistical Sciencegp42/sta101/notes/FPP19_2pp.pdfDamned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best Statistics: Concepts

1/18/10

18

Example of voluntary response survey  Nightline call-in poll:

Ted Koppel asked people to call his show to express their opinion on whether the United Nations should continue to have its headquarters in New York

186,000 people called in with 67% saying no. Independent random sample: 72% said yes.

Examples of problematic survey designs  Shere Hite’s book, Women and Love: A Cultural Revolution in

Progress (1987), claims:

 84% of women “not satisfied emotionally with their relationships” (pg. 804)

 95% of women “report forms of emotional and psychological harassment from men with whom they are in love relationships” (pg. 810)

 70% of women “married five or more years are having sex outside of their marriages” (pg. 856)

Page 19: Surveys - Statistical Sciencegp42/sta101/notes/FPP19_2pp.pdfDamned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best Statistics: Concepts

1/18/10

19

Hite’s survey  To whom did she send a survey?

 100,00 questionnaires mailed to professional women’s groups, counseling centers, church societies, and senior citizens’ centers.

 Her target population was women. What was her actual represented population?

Hite’s survey  What did the survey look like?

 127 essay questions on questionnaire

 4.5% of these questionnaires returned

 What was not taken into account?

Page 20: Surveys - Statistical Sciencegp42/sta101/notes/FPP19_2pp.pdfDamned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best Statistics: Concepts

1/18/10

20

Hite’s survey  How did she ask the questions?

 Questions use vague words like “love”.   People have different interpretations of such words

 Questions were leading   “Does your husband/lover treat you as an equal? Or are there times when

he seems to you as an inferior? Leave you out of decisions? Act superior?” (pg. 795)

Another problematic survey design  The article “Abortion Rights Groups Surveying Voters’

Views”, by Jack Coffman, appeared in the December 26, 1989 issue of the St. Paul Pioneer Press Dispatch.

 Problems with Minnesota survey

Page 21: Surveys - Statistical Sciencegp42/sta101/notes/FPP19_2pp.pdfDamned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best Statistics: Concepts

1/18/10

21

Random sampling comment 1   Say you collect data on units using a method other than a random

sample, and you know these data are not representative of the population of interest. Then, you take a random sample from these collected data. This random sample is representative of the population.

  Wrongo !!

  Large random samples are representative of the population in the frame.   Effectively, this methods uses the unrepresentative, collected data as a

frame.   By randomly sampling from a unrepresentative sample, you just get a

smaller unrepresentative sample.

Random sampling comment 2  Say you obtain data that are representative of the target

population. Should you take a random sample from these collected data?  This question arises when researchers use data collected by

others, for example in a Stat 101 project.

 No!

  If you have a representative sample, use it.   This sub-sampling method just reduces the amount of data

you work with

Page 22: Surveys - Statistical Sciencegp42/sta101/notes/FPP19_2pp.pdfDamned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best Statistics: Concepts

1/18/10

22

Random sampling comment 3   A census is a measurement of outcomes for all units in the population. For example the

U.S.. Government does a census of the population every 10 years to apportion seats in the House of Representatives. It also takes censuses of agriculture and business.

  Why do survey instead of census?

  Surveys are cheaper   They require much fewer people to contact

  Surveys results can be obtained more quickly   Same reason as above   This is important because we want to make policy decisions on current answers not

answers that are months or years old.   Surveys can be more accurate

  Fewer people to contact, less problems with interviewer effects and non-response bias   Up shot: less data of high quality is better than more data of poor quality

Random sampling comment 4  Most major surveys are not simple random samples   They involve multiple stages of random selection

 e.g., randomly pick 100 cities. From these cities random pick 500 households, then random pick 1 person from each household

 Data collection like this are NOT representative of the population. However, because units are selected randomly, statistician can account for the non-representation.

  This is done by assigning a weight to each observation that reflects how many units it represents in the population  A good question to ask here would be: Where do the weights come

from?  Generally when analyzing data from surveys that are not simple

random samples it is wise to contact a professional statistician

Page 23: Surveys - Statistical Sciencegp42/sta101/notes/FPP19_2pp.pdfDamned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best Statistics: Concepts

1/18/10

23

Up Shot   Conducting GOOD surveys isn’t trivial

  Requires tons of work in the preparation phase   Identifying population  Obtaining representative sampling frame  Creating a well written questionnaire

  Lots of work collecting data  Depending on the above analysis may be difficult

  Final projects