Josh Pasek Jon A. Krosnick - aapor.org€¦ · To Assess This . . . • Examine what proportion of...

Preview:

Citation preview

Measuring the Correlates of Intent to Participate and Participation in the Census and Trends

in These Correlates:

Comparisons of RDD Telephone and Non-probability Sample Internet Survey Data

Josh PasekStanford & University of Michigan

Jon A. KrosnickStanford & U.S. Census Bureau

Monitoring Threats toCensus Compliance

U.S. Census Bureau commissioned surveys in late 2009 and early 2010 to determine factors that might cause people to complete or not

complete the form

To Assess This . . .

• Examine what proportion of individuals hold particular beliefs about the Census

• Test whether those beliefs relate to intent to complete or completion of Census form

• Regularly examine prevalence of beliefs to assess emergent threats to Census completion

Two Synchronous Data Streams

• 2 data streams collected for Census Bureau

• RDD telephone (Gallup) and non-probability Internet (E-Rewards)

• 13 simultaneous weeks of data collection with > 900 interviews per data stream per week

• Very similar measures

Two Synchronous Data Streams• Intent to complete Census (before and after forms mailed)

• Reported completion of Census (after forms mailed)

• Census will help/hurt respondent

• Locate illegal immigrants

• Trust confidentiality

• Time to fill out

• Importance of counting everyone

• Respondent's participation does not matter

Today's Question

Would These Two Data Streams Lead To The Same

Conclusions?

Three Comparisons

• Proportions

• Relations between variables

• Trends over time

Three Comparisons

• Proportions

• Who holds particular beliefs about Census?

• Relations between variables

• Trends over time

Three Comparisons

• Proportions

• Who holds particular beliefs about Census?

• Relations between variables

• Which beliefs relate to expected and actual completion?

• Trends over time

Three Comparisons

• Proportions

• Who holds particular beliefs about Census?

• Relations between variables

• Which beliefs relate to expected and actual completion?

• Trends over time

• Do the surveys indicate that purported predictors changed similarly over time?

Weighting

• Unweighted

• Weights provided by both houses

• Weighted identically using anesrake (Pasek, 2010)

Weighting

• Unweighted

• Weights provided by both houses

• Weighted identically using anesrake (Pasek, 2010)

Both with and without matching on dates

Three Comparisons

• Proportions

• Relations between variables

• Trends over time

Three Comparisons

• Proportions

• Relations between variables

• Trends over time

Proportions

• Absolute difference between modal categories of non-demographic variables

• | proportion in RDD - proportion in Internet |

• Bootstrap to test significance

• Assessed within each week

ProportionsDifferences Between Data Streams

Difference in Percentage Points

Freq

uenc

y of

Diff

eren

ces

Difference in Percentage Points

Freq

uenc

y of

Diff

eren

ces

ProportionsDifferences Between Data Streams

Week 10Census can help you

RDD: 43.4%Internet: 44.2%Difference: 0.8%

Difference in Percentage Points

Freq

uenc

y of

Diff

eren

ces

ProportionsDifferences Between Data Streams

Week 3Rs participation does not matter - agree

RDD: 54.4%Internet: 40.5%Difference: 13.9%

Difference in Percentage Points

Freq

uenc

y of

Diff

eren

ces

ProportionsDifferences Between Data Streams

Week 12Important to CountEveryone - Agree

RDD: 65.7%Internet: 32.2%Difference: 33.5%

ProportionsDifferences Between Data Streams

Difference in Percentage Points

Freq

uenc

y of

Diff

eren

ces

Diff. Data

> 5 -- 80%

> 10 -- 55%

> 15 -- 40%

> 20 -- 26%

> 25 -- 16%

Three Comparisons

• Proportions

• Relations between variables

• Trends over time

Relations Between Variables

• Regressions predicting relevant outcomes

• Intent to complete Census

• Reported completion of Census

• Correlations between pairs of variables

Predicting Census Completion

Variable RDD Internet Difference

Numbers shown are from a logistic regression with all measures * p<.05 | ** p<.01 | *** p<.001

Predicting Census Completion

Variable RDD Internet Difference

Married .31* .58** .27

Same story, same magnitude

Numbers shown are from a logistic regression with all predictors simultaneously * p<.05 | ** p<.01 | *** p<.001

Variable RDD Internet Difference

Married .31* .58** .27

Age 25-34 -1.51*** -.68* .83*

Numbers shown are from a logistic regression with all predictors simultaneously * p<.05 | ** p<.01 | *** p<.001

Predicting Census Completion

Same story, different magnitude

Variable RDD Internet Difference

Married .31* .58** .27

Age 25-34 -1.51*** -.68* .83*

Don't have time to fill out - Disagree .88** .68 .19

Predicting Census Completion

Different story, same magnitude

Numbers shown are from a logistic regression with all predictors simultaneously * p<.05 | ** p<.01 | *** p<.001

Variable RDD Internet Difference

Married .31* .58** .27

Age 25-34 -1.51*** -.68* .83*

Don't have time to fill out - Disagree .88** .68 .19

Importance of counting everyone - Agree .98* -.37 1.27*

Numbers shown are from a logistic regression with all measures * p<.05 | ** p<.01 | *** p<.001

Predicting Census Completion

Different story, different magnitude

Predicting Census Completion

TypeOne

Predictor at a Time

AllPredictors

Same storySame magnitude 57% 31%

Same storyDifferent magnitude 13 6

Different storySame magnitude 30 56

Different storyDifferent magnitude 0 6

Total(Number of Variables)

100%(23)

100%(16)

Numbers shown are for variables that were significant in at least one data stream

Predicting Census Completion

TypeOne

Predictor at a Time

AllPredictors

Same storySame magnitude 57% 31%

Same storyDifferent magnitude 13 6

Different storySame magnitude 30 56

Different storyDifferent magnitude 0 6

Total(Number of Variables)

100%(23)

100%(16)

Numbers shown are for variables that were significant in at least one data stream

Predicting Intent to Complete FormType

OnePredictor at a

Time

All Predictors

Same storySame magnitude 41% 48%

Same storyDifferent magnitude 29 20

Different storySame magnitude 15 20

Different storyDifferent magnitude 15 12

Total(Number of Variables)

100%(41)

100%(25)

Numbers shown are for variables that were significant in at least one data stream

Predicting Intent to Complete FormType

OnePredictor at a

Time

All Predictors

Same storySame magnitude 41% 48%

Same storyDifferent magnitude 29 20

Different storySame magnitude 15 20

Different storyDifferent magnitude 15 12

Total(Number of Variables)

100%(41)

100%(25)

Numbers shown are for variables that were significant in at least one data stream

Three Comparisons

• Proportions

• Relations between variables

• Trends over time

Trends Over Time

• Correlations between variable categories over weeks

• Chi-squared tests comparing differences between data streams across weeks

Among variables with significant variations over time in at least one data stream

Sometimes the trends match

Trends Over Time

Date

Perc

ent

RDDInternet

Trends Over Time

Date

Perc

ent RDD

Internet

Sometimes the trends don't match

Trends Over Time

Date

Perc

ent

RDD

Internet

Trends Over Time

Date

Perc

ent RDD

Internet

And sometimes, the trends are opposites

Trends Over Time

Date

Perc

ent

RDDInternet

Trends Over Time

Date

Perc

ent RDD

Internet

Trends Over Time

Among variables with significant variations over time in at least one data stream

Correlations Between Data Streams

Freq

uenc

y

Differences Between Streams vs. "Sampling Error"

0

20

40

60

80

100

Proportions Relations Trends

Chance

73%significantly

different

30%significantly

different

76%significantly

different

Differences Between Streams vs. "Sampling Error"

0

20

40

60

80

100

Proportions Relations Trends

Chance

73%significantly

different

30%significantly

different

76%significantly

differentNone of the weighting strategies

changed these basic results

Data from one of the top RDD firms and one of the most visible

Internet firms produced very different results

Researchers need to choose, and that choice should depend

on the validity of the results

But Which is More Valid?

• More accurate self-reports using Internet mode (Chang and Krosnick, 2009)

• Some theoretical reasons to prefer probability sampling (even with contemporary response rates)

We cannot know based on these data alone

Measuring the Correlates of Intent to Participate in the Census and Trends

in These Correlates:

Comparisons of RDD Telephone and Non-probability Sample Internet Survey Data

Josh PasekStanford & University of Michigan

Jon A. KrosnickStanford & U.S. Census Bureau

Not really the point . . .

If the methods reach different conclusions, they can't both be correct

What Can We Make Of This?

• Mode effects

• Slight differences in question wording

• Neither survey is consistently within sampling error of accuracy benchmarks (though RDD is a little closer)

The results are not equivalent!

Benchmark Comparison

• Comparison of modal categories

• Variables not used in weighting or quotas• Primary household language (English)• Own or rent home (Own)• Children in household (Yes)

• Absolute difference between sample mean and benchmark (weighted)

Benchmark ComparisonTelephone Internet

Freq

uenc

y

Percentage Point Difference Percentage Point Difference

11.4 Avg. Error 13.8

18.6 Max Error 21.1

Benchmark ComparisonTelephone Internet

Freq

uenc

y

Percentage Point Difference Percentage Point Difference

11.4 Avg. Error 13.8

18.6 Max Error 21.1p<.001 difference

Benchmark Conclusions

• Neither survey is consistently within sampling error

• RDD sample is somewhat more accurate than Internet sample