View
2
Download
0
Category
Preview:
Citation preview
Is the Sky Falling?New Technology, Changing Media, and
the Future of Surveys
Mick P. CouperSurvey Research Center, University of Michigan, and
Joint Program in Survey Methodology, University of Maryland
A talk featuring Mick’s metaphors and no gratuitous graphics
“To everything there is a season, and a time to every purpose under the heaven … a time
to be born, a time to die, a time to plant, and a time to pluck up that which is planted….”
(Ecclesiastes 3:1)
“A time to tweet, a time to blog, a time to survey, a time to experiment, a time to
interview, a time to observe…”?
3
Is There a Future for Surveys?
With the rise of Big Data, who needs surveys anymore?
With the rise of opt-in panels, Google Consumer Surveys, Mechanical Turk, surveys on Facebook, etc., who needs probability surveys anymore?
With the rise of do-it-yourself (DIY) online survey tools, who needs survey professionals anymore?
4
Overview of Talk
Review three technology-driven trends with implications for surveys
• Big data• Non-probability samples (especially online panels)• Mobile data collection
Offer some observations on what this means for survey research and survey researchers
Big Data
6
Big Data
Following Groves (2011), the term organic data may be a better descriptor
Three characteristics of organic data:• Volume• Velocity• Variability
Three related types of organic data• Administrative data – provided by persons or organizations for
regulatory or other government activities • Transaction data – generated as an automatic byproduct of
transaction and activities (e.g., credit card data, traffic flowdata)
• Social media data – created by people with the express purpose of sharing with (at least some) others
7
Big Data Is Exciting
Some people see big data as replacing surveys
• It’s (mostly) free, it’s everywhere, it’s big• E.g., Savage and Burrows (2007, p. 891): “…where
data on whole populations are routinely gathered as a by-product of institutional transactions, the sample survey seems a very poor instrument.”
Some even see big data as replacing science:
• 2008 article in Wired Magazine: “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete.”
8
Some Limitations of Big Data
Single variable, few covariates
Bias through self-selection and self-presentation
Volatility or lack of stability
Privacy issues
Access issues
Opportunity for mischief
Size is not everything (bigger is not necessarily better)
File drawer problem
9
Single Variable, Limited Covariates
Surveys are much more than a single variable
Limited demographic variables provided or imputed may be wrong• Only about 1/3 of Facebook users provide
demographic information• Demographic information not available for 30-40%
of Google Consumer Survey respondents• What is available (derived from cookies) may be
wrong, e.g., gender matches about 75% of time
Knowing changing fuel prices is not the same as knowing what people do in response to such changing prices
10
Match of Reported and Inferred Gender
Source: Keeter and Christian (2012)
No demographics available for about 30-40% of Google consumer survey respondents
11
Accuracy of Information from DoubleClickCookie
For example, what Google thinks I am on one of my browsers and devices:
Check your own profile at:
• https://www.google.com/settings/ads/onweb/
12
Bias
Two sources of bias:• Selection bias• Self-presentation (measurement) bias
Selection bias: “haves” versus “have-nots”• Not everyone uses social media!• Need to distinguish between producers and users of users of
social media – about 13% of US online population actively tweets
• Not everyone uses loyalty cards or credit cards, or makes purchases online
Measurement bias• Impression management is a key element of social media• The average Facebook user has 229 “friends”
13
Volatility or Lack of Stability
What will Facebook look like 5 or 10 years from now? Will it even exist?• Anyone remember MySpace? Second Life?• Who’s on Google+?
Twitter has only been around since 2006, and grew 5000% in 5 years• Twitter today is very different from Twitter 5 years ago
Google.com was registered in 1997 – making it a mere teenager
Social media may be good for measuring short term trends, but surveys may be better for longer-run measurement
14
The Growth of Facebook Users and Articles
Source: Wilson, Gosling, and Graham (2012)
15
Privacy Issues
The more people become aware of what is being done with their data, the more they may opt-out or limit sharing
• E.g., choose to pay cash for certain transactions (alcohol, condoms, etc.)
• E.g., use fake identities or aliases online
EU legislation on cookies
Growth of “do not track” options – now the default in IE 10
Privacy options are changing on social media
16
Access Issues
Social media and transaction data are usually proprietary
• Only available to insiders, or at a cost• Exception: Twitter
A key strength of surveys is public access to data, permitting replication and reanalysis
17
Opportunity for Mischief
Three factors increase the likelihood of mischief with social media relative to other media (e.g., call-in polls)
• Relative anonymity of the Internet• Virtually costless• Automated systems can be written to generate
content
83 million Facebook accounts (8.7% of all accounts) are estimated to be fake
18
Opportunity for Mischief
Source: http://www.ubermotive.com/?p=68
19
Opportunity for Mischief
20
Opportunity for Mischief
21
Bigger ≢ Better
Exhibit A:
• Very large sample (n=10 million) from commercial databases
• Response rate comparable to many telephone polls• 2.3 million surveys returned• Correctly predicted last 5 elections
But wrong!
This was the Literary Digest Poll of 1936
22
1948 U.S. Election
Famous “Dewey Defeats Truman “ headline illustrates failure of polls in 1948
Failure of quota samples led to rise of probability-based methods
23
2012 U.S. Election Polls
Source: FiveThirtyEight Blog in NYT
24
The File Drawer Effect
“For any given research area, one cannot tell how many studies have been conducted but never reported. The extreme view of the ‘file drawer problem’ is that journals are filled with the 5% of the studies that show Type I errors, while the file drawers are filled with the 95% of the studies that show nonsignificant results”(Rosenthal 1979)
25
Macroeconomic Conditions and Problem Drinking as Captured by Google Searches
Source: Frijters et al. (2013)
26
Twitter Flu Trends 2009-2010
Source: Paul and Drezde (2011)
27
Twitter Flu Trends 2007-2011
Source: Murphy (2013)
28
Google Flu Trends 2011-2013
Source: Butler (2013), in Nature
29
Salvia Trends Compared to NSDUH
Source: https://blogs.rti.org/surveypost/2012/01/04/can-surveillance-of-tweets-and-google-searches-substitute-survey-research-2/
30
Obesity-Related Tweets and McDonalds Restaurants
Ghosh and Guha (2013) report a “strong correlation” between the two
Any alternative explanation come to mind?
31
Stupid Data Mining Tricks
Source: Leinweber (2007, original paper from 1995)
32
More on the File Drawer Effect
Ioannidis (2002): “In modern research, false findings may be the majority or even the vast majority of published research claims.”
Hirschhorn et al. (2002)• Reviewed 600 positive associations between gene variants and
common diseases• Of 166 reported associations studied 3 or more times, only 6 were
replicated consistently Fanelli (2011):
• Compared publications in 18 empirical areas• Ratios of confirmed hypotheses ranged from 70% in space science
to 86% in the social sciences, 88% in economics, and 92% in psychology
Some fields are encouraging reporting of nonsignificant effects, replications, etc.• E.g., http://www.psychfiledrawer.org/
33
Will we be swamped by the big data tsunami, or will we ride the wave?
Non-Probability Samples
35
Gordon Black, CEO of Harris Interactive (1999)
“Internet research is a ‘replacement technology’ – by this I mean any breakthrough invention where the advantages of the new technology are so dramatic as to all but eliminate the traditional technologies it replaces: like the automobile did to the horse and buggy. Market research …is now making a quantum leap forward with new Internet research techniques.” (Harris Interactive press release, August 1, 1999)
36
Non-Probability Samples
Have been around a long time
Fueled by rise in Internet surveys, and especially online opt-in or access panels
Implications
• Ubiquity of surveys, over-surveying effects• Rise of DIY (do-it-yourself) surveys• Demand exceeding supply• Gamification or surveytainment
Mobile Data Collection
38
Mobile Data Collection
Again, not particularly new, but garnering a lot of attention
Rise of smartphones, tablets, and mobile Web
Three types of mobile use:• Data collectors (interviewers) using tablets and
smartphones• Respondents completing Web surveys on mobile
devices • Respondents using mobile devices to enhance
measurement
I’ll focus on the last of these three
39
Mobile Devices Used by Respondents
There is great excitement over the many things respondents could do with mobile devices to enhance and extend measurement
• Many examples based on small groups of volunteers
But the fundamental question is, what are respondents willing and able to do?
40
Mobile Device Use: Three Examples
Biler, Senk, and Winklerova (2013, NTTS)• Survey in Czech Republic about willingness to participate in a
travel survey using a GPS device• Only 8% said that they would be willing (67% said no, 25%
were uncertain)
Armoogum and colleagues (2013, NTTS)• Asked participants in the 2007-2008 French National Travel
Survey about willingness to accept a GPS device to monitor their travel
• 29.8% said yes without condition, 5.1% said yes as long as they could turn it off, and 64.3% said no
Olson and Wagner (2013, AAPOR)• Interviewers equipped with smartphones and GPS for tracking• Data available for 59.4% of interviewer-days
41
Mobile Device Use: Coverage
Despite headlines that may suggest otherwise…
ITU (2013): 126.5 active mobile cellular subscriptions per 100 inhabitants in Europe and 109.4 in the Americas
…not everyone has a mobile phone, and not everyone has a smartphone• Latest (June 2013) US estimates from Pew: about 91% of
telephone-answering adults have a mobile phone*, and about 56% have a smartphone
*To a survey with a 10% landline and 13% cell RR
The Future of Surveys … and the Surveys of the Future
43
The Future of Surveys, and Surveys of the Future
For each one of these trends, proponents argue(d) that they are transformative, and will make existing methods obsolete
I believe the survey method has an important and valuable role to play
But, we do need to adapt
44
Key Areas of Adaptation
Reduce survey length and burden
Make increasing use of technology
Understand the nonresponse problem
Develop better quality metrics
Use and develop better statistical tools
45
Advice for Young Researchers
Be open to new ideas, but don’t be too quick to reject “old” methods
Look towards the future, but don’t ignore the past
Get as much technical and statistical knowledge as you can
Don’t underestimate the value of good theory
46
Finally, Surveys Are Tools
Surveys are one of many tools that can be used to study society
There are many varieties of surveys and they vary in quality, but there are also many other methods that we can use
We should use the right tool for the right job
Let’s remember why we’re doing what we’re doing
• Surveys are not an end in themselves, but a means to a larger end
47
Summary
The sky is not falling!
Surveys still have an important role to play
Survey research is a dynamic and exciting profession, with lots of opportunities
But we have to evolve
Long live surveys!
Long live ESRA!
Thank You
Questions, comments, …
References 1
Armoogum, J., Roux, S., & Pham, T.H.T. (2013), “Total nonresponse of a GPS-based travel surveys.” Paper presented at the conference on New Techniques and Technologies for Statistics, Brussels, March.
Biler, S., Šenk, P., & Winklerová, L. (2013), “Willingness of individuals to participate in a travel behavior survey using GPS devices.” Paper presented at the conference on New Techniques and Technologies for Statistics, Brussels, March.
Butler, D. (2013), “When Google got flu wrong: US outbreak foxes a leading web-based method for tracking seasonal flu.” Nature, 13th February 2013.
Ghosh, D., & Guha, R. (2013), “What are we ‘tweeting’ about obesity? Mapping tweets with topic modeling and geographic information system.” Cartography and Geographic Information Science, 40 (2): 90-102.
Fanelli, D. (2011), “A review of publication bias in various disciplines.” Scientometrics, 90: 891-904.
Groves, R.M. (2011), “Three eras of survey research.” Public Opinion Quarterly, 75 (5): 861-871.
Hirschhorn, J.N., Lohmueller, K., Byrne, E., & Hirschhorn, K. (2002), “A comprehensive review of genetic association studies.” Genetics in Medicine, 4 (2): 45-61.
Ioannidis, J.P.A. (2005), “Why most published research findings are false.” PLOS Medicine, 2 (8): 696-701.
References 2
Keeter, S., & Christian, L.M. (2012), “A comparison of results from surveys by the Pew Research Center and Google Consumer Surveys.” Washington, DC: Pew Research Center for the People & The Press.
Leinweber, D.J. (2007), “Stupid data miner tricks: Overfitting the S&P 500.” The Journal of Investing, 16 (1): 15-22.
Murphy, J. (2013), “10 Things Every Survey Researcher Should Know about Twitter.”Paper presented at FedCASIC, Washington, DC, March.
Olson, K., & Wagner, J. (2013), “A field experiment using GPS devices to measure interviewer travel behavior.” Paper presented at the annual conference of the American Association for Public Opinion Research, Boston, May.
Paul, M.J., & Dredze, M. (2011), “You Are What You Tweet: Analyzing Twitter for Public Health.” Fifth International AAAI Conference on Weblogs and Social Media, Barcelona, July 17-21. Palo Alto, CA: AAAI Publications, pp. 265-272.
Rosenthal, R. (1979). “The file drawer problem and tolerance for null results.”Psychological Bulletin, 86(3): 638-641.
Savage, M., & Burrows, R. (2007), “The coming crisis of empirical sociology.” Sociology, 41 (5): 885-899.
Wilson, R.E., Gosling, S.D., & Graham, L.T. (2012), “A review of Facebook research in the social sciences.” Perspectives on Psychological Science, 7 (3): 203-220.
Recommended