Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Organic data: Sensors, Social Networks, and “Big Data”
Philip S. BrennerAssistant Professor, Department of SociologySenior Research Fellow, Center for Survey Research
What is organic data?
• Unlike survey data (design data), organic data is transmitted or collected for purposes other than generating quantitative descriptors of a population
Organic data: an example
Google street view
Organic data: an example
“using Google Street View and an archive of 1990s videotapes … doctoral student Jackelyn Hwang and Harvard sociologist Robert Sampson used Google Street View to take a virtual walking tour of Chicago. As they went, they looked for details like home renovations or new construction that indicate gentrification is underway, or litter and graffiti, which indicate it’s not.”
Organic v. design data
Design Organic
purpose pre-specified post-hoc
data-to-noise ratio very high very low
pathway to information analysis algorithms
sampling probabilistically selected self-selected
What is organic data?
• Unlike survey data (design data), organic data is transmitted or collected for purposes other than generating quantitative descriptors of a population
• Location tracking/GPS
• Wearable sensors/monitors
• The “internet of things”
• Social networking services (Facebook, Twitter, Instagram)
• Transaction data (Amazon, Target, Google)
Wearable monitors: Location
Wearable monitors: Location
• Geotagging: linking a piece of data (tweet, photo, facebook post, etc.) to a specific location (latitude, longitude)
• Geofencing: creating an area used to measure the location/behavior of an individual; location flagged when individual arrives, departs fenced area
Geotagging
Geofencing
Geofencing
Geofencing
Geofencing
What is organic data?
• Unlike survey data (design data), organic data is transmitted or collected for purposes other than generating quantitative descriptors of a population
• Location tracking/GPS
• Wearable sensors/monitors
• The “internet of things”
• Social networking services (Facebook, Twitter, Instagram)
• Transaction data (Amazon, Target, Google)
Wearable monitors: Physical condition
accelerometers actigraphs
pulse monitors oximeters
Wearable monitors: Activity
Plug-in sensors: Environmental attributes
• Gas sensor/detector (oxygen, CO2)
• Humidity
• Thermometer
• Light sensors
• Barometer
• Proximity sensors
• Network (WiFi) sensor
Exercise 9(1)
• Consider using location or another sensor in your research project. How would you do it? What would you measure and why?
• Would it provide a better measure than a direct survey question? Why or why not?
• Do you think it would change the way the survey respondent/study participant will behave? Why or why not?
What is organic data?
• Unlike survey data (design data), organic data is transmitted or collected for purposes other than generating quantitative descriptors of a population
• Location tracking/GPS
• Wearable sensors/monitors
• The “internet of things”
• Social networking services (Facebook, Twitter, Instagram)
• Transaction data (Amazon, Target, Google)
Internet of things
Internet of things
• Wired devices to monitor energy use, purchases, viewing patterns (ostensibly) to improve our lives
• Can be used by social scientists
• survey questions about energy use or smartmeter, thermostat, or appliance use data
• questions about healthy eating or contents of refrigerator
• questions about hours watching TV or data from smart TV
What is organic data?
• Unlike survey data (design data), organic data is transmitted or collected for purposes other than generating quantitative descriptors of a population
• Location tracking/GPS
• Wearable sensors/monitors
• The “internet of things”
• Social networking services (Facebook, Twitter, Instagram)
• Transaction data (Amazon, Target, Google)
Social networks: Facebook
“So I give you a survey you fill it out, which is very artificial, whereas ethnography, as soon as you walk into the room, you change that room, because you are a foreign presence. There’s a scientist in the room. People get self-conscious. They don’t act naturally.”
In comparison, Facebook data is not influenced by the presence of a social science researcher. “It has no artificial construct, you are not bringing people to the lab,” Nelson said. “So you are recording social interaction in real time as it occurs completely naturally.”
Social networks: Facebook
Wilson, Gosling, and Graham 2012
Social networks: Twitter
McCormick et al. 2013
Using Twitter messages to analyze planned non-voting
Use Twitter API (application program interface) to search for attribute of interest
Compare non-voters on other information (bio, location, past tweets)
What is organic data?
• Unlike survey data (design data), organic data is transmitted or collected for purposes other than generating quantitative descriptors of a population
• Location tracking/GPS
• Wearable sensors/monitors
• The “internet of things”
• Social networking services (Facebook, Twitter, Instagram)
• Transaction data (Amazon, Target, Google)
Transaction data
• often treated synonymously with “big data”
• data “munging” to extract valuable information from the messiness
• used by the corporate world for focusing advertising, marketing
• also used by social scientists
• role in survey research?
Transaction data: an innocuous example
Transaction data: a not so innocuous example
Transaction data
• transaction: searching google for flu symptoms, medical information about symptoms, etc.
Transaction data
• transaction: searching google for particular racist epithets
• research associated these searches with high rates of black mortality
(Chae et al. 2015)
Design approach v. “big data” approach
• Design • draw probability sample • gain access to organic data source • code organic data • generate estimates • make inferences
• Big data • take population • use data munging and powerful algorithms • generate estimates • inferences?
“Big data hubris”
“Instead of focusing on a ‘big data revolution,’ perhaps it is time we were focused on an ‘all data revolution.’”
Exercise 9(2)
• Consider using social network data (not limited to Facebook and Twitter; Instagram, Flickr, and others are options too) in your research project. How would you do it? What would you measure and why?
• Would it provide a better measure than a direct survey question? Why or why not?
• Do you agree with the quotation from lecture that Facebook updates are “natural” and therefore contain less error than other forms of data collection? Why or why not?