Organic data: Sensors, Social Networks, and “Big Data”Organic data: Sensors, Social Networks,...

Preview:

Citation preview

Organic data: Sensors, Social Networks, and “Big Data”

Philip S. BrennerAssistant Professor, Department of SociologySenior Research Fellow, Center for Survey Research

What is organic data?

• Unlike survey data (design data), organic data is transmitted or collected for purposes other than generating quantitative descriptors of a population

Organic data: an example

Google street view

Organic data: an example

“using Google Street View and an archive of 1990s videotapes … doctoral student Jackelyn Hwang and Harvard sociologist Robert Sampson used Google Street View to take a virtual walking tour of Chicago. As they went, they looked for details like home renovations or new construction that indicate gentrification is underway, or litter and graffiti, which indicate it’s not.”

Organic v. design data

Design Organic

purpose pre-specified post-hoc

data-to-noise ratio very high very low

pathway to information analysis algorithms

sampling probabilistically selected self-selected

What is organic data?

• Unlike survey data (design data), organic data is transmitted or collected for purposes other than generating quantitative descriptors of a population

• Location tracking/GPS

• Wearable sensors/monitors

• The “internet of things”

• Social networking services (Facebook, Twitter, Instagram)

• Transaction data (Amazon, Target, Google)

Wearable monitors: Location

Wearable monitors: Location

• Geotagging: linking a piece of data (tweet, photo, facebook post, etc.) to a specific location (latitude, longitude)

• Geofencing: creating an area used to measure the location/behavior of an individual; location flagged when individual arrives, departs fenced area

Geotagging

Geofencing

Geofencing

Geofencing

Geofencing

What is organic data?

• Unlike survey data (design data), organic data is transmitted or collected for purposes other than generating quantitative descriptors of a population

• Location tracking/GPS

• Wearable sensors/monitors

• The “internet of things”

• Social networking services (Facebook, Twitter, Instagram)

• Transaction data (Amazon, Target, Google)

Wearable monitors: Physical condition

accelerometers actigraphs

pulse monitors oximeters

Wearable monitors: Activity

Plug-in sensors: Environmental attributes

• Gas sensor/detector (oxygen, CO2)

• Humidity

• Thermometer

• Light sensors

• Barometer

• Proximity sensors

• Network (WiFi) sensor

Exercise 9(1)

• Consider using location or another sensor in your research project. How would you do it? What would you measure and why?

• Would it provide a better measure than a direct survey question? Why or why not?

• Do you think it would change the way the survey respondent/study participant will behave? Why or why not?

What is organic data?

• Unlike survey data (design data), organic data is transmitted or collected for purposes other than generating quantitative descriptors of a population

• Location tracking/GPS

• Wearable sensors/monitors

• The “internet of things”

• Social networking services (Facebook, Twitter, Instagram)

• Transaction data (Amazon, Target, Google)

Internet of things

Internet of things

• Wired devices to monitor energy use, purchases, viewing patterns (ostensibly) to improve our lives

• Can be used by social scientists

• survey questions about energy use or smartmeter, thermostat, or appliance use data

• questions about healthy eating or contents of refrigerator

• questions about hours watching TV or data from smart TV

What is organic data?

• Unlike survey data (design data), organic data is transmitted or collected for purposes other than generating quantitative descriptors of a population

• Location tracking/GPS

• Wearable sensors/monitors

• The “internet of things”

• Social networking services (Facebook, Twitter, Instagram)

• Transaction data (Amazon, Target, Google)

Social networks: Facebook

“So I give you a survey you fill it out, which is very artificial, whereas ethnography, as soon as you walk into the room, you change that room, because you are a foreign presence. There’s a scientist in the room. People get self-conscious. They don’t act naturally.”

In comparison, Facebook data is not influenced by the presence of a social science researcher. “It has no artificial construct, you are not bringing people to the lab,” Nelson said. “So you are recording social interaction in real time as it occurs completely naturally.”

Social networks: Facebook

Wilson, Gosling, and Graham 2012

Social networks: Twitter

McCormick et al. 2013

Using Twitter messages to analyze planned non-voting

Use Twitter API (application program interface) to search for attribute of interest

Compare non-voters on other information (bio, location, past tweets)

What is organic data?

• Unlike survey data (design data), organic data is transmitted or collected for purposes other than generating quantitative descriptors of a population

• Location tracking/GPS

• Wearable sensors/monitors

• The “internet of things”

• Social networking services (Facebook, Twitter, Instagram)

• Transaction data (Amazon, Target, Google)

Transaction data

• often treated synonymously with “big data”

• data “munging” to extract valuable information from the messiness

• used by the corporate world for focusing advertising, marketing

• also used by social scientists

• role in survey research?

Transaction data: an innocuous example

Transaction data: a not so innocuous example

Transaction data

• transaction: searching google for flu symptoms, medical information about symptoms, etc.

Transaction data

• transaction: searching google for particular racist epithets

• research associated these searches with high rates of black mortality

(Chae et al. 2015)

Design approach v. “big data” approach

• Design • draw probability sample • gain access to organic data source • code organic data • generate estimates • make inferences

• Big data • take population • use data munging and powerful algorithms • generate estimates • inferences?

“Big data hubris”

“Instead of focusing on a ‘big data revolution,’ perhaps it is time we were focused on an ‘all data revolution.’”

Exercise 9(2)

• Consider using social network data (not limited to Facebook and Twitter; Instagram, Flickr, and others are options too) in your research project. How would you do it? What would you measure and why?

• Would it provide a better measure than a direct survey question? Why or why not?

• Do you agree with the quotation from lecture that Facebook updates are “natural” and therefore contain less error than other forms of data collection? Why or why not?

Recommended