Upload
digito-dunkey
View
223
Download
0
Embed Size (px)
Citation preview
8/13/2019 Rayid Ghani - Using Data and Analytics Powers for Good #Oct2013
1/2
USING DATA AND ANALYTICS POWERS FOR GOOD
RAYID GHANI, TYPED BY TOM LAGATTA
Any errors, inconsistencies, and unclear rambles in the notes are entirely the fault of the typist.
Affiliations: University of Chicago, Edgeflip
Objective function: maximize probability of winning 270 votes. Emphasis: winner takes all.
2-2.5 million volunteers. (approximately .075% of U.S. population)
Main data source: voter file, database of every registered voter in country. More precisely, every state has
its own voter file. Obama campaign consolidated these voter files into a single database.
Most people in database (e.g., emails) arenot
identified to an entry in the voter file.
Essential quantities:
Support: how likely is somebody to support our side?
Turnout: how likely is somebody to turn out and vote?
Persuasion: how likely is somebody to vote for each side?
Central theme: better than random. Use the data to make estimates that are better than random, and
using these estimates, take actions to influence and affect the outcome. Key point: justify the costs of those
actions.
Support model. You have some data on who supports whom (e.g., party registration). Augment these data
with polling. The central use of polling was to prime the priors for the model.
Inputs to the model: Demographics, voting history, email history, fundraising history, calling history.
Constraints to model. Accuracy: get good ranked list of supportive people, in order to target actions. Need
probabilities to line up with frequencies: if I am a 40% Obama supporter, then 40/100 of people like me
should be Obama supporters.
Number of features for each person: roughly hundreds. Total database: 10-20 terabytes, very manageable.
Interesting: the Narwhal backend was for web apps, and had little to do with data or analytics. Also:
investment for the future.
To be data-driven is to be rational: change actions based on available data. Most organizations are not
rational in this sense: they still make decisions based on their guts.1
Several channels of communication: direct mail, TV ads, knocking on doors, 5 billion emails.
Persuasion scripts for volunteers: are you going to vote? wheres your polling place? how are you going to
get there? when are you going?
Date: October 2013.1nb: this is a particular definition of rationality, and not agreed upon by the whole community.
1
8/13/2019 Rayid Ghani - Using Data and Analytics Powers for Good #Oct2013
2/2
USING DATA AND ANALYTICS POWERS FOR GOOD 2
Goal: identify that small number of people who are persuadable.
DIfferent channels have different purposes: emails and online ads are for fundraising; TV ads are for persua-
sion.
Primary variables for support: saying yes I support Obama, evidence of past support, donations to cam-
paign.
Fundamentally: this is a ranking problem. Identify supporters by degree.
In an ideal world, we do something more game theoretic with regards to persuasion. This is not that world
(yet).
There is always a tension between people who are comfortable with data and people who are not. The way
to settle this tension is by experiments.
Surprising: Facebook Pages do not have access to full lists of the users whove liked them.
Built a tool called Targeted Sharing. Authorize our Facebook App, to access their social graph and certain
attributes. Try to match them to voter database (30-40%).
Influence model. How likely is your friend to take an action given that you do? They had approx 1 million
people authorize the app, and used this to get data on 200 million people. Small world phenomenon.
There was a sharp correlation between the level of personalization of emails and the level of engagement.
Lots of A/B tests.
A mistake that nonprofits and campaigns make is that they dont send enough emails!
Clever: give us $23, but option is $25. These numbers are carefully optimized.
Prediction is great, but how can we influence behavior?