Rayid Ghani - Using Data and Analytics Powers for Good #Oct2013

Embed Size (px)

Citation preview

  • 8/13/2019 Rayid Ghani - Using Data and Analytics Powers for Good #Oct2013

    1/2

    USING DATA AND ANALYTICS POWERS FOR GOOD

    RAYID GHANI, TYPED BY TOM LAGATTA

    Any errors, inconsistencies, and unclear rambles in the notes are entirely the fault of the typist.

    Affiliations: University of Chicago, Edgeflip

    Objective function: maximize probability of winning 270 votes. Emphasis: winner takes all.

    2-2.5 million volunteers. (approximately .075% of U.S. population)

    Main data source: voter file, database of every registered voter in country. More precisely, every state has

    its own voter file. Obama campaign consolidated these voter files into a single database.

    Most people in database (e.g., emails) arenot

    identified to an entry in the voter file.

    Essential quantities:

    Support: how likely is somebody to support our side?

    Turnout: how likely is somebody to turn out and vote?

    Persuasion: how likely is somebody to vote for each side?

    Central theme: better than random. Use the data to make estimates that are better than random, and

    using these estimates, take actions to influence and affect the outcome. Key point: justify the costs of those

    actions.

    Support model. You have some data on who supports whom (e.g., party registration). Augment these data

    with polling. The central use of polling was to prime the priors for the model.

    Inputs to the model: Demographics, voting history, email history, fundraising history, calling history.

    Constraints to model. Accuracy: get good ranked list of supportive people, in order to target actions. Need

    probabilities to line up with frequencies: if I am a 40% Obama supporter, then 40/100 of people like me

    should be Obama supporters.

    Number of features for each person: roughly hundreds. Total database: 10-20 terabytes, very manageable.

    Interesting: the Narwhal backend was for web apps, and had little to do with data or analytics. Also:

    investment for the future.

    To be data-driven is to be rational: change actions based on available data. Most organizations are not

    rational in this sense: they still make decisions based on their guts.1

    Several channels of communication: direct mail, TV ads, knocking on doors, 5 billion emails.

    Persuasion scripts for volunteers: are you going to vote? wheres your polling place? how are you going to

    get there? when are you going?

    Date: October 2013.1nb: this is a particular definition of rationality, and not agreed upon by the whole community.

    1

  • 8/13/2019 Rayid Ghani - Using Data and Analytics Powers for Good #Oct2013

    2/2

    USING DATA AND ANALYTICS POWERS FOR GOOD 2

    Goal: identify that small number of people who are persuadable.

    DIfferent channels have different purposes: emails and online ads are for fundraising; TV ads are for persua-

    sion.

    Primary variables for support: saying yes I support Obama, evidence of past support, donations to cam-

    paign.

    Fundamentally: this is a ranking problem. Identify supporters by degree.

    In an ideal world, we do something more game theoretic with regards to persuasion. This is not that world

    (yet).

    There is always a tension between people who are comfortable with data and people who are not. The way

    to settle this tension is by experiments.

    Surprising: Facebook Pages do not have access to full lists of the users whove liked them.

    Built a tool called Targeted Sharing. Authorize our Facebook App, to access their social graph and certain

    attributes. Try to match them to voter database (30-40%).

    Influence model. How likely is your friend to take an action given that you do? They had approx 1 million

    people authorize the app, and used this to get data on 200 million people. Small world phenomenon.

    There was a sharp correlation between the level of personalization of emails and the level of engagement.

    Lots of A/B tests.

    A mistake that nonprofits and campaigns make is that they dont send enough emails!

    Clever: give us $23, but option is $25. These numbers are carefully optimized.

    Prediction is great, but how can we influence behavior?