74
Eytan Bakshy Sean J. Taylor Facebook Core Data Science WWW 2015, Florence, Italy May 19, 2015 Online Experiments for Computational Social Science WWW 15 Tutorial

Www tutorial

Embed Size (px)

Citation preview

Page 1: Www tutorial

facebook data science

Eytan Bakshy Sean J. Taylor Facebook Core Data Science

WWW 2015, Florence, Italy May 19, 2015

Online Experiments for Computational Social Science WWW 15 Tutorial

Page 2: Www tutorial

Outline1. Introduction and Causal Inference (30 minutes)

2. Planning Experiments (30 minutes)<30 minute break>Discussion + analysis exercise (15 minutes)

3. Designing and Implementing Experiments (45 minutes)

4. Analyzing Experiments (30 minutes)

Page 3: Www tutorial

Everything we assume▪ Minimum requirements

▪ Some basic knowledge of statistics

▪ The ability to follow code

▪ Necessary to understand 90%

▪ Intermediate knowledge of R and beginner knowledge of Python

▪ Necessary to understand 100%

▪ Advanced knowledge of R, intermediate Python, intermediate stats, design of experiments

Page 4: Www tutorial

Don’t panic!

Page 5: Www tutorial

Don’t panic! Buddy up! Group learning is good for you!

Page 6: Www tutorial

Software requirements

▪ If you are the kind of person who wants to tinker with code yourself, here are the software requirements

▪ Section 1: None

▪ Section 2: IPython + R

▪ Part 3: IPython + PlanOut

▪ Part 4: Python + R

▪ Can’t install the software? Buddy up

Page 7: Www tutorial

Follow along:

http://bit.ly/www15experiments

Page 8: Www tutorial

Section 1: Introduction and Causal Inference

Page 9: Www tutorial

Obligatory

Page 10: Www tutorial

Associations

(Xi, Yi) ▪ Health: (smoking, cancer)

▪ Sports: (running the ball, winning games)

▪ Education: (completes MOOC course, gets a promotion)

▪ Social Media: (likes a page on Facebook, buys a product)

Page 11: Www tutorial

Possible Relationships

▪Pr(Yi | Xi) = Pr(Yi)(independence)

▪Pr(Yi | Xi) ≠ Pr(Yi)(dependence)

Dependence between variables is useful, but interpretation of this relationship can often be tricky.

Page 12: Www tutorial

Possible Causal Relationships

All three causal relationships are possible when

there’s a dependency between variables.

Xi Yi

Xi Yi

Xi Yishouldn’t there be an arrow going both ways here?I also wonder if this is confusing because it doesn’t leave room for a mediating variable, X->M->Y.

Page 13: Www tutorial

Correlation without Causation: Introducing Zi ▪Smoking causes cancer

(genetics) ▪Running the ball causes winning games (having a lead) ▪Completing a MOOC causes a promotion(self-motivation) ▪Liking a page on Facebook causes a person to buy a product(brand loyalty)

Xi Yi

Ui

?

Page 14: Www tutorial

Bigger Example

Page 15: Www tutorial

Why Causal Inference?

1. Science: Why did something happen?

2. Decisions:What will happen if I change something?

Page 16: Www tutorial

Causal Inference for Science

1. Associations between two variables are always more interesting when they’re causal.

2. Understanding a phenomenon is different from predicting it.

Explanatory Modeling

Predictive Modeling

model captures causal function model captures association

model carefully constructed from theory models constructed from data

retrospective forward-looking

minimize bias minimize variance

basic science applied science

make better data make better features

Page 17: Www tutorial

Two Kinds of Out-of-Sample

People we Observe Similar People

Similar People under different circumstances

Machine Learning / Statistics

Experiments

Page 18: Www tutorial

social adnon-social ad

Attribute Outcomes to Causes

Social Influence in Social Advertising: Evidence from Field Experiments. Bakshy, Eckles, Yan, Rosenn. EC 2012.

Page 19: Www tutorial

Attribute Outcomes to Causes

Social Influence in Social Advertising: Evidence from Field Experiments. Bakshy, Eckles, Yan, Rosenn. EC 2012.

1 liking friend 0 friends shown

1 liking friend 1 friend shown

Page 20: Www tutorial

Causal Inference for Decisions▪ Health: Quit smoking? Brush your teeth? :)

▪ Sports: run the ball more?

▪ Social media: recruit more Facebook fans for my page?

▪ Advertising: purchase ads?

▪ Education: invest time in completing a MOOC?

Page 21: Www tutorial

News Feed (2011) News Feed (2014)

Test complete alternatives

Page 22: Www tutorial

News Feed (2011) News Feed (2014)

Explore a design space

Page 23: Www tutorial

A Social Science ProblemCausal inference is easy for the natural sciences:

▪ Manipulation is easy!

▪ Molecules, cells, animals, plants are exchangeable!

For people, there’s always something we may not be observing perfectly.

▪ Latent traits provide alternative explanations that we often cannot rule out.

▪ Experiments are often the only way of ruling them out.

Page 24: Www tutorial

Potential Outcomes Framework

Yi(1) Yi(0) Di Effect

Eytan 10 -5 15

Anna 0 5 -5

Gary -20 5 -25

Linda -10 10 -20

Edna -5 0 -5

Sean 10 5 5

Mean -2.14 2.86 -5

Yi(1) is the outcome under treatment Yi(0) is the outcome under control

Page 25: Www tutorial

Fundamental Problem of Causal Inference

Yi(1) Yi(0) Di Effect

Eytan 10 1 ?

Anna 5 0 ?

Gary 5 0 ?

Linda 10 0 ?

Edna 0 0 ?

Sean 10 1 ?

Mean

We only ever observe a unit in either treatment or control. Individual level effects are never defined.

Page 26: Www tutorial

Confounding

Yi(1) Yi(0) Di Effect

Eytan 10 1 ?

Anna 0 1 ?

Gary 5 0 ?

Linda -10 1 ?

Edna -5 1 ?

Sean 5 0 ?

Here we assumed they selected Di to maximize their outcome.

Yi(1) Yi(0) Di Effect

Eytan 10 1 ?

Anna 5 0 ?

Gary 5 0 ?

Linda 10 0 ?

Edna 0 0 ?

Sean 10 1 ?

Mean 10 5 0.29 5

Page 27: Www tutorial

Randomization

Yi(1) Yi(0) Di Effect

Eytan 10 1 ?

Anna 0 1 ?

Gary 5 0 ?

Linda -10 1 ?

Edna -5 1 ?

Sean 5 0 ?

With randomly assigned Di, we get unbiased effect estimates. The difference in means is called the Average Treatment Effect

Yi(1) Yi(0) Di Effect

Eytan -5 0 ?

Anna 0 1 ?

Gary -20 1 ?

Linda 10 0 ?

Edna 0 0 ?

Sean 10 1 ?

Mean -3.33 1.67 0.5 -5

Page 28: Www tutorial

The Average Treatment Effect

▪Due to the linearity of expectations, we can separate the ATE into two measurements.

▪ In practice, we estimate these expectations using the means in our treatment and control groups.

ATE = � = E[Yi(1)� Yi(0)] = E[Yi(1)]� E[Yi(0)]

[ATE = 1N

Pi2T Yi(1)� 1

M

Pi2C Yi(0)

Page 29: Www tutorial

Uncertainty▪How sure are we about the ATE we measured?

Variability in ATE estimates comes from:

1. variation in random assignment

2. variation in subjects

Page 30: Www tutorial

Variation due to Random AssignmentHistogram of reps

reps

Frequency

-25 -20 -15 -10 -5 0 5 10

010

2030

4050

60

For a given set of potential outcomes and a randomization, we may not always arrive at the right estimate, but it will not be biased.

-what is this illustrating? sampling variation in subjects or in terms of permutation tests / randomization inference?-I wonder if its worth mentioning variability due to sampling different subjects vs assignments?

Page 31: Www tutorial

Standard Errors▪How can we quantify uncertainty about ATEs we measure?

▪SE decreases with √N

▪SE is smaller when variances of potential outcomes are smaller

▪want n0 and ni to be similar if the variances are the same

▪want more observations for higher variance conditions

cSE([ATE) =q

1n0+n1

qn1n0

Var(Yi(0)) +n0n1

Var(Yi(1))

Page 32: Www tutorial

Confidence Intervals▪95% confidence interval is 1.96 times

▪SEs shrink with the square root of the number of observations, so to double the precision of your experiment you need four times the number of subjects.

cSE([ATE)

Page 33: Www tutorial

Confidence Intervals vs p-values▪Often easy to get statistical significance with big data.

Most things have effects!

▪Harder to get big, practically significant, effects!

▪CIs make uncertainty about an estimate more credible.

▪Should favor reporting CIs.

Page 34: Www tutorial

Section 2: Planning Experiments

Page 35: Www tutorial

The Routine

Page 36: Www tutorial

The routine▪ Step 1: Formulate a research hypothesis

▪ Step 2: State an expected effect size

▪ Step 3: Design your experiment

▪ Step 4: Power analysis

▪ Step 5: Write up analysis plan

▪ Step 6: Collect data

▪ Step 7: Analyze data according to plan

▪ A specific hypothesis: ▪ Population ▪ Empirical context ▪ Treatment(s) ▪ Subgroups ▪ Outcomes

Page 37: Www tutorial

The routine▪ Step 1: Formulate a research hypothesis

▪ Step 2: State an expected effect size

▪ Step 3: Design your experiment

▪ Step 4: Power analysis

▪ Step 5: Write up analysis plan

▪ Step 6: Collect data

▪ Step 7: Analyze data according to plan

▪ Use prior literature ▪ Use existing data ▪ Collect your own

observational data ▪ Small effects need

big data

Page 38: Www tutorial

The routine▪ Step 1: Formulate a research hypothesis

▪ Step 2: State an expected effect size

▪ Step 3: Design your experiment

▪ Step 4: Power analysis

▪ Step 5: Write up analysis plan

▪ Step 6: Collect data

▪ Step 7: Analyze data according to plan

▪ Follows from step 1 ▪ Identify: ▪ Constraints ▪ Threats to validity ▪ Ways to increase

precision

Page 39: Www tutorial

The routine▪ Step 1: Formulate a research hypothesis

▪ Step 2: State an expected effect size

▪ Step 3: Design your experiment

▪ Step 4: Power analysis

▪ Step 5: Write up analysis plan

▪ Step 6: Collect data

▪ Step 7: Analyze data according to plan

▪ Simulate experiment with posited effects and design

▪ You should feel comfortable that you’ll find a clinically significant effect

Page 40: Www tutorial

The routine▪ Step 1: Formulate a research hypothesis

▪ Step 2: State an expected effect size

▪ Step 3: Design your experiment

▪ Step 4: Power analysis

▪ Step 5: Write up analysis plan

▪ Step 6: Collect data

▪ Step 7: Analyze data according to plan

▪ Makes it easier to communicate your study

▪ Helps catch problems with plan

▪ Keeps you honest as a scientist

Page 41: Www tutorial

The routine▪ Step 1: Formulate a research hypothesis

▪ Step 2: State an expected effect size

▪ Step 3: Design your experiment

▪ Step 4: Power analysis

▪ Step 5: Write up analysis plan

▪ Step 6: Collect data

▪ Step 7: Analyze data according to plan

▪ Collect pre-treatment data (if applicable)

▪ Implement ▪ Collect experiment data ▪ Log sane data

Page 42: Www tutorial

The routine▪ Step 1: Formulate a research hypothesis

▪ Step 2: State an expected effect size

▪ Step 3: Design your experiment

▪ Step 4: Power analysis

▪ Step 5: Write up analysis plan

▪ Step 6: Collect data

▪ Step 7: Analyze data according to plan

▪ Wrangle data to get it into a format that you can analyze in R

▪ Apply appropriate statistical procedures

▪ Analysis should be easy!

Page 43: Www tutorial

Walking through the routine for an experiment

Page 44: Www tutorial

Step 1: Formulating research question▪ Prior research suggests

individuals tend to avoid ideologically cross-cutting information

▪ Recent paper shows:1-Pr(click|op)/Pr(click|same)

▪ Conservatives = -17%

▪ Liberals: -6%

▪ Is this causal?

Exposure to Ideologically Diverse News and Opinion on Facebook. Bakshy, Messing, Adamic. Science. 2015

20%

30%

40%

50%

Random Potentialfrom network

Exposed Selected

Perc

ent c

ross−c

uttin

g co

nten

t

Viewer affiliation●

ConservativeLiberal

Page 45: Www tutorial

Step 1: Hypothesis▪ Hypothesis: Source cues have a causal effect on users’

willingness to consider reading summaries of content when given many choices

▪ Treatment: Randomly assigned sources induce selective exposure effects

▪ Context: Convenience sample of Turkers on AMT

▪ Outcomes: Choosing read an article summary

Exposure to Ideologically Diverse News and Opinion on Facebook. Bakshy, Messing, Adamic. Science. 2015

Page 46: Www tutorial

Step 2: Effect size▪ We empirically observe a risk ratio of -6% and -17%

▪ We might expect the effect to be in the ballpark of this, but smaller, e.g., 4-10%

▪ Caveats:

▪ Different population

▪ Different outcomes (processing summary, vs clicks)

Exposure to Ideologically Diverse News and Opinion on Facebook. Bakshy, Messing, Adamic. Science. 2015

Page 47: Www tutorial

Step 3: Design experiment

Page 48: Www tutorial

Next steps

▪ Section 2: Estimation+Power analysis:

▪ Power analysis (step 4)

▪ Write up analysis plan (step 5) ▪ Section 3: Collect data (step 3+6)

▪ Code up experiment yourself in PlanOut

▪ Data already collected :) ▪ Section 4: Analyze data according to plan (step 7)

Page 49: Www tutorial

$ ipython notebook 0-estimation-and-power.ipynb

Estimation and Power analysis

Page 50: Www tutorial

Section 3: Designing and Implementing Experiments with PlanOut

Page 51: Www tutorial

PlanOut scripts are high-level descriptions

of randomized parameterizations

Page 52: Www tutorial

The PlanOut Idea▪ User experiences are parameterized by experimental

assignments

▪ PlanOut scripts describe assignment procedures

▪ Experiments are PlanOut scripts plus a population

▪ Parallel or follow-on experiments are centrally managed

Page 53: Www tutorial

The PlanOut framework▪ Two ways of writing PlanOut scripts

▪ Using the PlanOut language

▪ Using native API (e.g., Python)

▪ PlanOut is multiplatform

▪ Reference: Python

▪ Production: PHP, Java

▪ Alpha: JavaScript, Ruby, Go

Page 54: Www tutorial

Sample PlanOut script

button_color = uniformChoice( choices=["#ff0000", "#00ff00"], unit=userid);

button_text = uniformChoice( choices=["I'm voting", "I'm a voter”], unit=userid);

2x2 factorial design

Page 55: Www tutorial

Compiled PlanOut Code

{ "op": "seq", "seq": [ { "op": "set", "var": "button_color", "value": { "choices": { "op": "array", "values": [ "#ff0000", "#00ff00" ] }, "unit": { "op": "get", "var": "userid" }, "op": "uniformChoice" }

}, { "op": "set", "var": "button_text", "value": { "choices": { "op": "array", "values": [ "I'm voting", "I'm a voter" ] }, "unit": { "op": "get", "var": "userid" }, "op": "uniformChoice" } } ]

Page 56: Www tutorial

$ ipython notebook \ 1-planout-intro.ipynb

Using PlanOut

Page 57: Www tutorial

$ ipython notebook \ 2-making-your-own-data.ipynb

Using PlanOut

Page 58: Www tutorial

open: webapp/app.py

Python Web Application Walkthrough

Page 59: Www tutorial

open: webapp/extract_data.py

Extracting Data from Logs

Page 60: Www tutorial

Section 4: Analyzing Experimental Data

Page 61: Www tutorial

Outline

1. Dependence (non- i.i.d. data)

2. The bootstrap

3. Using covariates

4. Data reduction

5. “Big Data Guide”

6. Example in R

Page 62: Www tutorial

http://www.tucsonsentinel.com/local/report/062712_az_students/state-natl-rankings-vary-widely-az-student-performance/

100 items 100 users 100,000 impressions Does N = 100,000?

Page 63: Www tutorial

Dependence▪Most formulas for confidence intervals assume that each

individual data point is independent of all the others.

▪ In practice, we often have repeated observations of users or content items.

▪ Ignoring this fact in inference will tend to make confidence intervals anti-conservative.

See Bakshy and Eckles, “Uncertainty in Online Experiments with Dependent Data” (KDD 2013)

Page 64: Www tutorial

The Bootstrap

confidence intervals

Page 65: Www tutorial

I both like this and find it to be confusing. The way I explain it is that the bootstrap simulates the process of drawing samples from a very large population. I think if you just relabeled the top as ‘Sampling from the whole population’ and ‘Sampling from your own data (bootstrap)’, this might make it clearer?

I also wonder if you can use my benchmarking paper as illustrations for SEs vs N as a function of ICC and/or my plot that shows how the bootstrap

Page 66: Www tutorial

Bootstrapping in Practice

R1

All Your Data

R2

R500

Generate random sub-samples

s1

s2

s500

Compute statistics or estimate model

parameters

}0.0

2.5

5.0

7.5

-2 -1 0 1 2Statistic

Count

Get a distribution over statistic of interest

(e.g. the ATE)

- take mean - CIs == 95% quantiles - SEs == standard deviation

Page 67: Www tutorial

Using Covariates▪With simple random assignment, using covariates is not

necessary.

▪However, you can improve precision of ATE estimates if covariates explain a lot of variation in the potential outcomes.

▪Can be added to a linear model and SEs should decrease if they are helpful.

▪Should always at least report results without using covariates.

Page 68: Www tutorial

Data Reduction

Subject Di Yi

Evan 0 1

Ashley 0 1

Greg 1 0

Leena 1 0

Ema 0 0

Seamus 1 1

D Y=1 Y=0

0 2 1

1 1 2

N

# treatments

Page 69: Www tutorial

Data Reduction with Covariates

Subject Xi Di Yi

Evan M 0 1

Ashley F 0 1

Greg M 1 0

Leena F 1 0

Ema F 0 0

Seamus M 1 1

X D Y=1 Y=0

M 0 1 0

M 1 1 1

F 0

F 1

N

# treatments X # groups

Can analyze the reduced data using a weighted linear model.

Page 70: Www tutorial

Data Reduction with Dependent Data

Subject Di Yij

Evan 1 1

Evan 1 0

Ashley 0 1

Ashley 0 1

Ashley 0 1

Greg 1 0

Leena 1 0

Leena 1 1

Ema 0 0

Seamus 1 1

Create bootstrap replicates

R1

R2

R3

reduce the replicates as if they’re i.i.d.

r1

r2

r3

s1

s2

s3

compute statistics on reduced data

Page 71: Www tutorial

Experiment Analysis (i.i.d. data)

Data Size Fits in memory < 2M rows Doesn’t fit in memory

Using Covariates linear regression data reduction + weighted linear models

No Covariates t-test data reduction

Page 72: Www tutorial

Experiment Analysis (dependent data)

Data Size Fits in memory < 2M rows Doesn’t fit in memory

Using Covariates random effects models or bootstrap + linear models

bootstrap + data reduction + weighted linear models

No Covariates random effects models or bootstrap in R bootstrap + data reduction

Page 73: Www tutorial

$ ipython notebook \ 4-analyzing-experiments.ipynb

Hands On Analyzing Experiments

Page 74: Www tutorial

(c) 2009 Facebook, Inc. or its licensors.  "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0