70
PROGRAMMING FOR SOCIAL SCIENTISTS WITH INTERPOLL Ben Livshits Todd Mytkowicz Microsoft Research 1

Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

PROGRAMMING FOR SOCIAL SCIENTISTS WITH INTERPOLL

Ben Livshits

Todd Mytkowicz

Microsoft Research

1

Page 2: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

InterPoll 101

2

This is a LINQ query

for asking college

students whether they

are liberal arts majors

We want to give

access to polls and

survey to the regular

developer.

We want to make

access to human-

generated data as

easy as access to

databases.

Page 3: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

SIMPLE LINQ QUERIES

var femaleHeight = from person in people where person.Gender == Gender.FEMALE select person.PoseQuestion<int>("What is your height?");

var maleHeight = from person in people where person.Gender == Gender.MALE select person.PoseQuestion<int>("What is your height?");

if (maleHeight.ToRandomVariable() > femaleHeight.ToRandomVariable()) {

Console.WriteLine(

"Males are taller that females, according to a t-test.");

}

Page 4: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

C# + LINQ

LINQ (Language-integrated queries) is the only way to create surveys, but in has several important advantages.

In InterPoll, We are thinking a lot about the developer experience. LINQ is familiar to developers and does not require learning a new domain-specific language.

LINQ queries integrate well with the rest of the application.

Developers can continue using familiar tools such as Visual Studio, with powerful IntelliSence support and type checking, to avoid making mistakes.

Page 5: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

DEMO

Page 6: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

SAMPLE SURVEYS

Page 7: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

MECHANICAL TURK BACK-END

InterPoll uses Mechanical Turk as a back-end

Others like UHRS are possible

Page 8: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

IS THIS NOT A SOLVED PROBLEM?

8

SurveyMonkey

Page 9: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

OUR FOCUS IS ON AN END-TO-END PROCESS

1) idea

2) poll

3) crowd

4) analysis

9

Page 10: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

KEY DIFFERENCES BETWEEN THEM AND INTERPOLL

Lack of programmability

InterPoll is considerably cheaper

We give statistical significance to the results through unbiasinging

and power analysis

Page 11: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

A SMORGASBORD OF TOPICS

Crowd-sourcing

Programming with

uncertainty

Human and machine

computation combined

Database and PL

optimizations

Statistics

Privacy

11

Page 12: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

One of the wonderful things about InterPoll is that it brings together research fields that are

not frequently combined.

This requires researchers who are not afraid to step outside their comfort zone.

12

Page 13: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

“Experimental psychology is the study of the college sophomore”

Quinn McNemar, 1946

13

Page 14: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

Traditional polling is a painstaking and time-consuming process.

Not everything can be assessed through computer-based polling, but many things can be.

Recent studies show that online polling is frequently better than RDD polls and are good at avoiding the

interviewer bias.

14

Page 15: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

Recent results suggest that online polling is actually cheaper, faster, and is devoid of some of the biases such as the interviewer bias present in

traditional face-to-face or telephone surveys.

Moreover, in recent years, traditional random -dialing surveys suffer from the fact that young people often do not have a land line.

15

Page 16: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

COSTLY… ESPECIALLY AT SCALE

SurveyMonkey Audience pricing Instant.ly cost per completed survey

16

0

2

4

6

8

10

12

14

0 10 20 30 40 50

Cost

per

com

ple

tion,

in

$Number of questions asked

Page 17: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

MECHANICAL TURK (MTURK)

Our main backend is Mechanical Turk

Generally can get answers to most polls for $.10/survey completion

Latency drops as a result of higher rewards, but the quality generally doesn’t seem to increase

InterPoll takes care of creating appropriate HITs, generating XML forms for the UI that appears on Mechanical Turk and collecting the results

17

Page 18: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

How to integrate human and computer computation is a wide open problem in programming languages.

Our approach is to use LINQ to provide seamless language extensions.

But many topics remain unaddressed. How does one program with long-running crowd operations?

How does one combine lazy and eager computation?

18

Page 19: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

Uncertain<T>

19

Page 20: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

24 mph

20

Page 21: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

59 mph

21

Page 22: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

A WALK WITH A GPS

Page 23: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

GeoCoordinate Location;

HOW DO YOU WRITE THIS KIND OF CODE?

23

Page 24: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

Uncertain<GeoCoordinate> Location;

GeoCoordinate Location;

24

HOW DO YOU WRITE THIS KIND OF CODE?

Page 25: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

Uncertain<T> is an uncertain type abstraction.

It encourages developers to

explicitly reason about uncertainty.

25

Page 26: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

Uncertain<GeoCoordinate> LastLoc = GPS.GetLocation();

A variable of type Uncertain<T> is a random variable,

represented by a distribution.

3 D Raleigh DistributionGaussian distribution Arbitrary non-continuous functions

SYNTAX AND SEMANTICS

26

Page 27: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

Uncertain<double> Speed = Dist / 5; Or more generally, Z = X + Y, if X

and Y are distributions.X

YZ=X+Y

is a sample of X

is a sample of Y

is a sample of X+Y *`

If

and

then

x

y

x+y

* if X and Y are independent

PROGRAMMING WITH Uncertain<T>

27

Page 28: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

-

C/

E

BA

D

Sampling function for E recursively samples children.

BAYESIAN NETWORK REPRESENTATION:

28

Page 29: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

if (Speed > 4) print("Great job!");

TESTING DISTRIBUTIONS

29

Page 30: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

4 mph

0 2 4 6 8 10

Speed (mph)

if (Speed > 4) print("Great job!");

TESTING DISTRIBUTIONS

30

Page 31: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

4 mph

0 2 4 6 8 10

Speed (mph)

Pr[Speed > 4]

More likely than not that Speed > 4?

> 0.5?

TESTING DISTRIBUTIONS

if (Speed > 4) print("Great job!");

31

Page 32: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

4 mph

0 2 4 6 8 10

Speed (mph)

Pr[Speed > 4]

At least 90% likely that

Speed > 4?

> 0.9?

if (Speed > 4) print("Great job!");

TESTING DISTRIBUTIONS

32

Page 33: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

Pr[Speed > 4] > 0.9

approximate!

HA:

Pr[Speed > 4] ≤ 0.9H0:

null hypothesis

alternate hypothesis

How many samples? Too many = too slow

Too few = too noisy

Sequential sampling: sample size depends on progress

TESTING DISTRIBUTIONS

if (Speed > 4).Pr(0.9) print("Great job!");

33

Page 34: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

A = X + YB = A + X

(X,Y independent)

A and B depend on X – not independent – take one X sample not two!

BAYESIAN NETWORK ESTIMATION

34

Page 35: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

0.00

0.05

0.10

0.15

0.20

0.25

Location

Den

sity

Prior

Likelihood

Posterior

Uncertain<T> language semantics implements Bayes Rule

posteriorlikelihood prior

INCORPORATING DOMAIN KNOWLEDGE

35

Page 36: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

GPS SPEED

36

Page 37: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

GPS SPEED

37

Page 38: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

GPS SPEED

0

10

20

30

40

50

60

Time

Sp

ee

d (

mp

h)

GPS speed (95% CI)

Improved speed (95% CI)

38

Page 39: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

39

Uncertain<T> is an uncertain type abstraction.

It encourages developers to explicitly reason about uncertainty.

Page 40: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

BRINGING IT ALL TOGETHER

40

sensors people

Page 41: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

BUILDING INTERPOLL APPLICATIONS

41

Page 42: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

SOCIAL SCIENCES

What is the prevalence of

depression in men vs.

women?

How near-sighted

are people over 50

Which video game console

do you prefer? Xbox

One/PS4/Both/Neither)

Do you think that affirmative

action as part of college

admission is a good idea?

Do you itemize your taxes or

use the standard

tax form?

What do you think about

doctor-assisted suicide?

What do you think about

race relations in America

today?

Page 43: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

HEALTHCARE SURVEYS

How many hours of

sleep have you gotten on

average in the past

month?

Do you feel that you

get enough sleep?

How is duration of sleep

correlated with coffee intake

for college students? Are parents satisfied

with the amount of sleep

they get?

Are men more satisfied

than women?

For people between

30-45, do they find

medical providers by

word of mouth?

Which is worse for your

health: alcohol, tobacco,

sugar, or marijuana?

Is Yelp a trusted

source of healthcare

recommendations?

Page 44: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

MARKETING

Are men or women

responsible for shopping for

pet food?

Do people between

20-30 play more video

games than those between

30-45?

Which video game console

do you prefer? What kind of jeans

do you wear?

(Skinny/straight/loose, etc.)

Have you paid for a cable

TV subscription

in the last year and, if so,

are you likely to have a Hulu

subscription?

Do you like the new

Microsoft company logo

better than the old Microsoft

company logo?

What percentage of

the population between 30

and 50 use both cable and

streaming services like

Netflix?

Page 45: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

POLITICAL POLLING

In cases of extreme

drought, should farmers

and food producers have

priority when allocating

water?

Should individuals obtain

permission to

photograph every

person in the photos they

take?

Are you in favor or

against raising the

federal minimum wage?

Do you think

Obamacare will make

things better or worse

for you and your

family?

Should legalized

recreational marijuana

be required to be sold

with a warning label?

How did you or will

you pay for higher

education (anything

post-high school)?

How do you feel

about the direction of

the country?

Page 46: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

46

Demo: basic surveys

Page 47: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

OTHER INTERFACES ARE POSSIBLE

47

Page 48: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

FUNDAMENTAL PROBLEMS

48

Page 49: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

MOVING AWAY FROM SMALL AND UNREPRESENTATIVE SAMPLES

Page 50: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

POWER ANALYSIS

50

Page 51: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

POWER ANALYSIS

Determine the number of samples for a query

We can sample from the crowd sequentially until we satisfy or disprove ourhypothesis.

We will poll the crowd for more until our stopping criterion is reached.

The sopping criterion allows us to conclude that he hypothesis can be proven ordisproven with the required level of confidence.

51

Page 52: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

EXAMPLE QUESTION: HEIGHT

N=29

Once we remove the outliers

height = from person in height where

where

person.Height >=140 &&

person.Height <=220)

N=27

52

Page 53: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

ARE THE TWO CORRELATED?

53

Hospital anxiety and depression scale (HADS)

Normal 0-7

Mild 8-10

Moderate (11–14)

Severe (15–21)

Page 54: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

DOES MONEY BUY HAPPINESS (OR AT LEST TRANQUILITY)?

Are rich more anxious than poor?

N = 105

expected value for poor=8.5714,

expected value for rich=7.9619

54

Page 55: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

DEBATES: INTELLIGENCE SQUARED

55

Page 56: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

CONVERGENCE CURVES: SEQUENTIAL PROBABILITY RATIO TEST (SPRT) OR WALD, 1945

56

=a

=b

Page 57: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

BE CAREFUL WITH YOUR PARAMETERS

As we vary the probability As we vary the confidence

57

Page 58: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

TAXATION, BY GENDER AND INCOME

58

0

5

10

15

20

25

30

Female Male

No Rich are taxed enough

5

71

4

63

10

2 7

3

10

3

22

0

1 3

No Rich are taxed enough

Page 59: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

TWO MORE CONUNDRUMS

Affirmative action does more harm than good

2

8

35

66

2

9

64 63

0

10

20

30

40

50

60

70

No high schooldiploma

High school orequivalent

Some college, lessthan 4-yr degree

Bachelor's degreeor higher

No More hard than good

We should break up the big banks

3

6 44 9 10 7

58

4

2 12 4 4 3

13

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

No Inco

me

$1

to $

4,9

99 o

r loss

$5

,000 to

$9,9

99

$1

0,0

00

to $

14,9

99

$1

5,0

00

to $

24,9

99

$2

5,0

00

to $

34,9

99

$3

5,0

00

to $

49,9

99

$5

0,0

00

to $

74,9

99

$7

5,0

00

and

ove

r

Yes No

59

Page 60: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

PRIORS FOR THE CROWD

Instant.ly crowd US census

Page 61: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

SEGMENTS OF THE CROWD LOOK DIFFERENT

61

Panos Ipeirotis, 2010

Page 62: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

THE UNBIAS OPERATOR

62

Page 63: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

UNBIASING IN INTERPOLL

63

0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

0.400

I'm not sure Somewhat likely Somewhat unlikely Very likely Very unlikely

Biased response Unbiased response

Page 64: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

LOCAL OPTIMIZATIONS

Poll twice?

Poll once and reuse the population?

Dead code – reason about collection properties and query/code interactions

Page 65: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

CONDITIONAL BLENDING

65

Page 66: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

CROSS-CROWD SAMPLING STRATEGIES

Having more than one crowd can help us assess the issue of ignorability: does the fact that we are polling online affect the outcome?

Test respective biases brought about by different crowds

Provide answers for different population mixes: 70% Indian, 30% American

Specific optimizations

US -> India

US and India, blended

US during the day, India at night

Use India to find missing population segments

66

Page 67: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

MORE OPTIMIZATIONS

Can we cache answers to common questions and reuse them across surveys? Is that always the correct thing to do?

Can we determine questions that are time-sensitive?

Can we decide if keeping answers to specific questions may lead to privacy violations.

Can we determine questions suitable for different crowds?

Can we determine if someone is likely to cheat or compensate for the possibility of cheating?

67

Page 68: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

EXPLORING THE DATA

68

Page 69: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

INTERFACES

69

Page 70: Programming for Social Scientistslivshits/papers/ppt/ecoop-school-2014.pdfDEMO. SAMPLE SURVEYS. MECHANICAL TURK BACK-END ... SurveyMonkey Audience pricing Instant.ly cost per completed

CONCLUSIONS

InterPoll: a system for large-scale crowd-sourced polling

Geared toward

Developers who want to incorporate human data into their applications

But also social scientists

Marketing professionals

Campaign pollsters

Have explored power analysis

and are doing experiments

on unbiasing

and various optimizations

70