Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd...

Preview:

Citation preview

1

Information Management

via CrowdSourcing Hector Garcia-Molina

Stanford University

Crowdsourcing

2

Not to be confused with:

• Wisdom of the Crowd

• Cloud Computing :-)

3

Does figure show > 45 dots?

4

Question A

Does figure show > 45 dots?

5

Does figure show > 45 dots?

6

Report Results for Question A

Does figure show > 45 dots?

7

Question B

Does figure show > 45 dots?

8

Does figure show > 45 dots?

9

Report Results for Question B

Many Crowdsourcing Marketplaces!

Real World Examples

11

Image Matching

Translation

Categorizing Images

Search Relevance

Data Gathering

Many Research Projects!

12

The Many Faces of Crowdsourcing

13

The Many Faces of Crowdsourcing

14

Human-Computer Interaction

Software Systems

Machine Learning

Human Issues

Information Management

Crowd Information Management

• Two Aspects:

– Crowd as Information Source

– Crowd as Data Processor

15

16

Fundamental Tradeoffs

Latency

Cost

Uncertainty

17

Efficiency: Fundamental Tradeoffs

Latency

Cost

Uncertainty

How much $$ can I spend?

How long can I wait?

What is the desired quality?

18

Efficiency: Fundamental Tradeoffs

Latency

Cost

Uncertainty

How much $$ can I spend?

How long can I wait?

What is the desired quality?

• Which questions do I ask humans?

• Do I ask in sequence or in parallel?

• How much redundancy in questions?

• How do I combine the answers?

• When do I stop?

Example: CrowdScreen

19

Dataset of Items Predicate

Y Y N

Item X satisfies predicate?

Filtered Dataset

Strategy

20

6 5 4 3 2 1

6

5

4

3

2

1

NOs

YESs

Strategy

21

6 5 4 3 2 1

6

5

4

3

2

1

NOs

YESs

continue

decide PASS

decide FAIL

Strategy

22

6 5 4 3 2 1

6

5

4

3

2

1

NOs

YESs

decision point

More Examples

23

6 5 4 3 2 1

6

5

4

3

2

1

NOs

YESs

More Examples

24

6 5 4 3 2 1

6

5

4

3

2

1

NOs

YESs

More Examples

25

6 5 4 3 2 1

6

5

4

3

2

1

NOs

YESs

More Examples

26

6 5 4 3 2 1

6

5

4

3

2

1

NOs

YESs

Some Optimizations

27

6 5 4 3 2 1

6

5

4

3

2

1

NOs

YESs

What is “best” strategy?

28

6 5 4 3 2 1

6

5

4

3

2

1

NOs

YESs

6 5 4 3 2 1

6

5

4

3

2

1

NOs

YESs

What is “best” strategy?

29

6 5 4 3 2 1

6

5

4

3

2

1

NOs

YESs

p(x,y) = probability of miss-classification at x,y

Expected error:

∑ p(x,y)*end(x,y)

end(x,y)= probability of terminating at x,y

Expected cost:

∑ (x+y)*end(x,y)

One (of many) optimization problems:

30

6 5 4 3 2 1

6

5

4

3

2

1

NOs

YESs

Find strategy that minimizes expected cost (# questions),

such that expected error is less than threshold

(and number of questions never exceeds m).

Example of Results

31

Beyond Single Filter

• Probabilistic Strategies

• Multiple Filters

• Categorizer (output more than 2 types)

32

Beyond Filtering

• Finding Max

• Sorting

• Clustering

• Entity Resolution

• Adding terms to a taxonomy

• Building a Folksonomy

• ...

33

Beyond Simple Models

• Worker Error Models

• Task Design

• Tracking Worker Abilities

• Payments

• Response Time Issues

• ...

34

35

Crowd As Information Source

DBMS like thing

Declarative queries

Web

36

The Deco Data Model

RDBMS

Actual schema

Conceptual schema

Schema designer

relations and other stuff

End user

relations

automatic (system)

restaurant rating cuisine

Chez Panisse 4.9 French

Chez Panisse 4.9 California

Bytes 3.8 California

• • • • • • • • •

Small Example

User view

restaurant rating cuisine

Chez Panisse 4.9 French

Chez Panisse 4.9 California

Bytes 3.8 California

• • • • • • • • •

⋈ o

Small Example

38

User view

restaurant

Chez Panisse

Bytes

• • •

restaurant rating

Chez Panisse 4.8

Chez Panisse 5.0

Chez Panisse 4.9

Bytes 3.6

Bytes 4.0

• • • • • •

restaurant cuisine

Chez Panisse French

Chez Panisse California

Bytes California

Bytes California

• • • • • •

• • • • • •

Anchor

Dependent Dependent

restaurant rating cuisine

Chez Panisse 4.9 French

Chez Panisse 4.9 California

Bytes 3.8 California

• • • • • • • • •

⋈ o

Small Example

39

User view

restaurant

Chez Panisse

Bytes

• • •

restaurant rating

Chez Panisse 4.8

Chez Panisse 5.0

Chez Panisse 4.9

Bytes 3.6

Bytes 4.0

• • • • • •

restaurant cuisine

Chez Panisse French

Chez Panisse California

Bytes California

Bytes California

• • • • • •

• • • • • •

Anchor

Dependent Dependent

fetch rule

fetch rule Bytes

Chez Panisse

fetch rule

restaurant rating cuisine

Chez Panisse 4.9 French

Chez Panisse 4.9 California

Bytes 3.8 California

• • • • • • • • •

⋈ o

Small Example

40

User view

restaurant

Chez Panisse

Bytes

• • •

restaurant rating

Chez Panisse 4.8

Chez Panisse 5.0

Chez Panisse 4.9

Bytes 3.6

Bytes 4.0

• • • • • •

restaurant cuisine

Chez Panisse French

Chez Panisse California

Bytes California

Bytes California

• • • • • •

• • • • • •

Anchor

Dependent Dependent

fetch rule

fetch rule Bytes

Chez Panisse

fetch rule fetch rule

French

restaurant rating cuisine

Chez Panisse 4.9 French

Chez Panisse 4.9 California

Bytes 3.8 California

• • • • • • • • •

⋈ o

Small Example

41

User view

restaurant

Chez Panisse

Bytes

• • •

restaurant rating

Chez Panisse 4.8

Chez Panisse 5.0

Chez Panisse 4.9

Bytes 3.6

Bytes 4.0

• • • • • •

restaurant cuisine

Chez Panisse French

Chez Panisse California

Bytes California

Bytes California

• • • • • •

• • • • • •

Anchor

Dependent Dependent

resolution rule

resolution rule

Bytes

Chez Panisse

restaurant rating cuisine

Chez Panisse 4.9 French

Chez Panisse 4.9 California

Bytes 3.8 California

• • • • • • • • •

⋈ o

Small Example

42

User view

restaurant

Chez Panisse

Bytes

• • •

restaurant rating

Chez Panisse 4.8

Chez Panisse 5.0

Chez Panisse 4.9

Bytes 3.6

Bytes 4.0

• • • • • •

restaurant cuisine

Chez Panisse French

Chez Panisse California

Bytes California

Bytes California

• • • • • •

• • • • • •

Anchor

Dependent Dependent

1. Fetch

2. Resolve

3. Join

Fetch

[n]

Fetch

[ln]

Fetch

[ln,c]

Scan

D1(n,l)

Scan

A(n) 43

Join

Join

AtLeast [8] SELECT n,l,c

FROM country

WHERE l = ‘Spanish’

ATLEAST 8

Resolve[m3] Resolve[d.e]

Fetch

[nl]

Scan

D2(n,c)

Resolve[m3]

Fetch

[nl,c]

Many Query Processing Challenges

Filter [l=‘Spanish’]

Fetch

[nl,c]

Deco Prototype V1.0

44

Experimental Setup

• Experimental Goals:

– Different fetch configurations

• Basic: n + nl + nc

• Reverse: ln + nl + nc

• Hybrid: ln,c + nl,c

– Different filter locations

• After vs. between joins

• Experimental Setup:

– 5 cents/task on MTurk

– Empty tables initially

– Default: reverse + between

SELECT n,l,c FROM country WHERE l = ‘Spanish’ ATLEAST 8

45

Experiment 1: Different Fetch Rules

46

Experiment 1: Different Fetch Rules

47

$12.00 and 2 hrs

$2.20 and 15 min

$1.30 and 11 min

Conclusion

• Crowdsourcing is exciting area!

• Many challenges!

48

49

Recommended