Anand KulkarniBjörn Hartmann University of California, BerkeleyMatthew Can Stanford University
Collaboratively Crowdsourcing Complex WorkWith Turkomatic
Turkomatic
Microtask marketplaces excel at simple, repetitive work.
Microtask marketplaces excel at simple, repetitive work.
Transcribe a business card.
Microtask marketplaces excel at simple, repetitive work.
Transcribe a business card.
Look up a fact online.
Much of the work we do in our daily lives is not simple or repetitive.
“Create algebra problems for my mathematics exam.” “Write a research paper.”
“Create a small piece of software.”
“Arrange my trip to Seattle.”
“Write a blog about Mechanical Turk with a few good entries.”
How do we crowdsource complex work?
Complex work with crowdsSoylent: Editing word processing documents(Bernstein et al ’10)Vizwiz: Answering queries about visual scenes (Bigham et al ‘10)
More complex applications: Platemate [NHZG11], Adrenaline [BBMK11], Crowdforge [KSK11]….
Workflows: Crowd Algorithms
Divide complex tasks into a sequence of microtasks arranged in a workflow
Soylent, Bernstein et al, UIST 2010
Workflow design is labor-intensive1. Design individual HITs2. Implement parallelism to make sure tasks are done correctly3. Write software to launch HITs and parse worker results4. Test workflow by running program 5. Identify errors6. Iterate from step 1
Workflow design is labor-intensive
Difficult and domain-specific: Workflow design requires extensive up-front iteration and experimentation and is specific to a given task domain.
Inaccessible to non-experts: Few have the patience to implement this process in code
Turkomatic is a system for crowdsourcing high-level complex and creative work where the crowd designs the workflow.
What is Turkomatic?
What is Turkomatic?Create a new blog about Mechanical Turk with two posts.
Price-Divide-Solve (PDS)How do we induce the crowd to design a workflow?
Price-Divide-Solve (PDS)PDS is a divide and conquer algorithm to create workflows.Price: Can this task be solved for
20 cents?
If yes: Solve task and return the answer.
If no: Divide task into multiple steps.
For each step, recurse.
Merge steps into solution.
Price-Divide-Solve (PDS)PDS is a divide and conquer algorithm to create workflows.Price: Can this task be solved for
20 cents?
If yes: Solve task and return the answer.
If no: Divide task into multiple steps.
For each step, recurse.
Merge steps into solution.
Price Task
Price Task
Price-Divide-Solve (PDS)Redundancy is used at each step to ensure quality.
Divide Task
Best subdivisio
nVote
Price Task
Price Task
Price check
Consensus on priceMajority
Price Task
Price Task
Solve Task
Best solutionVote
Price-Divide-Solve (PDS)Create a new blog about Mechanical Turk with two posts.
Can we solve it for 20 cents?Price
Price-Divide-Solve (PDS)Create a new blog about Mechanical Turk with two posts.
Can we solve it for 20 cents?Price No.
Can we solve it for 20 cents?Price No.
Price-Divide-Solve (PDS)Create a new blog about Mechanical Turk with two posts.
Create a new blog on Wordpress.com.
Write one entry for a blog.
Write a second entry for a blog.
Divide it into two or more steps.Divide
Price-Divide-Solve (PDS)
Create a new blog on Wordpress.com.
Write one entry for a blog.
Create a new blog about Mechanical Turk with two posts.
Write a second entry for a blog.
Price Divide it into two or more steps.Divide
Price-Divide-Solve (PDS)
Create a new blog on Wordpress.com.
Write one entry for a blog.
Create a new blog about Mechanical Turk with two posts.
Write a second entry for a blog.
Can we solve it for 20 cents?
Price
Can we solve it for 20 cents?
Can we solve it for 20 cents?
Price-Divide-Solve (PDS)
Create a new blog on Wordpress.com.
Write one entry for a blog.
Create a new blog about Mechanical Turk with two posts.
Write a second entry for a blog.
Can we solve it for 20 cents?
Price
Can we solve it for 20 cents?
Can we solve it for 20 cents?
Yes. Yes. Yes.
Price-Divide-Solve (PDS)
Create a new blog on Wordpress.com.
Write one entry for a blog.
Create a new blog about Mechanical Turk with two posts.
Write a second entry for a blog.
Solve
Price-Divide-Solve (PDS)
Create a new blog on Wordpress.com. Write one entry for
a blog.
Create a new blog about Mechanical Turk with two posts.
Write a second entry for a blog.
Solve
“Welcome to my blog about Mechanical Turk! Here, I’ll be posting
some of my favorite recipes for Mechanical Turk. You’ll be able to follow
along at home and create delicious HITs. From the comfort of your own home! Stay tuned and i’ll show you
some of the best strategies for keeping your Turk workers engaged.”
Price-Divide-Solve (PDS)
Create a new blog on Wordpress.com.
Write one entry for a blog.
Create a new blog about Mechanical Turk with two posts.
Write a second entry for a blog.
Solve
“You may be inclined to price your HITs at the lowest possible rate, but this
isn’t always the best choice. Instead, you should base your pricing on:
-How long will the HIT take?-Is the HIT similar to other HITs? If so,
price it slightly less than theirs.-If the HIT involves a lot of
qualifications, you may want to price it higher, to attract more qualified
workers.”
Price-Divide-Solve (PDS)
Create a new blog on Wordpress.com.
Write one entry for a blog.
Create a new blog about Mechanical Turk with two posts.
Write a second entry for a blog.
mtworker.wordpress.com
Combine the results of solved steps.Merge
mtworker.wordpress.com
Can this task be solved for 20 cents?
YesNo
Write a blog about Mechanical Turk
Submit
Break down the following task.
Write a blog about Mechanical Turk
Step 1:Step 2:
Add Step Submit
Solve the following task.
Create a new blank blog on Wordpress
Submit
Merge the following subtasks.
Write a blog about Mechanical Turk
Step 1:Step 2:
Submit
Workers previously divided this task into simpler steps and solved each step. Combine their work into a complete solution.
Write a blog post about Mechanical Turk. [answer: This post is…]
Create a blank blog about Mechanical Turk [answer: www...]
Price-Divide-Solve (PDS)PDS guides the crowd to design workflows in a particular way.
It can attempt to create a workflow for any task, but it can’t produce all workflows.
Write a sentence.Improve the
previous worker’s answer.
Check that the previous answer was improved.
System Recap
Price SolveDivide
Requester Interface
System Output
Algorithm
Worker Interface
Experiment 1: Can the crowd plan and execute workflows using PDS?
Over 150 trials, including:• Java programming• Booking restaurants• Sorting and cleaning data• Blogging• Creating self-portraits• Solving an SAT• Logo design• Travel planning• Writing essays• Web research
…
Experiment 1: Can the crowd plan and execute workflows using PDS?
Over 150 trials, including:• Java programming• Booking restaurants• Sorting and cleaning data• Blogging• Creating self-portraits• Solving an SAT• Logo design• Travel planning• Writing essays• Web research
…
Experiment 1: Success Modes
Write a 3-paragraph essay about whether it’s ever OK to lie.
Write one paragraph arguing it’s OK to lie sometimes.
Write one paragraph suggesting it’s never OK to lie.
Write a conclusion reconciling the two.
Write one sentence
to open the conclusion.
Write 2-3 sentences in the middle of the conclusion.
Write a concluding sentence.
Experiment 1: Success Modes
Data:• 6 subnodes were produced• 44 separate worker judgments were
used• Task completed with a full essay
Experiment 1: Success Modes
“…although many people believe it is always essential to tell the truth, sometimes it may be better to lie. There is credibility in both views. And like many ethical decisions, sometimes the circumstances dictate.
When you tell the truth you develop a stronger bond of trust with those around you. A relationship can not exist without trust. If you lie, you end up telling more lies to cover the first….”
Experiment 1: Failure ModesThere are two ways we found that the algorithm could fail:
-Failing to terminate at all-Completing, but producing
wrong answers
Experiment 1: Failing to terminate
Plan a trip from New York to S.F. that visits 5 interesting places.
Think about where to go next
in Ohio.
Think about where to go next
in Ohio.
Experiment 1: Wrong answers
List the department chairs of the top 20 US programs in CS.
aalto armchair poang lounge chair adirondack chair
aeron chair balans chair
ball chair….
Why does the crowd lose context?
Turkomatic worker:“…I’ve taken a look at your instructions, and I understand them perfectly. However, this task seems to have been inadvertently sabotaged by other turkers who do not understand what you are asking them to do…”
Long workflows involve increasing chains of trust.
Each individual worker has a ~30% probability of failure [Chi/Kittur/Suh ’08, Bernstein et al ’10]
Weakest link problem: If one worker early in the workflow design process makes mistakes, the subsequent decompositions will fail.
Including context doesn’t suffice
One explanationWhat if we used more competent
workers?
Experiment 2: Can expert workers make Turkomatic work? Setup:
We recruited five graduate students with experience as requesters on Mechanical Turk.
We ran the PDS algorithm on three complex tasks with this crowd: online research, essay writing, and creating a blog
Experiment 2: Can expert workers make Turkomatic work? Results:
Each of three tested tasks completed correctly when we used only expert workers!
Experiment 2: Can expert workers make Turkomatic work? Results:
Each of three tested tasks completed correctly when we used only expert workers!
Conclusion:PDS works well with qualified crowds.
How can we successfully run PDS with unskilled workers?
Experiment 3: Can requester management help the crowd?
Workflow visualizer: Monitor the workflow in real-time.
Interactive task editor: Selectively invalidate parts of a workflow.
Workflow seeding: Run previously-designed parts of workflows in the crowd.
Task Graphs (Requester)
Task Graph Nodes
Task Prompt
Status Submitted Answer
completedqueued in progress
Task Graph Edges: ParallelParent Task
Split
Sub Task 1
Solve
Sub Task 2
Decide
Task Graph Edges: Sequential
Parent Task
Split
Sub Task 1
Solve
Sub Task 2
Decide
Task Graph ExampleWrite an essay
Split
Write an outline
Solve 1. Thesis: …
Expand the outline
Decide
Task Graph EditingWrite a 3-paragraph essay…Split
Think about the topic…Split Collect information
about…Decide Write the paragraphs…
Decide
Pick one of the topicsSplit
List possible topicsSolve 1. The word…
EDIT TASK DETAILS
Edit Task Edit Solution
Edit Subtask
Delete Node
Task:
Status:
Think about the topic you want to write aboutSplit
Task Graph EditingWrite a 3-paragraph essay…Split
Think about the topic…Split Collect information
about…Decide Write the paragraphs…
Decide
Pick one of the topicsSplit
List possible topicsSolve 1. The word…
List three main topics…Solve
Recomputing Task Graphs• Delete subtree of edited task• Recursively:
– Delete stale solutions in parent tasks– Delete stale solutions in subsequent
sibling tasks (for serial decompositions)
Seeding workflowsWe mitigate poor performances by workers by starting with partial workflows.
Run Workflow with Crowd
Experiment 3: CollaborationSetup:
We ran the PDS algorithm using Turkers on three sets of tasks, but actively monitored and intervened only to eliminate errors
Outcomes:Each of the three tested tasks completed correctly with 1 to 4 requester interventions.
Experiment 3: Collaboration
Paragraph 1
Paragraph 2
Paragraph 3
Experiment 3: Collaboration
Crowdsourcing is a term…
Experiment 3: Collaboration
Crowdsourcing is a term…
Chaordix crowd consulting is…
Experiment 3: Collaboration
Crowdsourcing is a term…
Experiment 3: Collaboration
Crowdsourcing is a term…
Crowdsourcing works best on tasks where…
Experiment 3: Collaboration
Crowdsourcing is a term…
Crowdsourcing works best on tasks where…
One of the best known crowdsourcing
platforms…
ConclusionWe presented Turkomatic, a system to let the requesters harness the crowd to design complex workflows.
Our first experiment showed successful and unsuccessful examples could result from letting the crowd design their own tasks.
Our second experiment showed that expert workers could successfully design workflows using PDS.
ConclusionLast, we showed that an interactive, real-time interface for visualizing and selectively editing worker interfaces could produce viable workflows.
One finding of noteIn Turkomatic, highly motivated workers could not contribute to correct others’ errors.
Excessive structure in workflow design prevents the emergence of leaders.
To scale, we may consider giving editing abilities to more capable workers.
ContributionsA simplified interface for crowdsourcing
that lowers the threshold for crowdsourcing complex tasks
A new algorithm, techniques, and interfaces enabling the crowd to decompose complex tasks
A new interface for letting requesters edit, visualize, and seed workflows