Introduction Client Motivations Tasks Categories Crowd Motivation Pros & Cons

Preview:

DESCRIPTION

Crowdsourcing. Introduction Client Motivations  Tasks Categories Crowd Motivation Pros & Cons Quality Management Scale up with Machine Learning Workflows for Complex tasks Market evolution  Reputation Systems ECCO, March 20,2011 corina.ciechanow@pobox.com - PowerPoint PPT Presentation

Citation preview

• Introduction• Client Motivations Tasks Categories• Crowd Motivation • Pros & Cons• Quality Management• Scale up with Machine Learning• Workflows for Complex tasks• Market evolution Reputation Systems

ECCO, March 20,2011 corina.ciechanow@pobox.com

http://bitsofknowledge.waterloohills.com

Introduction

• June 2006: Jeff Howe created the term for his article in the Wired magazine "The Rise of Crowdsourcing".

• Elements:At least 2 actors:- Client/Requester - Crowd or community (an online audience)A Challenge:- What has to be done? Need, task, etc.- Reward: money, prize, other motivators.

Ex: “Adult Websites” Classification

• Large number of sites to label• Get people to look at sites and classify them as:

– G (general audience) – PG (parental guidance) – R (restricted) – X (porn)

[Panos Ipeirotis. WWW2011 tutorial]

Ex: “Adult Websites” Classification• Large number of hand‐labeled sites• Get people to look at sites and classify them as:

– G (general audience) – PG (parental guidance) – R (restricted) – X (porn)

Cost/Speed Statistics:• Undergrad intern: 200 websites/hr, cost: $15/hr• MTurk: 2500 websites/hr, cost: $12/hr

[Panos Ipeirotis. WWW2011 tutorial]

Client motivation

• Need Suppliers:

Mass work, Distributed work, or just tedious work Creative work Look for specific talent Testing Support To offload peak demands Tackle problems that need specific communities or human variety Any work that can be done cheaper this way.

Client motivation

• Need customers!

• Need Funding

• Need to be Backed up

• Crowdsourcing is your business!

Crowd Motivation

• Money €€€

• Self-serving purpose (learning new skills, get recognition, avoid boredom, enjoyment, create a network with other profesionals)

• Socializing, feeling of belonging to a community, friendship

• Altruism (public good, help others)

Crowd Demography(background defines motivation)

• The 2008 survey at iStockphoto indicates that the crowd is quite homogenous and elite.

• Amazon’s Mechanical Turk workers come mainly from 2 countries: a) USAb) India

Crowd Demography

Client Tasks Parameters

3 main goals for a task to be done:

1. Minimize Cost (cheap)

2. Minimize Completion Time (fast)

3. Maximize Quality (good)

Client has other goals when the crowd is not just a supplier

Pros

• Quicker: Parallellism reduces time• Cheap, even free• Creativity, Innovation • Quality (depends)• Availability of scarce ressources:

Taps on the ‘long tail’• Multiple feedback• Allows to create a community (followers)• Business Agility• Scales up!

Cons

• Lack of professionalism: Unverified quality • Too many answers• No standards • No organisation of answers• Not always cheap: Added costs to bring a project to conclusion• Too few participants if task or pay is not attractive• If worker is not motivated, lower quality of work

Cons• Global language barriers. • Different laws in each country: adds complexity• No written contracts, so no possibility of non-disclosure agreements.• Hard to maintain a long term working relationship with workers.• Difficulty managing a large-scale, crowdsourced project.• Can be targeted by malicious work efforts. • Lack of guaranteed investment, thus hard to convince stakeholders.

Quality Management Ex: “Adult Website” Classification

• Bad news: Spammers!

• Worker ATAMRO447HWJQ labeled X (porn) sites as G (general audience)

[Panos Ipeirotis. WWW2011 tutorial]

Quality ManagementMajority Voting and Label Quality

• Ask multiple labelers, keep majority label as “true” label

• Quality is probability of being correct

Dealing with Quality• Majority vote works best when workers have similar

quality• Otherwise better to just pick the vote of the best

worker• Or model worker qualities and combine

Vote combination studies [Clemen and Winkler, 1999, Ariely et al. 2000] show that complex models work slightly better than simple average, but are less robust.

• Spammers try to go undetected• Good willing workers may have bias

difficult to set apart.

Human Computation Biases• Anchoring Effect: “Humans start with a first

approximation (anchor) and then make adjustments to that number based on additional information.” [Tversky & Kahneman, 1974]

• Priming: Exposure to one stimulus (as stereotypes) influences another [Shih et al., 1999]

• Exposure Effect: Familiarity leads to liking...[Stone and Alonso, 2010]

• Framing Effect: Presenting the same option in different formats leads to different answers. [Tversky and Kahneman, 1981]

Need to remove sequential effects from human computation data…

Dealing with Quality• Use this process to improve quality:

1.Initialize by aggregating labels (using majority vote)2. Estimate error rates for workers (use aggregated labels)3. Change aggregate labels (using error rates, weight worker votes according to quality)

Note: Keep labels for “example data” unchanged4. Iterate from Step 2 until convergence

• Or Use exploration‐exploitation scheme:– Explore: Learn about the quality of the workers– Exploit: Label new examples using the quality

In both cases, significant advantage on bad conditions like imbalanced datasets and bad workers

Effect of Payment: Quality

• Cost does not affect quality [Mason and Watts, 2009, AdSafe]

• Similar results for bigger tasks [Ariely et al, 2009]

[Panos Ipeirotis. WWW2011 tutorial]

Effect of payment in #tasks• Payment incentives increase speed, though

[Panos Ipeirotis. WWW2011 tutorial]

Optimizing Quality

• Quality tends to remain the same, independent of completion time [Huang et al., HCOMP 2010]

Scale Up with Machine Learning Build an ‘Adult Website’ Classifier

• Crowdsourcing is cheap but not free

– Cannot scale to web without help

Build automatic classification models using examples from crowdsourced data

Integration with Machine Learning

• Humans label training data

• Use training data to build model

Dealing w/Quality in Machine Learning

Noisy labels lead to degraded task performance

Labeling quality increases Classification quality increases

Tradeoffs for Machine Learning Models

• Get more data Improve model accuracy

• Improve data quality Improve classification

Tradeoffs for Machine Learning Models

• Get more data: Active Learning, select which unlabeled example to label [Settles, http://active-learning.net/]

• Improve data quality: Repeated Labeling, label again an already labeled example [Sheng et al. 2008, Ipeirotis et al, 2010]

Model Uncertainty (MU)• Model uncertainty:

get more labels for instances that cause model uncertainty

– for modeling: why improve training data quality if model already is certain there?(“Self‐healing” process:[Brodley et al, JAIR 1999] , [Ipeirotis et al NYU 2010])

– for data quality, low‐certainty “regions” may be due to incorrect labeling of corresponding instances

Quality Rule of Thumb

• With high quality labelers (80% and above): One worker per case (more data better)

• With low quality labelers (~60%) Multiple workers per case (to improve quality)

[Sheng et al, KDD 2008; Kumar and Lease, CSDM 2011]

Complex tasks:Handle answers through workflow

• Q: “My task does not have discrete answers….”• A: Break into two Human Intelligence Tasks (HITs):

– “Create” HIT– “Vote” HIT

• Vote controls quality of Creation HIT• Redundancy controls quality of Voting HIT

Catch: If “creation” very good, voting workers just vote “yes”

– Solution: Add some random noise (e.g. add typos)

Photo descriptionBut the free-form answer can be more complex, not just right or wrong…

TurkIt toolkit [Little et al., UIST 2010]: http://groups.csail.mit.edu/uid/turkit/

Description Versions1. A partial view of a pocket calculator

together with some coins and a pen.2. ...3. A close‐up photograph of the following

items: A CASIO multi‐function calculator. A ball point pen, uncapped. Various coins, apparently European, both copper and gold. Seems to be a theme illustration for a brochure or document cover treating finance, probably personal finance.

4. …

8. A close‐up photograph of the following items: A CASIO multi‐function, solar powered scientific calculator. A blue ball point pen with a blue rubber grip and the tip extended. Six British coins; two of £1value, three of 20p value and one of 1p value. Seems to be a theme illustration for a brochure or document cover treating finance ‐ probably personal finance.

Collective Problem Solving

• Exploration / exploitation tradeoff (Independence/or not)

– Can accelerate learning, by sharing good solutions

– But can lead to premature convergence on suboptimal solution

[Mason and Watts, submitted to Science, 2011]

Independence or Not?

• Building iteratively (lack of independent) allows better outcomes for image description task…In the FoldIt game, workers built on each other’s results

• But lack of independence may cause high dependence on starting conditions and create Groupthink

[Little et al, HCOMP 2010]

Exploration/Exploitation?

• With high quality labelers (80% and above):

Exploration/Exploitation?

Group Effect

• Individual search strategies affect group success:

Players copying each other make less exploring lower probability of finding peak on a round

Workflow Patterns• Generate / Create• Find • Improve / Edit / Fix

Creation• Vote for accept‐reject• Vote up, vote down, to generate rank• Vote for best / select top‐k

Quality Control• Split task• Aggregate Flow Control• Iterate

Flow Control

AdSafe Crowdsourcing Experience

AdSafe Crowdsourcing Experience•Detect pages that discuss swine flu– Pharmaceutical firm had drug “treating” (off-label) swine flu– FDA prohibited pharmaceuticals to display drug ad in pages about swine flu

Two days to comply!

• Big fast-food chain does not want ad to appear:– In pages that discuss the brand (99% negative sentiment)– In pages discussing obesity

Adsafe Crowdsourcing ExperienceWorkflow to classify URLs

• Find URLs for a given topic (hate speech, gambling, alcoholabuse, guns, bombs, celebrity gossip, etc etc)http://url‐collector.appspot.com/allTopics.jsp

• Classify URLs into appropriate categorieshttp://url‐annotator.appspot.com/AdminFiles/Categories.jsp

• Mesure quality of the labelers and remove spammershttp://qmturk.appspot.com/

• Get humans to “beat” the classifier by providing cases wherethe classifier failshttp://adsafe‐beatthemachine.appspot.com/

Market Design of Crowdsourcing

Aggregators:• Create a crowd or community.• Create a portal to connect a client to the crowd• Deal with workflow of complex tasks, like decomposition in simpler tasks and answer recomposition

Allow anonymity Consumers can benefit from a crowd without the need to create it.

Market Design: Crude vs Intelligent Crowdsourcing• Intelligent Crowdsourcing uses an organized

workflow to tackle CONS of crude crowdsourcing.

Complex task is divided by experts,

Given to relevant crowds, and not to everyoneIndividual answers are recomposed by experts

into general answerUsually covert

Lack of Reputation and Market for Lemons

“When quality of sold good is uncertain and hidden before transaction, prize goes to value of lowest valued good” [Akerlof, 1970; Nobel prize winner]

• Market evolution steps:1. Employers pays $10 to good worker, $0.1 to bad worker2. 50% good workers, 50% bad; indistinguishable from each other3. Employer offers price in the middle: $54. Some good workers leave the market (pay too low)5. Employer revised prices downwards as % of bad increased6. More good workers leave the market… death spiral

http://en.wikipedia.org/wiki/The_Market_for_Lemons

Reputation systems• Great number of reputation mechanisms

• Challenges in the Design of Reputation Systems- Insufficient participation- Overwhelmingly positive feedback- Dishonest reports- Identity changes- Value imbalance exploitation (“milking the reputation”)

Reputation systems

[Panos Ipeirotis. WWW2011 tutorial]

Reputation systems

• Dishonest Reports1. Ebay “Riddle for a PENNY! No shipping‐Positive Feedback”. Sets agreement in order to be given unfairly high ratings by them.2 “Bad‐mouthing”: Same situation but to “bad‐mouth” other sellers that they want to drive out the market.

• Design incentive‐compatible mechanism to elicit honest feedbacks [Jurca and Faltings 2003: pay rater if report matches next; Miller et al. 2005: use a proper score rule to value report; Papaioannou and Stamoulis 2005: delay next transaction over time]

[Panos Ipeirotis. WWW2011 tutorial]

Reputation systemsIdentity changes

• “Cheap pseudonyms”: easy to disappear and reregister under a new identity with almost zero cost. [Friedman and Resnick 2001]

• Introduce opportunities to misbehave without paying reputational consequences.

Increase the difficulty of online identity changes• Impose upfront costs to new entrants: allow new

identities (forget the past) but make it costly to create them

Challenges for Crowdsourcing Markets

• Two‐sided opportunistic behavior1. In e‐commerce markets, only sellers are likely to behave opportunistically. 2. In crowdsourcing markets, both sides can be fraudulent.

• Imperfect monitoring and heavy‐tailed participationverifying the answers is sometimes as costly as providing them.- Sampling often does not work, due to heavy tailed participation distribution (lognormal, according to self‐reported surveys)

[Panos Ipeirotis. WWW2011 tutorial]

Challenges forthe Crowdsourcing Market

• Constrained capacity of workersWorkers have constrained capacity (cannot do more than xxhours per day) Machine Learning Techniques

• No “price premium” for high‐quality workersIt is the requester who set the prices, which are generally the same for all the workers, regardless of their reputation or quality.

Market is Organizing the Crowd

• Reputation Mechanisms– Crowd: Ensure worker quality– Employer: Ensure employer trustworthiness

• Task organization for task discovery (worker finds employer/task)

• Worker expertise recording for task assignment (employer/task finds worker)

Crowdsourcing Market Possible Evolutions

• Optimize allocation of tasks to worker based on completion time and expected quality

• Recommender system for crowds (“workers like you performed well in…”)

• Create a market with dynamic pricing for tasks, following the pricing model of the stock market (prices increase for task when work supply low, and vice versa)

[P. Ipeirotis, 2011]

References• Wikipedia,2011• Dion Hinchcliffe Crowdsourcing: 5 Reasons Its Not Just For Start Ups Anymore,2009• Tomoko A. Hosaka, MSNBC. "Facebook asks users to translate for free“,2008. • Daren C. Brabham. "Moving the Crowd at iStockphoto: The Composition of the Crowd and Motivations for Participation in a Crowdsourcing Application", First Monday, 13(6),2008.• Karim R. Lakhani, Lars Bo Jeppesen, Peter A. Lohse & Jill A. Panetta. The value of openness in scientific problem solving (Harvard Business School Working Paper No. 07-050),2007.• Klaus-Peter Speidel How to Do Intelligent Crowdsourcing,2011• Panos Ipeirotis. Managing Crowdsourced Human Computation, WWW2011 tutorial,2011• Omar Alonso & Matthew Lease. Crowdsourcing 101: Putting the WSDM of Crowds to Work for You, WSDM Hong Kong 2011. • Sanjoy Dasgupta, http://videolectures.net/icml09_dasgupta_langford_actl/,2009