23
Data and text mining workshop The role of crowdsourcing Anna Noel-Storr Wellcome Trust, London, Friday 6 th March 2015

Role of crowdsourcing

Embed Size (px)

Citation preview

Page 1: Role of crowdsourcing

Data and text mining workshopThe role of crowdsourcing

Anna Noel-StorrWellcome Trust, London, Friday 6th March 2015

Page 2: Role of crowdsourcing

What is crowdsourcing?

“…the practice of obtaining needed services, ideas, or content by soliciting contributions

from a large group of people, and especially from an online community, rather than from traditional employees…”

Image credit: DesignCareer

Page 3: Role of crowdsourcing

What is crowdsourcing?

Knowledge discovery

and management

Brabham’s problem focused crowdsourcing typology: 4 types

Page 4: Role of crowdsourcing

What is crowdsourcing?

Knowledge discovery

and management

Broadcastsearch

Brabham’s problem focused crowdsourcing typology: 4 types

Page 5: Role of crowdsourcing

What is crowdsourcing?

Knowledge discovery

and management

Broadcastsearch

Peer-vetted creative

production

Brabham’s problem focused crowdsourcing typology: 4 types

Page 6: Role of crowdsourcing

What is crowdsourcing?

Knowledge discovery

and management

Broadcastsearch

Peer-vetted creative

production

Distributed human

intelligence tasking

Brabham’s problem focused crowdsourcing typology: 4 types

Page 7: Role of crowdsourcing

What is crowdsourcing?

Knowledge discovery

and management

Broadcastsearch

Peer-vetted creative

production

Distributed human

intelligence tasking

Brabham’s problem focused crowdsourcing typology: 4 types

Page 8: Role of crowdsourcing

Micro-tasking: process

Breaking down large corpus of data into smaller units and distributing those units to a large online crowd

“the distribution of small parts of a problem”

Page 9: Role of crowdsourcing

Human computation

Humans remain better than machines at certain tasks: e.g. Identifying pizza toppings from a picture of a pizzae.g. “preventing obesity without eating like a rabbit”.ti. – autotag: Animal study

Page 10: Role of crowdsourcing

Tools and platforms

What platforms and tools exist and how do they work?

Image credit: ThinkStock

Page 11: Role of crowdsourcing

The Zooniverse

“each project uses the efforts and ability of volunteers to help scientists and researchers deal with the flood of data that confronts them”

Page 12: Role of crowdsourcing

Classification and annotation

Galaxy Zoo

Operation War Diary

Page 13: Role of crowdsourcing

Health related evidence productionCan we use crowdsourcing to identify the

evidence in a more timely way?

- Known pressure point within the review production- Between 2000 and 5000 citations per new review, but can be much more- A not much loved task

Trial identification

Page 14: Role of crowdsourcing

The Embase project

Cochrane’s Central Register

of Controlled Trials:

CENTRAL

EmbaseCrowd

Embaseauto

Step 2: Use a crowd to screen thousands of search results from Embase and feed the identified reports of RCTs into CENTRAL

How will the crowd do this?

Step 1: run a very sensitive search in the largest biomedical database for studies

Page 15: Role of crowdsourcing

The screening tool

Three choices

You are not alone!

(and you can’t go back)

Progress bar

Yellow highlights to indicate a likely RCT

Red highlights

Page 16: Role of crowdsourcing

The Embase project: recruitment

- 900+ people have signed-up to screen citations in 12 months- 110,000+ citations have been collectively screened

- 4,000 RCTs/q-RCTs identified by the crowd

0

100

200

300

400

500

600

700

800

900

1000

Feb-14 Mar-14 Apr-14 May-14 Jun-14 Jul-14 Aug-14 Sep-14 Oct-14 Nov-14 Dec-14 Jan-15 Feb-15 Mar-15

Number of Participants

Participants

Page 17: Role of crowdsourcing

Why do people do it?

Made it very easy to participate (and equally easy to stop!)

Gain experience(bulk up the CV)

Provide feedback: both to the individual and to

the community

Wanting to do something to contribute (healthcare is a strong hook)

(people are more likely to come back)

Page 18: Role of crowdsourcing

RCT RCT RCT

Reject Reject Reject

Unsure

CENTRAL

Bin

Resolver

How accurate is the crowd?

RCTReject Resolver

5%

Page 19: Role of crowdsourcing

Crowd accuracy

TP1565

FP 9

FN 2

TN 2888

TP415

FP 5

FN 1

TN 2649

The Crowd:INDEXTEST

The Crowd:INDEXTEST

The Info specialist: REFERENCE STANDARD

The Info specialists: REFERENCE STANDARD

Validation 1Validation 2

Sensitivity: 99.9% Specificity: 99.7% Sensitivity: 99.8% Specificity: 99.8%

Enriched sample; blinded to crowd decision; dual independent screeners as reference standard

Enriched sample; blinded to crowd decision; single independent expert screener (me!) as reference standard; possibility of incorporation bias

Individual screener accuracy is also carefully monitored

Page 20: Role of crowdsourcing

How fast is the crowd?

Number of weeks

Jan 2014 Jul 2014 Jan 2015

6 weeks

5 weeks

2 weeks

More screeners and more screeners screening more quickly

Length of time to screen one month’s worth of records

Page 21: Role of crowdsourcing

More of the same, and more tasks

As the crowd becomes more efficient, we plan to do two things:1. Increase the databases we search – feed in more citations2. Offer other ‘micro-tasks’

Feed in more citations – from other databases

Bin

Y

N

ScreenAnnotate, appraise

And in these tasks the machine plays a vital and complementary role…

e.g. is the healthcare condition Alzheimer’s disease? Y, N, Unsure

Page 22: Role of crowdsourcing

Perfect partnership

Machine driven probability + Collective human decision-making

It’s not one or the other, the ideal is both

Page 23: Role of crowdsourcing

In summary

• Effective method in large scale study identification

• Identify more studies, more quickly

• No compromise on quality or accuracy

• Offers meaningful ways to contribute

• Feasible to recruit a crowd• Highly functional tool• Complements data and text

mining

And enables the move towards the living review

Crowdsourcing: