Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Systems

© author(s) of these slides including research results from the KOM research network and TU Darmstadt; otherwise it is specified at the respective slide

21-Sep-13

Prof. Dr.-Ing. Ralf Steinmetz

KOM - Multimedia Communications Lab

Rensing_Crowdsourcing_ECTELmeetsECSCW__2013.pptx

Investigating Crowdsourcing as an

Evaluation Method for

(TEL) Recommender Systems

Mojisola Erdt

Florian Jomrich

Katja Schüler

Christoph Rensing

Source: http://www.digitalvisitor.com/cultural-differences-in-online-behaviour-and-customer-reviews/

KOM – Multimedia Communications Lab 2

Motivation & Context

Workplace Learning

to solve a particular work related

problem or to prepare for a new task

Informal: Community of practices &

using resources from the Web or

companies Intranet or created from

the employees

Train and practice necessary

competences in vocational

training

Learning Application

Social tagging application

Help to structure resources

Help to retrieve resources

Reflection

…

see Proceedings of EC-TEL 2011

(TEL) Recommender Systems

Recommend relevant resources,

tags or users


Quality Measures for Recommender Systems in TEL:

Relevance: “Relevance refers to the ability of a RS (Recommender System)

to provide items that fit the user’s preferences” [Epifania 2012].

Novelty: “Novelty (or discovery) is the extent to which users receive new

and interesting recommendations” [Pu 2011].

Diversity: “Diversity measures the diversity level of items in the

recommendation list” [Pu 2011].

Quality Measures for a User Experiment


Evaluation

Approach

Method Advantages Disadvantages

Offline

Evaluation

Cross-validation

using historical

datasets

Fast

Less effort

Repeatable

New, unknown

resources cannot be

evaluated

Cross-validation

using synthetically

generated data

For testing and

analysing

Might be biased

towards an algorithm

Artificial scenarios

Online

Evaluation

User Experiments

(Lab & Field)

User’s

perspective

A lot of effort and

time

Few users

Real-life testing Real-life setting

Needs a substantial

amount of users

Crowdsourcing Fast

Less effort

Enough users

Spamming

Evaluation Methods for TEL Recommender

Systems


Experiment

Generate a basis graph structure (extended Folksonomy)

5 Experts research on the topic of climate change for one hour

Using CROKODIL to create an extended folksonomy

Ca. 70 resources were attached to 8 activities

Generate Recommendations

Recommendations from AScore and FolkRank algorithms (see Proceedings of

EC-TEL 2012)

Three Hypotheses:

1. AScore gives more relevant resources than baseline (FolkRank)

2. AScore gives more novel resources than baseline (FolkRank)

3. AScore gives more diverse resources than baseline (FolkRank)

Questionnaire is created

10 questions per recommendation

3 questions to each hypothesis

1 control question to detect spammers


Crowdsouring Evaluation Concept

Jobs on 2 crowdsourcing platforms

Microworkers.com

Active Users and good service quality

CrowdFlower.com

Gives access to other platforms (e.g.

Amazon MTurk a lot of crowdworkers)


Overview of Participants in the Evaluation

Jobs on 2 crowdsourcing platforms

Microworkers.com (Active Users and good service quality)

CrowdFlower.com (Gives access to other platforms (e.g. Amazon MTurk a lot of crowdworkers))

125 completed Questionnaires were evaluated


Summary: Results

Hyp. 1 Relevance Hyp. 2 Novelty Hyp. 3 Diversity

p=0.0065 < 0.5 p=0.0042 < 0.5 p=0.677 > 0.5

supported supported not supported


Conclusion & Future Work

Crowdsourcing can be used as an Evaluation Method for TEL

Further Analysis of Data collected

Comparing results between Crowdworkers and Non-Crowdworkers

Improve Crowdsourcing Evaluation Concept

Apply a more complex evaluation

Apply to further scenarios


Workshop related research questions

How has a recommender to been designed to support resource-based

knowledge acquisition at the workplace?

Relevance and novelty valid quality measures?

How has a recommender to been designed to support resource-based

Learning?

Which are the valid quality measures?

Aggregation of relevance, novelty, diversity, … ?

How can the recommendation been adapted to the current need of a learner?

What do we know from the learner Learning Analytics

How can we support teachers to train and practice self-regulated

learning competences in vocational training?


Questions & Contact


Hypothesis 1: Relevance

Q1. The given internet resource supports me very well in my research about

the topic.

Q2. If I could only use this resource, my research would still be very successful.

Q3. Without this resource just by using my own resources, my research about

the given topic would still be very good.

Hypothesis 2: Novelty

Q4. The internet resource gives me new insights and/ or information for my

task.

Q5. I would have found this resource on my own/ anyway/ during my research.

Q6. There are lots of important aspects about the topic described in this

resource that lack in other resources.

Hypothesis 3: Diversity

Q7. This internet resource differs strongly from my other resources.

Q8. This resource informs me comprehensively about my topic.

Q9. This resource covers the whole spectrum of research about the given

topic.

Questions to Hypothesis 1 – 3


Control Questions in Questionnaire

Q10a. How many pictures and tables that are relevant to the given research topic

does the given resource contain?

Q10b. Give a short summary of the recommended resource above by giving 4

keywords describing its content.

Q10c. Describe the content of the given resource in two sentences.

Control Questions




Quality Measures for Recommender Systems in TEL:

Relevance: “Relevance refers to the ability of a RS (Recommender System)

to provide items that fit the user’s preferences” [Epifania 2012].

Novelty: “Novelty (or discovery) is the extent to which users receive new

and interesting recommendations” [Pu 2011].

Diversity: “Diversity measures the diversity level of items in the

recommendation list” [Pu 2011].

Quality Measures for a User Experiment


Crowdworkers: Age distribution

Education

Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Systems