17
© author(s) of these slides including research results from the KOM research network and TU Darmstadt; otherwise it is specified at the respective slide 21-Sep-13 Prof. Dr.-Ing. Ralf Steinmetz KOM - Multimedia Communications Lab Rensing_Crowdsourcing_ECTELmeetsECSCW__2013.pptx Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Systems Mojisola Erdt Florian Jomrich Katja Schüler Christoph Rensing Source: http://www.digitalvisitor.com/cultural-differences-in-online-behaviour-and-customer-reviews/

Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Systems

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Systems

© author(s) of these slides including research results from the KOM research network and TU Darmstadt; otherwise it is specified at the respective slide

21-Sep-13

Prof. Dr.-Ing. Ralf Steinmetz

KOM - Multimedia Communications Lab

Rensing_Crowdsourcing_ECTELmeetsECSCW__2013.pptx

Investigating Crowdsourcing as an

Evaluation Method for

(TEL) Recommender Systems

Mojisola Erdt

Florian Jomrich

Katja Schüler

Christoph Rensing

Source: http://www.digitalvisitor.com/cultural-differences-in-online-behaviour-and-customer-reviews/

Page 2: Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Systems

KOM – Multimedia Communications Lab 2

Motivation & Context

Workplace Learning

to solve a particular work related

problem or to prepare for a new task

Informal: Community of practices &

using resources from the Web or

companies Intranet or created from

the employees

Train and practice necessary

competences in vocational

training

Learning Application

Social tagging application

Help to structure resources

Help to retrieve resources

Reflection

see Proceedings of EC-TEL 2011

(TEL) Recommender Systems

Recommend relevant resources,

tags or users

Page 3: Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Systems

KOM – Multimedia Communications Lab 3

Quality Measures for Recommender Systems in TEL:

Relevance: “Relevance refers to the ability of a RS (Recommender System)

to provide items that fit the user’s preferences” [Epifania 2012].

Novelty: “Novelty (or discovery) is the extent to which users receive new

and interesting recommendations” [Pu 2011].

Diversity: “Diversity measures the diversity level of items in the

recommendation list” [Pu 2011].

Quality Measures for a User Experiment

Page 4: Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Systems

KOM – Multimedia Communications Lab 4

Evaluation

Approach

Method Advantages Disadvantages

Offline

Evaluation

Cross-validation

using historical

datasets

Fast

Less effort

Repeatable

New, unknown

resources cannot be

evaluated

Cross-validation

using synthetically

generated data

For testing and

analysing

Might be biased

towards an algorithm

Artificial scenarios

Online

Evaluation

User Experiments

(Lab & Field)

User’s

perspective

A lot of effort and

time

Few users

Real-life testing Real-life setting

Needs a substantial

amount of users

Crowdsourcing Fast

Less effort

Enough users

Spamming

Evaluation Methods for TEL Recommender

Systems

Page 5: Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Systems

KOM – Multimedia Communications Lab 6

Experiment

Generate a basis graph structure (extended Folksonomy)

5 Experts research on the topic of climate change for one hour

Using CROKODIL to create an extended folksonomy

Ca. 70 resources were attached to 8 activities

Generate Recommendations

Recommendations from AScore and FolkRank algorithms (see Proceedings of

EC-TEL 2012)

Three Hypotheses:

1. AScore gives more relevant resources than baseline (FolkRank)

2. AScore gives more novel resources than baseline (FolkRank)

3. AScore gives more diverse resources than baseline (FolkRank)

Questionnaire is created

10 questions per recommendation

3 questions to each hypothesis

1 control question to detect spammers

Page 6: Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Systems

KOM – Multimedia Communications Lab 7

Crowdsouring Evaluation Concept

Jobs on 2 crowdsourcing platforms

Microworkers.com

Active Users and good service quality

CrowdFlower.com

Gives access to other platforms (e.g.

Amazon MTurk a lot of crowdworkers)

Page 7: Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Systems

KOM – Multimedia Communications Lab 9

Overview of Participants in the Evaluation

Jobs on 2 crowdsourcing platforms

Microworkers.com (Active Users and good service quality)

CrowdFlower.com (Gives access to other platforms (e.g. Amazon MTurk a lot of crowdworkers))

125 completed Questionnaires were evaluated

Page 8: Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Systems

KOM – Multimedia Communications Lab 11

Summary: Results

Hyp. 1 Relevance Hyp. 2 Novelty Hyp. 3 Diversity

p=0.0065 < 0.5 p=0.0042 < 0.5 p=0.677 > 0.5

supported supported not supported

Page 9: Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Systems

KOM – Multimedia Communications Lab 15

Conclusion & Future Work

Crowdsourcing can be used as an Evaluation Method for TEL

Further Analysis of Data collected

Comparing results between Crowdworkers and Non-Crowdworkers

Improve Crowdsourcing Evaluation Concept

Apply a more complex evaluation

Apply to further scenarios

Page 10: Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Systems

KOM – Multimedia Communications Lab 16

Workshop related research questions

How has a recommender to been designed to support resource-based

knowledge acquisition at the workplace?

Relevance and novelty valid quality measures?

How has a recommender to been designed to support resource-based

Learning?

Which are the valid quality measures?

Aggregation of relevance, novelty, diversity, … ?

How can the recommendation been adapted to the current need of a learner?

What do we know from the learner Learning Analytics

How can we support teachers to train and practice self-regulated

learning competences in vocational training?

Page 11: Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Systems

KOM – Multimedia Communications Lab 17

Questions & Contact

Page 12: Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Systems

KOM – Multimedia Communications Lab 18

Hypothesis 1: Relevance

Q1. The given internet resource supports me very well in my research about

the topic.

Q2. If I could only use this resource, my research would still be very successful.

Q3. Without this resource just by using my own resources, my research about

the given topic would still be very good.

Hypothesis 2: Novelty

Q4. The internet resource gives me new insights and/ or information for my

task.

Q5. I would have found this resource on my own/ anyway/ during my research.

Q6. There are lots of important aspects about the topic described in this

resource that lack in other resources.

Hypothesis 3: Diversity

Q7. This internet resource differs strongly from my other resources.

Q8. This resource informs me comprehensively about my topic.

Q9. This resource covers the whole spectrum of research about the given

topic.

Questions to Hypothesis 1 – 3

Page 13: Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Systems

KOM – Multimedia Communications Lab 19

Control Questions in Questionnaire

Q10a. How many pictures and tables that are relevant to the given research topic

does the given resource contain?

Q10b. Give a short summary of the recommended resource above by giving 4

keywords describing its content.

Q10c. Describe the content of the given resource in two sentences.

Control Questions

Page 14: Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Systems

KOM – Multimedia Communications Lab 20

Page 15: Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Systems

KOM – Multimedia Communications Lab 21

Page 16: Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Systems

KOM – Multimedia Communications Lab 22

Quality Measures for Recommender Systems in TEL:

Relevance: “Relevance refers to the ability of a RS (Recommender System)

to provide items that fit the user’s preferences” [Epifania 2012].

Novelty: “Novelty (or discovery) is the extent to which users receive new

and interesting recommendations” [Pu 2011].

Diversity: “Diversity measures the diversity level of items in the

recommendation list” [Pu 2011].

Quality Measures for a User Experiment

Page 17: Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Systems

KOM – Multimedia Communications Lab 23

Crowdworkers: Age distribution