Eval rec algo_crowdsourcing__icalt_2014_ma

© author(s) of these slides including research results from the KOM research network and TU Darmstadt; otherwise it is specified at the respective slide 28-Dez-14

Prof. Dr.-Ing. Ralf Steinmetz KOM - Multimedia Communications Lab

Eval_Rec_Algo_Crowdsourcing__ICALT_2014_MA.pptx

Evaluating Recommender Algorithms for Learning using Crowdsourcing

Mojisola Erdt Christoph Rensing

ICALT 2014, Athen

Source: http://www.digitalvisitor.com/cultural-differences-in-online-behaviour-and-customer-reviews/

KOM – Multimedia Communications Lab 2

Motivation

Learning on-the-job § To solve a particular problem § To learn about a new topic § Mostly web resources

Social Tagging Applications § Help to manage resources § Offer recommendations

TEL Recommender Systems § Recommend relevant, novel and

diverse resources to a specific learning goal or activity


Evaluation Approach Advantages Disadvantages

Offline Experiments (historical or synthetic datasets)

§  Fast §  Less effort §  Repeatable

§  New, unknown resources cannot be evaluated

§  Dependent on dataset User Experiments §  User’s perspective §  A lot of effort and time

§  Few users (ca. 40)

Real-life testing §  Real-life setting §  Needs a substantial amount of users

Crowdsourcing

§  Fast §  Less effort §  Repeatable §  User’s perspective §  Sufficient users

§  Unknown users §  “Artificial task” §  Spamming

Evaluation Methods for TEL Recommender Systems


microworkers §  500,000 crowdworkers worldwide §  Flexible forwarding to other

hosting platforms §  Since 2009

CrowdFlower §  5 million crowdworkers in 208

countries § Gives access to other

crowdsourcing platforms e.g. Amazon MTurk

§  Since 2007

https://microworkers.com, http://www.crowdflower.com

Crowdsourcing Platforms


§ Motivation § Crowdsourcing Evaluation Concept §  Preparation Step §  Execution Step

§ Crowdsourcing Evaluation Results § Conclusion & Future Work

Overview


Crowdsourcing Evaluation Concept Preparation Step

Create Questionnaire

Set Goal

Formulate Hypotheses

Create Questions

Add Control

Questions

Select Topic

Create Activity

Hierarchy

Create Seed

Dataset

Prepare Algorithms

Generate Recommendations

Filter Duplicates

DeLFI 2013. M. Migenda, M. Erdt, M. Gutjahr, and C. Rensing


Preparation Step Set Goal

AScore is based on Activity Hierarchies § Extends FolkRank by considering activities, activity hierarchies and the current

activity of the learner

ECTEL 2012. Anjorin et al

Understanding the Carbon Footprint

Calculating the

Carbon Footprint

Investigate the impact of

Climate Change

Analyze potential

Catastrophes due to Climate Change

Investigate causes of

Climate Change

Give an overview on the history of Global Warming

Determine

future prognoses on Climate Change

Understanding

Climate Change


Set Evaluation Goals: § Investigate if AScore

recommends more relevant, novel and diverse learning resources to a specified topic than FolkRank.

§ Investigate if AScore recommends more relevant, novel and diverse learning resources to sub-activities (A Sub) than to activities higher up in the hierarchy (A Super).

Formulate Hypotheses: 1.  Hypothesis: Relevance §  Ascore vs. FolkRank §  A_Sub vs. A_Super

2.  Hypothesis: Novelty §  Ascore vs. FolkRank §  A_Sub vs. A_Super

3.  Hypothesis: Diversity §  Ascore vs. FolkRank §  A_Sub vs. A_Super

Preparation Step Set Goal and Formulate Hypotheses


Generate a basis graph structure for recommendations § 5 experts research on the topic of climate change for one hour § Using CROKODIL to create an extended folksonomy (users, tags, resources,

activities) § Ca. 70 resources were tagged and attached to 8 activities

Preparation Step Select Topic and Generate Recommendations

Understanding the Carbon Footprint

Calculating the

Carbon Footprint

Investigate the impact of

Climate Change

Analyze potential

Catastrophes due to Climate Change

Investigate causes of

Climate Change

Give an overview on the history of Global Warming

Determine

future prognoses on Climate Change

Understanding

Climate Change

Experiment Spring

Experiment Autumn


Conduct personal research on the topic § Level of knowledge on this topic § Request to find 5 online resources relevant to this topic

10 Questions per Recommendation § 3 questions to each hypothesis (relevance, novelty, diversity) § 1 control question to detect spammers §  E.g. Give 4 keywords to summarize the recommended resource

General Questions § Age, gender, level of education and nationality

Preparation Step Create Questionnaire

Experiment Spring

Sub-activity

Super-activity

AScore A_Sub A_Super FolkRank F_Sub F_Super

Experiment Autumn

Sub-activity

Super-activity

AScore A_Sub A_Super FolkRank F_Sub F_Super


https://www.soscisurvey.de

Crowdsourcing Evaluation Concept Execution Step

Release next iteration burst

Crowdsourcing Platform

Results

Filter Spammers

Make Payments

Questionnaire


Execution Step Participants and Treatment Conditions

Experiment Spring

Sub-activity

Super-activity

AScore A_Sub: 45

A_Super:39

FolkRank F_Sub: 39

F_Super: 36

Experiment Autumn

Sub-activity

Super-activity

AScore A_Sub: 80

A_Super: 73

FolkRank F_Sub: 76

F_Super: 85

CrowdFlower (32)

Microworker (35) Volunteers (92)

Spammers (243)

Crowdworkers (314)

Spammers (549)


§ Motivation § Crowdsourcing Evaluation Concept § Crowdsourcing Evaluation Results §  AScore and FolkRank §  Experiment Spring §  Experiment Autumn

§  A_Sub and A_Super §  Experiment Spring §  Experiment Autumn

§ Conclusion & Future Work

Overview


Crowdsourcing Evaluation Results Experiment Spring

Significance Tests

Hypothesis 1: Relevance 2: Novelty 3: Diversity

p-value 0.000003578 < 0.05 0.000001531 < 0.05 0.0001618 < 0.05


Crowdsourcing Evaluation Results Experiment Autumn

Significance Tests


p-value 0.000001362 < 0.05 0.0000007654 < 0.05 0.00000000015 < 0.05


Evaluation Goals: § Investigate if AScore






Execution Step Evaluation Results

✔

✔ ✔

✔



Significance Tests


p-value 0.0005654 < 0.05 0.01666 < 0.05 0.02176 < 0.05



Significance Tests


p-value 0.0005306 < 0.05 0.000001531 < 0.05 0.0000001608 < 0.05


Hypothesis 1 Hypothesis 2 Hypothesis 3

Aggregated Mean Values for Hypotheses 1, 2 and 3

Mea

n

01

23

45

67

F_SubF_Super

3.95 4.05 3.97 3.91 3.96 3.83


Significance Tests


p-value 0.3023 > 0.05 0.5216 > 0.05 0.2031 > 0.05


Hypothesis 1 Hypothesis 2 Hypothesis 3

Aggregated Mean Values for Hypotheses 1, 2 and 3

Mea

n

01

23

45

67 F_Sub

F_Super

4.04 3.9 4.11 4.09 4.07 4.01


Significance Tests


p-value 0.01481 < 0.05 0.7064 > 0.05 0.2881 > 0.05


Evaluation Goals: § Investigate if AScore






Execution Step Evaluation Results

✔

✔

✔

✔

✔


Crowdsourcing can be successfully applied to evaluate TEL recommender algorithms §  Integrate more user-centric evaluations already during the design and

development of TEL recommender algorithms § Select the best fitting evaluation approach

Future Work § Can crowdsourcing be used to evaluate other aspects of a recommender

system? E.g. explanations, presentation… Can more complex TEL evaluation tasks be evaluated with crowdsourcing?

Conclusion and Future Work


Questions & Contact

Technology

Eval rec algo_crowdsourcing__icalt_2014_ma